fix: harden webhook reliability, fix bugs, add test suite

- Statuspage webhook always returns 200 to prevent subscriber removal
- Fix parseKvKey returning string chatId instead of number
- Queue consumer retries on Telegram 5xx instead of acking (prevents message loss)
- Fix observability top-level enabled flag (false → true)
- Add defensive null checks for webhook payload body
- Cache Bot instance per isolate to avoid middleware rebuild per request
- Add vitest + @cloudflare/vitest-pool-workers with 31 tests
- Document DLQ and KV sharding as declined features
This commit is contained in:
2026-04-09 10:29:30 +07:00
parent bb8f4dcde8
commit 8c993df72b
15 changed files with 1680 additions and 57 deletions

View File

@@ -13,7 +13,10 @@ Telegram bot that forwards [status.claude.com](https://status.claude.com/) (Atla
- `npx wrangler deploy --dry-run` — Verify build without deploying - `npx wrangler deploy --dry-run` — Verify build without deploying
- `node scripts/setup-bot.js` — One-time: register bot commands + set Telegram webhook (interactive prompts) - `node scripts/setup-bot.js` — One-time: register bot commands + set Telegram webhook (interactive prompts)
No test framework configured yet. No linter configured. - `npm test` — Run tests (vitest + @cloudflare/vitest-pool-workers, runs in Workers runtime)
- `npm run test:watch` — Run tests in watch mode
No linter configured.
## Secrets (set via `wrangler secret put`) ## Secrets (set via `wrangler secret put`)

View File

@@ -78,7 +78,23 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Out of scope. The bot is the product — adding a web frontend changes the project's nature. **Why this rank**: Out of scope. The bot is the product — adding a web frontend changes the project's nature.
### 10. Digest / Quiet Mode ### 10. Dead Letter Queue for Failed Messages
**Idea**: After CF Queues exhausts 3 retries, persist failed messages to KV or a dedicated DLQ for debugging.
**Decision**: Skip. CF Workers already logs all queue consumer errors (including final retry failures) via the observability config. With 100% log sampling and persisted invocation logs, failed messages are visible in the Cloudflare Dashboard. Adding a KV-based DLQ introduces write overhead on every failure and cleanup logic for stale entries — not worth it when logs already provide the same visibility.
**Why this rank**: Logging is sufficient for current scale. Revisit only if log retention (3-day free tier) is too short for debugging patterns.
### 11. KV List Scalability (Subscriber Sharding)
**Idea**: Shard subscriber keys by event type (e.g., `sub:incident:{chatId}`, `sub:component:{chatId}`) to avoid listing all subscribers on every webhook.
**Decision**: Skip. Current `kv.list({ prefix: "sub:" })` pagination works for hundreds of subscribers. Sharding requires a KV schema migration, dual-write logic during transition, and doubles storage for subscribers who want both types. Not justified until `kv.list()` latency or cost becomes measurable.
**Why this rank**: Clear trigger: slow webhook response times at high subscriber counts. Migration path is straightforward when needed.
### 12. Digest / Quiet Mode
**Idea**: Batch notifications into a daily summary instead of instant alerts. **Idea**: Batch notifications into a daily summary instead of instant alerts.

View File

@@ -56,7 +56,7 @@ A middleware in `index.js` normalizes double slashes in URL paths (Statuspage oc
| File | Lines | Responsibility | | File | Lines | Responsibility |
|------|-------|---------------| |------|-------|---------------|
| `index.js` | ~30 | Hono router, path normalization middleware, export handlers | | `index.js` | ~30 | Hono router, path normalization middleware, export handlers |
| `bot-commands.js` | ~145 | `/start`, `/stop`, `/subscribe` — subscription management | | `bot-commands.js` | ~155 | `/start`, `/stop`, `/subscribe` — subscription management (cached Bot instance) |
| `bot-info-commands.js` | ~125 | `/help`, `/status`, `/history`, `/uptime` — read-only info | | `bot-info-commands.js` | ~125 | `/help`, `/status`, `/history`, `/uptime` — read-only info |
| `statuspage-webhook.js` | ~85 | Webhook validation, event parsing, subscriber fan-out | | `statuspage-webhook.js` | ~85 | Webhook validation, event parsing, subscriber fan-out |
| `queue-consumer.js` | ~65 | Batch message delivery, retry/removal logic | | `queue-consumer.js` | ~65 | Batch message delivery, retry/removal logic |
@@ -94,6 +94,7 @@ Binding: `claude-status` queue
- **Batch size**: 30 messages per consumer invocation - **Batch size**: 30 messages per consumer invocation
- **Max retries**: 3 (configured in `wrangler.jsonc`) - **Max retries**: 3 (configured in `wrangler.jsonc`)
- **429 handling**: `msg.retry()` with CF Queues backoff; `Retry-After` header logged - **429 handling**: `msg.retry()` with CF Queues backoff; `Retry-After` header logged
- **5xx handling**: `msg.retry()` for transient Telegram server errors
- **403/400 handling**: subscriber removed from KV, message acknowledged - **403/400 handling**: subscriber removed from KV, message acknowledged
- **Network errors**: `msg.retry()` for transient failures - **Network errors**: `msg.retry()` for transient failures
@@ -108,6 +109,7 @@ Enabled via `wrangler.jsonc` `observability` config. Automatic — no code chang
## Security ## Security
- **Statuspage webhook always-200**: Handler always returns HTTP 200 (even on errors) to prevent Statuspage from removing the webhook subscription. Errors are logged, not surfaced as HTTP status codes.
- **Statuspage webhook auth**: URL path secret validated with timing-safe SHA-256 comparison - **Statuspage webhook auth**: URL path secret validated with timing-safe SHA-256 comparison
- **Telegram webhook**: Registered via `setup-bot.js` — Telegram only sends to the registered URL - **Telegram webhook**: Registered via `setup-bot.js` — Telegram only sends to the registered URL
- **No secrets in code**: `BOT_TOKEN` and `WEBHOOK_SECRET` stored as Cloudflare secrets - **No secrets in code**: `BOT_TOKEN` and `WEBHOOK_SECRET` stored as Cloudflare secrets

1237
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -6,7 +6,9 @@
"type": "module", "type": "module",
"scripts": { "scripts": {
"dev": "wrangler dev", "dev": "wrangler dev",
"deploy": "wrangler deploy" "deploy": "wrangler deploy",
"test": "vitest run",
"test:watch": "vitest"
}, },
"repository": { "repository": {
"type": "git", "type": "git",
@@ -24,6 +26,8 @@
"hono": "^4.12.12" "hono": "^4.12.12"
}, },
"devDependencies": { "devDependencies": {
"@cloudflare/vitest-pool-workers": "^0.14.2",
"vitest": "^4.1.3",
"wrangler": "^4.81.0" "wrangler": "^4.81.0"
} }
} }

View File

@@ -8,6 +8,13 @@ import {
} from "./kv-store.js"; } from "./kv-store.js";
import { fetchComponentByName, escapeHtml } from "./status-fetcher.js"; import { fetchComponentByName, escapeHtml } from "./status-fetcher.js";
import { registerInfoCommands } from "./bot-info-commands.js"; import { registerInfoCommands } from "./bot-info-commands.js";
/**
* Module-level KV reference, updated each request.
* Safe because CF Workers are single-threaded per isolate.
*/
let kv = null;
/** /**
* Extract chatId and threadId from grammY context * Extract chatId and threadId from grammY context
*/ */
@@ -19,11 +26,10 @@ function getChatTarget(ctx) {
} }
/** /**
* Handle incoming Telegram webhook via grammY * Create Bot with all commands registered. Called once per isolate.
*/ */
export async function handleTelegramWebhook(c) { function createBot(token) {
const bot = new Bot(c.env.BOT_TOKEN); const bot = new Bot(token);
const kv = c.env.claude_status;
bot.command("start", async (ctx) => { bot.command("start", async (ctx) => {
const { chatId, threadId } = getChatTarget(ctx); const { chatId, threadId } = getChatTarget(ctx);
@@ -140,6 +146,29 @@ export async function handleTelegramWebhook(c) {
); );
}); });
const handler = webhookCallback(bot, "cloudflare-mod"); return bot;
return handler(c.req.raw); }
/**
* Cached Bot instance — avoids rebuilding middleware chain on every request.
* CF Workers reuse isolates, so module-level state persists across requests.
*/
let cachedBot = null;
let cachedToken = null;
let cachedHandler = null;
/**
* Handle incoming Telegram webhook via grammY
*/
export async function handleTelegramWebhook(c) {
// Update module-level KV ref (same binding across requests, but kept explicit)
kv = c.env.claude_status;
if (!cachedBot || cachedToken !== c.env.BOT_TOKEN) {
cachedBot = createBot(c.env.BOT_TOKEN);
cachedToken = c.env.BOT_TOKEN;
cachedHandler = webhookCallback(cachedBot, "cloudflare-mod");
}
return cachedHandler(c.req.raw);
} }

View File

@@ -17,7 +17,7 @@ function parseKvKey(kvKey) {
const lastColon = raw.lastIndexOf(":"); const lastColon = raw.lastIndexOf(":");
// No colon or only negative sign prefix — no threadId // No colon or only negative sign prefix — no threadId
if (lastColon <= 0) { if (lastColon <= 0) {
return { chatId: raw, threadId: null }; return { chatId: Number(raw), threadId: null };
} }
// Check if the part after last colon is a valid threadId (numeric) // Check if the part after last colon is a valid threadId (numeric)
const possibleThread = raw.slice(lastColon + 1); const possibleThread = raw.slice(lastColon + 1);

View File

@@ -46,6 +46,10 @@ export async function handleQueue(batch, env) {
console.log(`Queue: rate limited for ${chatId}, Retry-After: ${retryAfter ?? "unknown"}`); console.log(`Queue: rate limited for ${chatId}, Retry-After: ${retryAfter ?? "unknown"}`);
retried++; retried++;
msg.retry(); msg.retry();
} else if (res.status >= 500) {
console.error(`Queue: Telegram 5xx (${res.status}) for ${chatId}, retrying`);
retried++;
msg.retry();
} else { } else {
console.error(`Queue: unexpected HTTP ${res.status} for ${chatId}`); console.error(`Queue: unexpected HTTP ${res.status} for ${chatId}`);
failed++; failed++;

View File

@@ -34,53 +34,73 @@ function formatComponentMessage(component, update) {
} }
/** /**
* Handle incoming Statuspage webhook * Handle incoming Statuspage webhook.
* CRITICAL: Always return 200 — Statuspage removes subscriber webhooks on non-2xx responses.
*/ */
export async function handleStatuspageWebhook(c) { export async function handleStatuspageWebhook(c) {
// Validate URL secret (timing-safe)
const secret = c.req.param("secret");
if (!await timingSafeEqual(secret, c.env.WEBHOOK_SECRET)) {
return c.text("Unauthorized", 401);
}
// Parse body
let body;
try { try {
body = await c.req.json(); // Validate URL secret (timing-safe)
} catch { const secret = c.req.param("secret");
return c.text("Bad Request", 400); if (!await timingSafeEqual(secret, c.env.WEBHOOK_SECRET)) {
console.error("Statuspage webhook: invalid secret");
return c.text("OK", 200);
}
// Parse body
let body;
try {
body = await c.req.json();
} catch {
console.error("Statuspage webhook: invalid JSON body");
return c.text("OK", 200);
}
const eventType = body?.meta?.event_type;
if (!eventType) {
console.error("Statuspage webhook: missing event_type");
return c.text("OK", 200);
}
console.log(`Statuspage webhook: ${eventType}`);
// Determine category and format message
let category, html, componentName;
if (eventType.startsWith("incident.")) {
if (!body.incident) {
console.error("Statuspage webhook: incident event missing incident data");
return c.text("OK", 200);
}
category = "incident";
html = formatIncidentMessage(body.incident);
} else if (eventType.startsWith("component.")) {
if (!body.component) {
console.error("Statuspage webhook: component event missing component data");
return c.text("OK", 200);
}
category = "component";
componentName = body.component.name || null;
html = formatComponentMessage(body.component, body.component_update);
} else {
console.error(`Statuspage webhook: unknown event type ${eventType}`);
return c.text("OK", 200);
}
// Get filtered subscribers (with component name filtering)
const subscribers = await getSubscribersByType(c.env.claude_status, category, componentName);
// Enqueue messages for fan-out via CF Queues (batch for performance)
const messages = subscribers.map(({ chatId, threadId }) => ({
body: { chatId, threadId, html },
}));
for (let i = 0; i < messages.length; i += 100) {
await c.env["claude-status"].sendBatch(messages.slice(i, i + 100));
}
console.log(`Enqueued ${messages.length} messages for ${category}${componentName ? `:${componentName}` : ""}`);
return c.text("OK", 200);
} catch (err) {
// Catch-all: log error but still return 200 to prevent Statuspage from removing us
console.error("Statuspage webhook: unexpected error", err);
return c.text("OK", 200);
} }
const eventType = body?.meta?.event_type;
if (!eventType) return c.text("Bad Request", 400);
console.log(`Statuspage webhook: ${eventType}`);
// Determine category and format message
let category, html, componentName;
if (eventType.startsWith("incident.")) {
category = "incident";
html = formatIncidentMessage(body.incident);
} else if (eventType.startsWith("component.")) {
category = "component";
componentName = body.component?.name || null;
html = formatComponentMessage(body.component, body.component_update);
} else {
return c.text("Unknown event type", 400);
}
// Get filtered subscribers (with component name filtering)
const subscribers = await getSubscribersByType(c.env.claude_status, category, componentName);
// Enqueue messages for fan-out via CF Queues (batch for performance)
const messages = subscribers.map(({ chatId, threadId }) => ({
body: { chatId, threadId, html },
}));
for (let i = 0; i < messages.length; i += 100) {
await c.env["claude-status"].sendBatch(messages.slice(i, i + 100));
}
console.log(`Enqueued ${messages.length} messages for ${category}${componentName ? `:${componentName}` : ""}`);
return c.text("OK", 200);
} }

20
test/crypto-utils.test.js Normal file
View File

@@ -0,0 +1,20 @@
import { describe, it, expect } from "vitest";
import { timingSafeEqual } from "../src/crypto-utils.js";
describe("timingSafeEqual", () => {
it("returns true for identical strings", async () => {
expect(await timingSafeEqual("secret123", "secret123")).toBe(true);
});
it("returns false for different strings", async () => {
expect(await timingSafeEqual("secret123", "wrong")).toBe(false);
});
it("returns false for empty vs non-empty", async () => {
expect(await timingSafeEqual("", "something")).toBe(false);
});
it("returns true for both empty", async () => {
expect(await timingSafeEqual("", "")).toBe(true);
});
});

124
test/kv-store.test.js Normal file
View File

@@ -0,0 +1,124 @@
import { describe, it, expect } from "vitest";
import { env } from "cloudflare:test";
import {
addSubscriber,
removeSubscriber,
getSubscriber,
updateSubscriberTypes,
updateSubscriberComponents,
getSubscribersByType,
} from "../src/kv-store.js";
// Each test uses unique chatIds to avoid cross-test interference (miniflare KV persists across tests)
describe("kv-store", () => {
const kv = env.claude_status;
describe("addSubscriber / getSubscriber", () => {
it("adds subscriber with default types", async () => {
await addSubscriber(kv, 100, null);
const sub = await getSubscriber(kv, 100, null);
expect(sub).toEqual({ types: ["incident", "component"], components: [] });
});
it("adds subscriber with threadId", async () => {
await addSubscriber(kv, 101, 456);
const sub = await getSubscriber(kv, 101, 456);
expect(sub).toEqual({ types: ["incident", "component"], components: [] });
});
it("handles threadId=0 (General topic)", async () => {
await addSubscriber(kv, 102, 0);
const sub = await getSubscriber(kv, 102, 0);
expect(sub).toEqual({ types: ["incident", "component"], components: [] });
});
it("preserves existing data on re-subscribe", async () => {
await addSubscriber(kv, 103, null);
await updateSubscriberTypes(kv, 103, null, ["incident"]);
await addSubscriber(kv, 103, null);
const sub = await getSubscriber(kv, 103, null);
expect(sub.types).toEqual(["incident"]);
});
});
describe("removeSubscriber", () => {
it("removes existing subscriber", async () => {
await addSubscriber(kv, 200, null);
await removeSubscriber(kv, 200, null);
const sub = await getSubscriber(kv, 200, null);
expect(sub).toBeNull();
});
});
describe("updateSubscriberTypes", () => {
it("updates types for existing subscriber", async () => {
await addSubscriber(kv, 300, null);
const result = await updateSubscriberTypes(kv, 300, null, ["incident"]);
expect(result).toBe(true);
const sub = await getSubscriber(kv, 300, null);
expect(sub.types).toEqual(["incident"]);
});
it("returns false for non-existent subscriber", async () => {
const result = await updateSubscriberTypes(kv, 99999, null, ["incident"]);
expect(result).toBe(false);
});
});
describe("updateSubscriberComponents", () => {
it("sets component filter", async () => {
await addSubscriber(kv, 400, null);
await updateSubscriberComponents(kv, 400, null, ["API"]);
const sub = await getSubscriber(kv, 400, null);
expect(sub.components).toEqual(["API"]);
});
});
describe("getSubscribersByType", () => {
it("filters by event type", async () => {
// Use unique IDs unlikely to collide with other tests
await addSubscriber(kv, 50001, null);
await updateSubscriberTypes(kv, 50001, null, ["incident"]);
await addSubscriber(kv, 50002, null);
await updateSubscriberTypes(kv, 50002, null, ["component"]);
const incident = await getSubscribersByType(kv, "incident");
const incidentIds = incident.map((s) => s.chatId);
expect(incidentIds).toContain(50001);
expect(incidentIds).not.toContain(50002);
const component = await getSubscribersByType(kv, "component");
const componentIds = component.map((s) => s.chatId);
expect(componentIds).toContain(50002);
expect(componentIds).not.toContain(50001);
});
it("filters by component name", async () => {
await addSubscriber(kv, 60001, null);
await updateSubscriberComponents(kv, 60001, null, ["API"]);
await addSubscriber(kv, 60002, null); // no component filter = all
const results = await getSubscribersByType(kv, "component", "API");
const ids = results.map((s) => s.chatId);
expect(ids).toContain(60001);
expect(ids).toContain(60002);
});
it("excludes non-matching component filter", async () => {
await addSubscriber(kv, 70001, null);
await updateSubscriberComponents(kv, 70001, null, ["Console"]);
const results = await getSubscribersByType(kv, "component", "API");
const ids = results.map((s) => s.chatId);
expect(ids).not.toContain(70001);
});
it("returns chatId as number", async () => {
await addSubscriber(kv, 80001, null);
const results = await getSubscribersByType(kv, "incident");
const match = results.find((s) => s.chatId === 80001);
expect(match).toBeDefined();
expect(typeof match.chatId).toBe("number");
});
});
});

View File

@@ -0,0 +1,79 @@
import { describe, it, expect, vi, beforeEach } from "vitest";
import { handleQueue } from "../src/queue-consumer.js";
/**
* Create a mock queue message with ack/retry tracking
*/
function mockMessage(body) {
return {
body,
ack: vi.fn(),
retry: vi.fn(),
};
}
describe("handleQueue", () => {
let env;
beforeEach(() => {
env = {
BOT_TOKEN: "test-token",
claude_status: {
delete: vi.fn(),
},
};
vi.restoreAllMocks();
});
it("acks on successful send", async () => {
vi.stubGlobal("fetch", vi.fn().mockResolvedValue({ ok: true, status: 200 }));
const msg = mockMessage({ chatId: 123, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.ack).toHaveBeenCalled();
expect(msg.retry).not.toHaveBeenCalled();
});
it("removes subscriber and acks on 403", async () => {
vi.stubGlobal("fetch", vi.fn().mockResolvedValue({ ok: false, status: 403 }));
const msg = mockMessage({ chatId: 123, threadId: null, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.ack).toHaveBeenCalled();
expect(env.claude_status.delete).toHaveBeenCalled();
});
it("retries on 429 rate limit", async () => {
vi.stubGlobal(
"fetch",
vi.fn().mockResolvedValue({
ok: false,
status: 429,
headers: new Headers({ "Retry-After": "5" }),
})
);
const msg = mockMessage({ chatId: 123, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.retry).toHaveBeenCalled();
expect(msg.ack).not.toHaveBeenCalled();
});
it("retries on 5xx server error", async () => {
vi.stubGlobal("fetch", vi.fn().mockResolvedValue({ ok: false, status: 502 }));
const msg = mockMessage({ chatId: 123, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.retry).toHaveBeenCalled();
expect(msg.ack).not.toHaveBeenCalled();
});
it("retries on network error", async () => {
vi.stubGlobal("fetch", vi.fn().mockRejectedValue(new Error("network fail")));
const msg = mockMessage({ chatId: 123, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.retry).toHaveBeenCalled();
});
it("skips malformed messages", async () => {
const msg = mockMessage({ chatId: null, html: null });
await handleQueue({ messages: [msg] }, env);
expect(msg.ack).toHaveBeenCalled();
});
});

View File

@@ -0,0 +1,63 @@
import { describe, it, expect } from "vitest";
import {
escapeHtml,
humanizeStatus,
statusIndicator,
formatComponentLine,
formatOverallStatus,
} from "../src/status-fetcher.js";
describe("escapeHtml", () => {
it("escapes HTML special chars", () => {
expect(escapeHtml('<script>"alert&"</script>')).toBe(
"&lt;script&gt;&quot;alert&amp;&quot;&lt;/script&gt;"
);
});
it("returns empty string for null/undefined", () => {
expect(escapeHtml(null)).toBe("");
expect(escapeHtml(undefined)).toBe("");
});
});
describe("humanizeStatus", () => {
it("maps known statuses", () => {
expect(humanizeStatus("operational")).toBe("Operational");
expect(humanizeStatus("major_outage")).toBe("Major Outage");
expect(humanizeStatus("resolved")).toBe("Resolved");
});
it("returns raw string for unknown status", () => {
expect(humanizeStatus("custom_status")).toBe("custom_status");
});
});
describe("statusIndicator", () => {
it("returns green check for operational", () => {
expect(statusIndicator("operational")).toBe("\u2705");
});
it("returns question mark for unknown", () => {
expect(statusIndicator("unknown_status")).toBe("\u2753");
});
});
describe("formatComponentLine", () => {
it("formats component with indicator and escaped name", () => {
const line = formatComponentLine({ name: "API", status: "operational" });
expect(line).toContain("\u2705");
expect(line).toContain("<b>API</b>");
expect(line).toContain("Operational");
});
});
describe("formatOverallStatus", () => {
it("maps known indicators", () => {
expect(formatOverallStatus("none")).toContain("All Systems Operational");
expect(formatOverallStatus("critical")).toContain("Critical System Outage");
});
it("returns raw value for unknown indicator", () => {
expect(formatOverallStatus("custom")).toBe("custom");
});
});

22
vitest.config.js Normal file
View File

@@ -0,0 +1,22 @@
import { defineConfig } from "vitest/config";
import { cloudflarePool, cloudflareTest } from "@cloudflare/vitest-pool-workers";
export default defineConfig({
plugins: [
cloudflareTest({
wrangler: { configPath: "./wrangler.jsonc" },
miniflare: {
// Override remote KV with local-only for tests
kvNamespaces: ["claude_status"],
},
}),
],
test: {
pool: cloudflarePool({
wrangler: { configPath: "./wrangler.jsonc" },
miniflare: {
kvNamespaces: ["claude_status"],
},
}),
},
});

View File

@@ -25,7 +25,7 @@
] ]
}, },
"observability": { "observability": {
"enabled": false, "enabled": true,
"head_sampling_rate": 1, "head_sampling_rate": 1,
"logs": { "logs": {
"enabled": true, "enabled": true,