fix: harden webhook reliability, fix bugs, add test suite

- Statuspage webhook always returns 200 to prevent subscriber removal
- Fix parseKvKey returning string chatId instead of number
- Queue consumer retries on Telegram 5xx instead of acking (prevents message loss)
- Fix observability top-level enabled flag (false → true)
- Add defensive null checks for webhook payload body
- Cache Bot instance per isolate to avoid middleware rebuild per request
- Add vitest + @cloudflare/vitest-pool-workers with 31 tests
- Document DLQ and KV sharding as declined features
This commit is contained in:
2026-04-09 10:29:30 +07:00
parent bb8f4dcde8
commit 8c993df72b
15 changed files with 1680 additions and 57 deletions

View File

@@ -13,7 +13,10 @@ Telegram bot that forwards [status.claude.com](https://status.claude.com/) (Atla
- `npx wrangler deploy --dry-run` — Verify build without deploying
- `node scripts/setup-bot.js` — One-time: register bot commands + set Telegram webhook (interactive prompts)
No test framework configured yet. No linter configured.
- `npm test` — Run tests (vitest + @cloudflare/vitest-pool-workers, runs in Workers runtime)
- `npm run test:watch` — Run tests in watch mode
No linter configured.
## Secrets (set via `wrangler secret put`)

View File

@@ -78,7 +78,23 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Out of scope. The bot is the product — adding a web frontend changes the project's nature.
### 10. Digest / Quiet Mode
### 10. Dead Letter Queue for Failed Messages
**Idea**: After CF Queues exhausts 3 retries, persist failed messages to KV or a dedicated DLQ for debugging.
**Decision**: Skip. CF Workers already logs all queue consumer errors (including final retry failures) via the observability config. With 100% log sampling and persisted invocation logs, failed messages are visible in the Cloudflare Dashboard. Adding a KV-based DLQ introduces write overhead on every failure and cleanup logic for stale entries — not worth it when logs already provide the same visibility.
**Why this rank**: Logging is sufficient for current scale. Revisit only if log retention (3-day free tier) is too short for debugging patterns.
### 11. KV List Scalability (Subscriber Sharding)
**Idea**: Shard subscriber keys by event type (e.g., `sub:incident:{chatId}`, `sub:component:{chatId}`) to avoid listing all subscribers on every webhook.
**Decision**: Skip. Current `kv.list({ prefix: "sub:" })` pagination works for hundreds of subscribers. Sharding requires a KV schema migration, dual-write logic during transition, and doubles storage for subscribers who want both types. Not justified until `kv.list()` latency or cost becomes measurable.
**Why this rank**: Clear trigger: slow webhook response times at high subscriber counts. Migration path is straightforward when needed.
### 12. Digest / Quiet Mode
**Idea**: Batch notifications into a daily summary instead of instant alerts.

View File

@@ -56,7 +56,7 @@ A middleware in `index.js` normalizes double slashes in URL paths (Statuspage oc
| File | Lines | Responsibility |
|------|-------|---------------|
| `index.js` | ~30 | Hono router, path normalization middleware, export handlers |
| `bot-commands.js` | ~145 | `/start`, `/stop`, `/subscribe` — subscription management |
| `bot-commands.js` | ~155 | `/start`, `/stop`, `/subscribe` — subscription management (cached Bot instance) |
| `bot-info-commands.js` | ~125 | `/help`, `/status`, `/history`, `/uptime` — read-only info |
| `statuspage-webhook.js` | ~85 | Webhook validation, event parsing, subscriber fan-out |
| `queue-consumer.js` | ~65 | Batch message delivery, retry/removal logic |
@@ -94,6 +94,7 @@ Binding: `claude-status` queue
- **Batch size**: 30 messages per consumer invocation
- **Max retries**: 3 (configured in `wrangler.jsonc`)
- **429 handling**: `msg.retry()` with CF Queues backoff; `Retry-After` header logged
- **5xx handling**: `msg.retry()` for transient Telegram server errors
- **403/400 handling**: subscriber removed from KV, message acknowledged
- **Network errors**: `msg.retry()` for transient failures
@@ -108,6 +109,7 @@ Enabled via `wrangler.jsonc` `observability` config. Automatic — no code chang
## Security
- **Statuspage webhook always-200**: Handler always returns HTTP 200 (even on errors) to prevent Statuspage from removing the webhook subscription. Errors are logged, not surfaced as HTTP status codes.
- **Statuspage webhook auth**: URL path secret validated with timing-safe SHA-256 comparison
- **Telegram webhook**: Registered via `setup-bot.js` — Telegram only sends to the registered URL
- **No secrets in code**: `BOT_TOKEN` and `WEBHOOK_SECRET` stored as Cloudflare secrets

1237
package-lock.json generated

File diff suppressed because it is too large Load Diff

View File

@@ -6,7 +6,9 @@
"type": "module",
"scripts": {
"dev": "wrangler dev",
"deploy": "wrangler deploy"
"deploy": "wrangler deploy",
"test": "vitest run",
"test:watch": "vitest"
},
"repository": {
"type": "git",
@@ -24,6 +26,8 @@
"hono": "^4.12.12"
},
"devDependencies": {
"@cloudflare/vitest-pool-workers": "^0.14.2",
"vitest": "^4.1.3",
"wrangler": "^4.81.0"
}
}

View File

@@ -8,6 +8,13 @@ import {
} from "./kv-store.js";
import { fetchComponentByName, escapeHtml } from "./status-fetcher.js";
import { registerInfoCommands } from "./bot-info-commands.js";
/**
* Module-level KV reference, updated each request.
* Safe because CF Workers are single-threaded per isolate.
*/
let kv = null;
/**
* Extract chatId and threadId from grammY context
*/
@@ -19,11 +26,10 @@ function getChatTarget(ctx) {
}
/**
* Handle incoming Telegram webhook via grammY
* Create Bot with all commands registered. Called once per isolate.
*/
export async function handleTelegramWebhook(c) {
const bot = new Bot(c.env.BOT_TOKEN);
const kv = c.env.claude_status;
function createBot(token) {
const bot = new Bot(token);
bot.command("start", async (ctx) => {
const { chatId, threadId } = getChatTarget(ctx);
@@ -140,6 +146,29 @@ export async function handleTelegramWebhook(c) {
);
});
const handler = webhookCallback(bot, "cloudflare-mod");
return handler(c.req.raw);
return bot;
}
/**
* Cached Bot instance — avoids rebuilding middleware chain on every request.
* CF Workers reuse isolates, so module-level state persists across requests.
*/
let cachedBot = null;
let cachedToken = null;
let cachedHandler = null;
/**
* Handle incoming Telegram webhook via grammY
*/
export async function handleTelegramWebhook(c) {
// Update module-level KV ref (same binding across requests, but kept explicit)
kv = c.env.claude_status;
if (!cachedBot || cachedToken !== c.env.BOT_TOKEN) {
cachedBot = createBot(c.env.BOT_TOKEN);
cachedToken = c.env.BOT_TOKEN;
cachedHandler = webhookCallback(cachedBot, "cloudflare-mod");
}
return cachedHandler(c.req.raw);
}

View File

@@ -17,7 +17,7 @@ function parseKvKey(kvKey) {
const lastColon = raw.lastIndexOf(":");
// No colon or only negative sign prefix — no threadId
if (lastColon <= 0) {
return { chatId: raw, threadId: null };
return { chatId: Number(raw), threadId: null };
}
// Check if the part after last colon is a valid threadId (numeric)
const possibleThread = raw.slice(lastColon + 1);

View File

@@ -46,6 +46,10 @@ export async function handleQueue(batch, env) {
console.log(`Queue: rate limited for ${chatId}, Retry-After: ${retryAfter ?? "unknown"}`);
retried++;
msg.retry();
} else if (res.status >= 500) {
console.error(`Queue: Telegram 5xx (${res.status}) for ${chatId}, retrying`);
retried++;
msg.retry();
} else {
console.error(`Queue: unexpected HTTP ${res.status} for ${chatId}`);
failed++;

View File

@@ -34,13 +34,16 @@ function formatComponentMessage(component, update) {
}
/**
* Handle incoming Statuspage webhook
* Handle incoming Statuspage webhook.
* CRITICAL: Always return 200 — Statuspage removes subscriber webhooks on non-2xx responses.
*/
export async function handleStatuspageWebhook(c) {
try {
// Validate URL secret (timing-safe)
const secret = c.req.param("secret");
if (!await timingSafeEqual(secret, c.env.WEBHOOK_SECRET)) {
return c.text("Unauthorized", 401);
console.error("Statuspage webhook: invalid secret");
return c.text("OK", 200);
}
// Parse body
@@ -48,25 +51,38 @@ export async function handleStatuspageWebhook(c) {
try {
body = await c.req.json();
} catch {
return c.text("Bad Request", 400);
console.error("Statuspage webhook: invalid JSON body");
return c.text("OK", 200);
}
const eventType = body?.meta?.event_type;
if (!eventType) return c.text("Bad Request", 400);
if (!eventType) {
console.error("Statuspage webhook: missing event_type");
return c.text("OK", 200);
}
console.log(`Statuspage webhook: ${eventType}`);
// Determine category and format message
let category, html, componentName;
if (eventType.startsWith("incident.")) {
if (!body.incident) {
console.error("Statuspage webhook: incident event missing incident data");
return c.text("OK", 200);
}
category = "incident";
html = formatIncidentMessage(body.incident);
} else if (eventType.startsWith("component.")) {
if (!body.component) {
console.error("Statuspage webhook: component event missing component data");
return c.text("OK", 200);
}
category = "component";
componentName = body.component?.name || null;
componentName = body.component.name || null;
html = formatComponentMessage(body.component, body.component_update);
} else {
return c.text("Unknown event type", 400);
console.error(`Statuspage webhook: unknown event type ${eventType}`);
return c.text("OK", 200);
}
// Get filtered subscribers (with component name filtering)
@@ -81,6 +97,10 @@ export async function handleStatuspageWebhook(c) {
}
console.log(`Enqueued ${messages.length} messages for ${category}${componentName ? `:${componentName}` : ""}`);
return c.text("OK", 200);
} catch (err) {
// Catch-all: log error but still return 200 to prevent Statuspage from removing us
console.error("Statuspage webhook: unexpected error", err);
return c.text("OK", 200);
}
}

20
test/crypto-utils.test.js Normal file
View File

@@ -0,0 +1,20 @@
import { describe, it, expect } from "vitest";
import { timingSafeEqual } from "../src/crypto-utils.js";
describe("timingSafeEqual", () => {
it("returns true for identical strings", async () => {
expect(await timingSafeEqual("secret123", "secret123")).toBe(true);
});
it("returns false for different strings", async () => {
expect(await timingSafeEqual("secret123", "wrong")).toBe(false);
});
it("returns false for empty vs non-empty", async () => {
expect(await timingSafeEqual("", "something")).toBe(false);
});
it("returns true for both empty", async () => {
expect(await timingSafeEqual("", "")).toBe(true);
});
});

124
test/kv-store.test.js Normal file
View File

@@ -0,0 +1,124 @@
import { describe, it, expect } from "vitest";
import { env } from "cloudflare:test";
import {
addSubscriber,
removeSubscriber,
getSubscriber,
updateSubscriberTypes,
updateSubscriberComponents,
getSubscribersByType,
} from "../src/kv-store.js";
// Each test uses unique chatIds to avoid cross-test interference (miniflare KV persists across tests)
describe("kv-store", () => {
const kv = env.claude_status;
describe("addSubscriber / getSubscriber", () => {
it("adds subscriber with default types", async () => {
await addSubscriber(kv, 100, null);
const sub = await getSubscriber(kv, 100, null);
expect(sub).toEqual({ types: ["incident", "component"], components: [] });
});
it("adds subscriber with threadId", async () => {
await addSubscriber(kv, 101, 456);
const sub = await getSubscriber(kv, 101, 456);
expect(sub).toEqual({ types: ["incident", "component"], components: [] });
});
it("handles threadId=0 (General topic)", async () => {
await addSubscriber(kv, 102, 0);
const sub = await getSubscriber(kv, 102, 0);
expect(sub).toEqual({ types: ["incident", "component"], components: [] });
});
it("preserves existing data on re-subscribe", async () => {
await addSubscriber(kv, 103, null);
await updateSubscriberTypes(kv, 103, null, ["incident"]);
await addSubscriber(kv, 103, null);
const sub = await getSubscriber(kv, 103, null);
expect(sub.types).toEqual(["incident"]);
});
});
describe("removeSubscriber", () => {
it("removes existing subscriber", async () => {
await addSubscriber(kv, 200, null);
await removeSubscriber(kv, 200, null);
const sub = await getSubscriber(kv, 200, null);
expect(sub).toBeNull();
});
});
describe("updateSubscriberTypes", () => {
it("updates types for existing subscriber", async () => {
await addSubscriber(kv, 300, null);
const result = await updateSubscriberTypes(kv, 300, null, ["incident"]);
expect(result).toBe(true);
const sub = await getSubscriber(kv, 300, null);
expect(sub.types).toEqual(["incident"]);
});
it("returns false for non-existent subscriber", async () => {
const result = await updateSubscriberTypes(kv, 99999, null, ["incident"]);
expect(result).toBe(false);
});
});
describe("updateSubscriberComponents", () => {
it("sets component filter", async () => {
await addSubscriber(kv, 400, null);
await updateSubscriberComponents(kv, 400, null, ["API"]);
const sub = await getSubscriber(kv, 400, null);
expect(sub.components).toEqual(["API"]);
});
});
describe("getSubscribersByType", () => {
it("filters by event type", async () => {
// Use unique IDs unlikely to collide with other tests
await addSubscriber(kv, 50001, null);
await updateSubscriberTypes(kv, 50001, null, ["incident"]);
await addSubscriber(kv, 50002, null);
await updateSubscriberTypes(kv, 50002, null, ["component"]);
const incident = await getSubscribersByType(kv, "incident");
const incidentIds = incident.map((s) => s.chatId);
expect(incidentIds).toContain(50001);
expect(incidentIds).not.toContain(50002);
const component = await getSubscribersByType(kv, "component");
const componentIds = component.map((s) => s.chatId);
expect(componentIds).toContain(50002);
expect(componentIds).not.toContain(50001);
});
it("filters by component name", async () => {
await addSubscriber(kv, 60001, null);
await updateSubscriberComponents(kv, 60001, null, ["API"]);
await addSubscriber(kv, 60002, null); // no component filter = all
const results = await getSubscribersByType(kv, "component", "API");
const ids = results.map((s) => s.chatId);
expect(ids).toContain(60001);
expect(ids).toContain(60002);
});
it("excludes non-matching component filter", async () => {
await addSubscriber(kv, 70001, null);
await updateSubscriberComponents(kv, 70001, null, ["Console"]);
const results = await getSubscribersByType(kv, "component", "API");
const ids = results.map((s) => s.chatId);
expect(ids).not.toContain(70001);
});
it("returns chatId as number", async () => {
await addSubscriber(kv, 80001, null);
const results = await getSubscribersByType(kv, "incident");
const match = results.find((s) => s.chatId === 80001);
expect(match).toBeDefined();
expect(typeof match.chatId).toBe("number");
});
});
});

View File

@@ -0,0 +1,79 @@
import { describe, it, expect, vi, beforeEach } from "vitest";
import { handleQueue } from "../src/queue-consumer.js";
/**
* Create a mock queue message with ack/retry tracking
*/
function mockMessage(body) {
return {
body,
ack: vi.fn(),
retry: vi.fn(),
};
}
describe("handleQueue", () => {
let env;
beforeEach(() => {
env = {
BOT_TOKEN: "test-token",
claude_status: {
delete: vi.fn(),
},
};
vi.restoreAllMocks();
});
it("acks on successful send", async () => {
vi.stubGlobal("fetch", vi.fn().mockResolvedValue({ ok: true, status: 200 }));
const msg = mockMessage({ chatId: 123, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.ack).toHaveBeenCalled();
expect(msg.retry).not.toHaveBeenCalled();
});
it("removes subscriber and acks on 403", async () => {
vi.stubGlobal("fetch", vi.fn().mockResolvedValue({ ok: false, status: 403 }));
const msg = mockMessage({ chatId: 123, threadId: null, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.ack).toHaveBeenCalled();
expect(env.claude_status.delete).toHaveBeenCalled();
});
it("retries on 429 rate limit", async () => {
vi.stubGlobal(
"fetch",
vi.fn().mockResolvedValue({
ok: false,
status: 429,
headers: new Headers({ "Retry-After": "5" }),
})
);
const msg = mockMessage({ chatId: 123, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.retry).toHaveBeenCalled();
expect(msg.ack).not.toHaveBeenCalled();
});
it("retries on 5xx server error", async () => {
vi.stubGlobal("fetch", vi.fn().mockResolvedValue({ ok: false, status: 502 }));
const msg = mockMessage({ chatId: 123, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.retry).toHaveBeenCalled();
expect(msg.ack).not.toHaveBeenCalled();
});
it("retries on network error", async () => {
vi.stubGlobal("fetch", vi.fn().mockRejectedValue(new Error("network fail")));
const msg = mockMessage({ chatId: 123, html: "<b>test</b>" });
await handleQueue({ messages: [msg] }, env);
expect(msg.retry).toHaveBeenCalled();
});
it("skips malformed messages", async () => {
const msg = mockMessage({ chatId: null, html: null });
await handleQueue({ messages: [msg] }, env);
expect(msg.ack).toHaveBeenCalled();
});
});

View File

@@ -0,0 +1,63 @@
import { describe, it, expect } from "vitest";
import {
escapeHtml,
humanizeStatus,
statusIndicator,
formatComponentLine,
formatOverallStatus,
} from "../src/status-fetcher.js";
describe("escapeHtml", () => {
it("escapes HTML special chars", () => {
expect(escapeHtml('<script>"alert&"</script>')).toBe(
"&lt;script&gt;&quot;alert&amp;&quot;&lt;/script&gt;"
);
});
it("returns empty string for null/undefined", () => {
expect(escapeHtml(null)).toBe("");
expect(escapeHtml(undefined)).toBe("");
});
});
describe("humanizeStatus", () => {
it("maps known statuses", () => {
expect(humanizeStatus("operational")).toBe("Operational");
expect(humanizeStatus("major_outage")).toBe("Major Outage");
expect(humanizeStatus("resolved")).toBe("Resolved");
});
it("returns raw string for unknown status", () => {
expect(humanizeStatus("custom_status")).toBe("custom_status");
});
});
describe("statusIndicator", () => {
it("returns green check for operational", () => {
expect(statusIndicator("operational")).toBe("\u2705");
});
it("returns question mark for unknown", () => {
expect(statusIndicator("unknown_status")).toBe("\u2753");
});
});
describe("formatComponentLine", () => {
it("formats component with indicator and escaped name", () => {
const line = formatComponentLine({ name: "API", status: "operational" });
expect(line).toContain("\u2705");
expect(line).toContain("<b>API</b>");
expect(line).toContain("Operational");
});
});
describe("formatOverallStatus", () => {
it("maps known indicators", () => {
expect(formatOverallStatus("none")).toContain("All Systems Operational");
expect(formatOverallStatus("critical")).toContain("Critical System Outage");
});
it("returns raw value for unknown indicator", () => {
expect(formatOverallStatus("custom")).toBe("custom");
});
});

22
vitest.config.js Normal file
View File

@@ -0,0 +1,22 @@
import { defineConfig } from "vitest/config";
import { cloudflarePool, cloudflareTest } from "@cloudflare/vitest-pool-workers";
export default defineConfig({
plugins: [
cloudflareTest({
wrangler: { configPath: "./wrangler.jsonc" },
miniflare: {
// Override remote KV with local-only for tests
kvNamespaces: ["claude_status"],
},
}),
],
test: {
pool: cloudflarePool({
wrangler: { configPath: "./wrangler.jsonc" },
miniflare: {
kvNamespaces: ["claude_status"],
},
}),
},
});

View File

@@ -25,7 +25,7 @@
]
},
"observability": {
"enabled": false,
"enabled": true,
"head_sampling_rate": 1,
"logs": {
"enabled": true,