fix: harden webhook reliability, fix bugs, add test suite

- Statuspage webhook always returns 200 to prevent subscriber removal
- Fix parseKvKey returning string chatId instead of number
- Queue consumer retries on Telegram 5xx instead of acking (prevents message loss)
- Fix observability top-level enabled flag (false → true)
- Add defensive null checks for webhook payload body
- Cache Bot instance per isolate to avoid middleware rebuild per request
- Add vitest + @cloudflare/vitest-pool-workers with 31 tests
- Document DLQ and KV sharding as declined features
This commit is contained in:
2026-04-09 10:29:30 +07:00
parent bb8f4dcde8
commit 8c993df72b
15 changed files with 1680 additions and 57 deletions

View File

@@ -56,7 +56,7 @@ A middleware in `index.js` normalizes double slashes in URL paths (Statuspage oc
| File | Lines | Responsibility |
|------|-------|---------------|
| `index.js` | ~30 | Hono router, path normalization middleware, export handlers |
| `bot-commands.js` | ~145 | `/start`, `/stop`, `/subscribe` — subscription management |
| `bot-commands.js` | ~155 | `/start`, `/stop`, `/subscribe` — subscription management (cached Bot instance) |
| `bot-info-commands.js` | ~125 | `/help`, `/status`, `/history`, `/uptime` — read-only info |
| `statuspage-webhook.js` | ~85 | Webhook validation, event parsing, subscriber fan-out |
| `queue-consumer.js` | ~65 | Batch message delivery, retry/removal logic |
@@ -94,6 +94,7 @@ Binding: `claude-status` queue
- **Batch size**: 30 messages per consumer invocation
- **Max retries**: 3 (configured in `wrangler.jsonc`)
- **429 handling**: `msg.retry()` with CF Queues backoff; `Retry-After` header logged
- **5xx handling**: `msg.retry()` for transient Telegram server errors
- **403/400 handling**: subscriber removed from KV, message acknowledged
- **Network errors**: `msg.retry()` for transient failures
@@ -108,6 +109,7 @@ Enabled via `wrangler.jsonc` `observability` config. Automatic — no code chang
## Security
- **Statuspage webhook always-200**: Handler always returns HTTP 200 (even on errors) to prevent Statuspage from removing the webhook subscription. Errors are logged, not surfaced as HTTP status codes.
- **Statuspage webhook auth**: URL path secret validated with timing-safe SHA-256 comparison
- **Telegram webhook**: Registered via `setup-bot.js` — Telegram only sends to the registered URL
- **No secrets in code**: `BOT_TOKEN` and `WEBHOOK_SECRET` stored as Cloudflare secrets