feat: structured logging, JSDoc types, and WEBHOOK_SECRET guard

- Replace string-interpolated console.log/error with JSON.stringify
  for searchable/filterable CF Workers dashboard logs
- Add shared JSDoc typedefs (Subscriber, QueueMessage, ChatTarget)
  with @param/@returns annotations on key functions
- Guard against undefined WEBHOOK_SECRET env var (auth bypass on
  misconfigured deploy)
- Add 3 declined features to feature-decisions.md (fan-out decoupling,
  idempotency keys, /ping)
This commit is contained in:
2026-04-09 11:30:00 +07:00
parent 39afb0fd68
commit 7949da3734
7 changed files with 104 additions and 31 deletions

View File

@@ -6,7 +6,32 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
## Declined Features ## Declined Features
### 1. Admin Commands (/stats) ### 1. Fan-Out Decoupling (Two-Phase Queue)
**Idea**: Webhook handler enqueues a single "dispatch" message; queue consumer lists subscribers and re-enqueues individual "deliver" messages. Converts O(N) webhook handler to O(1).
**Decision**: Skip. Current subscriber count is small. The webhook handler completing in one pass is simpler to reason about and debug. Adding a two-phase queue introduces message type routing, a new queue message schema, and makes the data flow harder to follow — all for a scaling problem that doesn't exist yet.
**Why this rank**: Clear trigger: webhook response times or CPU usage climbing in CF dashboard. Straightforward to implement when needed.
### 2. Queue Message Idempotency Keys
**Idea**: Include `{ incidentId, chatId }` hash as dedup key. Check short-TTL KV key before sending to prevent duplicate delivery on queue retries.
**Decision**: Skip. Duplicate notifications are a minor UX annoyance, not a correctness issue. Adding a KV read+write per message doubles KV operations in the queue consumer for a rare edge case (crash between successful Telegram send and `msg.ack()`). CF Queues retry is already bounded to 3 attempts.
**Why this rank**: Only worth it if users report duplicate notifications as a real problem.
### 3. /ping Command
**Idea**: Bot replies with worker region + timestamp for liveness check.
**Decision**: Skip. `/status` already proves the bot is alive (it fetches from external API and replies). A dedicated `/ping` adds another command for marginal value. The web health check endpoint (`GET /`) serves the same purpose for monitoring.
**Why this rank**: Trivial to add but not useful enough to justify another command.
### 4. Admin Commands (/stats)
**Idea**: `/stats` to show subscriber count, recent webhook events (useful for bot operator). **Idea**: `/stats` to show subscriber count, recent webhook events (useful for bot operator).
@@ -14,7 +39,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why highest**: Low effort, no architectural changes. Just a new command + `kv.list()` count. First thing to add if the bot grows. **Why highest**: Low effort, no architectural changes. Just a new command + `kv.list()` count. First thing to add if the bot grows.
### 2. Webhook HMAC Signature Verification ### 5. Webhook HMAC Signature Verification
**Idea**: Verify Statuspage webhook payloads using HMAC signatures as a second auth layer beyond URL secret. **Idea**: Verify Statuspage webhook payloads using HMAC signatures as a second auth layer beyond URL secret.
@@ -22,7 +47,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Not blocked by effort — blocked by platform. Would be implemented immediately if Atlassian ships HMAC support. **Why this rank**: Not blocked by effort — blocked by platform. Would be implemented immediately if Atlassian ships HMAC support.
### 3. Proactive Rate Limit Tracking ### 6. Proactive Rate Limit Tracking
**Idea**: Track per-chat message counts to stay within Telegram's rate limits proactively. **Idea**: Track per-chat message counts to stay within Telegram's rate limits proactively.
@@ -30,7 +55,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Becomes necessary at scale. Clear trigger: frequent 429 errors in logs. **Why this rank**: Becomes necessary at scale. Clear trigger: frequent 429 errors in logs.
### 4. Status Change Deduplication ### 7. Status Change Deduplication
**Idea**: If a component flaps (operational → degraded → operational in 2 minutes), debounce into one message. **Idea**: If a component flaps (operational → degraded → operational in 2 minutes), debounce into one message.
@@ -38,7 +63,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Useful if flapping becomes noisy. Moderate effort with clear user-facing benefit. **Why this rank**: Useful if flapping becomes noisy. Moderate effort with clear user-facing benefit.
### 5. Inline Keyboard for /subscribe ### 8. Inline Keyboard for /subscribe
**Idea**: Replace text commands with clickable buttons using grammY's inline keyboard support. **Idea**: Replace text commands with clickable buttons using grammY's inline keyboard support.
@@ -46,7 +71,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Nice UX polish but not functional gap. grammY supports it well — moderate effort. **Why this rank**: Nice UX polish but not functional gap. grammY supports it well — moderate effort.
### 6. Scheduled Status Digest ### 9. Scheduled Status Digest
**Idea**: CF Workers `scheduled` cron trigger sends a daily "all clear" or summary to subscribers. **Idea**: CF Workers `scheduled` cron trigger sends a daily "all clear" or summary to subscribers.
@@ -54,7 +79,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Low user value. Only useful if users explicitly request daily summaries. **Why this rank**: Low user value. Only useful if users explicitly request daily summaries.
### 7. Mute Command (/mute \<duration>) ### 10. Mute Command (/mute \<duration>)
**Idea**: Temporarily pause notifications without unsubscribing (e.g., `/mute 2h`). **Idea**: Temporarily pause notifications without unsubscribing (e.g., `/mute 2h`).
@@ -62,7 +87,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Contradicts real-time purpose. `/stop` + `/start` is sufficient. **Why this rank**: Contradicts real-time purpose. `/stop` + `/start` is sufficient.
### 8. Multi-Language Support ### 11. Multi-Language Support
**Idea**: At minimum English/Vietnamese support. **Idea**: At minimum English/Vietnamese support.
@@ -70,7 +95,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Source data is English-only. Translating bot chrome while incidents stay English creates a mixed-language experience. **Why this rank**: Source data is English-only. Translating bot chrome while incidents stay English creates a mixed-language experience.
### 9. Web Dashboard ### 12. Web Dashboard
**Idea**: Replace the `/` health check with a status page showing subscriber count and recent webhook events. **Idea**: Replace the `/` health check with a status page showing subscriber count and recent webhook events.
@@ -78,7 +103,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Out of scope. The bot is the product — adding a web frontend changes the project's nature. **Why this rank**: Out of scope. The bot is the product — adding a web frontend changes the project's nature.
### 10. Dead Letter Queue for Failed Messages ### 13. Dead Letter Queue for Failed Messages
**Idea**: After CF Queues exhausts 3 retries, persist failed messages to KV or a dedicated DLQ for debugging. **Idea**: After CF Queues exhausts 3 retries, persist failed messages to KV or a dedicated DLQ for debugging.
@@ -86,7 +111,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Logging is sufficient for current scale. Revisit only if log retention (3-day free tier) is too short for debugging patterns. **Why this rank**: Logging is sufficient for current scale. Revisit only if log retention (3-day free tier) is too short for debugging patterns.
### 11. KV List Scalability (Subscriber Sharding) ### 14. KV List Scalability (Subscriber Sharding)
**Idea**: Shard subscriber keys by event type (e.g., `sub:incident:{chatId}`, `sub:component:{chatId}`) to avoid listing all subscribers on every webhook. **Idea**: Shard subscriber keys by event type (e.g., `sub:incident:{chatId}`, `sub:component:{chatId}`) to avoid listing all subscribers on every webhook.
@@ -94,7 +119,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Clear trigger: slow webhook response times at high subscriber counts. Migration path is straightforward when needed. **Why this rank**: Clear trigger: slow webhook response times at high subscriber counts. Migration path is straightforward when needed.
### 12. Digest / Quiet Mode ### 15. Digest / Quiet Mode
**Idea**: Batch notifications into a daily summary instead of instant alerts. **Idea**: Batch notifications into a daily summary instead of instant alerts.

View File

@@ -1,3 +1,5 @@
/** @import { ChatTarget } from "./types.js" */
import { Bot, webhookCallback } from "grammy"; import { Bot, webhookCallback } from "grammy";
import { import {
addSubscriber, addSubscriber,
@@ -17,6 +19,7 @@ let kv = null;
/** /**
* Extract chatId and threadId from grammY context * Extract chatId and threadId from grammY context
* @returns {ChatTarget}
*/ */
function getChatTarget(ctx) { function getChatTarget(ctx) {
return { return {

View File

@@ -69,7 +69,7 @@ export function registerInfoCommands(bot) {
); );
} }
} catch (err) { } catch (err) {
console.error("status command error:", err); console.error(JSON.stringify({ event: "command_error", command: "status", error: err.message }));
await ctx.reply("Unable to fetch status. Please try again later."); await ctx.reply("Unable to fetch status. Please try again later.");
} }
}); });
@@ -91,7 +91,7 @@ export function registerInfoCommands(bot) {
{ parse_mode: "HTML", disable_web_page_preview: true } { parse_mode: "HTML", disable_web_page_preview: true }
); );
} catch (err) { } catch (err) {
console.error("history command error:", err); console.error(JSON.stringify({ event: "command_error", command: "history", error: err.message }));
await ctx.reply("Unable to fetch incident history. Please try again later."); await ctx.reply("Unable to fetch incident history. Please try again later.");
} }
}); });
@@ -116,7 +116,7 @@ export function registerInfoCommands(bot) {
{ parse_mode: "HTML", disable_web_page_preview: true } { parse_mode: "HTML", disable_web_page_preview: true }
); );
} catch (err) { } catch (err) {
console.error("uptime command error:", err); console.error(JSON.stringify({ event: "command_error", command: "uptime", error: err.message }));
await ctx.reply("Unable to fetch uptime data. Please try again later."); await ctx.reply("Unable to fetch uptime data. Please try again later.");
} }
}); });

View File

@@ -1,3 +1,5 @@
/** @import { Subscriber, ChatTarget } from "./types.js" */
const KEY_PREFIX = "sub:"; const KEY_PREFIX = "sub:";
/** /**
@@ -54,6 +56,10 @@ async function listAllSubscriberKeys(kv) {
/** /**
* Add or re-subscribe a user. Preserves existing types and components if already subscribed. * Add or re-subscribe a user. Preserves existing types and components if already subscribed.
* @param {KVNamespace} kv
* @param {number} chatId
* @param {?number} threadId
* @param {string[]} types
*/ */
export async function addSubscriber(kv, chatId, threadId, types = ["incident", "component"]) { export async function addSubscriber(kv, chatId, threadId, types = ["incident", "component"]) {
const key = buildKvKey(chatId, threadId); const key = buildKvKey(chatId, threadId);
@@ -105,6 +111,10 @@ export async function updateSubscriberComponents(kv, chatId, threadId, component
/** /**
* Get a single subscriber's data, or null if not subscribed * Get a single subscriber's data, or null if not subscribed
* @param {KVNamespace} kv
* @param {number} chatId
* @param {?number} threadId
* @returns {Promise<?Subscriber>}
*/ */
export async function getSubscriber(kv, chatId, threadId) { export async function getSubscriber(kv, chatId, threadId) {
const key = buildKvKey(chatId, threadId); const key = buildKvKey(chatId, threadId);
@@ -114,6 +124,10 @@ export async function getSubscriber(kv, chatId, threadId) {
/** /**
* Get subscribers filtered by event type and optional component name. * Get subscribers filtered by event type and optional component name.
* Uses KV metadata from list() — O(1) list call, no individual get() needed. * Uses KV metadata from list() — O(1) list call, no individual get() needed.
* @param {KVNamespace} kv
* @param {string} eventType
* @param {?string} componentName
* @returns {Promise<ChatTarget[]>}
*/ */
export async function getSubscribersByType(kv, eventType, componentName = null) { export async function getSubscribersByType(kv, eventType, componentName = null) {
const keys = await listAllSubscriberKeys(kv); const keys = await listAllSubscriberKeys(kv);

View File

@@ -1,8 +1,12 @@
/** @import { QueueMessage } from "./types.js" */
import { removeSubscriber } from "./kv-store.js"; import { removeSubscriber } from "./kv-store.js";
import { telegramUrl } from "./telegram-api.js"; import { telegramUrl } from "./telegram-api.js";
/** /**
* Process a batch of queued messages, sending each to Telegram. * Process a batch of queued messages, sending each to Telegram.
* Handles rate limits (429 → retry), blocked bots (403/400 → remove subscriber). * Handles rate limits (429 → retry), blocked bots (403/400 → remove subscriber).
* @param {{ messages: Array<{ body: QueueMessage, ack: () => void, retry: () => void }> }} batch
* @param {object} env
*/ */
export async function handleQueue(batch, env) { export async function handleQueue(batch, env) {
let sent = 0, failed = 0, retried = 0, removed = 0; let sent = 0, failed = 0, retried = 0, removed = 0;
@@ -12,7 +16,7 @@ export async function handleQueue(batch, env) {
// Defensive check for malformed messages // Defensive check for malformed messages
if (!chatId || !html) { if (!chatId || !html) {
console.error("Queue: malformed message, skipping", msg.body); console.error(JSON.stringify({ event: "queue_skip", reason: "malformed", body: msg.body }));
msg.ack(); msg.ack();
continue; continue;
} }
@@ -37,32 +41,32 @@ export async function handleQueue(batch, env) {
sent++; sent++;
msg.ack(); msg.ack();
} else if (res.status === 403 || res.status === 400) { } else if (res.status === 403 || res.status === 400) {
console.log(`Queue: removing subscriber ${chatId}:${threadId} (HTTP ${res.status})`); console.log(JSON.stringify({ event: "queue_remove", chatId, threadId, status: res.status }));
await removeSubscriber(env.claude_status, chatId, threadId); await removeSubscriber(env.claude_status, chatId, threadId);
removed++; removed++;
msg.ack(); msg.ack();
} else if (res.status === 429) { } else if (res.status === 429) {
const retryAfter = res.headers.get("Retry-After"); const retryAfter = res.headers.get("Retry-After");
console.log(`Queue: rate limited for ${chatId}, Retry-After: ${retryAfter ?? "unknown"}`); console.log(JSON.stringify({ event: "queue_ratelimit", chatId, retryAfter }));
retried++; retried++;
msg.retry(); msg.retry();
} else if (res.status >= 500) { } else if (res.status >= 500) {
console.error(`Queue: Telegram 5xx (${res.status}) for ${chatId}, retrying`); console.error(JSON.stringify({ event: "queue_retry", chatId, status: res.status }));
retried++; retried++;
msg.retry(); msg.retry();
} else { } else {
console.error(`Queue: unexpected HTTP ${res.status} for ${chatId}`); console.error(JSON.stringify({ event: "queue_error", chatId, status: res.status }));
failed++; failed++;
msg.ack(); msg.ack();
} }
} catch (err) { } catch (err) {
console.error("Queue: network error, retrying", err); console.error(JSON.stringify({ event: "queue_network_error", chatId, error: err.message }));
retried++; retried++;
msg.retry(); msg.retry();
} }
} }
if (sent || failed || retried || removed) { if (sent || failed || retried || removed) {
console.log(`Queue batch: sent=${sent} failed=${failed} retried=${retried} removed=${removed}`); console.log(JSON.stringify({ event: "queue_batch", sent, failed, retried, removed }));
} }
} }

View File

@@ -40,9 +40,15 @@ function formatComponentMessage(component, update) {
export async function handleStatuspageWebhook(c) { export async function handleStatuspageWebhook(c) {
try { try {
// Validate URL secret (timing-safe) // Validate URL secret (timing-safe)
// Guard against misconfigured deploy (undefined env var)
if (!c.env.WEBHOOK_SECRET) {
console.error(JSON.stringify({ event: "webhook_error", reason: "WEBHOOK_SECRET not configured" }));
return c.text("OK", 200);
}
const secret = c.req.param("secret"); const secret = c.req.param("secret");
if (!await timingSafeEqual(secret, c.env.WEBHOOK_SECRET)) { if (!await timingSafeEqual(secret, c.env.WEBHOOK_SECRET)) {
console.error("Statuspage webhook: invalid secret"); console.error(JSON.stringify({ event: "webhook_error", reason: "invalid_secret" }));
return c.text("OK", 200); return c.text("OK", 200);
} }
@@ -51,37 +57,37 @@ export async function handleStatuspageWebhook(c) {
try { try {
body = await c.req.json(); body = await c.req.json();
} catch { } catch {
console.error("Statuspage webhook: invalid JSON body"); console.error(JSON.stringify({ event: "webhook_error", reason: "invalid_json" }));
return c.text("OK", 200); return c.text("OK", 200);
} }
const eventType = body?.meta?.event_type; const eventType = body?.meta?.event_type;
if (!eventType) { if (!eventType) {
console.error("Statuspage webhook: missing event_type"); console.error(JSON.stringify({ event: "webhook_error", reason: "missing_event_type" }));
return c.text("OK", 200); return c.text("OK", 200);
} }
console.log(`Statuspage webhook: ${eventType}`); console.log(JSON.stringify({ event: "webhook_received", eventType }));
// Determine category and format message // Determine category and format message
let category, html, componentName; let category, html, componentName;
if (eventType.startsWith("incident.")) { if (eventType.startsWith("incident.")) {
if (!body.incident) { if (!body.incident) {
console.error("Statuspage webhook: incident event missing incident data"); console.error(JSON.stringify({ event: "webhook_error", reason: "missing_incident_data", eventType }));
return c.text("OK", 200); return c.text("OK", 200);
} }
category = "incident"; category = "incident";
html = formatIncidentMessage(body.incident); html = formatIncidentMessage(body.incident);
} else if (eventType.startsWith("component.")) { } else if (eventType.startsWith("component.")) {
if (!body.component) { if (!body.component) {
console.error("Statuspage webhook: component event missing component data"); console.error(JSON.stringify({ event: "webhook_error", reason: "missing_component_data", eventType }));
return c.text("OK", 200); return c.text("OK", 200);
} }
category = "component"; category = "component";
componentName = body.component.name || null; componentName = body.component.name || null;
html = formatComponentMessage(body.component, body.component_update); html = formatComponentMessage(body.component, body.component_update);
} else { } else {
console.error(`Statuspage webhook: unknown event type ${eventType}`); console.error(JSON.stringify({ event: "webhook_error", reason: "unknown_event_type", eventType }));
return c.text("OK", 200); return c.text("OK", 200);
} }
@@ -96,11 +102,11 @@ export async function handleStatuspageWebhook(c) {
await c.env["claude-status"].sendBatch(messages.slice(i, i + 100)); await c.env["claude-status"].sendBatch(messages.slice(i, i + 100));
} }
console.log(`Enqueued ${messages.length} messages for ${category}${componentName ? `:${componentName}` : ""}`); console.log(JSON.stringify({ event: "webhook_enqueued", category, componentName, count: messages.length }));
return c.text("OK", 200); return c.text("OK", 200);
} catch (err) { } catch (err) {
// Catch-all: log error but still return 200 to prevent Statuspage from removing us // Catch-all: log error but still return 200 to prevent Statuspage from removing us
console.error("Statuspage webhook: unexpected error", err); console.error(JSON.stringify({ event: "webhook_error", reason: "unexpected", error: err.message }));
return c.text("OK", 200); return c.text("OK", 200);
} }
} }

21
src/types.js Normal file
View File

@@ -0,0 +1,21 @@
/**
* Shared JSDoc type definitions for the project.
* Import via: @import { Subscriber, QueueMessage, ChatTarget } from "./types.js"
*/
/**
* Subscriber preferences stored in KV value and metadata
* @typedef {{ types: string[], components: string[] }} Subscriber
*/
/**
* Message body enqueued to CF Queue for fan-out delivery
* @typedef {{ chatId: number, threadId: ?number, html: string }} QueueMessage
*/
/**
* Chat target extracted from Telegram update
* @typedef {{ chatId: number, threadId: ?number }} ChatTarget
*/
export {};