feat: structured logging, JSDoc types, and WEBHOOK_SECRET guard

- Replace string-interpolated console.log/error with JSON.stringify
  for searchable/filterable CF Workers dashboard logs
- Add shared JSDoc typedefs (Subscriber, QueueMessage, ChatTarget)
  with @param/@returns annotations on key functions
- Guard against undefined WEBHOOK_SECRET env var (auth bypass on
  misconfigured deploy)
- Add 3 declined features to feature-decisions.md (fan-out decoupling,
  idempotency keys, /ping)
This commit is contained in:
2026-04-09 11:30:00 +07:00
parent 39afb0fd68
commit 7949da3734
7 changed files with 104 additions and 31 deletions

View File

@@ -6,7 +6,32 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
## Declined Features
### 1. Admin Commands (/stats)
### 1. Fan-Out Decoupling (Two-Phase Queue)
**Idea**: Webhook handler enqueues a single "dispatch" message; queue consumer lists subscribers and re-enqueues individual "deliver" messages. Converts O(N) webhook handler to O(1).
**Decision**: Skip. Current subscriber count is small. The webhook handler completing in one pass is simpler to reason about and debug. Adding a two-phase queue introduces message type routing, a new queue message schema, and makes the data flow harder to follow — all for a scaling problem that doesn't exist yet.
**Why this rank**: Clear trigger: webhook response times or CPU usage climbing in CF dashboard. Straightforward to implement when needed.
### 2. Queue Message Idempotency Keys
**Idea**: Include `{ incidentId, chatId }` hash as dedup key. Check short-TTL KV key before sending to prevent duplicate delivery on queue retries.
**Decision**: Skip. Duplicate notifications are a minor UX annoyance, not a correctness issue. Adding a KV read+write per message doubles KV operations in the queue consumer for a rare edge case (crash between successful Telegram send and `msg.ack()`). CF Queues retry is already bounded to 3 attempts.
**Why this rank**: Only worth it if users report duplicate notifications as a real problem.
### 3. /ping Command
**Idea**: Bot replies with worker region + timestamp for liveness check.
**Decision**: Skip. `/status` already proves the bot is alive (it fetches from external API and replies). A dedicated `/ping` adds another command for marginal value. The web health check endpoint (`GET /`) serves the same purpose for monitoring.
**Why this rank**: Trivial to add but not useful enough to justify another command.
### 4. Admin Commands (/stats)
**Idea**: `/stats` to show subscriber count, recent webhook events (useful for bot operator).
@@ -14,7 +39,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why highest**: Low effort, no architectural changes. Just a new command + `kv.list()` count. First thing to add if the bot grows.
### 2. Webhook HMAC Signature Verification
### 5. Webhook HMAC Signature Verification
**Idea**: Verify Statuspage webhook payloads using HMAC signatures as a second auth layer beyond URL secret.
@@ -22,7 +47,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Not blocked by effort — blocked by platform. Would be implemented immediately if Atlassian ships HMAC support.
### 3. Proactive Rate Limit Tracking
### 6. Proactive Rate Limit Tracking
**Idea**: Track per-chat message counts to stay within Telegram's rate limits proactively.
@@ -30,7 +55,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Becomes necessary at scale. Clear trigger: frequent 429 errors in logs.
### 4. Status Change Deduplication
### 7. Status Change Deduplication
**Idea**: If a component flaps (operational → degraded → operational in 2 minutes), debounce into one message.
@@ -38,7 +63,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Useful if flapping becomes noisy. Moderate effort with clear user-facing benefit.
### 5. Inline Keyboard for /subscribe
### 8. Inline Keyboard for /subscribe
**Idea**: Replace text commands with clickable buttons using grammY's inline keyboard support.
@@ -46,7 +71,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Nice UX polish but not functional gap. grammY supports it well — moderate effort.
### 6. Scheduled Status Digest
### 9. Scheduled Status Digest
**Idea**: CF Workers `scheduled` cron trigger sends a daily "all clear" or summary to subscribers.
@@ -54,7 +79,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Low user value. Only useful if users explicitly request daily summaries.
### 7. Mute Command (/mute \<duration>)
### 10. Mute Command (/mute \<duration>)
**Idea**: Temporarily pause notifications without unsubscribing (e.g., `/mute 2h`).
@@ -62,7 +87,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Contradicts real-time purpose. `/stop` + `/start` is sufficient.
### 8. Multi-Language Support
### 11. Multi-Language Support
**Idea**: At minimum English/Vietnamese support.
@@ -70,7 +95,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Source data is English-only. Translating bot chrome while incidents stay English creates a mixed-language experience.
### 9. Web Dashboard
### 12. Web Dashboard
**Idea**: Replace the `/` health check with a status page showing subscriber count and recent webhook events.
@@ -78,7 +103,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Out of scope. The bot is the product — adding a web frontend changes the project's nature.
### 10. Dead Letter Queue for Failed Messages
### 13. Dead Letter Queue for Failed Messages
**Idea**: After CF Queues exhausts 3 retries, persist failed messages to KV or a dedicated DLQ for debugging.
@@ -86,7 +111,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Logging is sufficient for current scale. Revisit only if log retention (3-day free tier) is too short for debugging patterns.
### 11. KV List Scalability (Subscriber Sharding)
### 14. KV List Scalability (Subscriber Sharding)
**Idea**: Shard subscriber keys by event type (e.g., `sub:incident:{chatId}`, `sub:component:{chatId}`) to avoid listing all subscribers on every webhook.
@@ -94,7 +119,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
**Why this rank**: Clear trigger: slow webhook response times at high subscriber counts. Migration path is straightforward when needed.
### 12. Digest / Quiet Mode
### 15. Digest / Quiet Mode
**Idea**: Batch notifications into a daily summary instead of instant alerts.

View File

@@ -1,3 +1,5 @@
/** @import { ChatTarget } from "./types.js" */
import { Bot, webhookCallback } from "grammy";
import {
addSubscriber,
@@ -17,6 +19,7 @@ let kv = null;
/**
* Extract chatId and threadId from grammY context
* @returns {ChatTarget}
*/
function getChatTarget(ctx) {
return {

View File

@@ -69,7 +69,7 @@ export function registerInfoCommands(bot) {
);
}
} catch (err) {
console.error("status command error:", err);
console.error(JSON.stringify({ event: "command_error", command: "status", error: err.message }));
await ctx.reply("Unable to fetch status. Please try again later.");
}
});
@@ -91,7 +91,7 @@ export function registerInfoCommands(bot) {
{ parse_mode: "HTML", disable_web_page_preview: true }
);
} catch (err) {
console.error("history command error:", err);
console.error(JSON.stringify({ event: "command_error", command: "history", error: err.message }));
await ctx.reply("Unable to fetch incident history. Please try again later.");
}
});
@@ -116,7 +116,7 @@ export function registerInfoCommands(bot) {
{ parse_mode: "HTML", disable_web_page_preview: true }
);
} catch (err) {
console.error("uptime command error:", err);
console.error(JSON.stringify({ event: "command_error", command: "uptime", error: err.message }));
await ctx.reply("Unable to fetch uptime data. Please try again later.");
}
});

View File

@@ -1,3 +1,5 @@
/** @import { Subscriber, ChatTarget } from "./types.js" */
const KEY_PREFIX = "sub:";
/**
@@ -54,6 +56,10 @@ async function listAllSubscriberKeys(kv) {
/**
* Add or re-subscribe a user. Preserves existing types and components if already subscribed.
* @param {KVNamespace} kv
* @param {number} chatId
* @param {?number} threadId
* @param {string[]} types
*/
export async function addSubscriber(kv, chatId, threadId, types = ["incident", "component"]) {
const key = buildKvKey(chatId, threadId);
@@ -105,6 +111,10 @@ export async function updateSubscriberComponents(kv, chatId, threadId, component
/**
* Get a single subscriber's data, or null if not subscribed
* @param {KVNamespace} kv
* @param {number} chatId
* @param {?number} threadId
* @returns {Promise<?Subscriber>}
*/
export async function getSubscriber(kv, chatId, threadId) {
const key = buildKvKey(chatId, threadId);
@@ -114,6 +124,10 @@ export async function getSubscriber(kv, chatId, threadId) {
/**
* Get subscribers filtered by event type and optional component name.
* Uses KV metadata from list() — O(1) list call, no individual get() needed.
* @param {KVNamespace} kv
* @param {string} eventType
* @param {?string} componentName
* @returns {Promise<ChatTarget[]>}
*/
export async function getSubscribersByType(kv, eventType, componentName = null) {
const keys = await listAllSubscriberKeys(kv);

View File

@@ -1,8 +1,12 @@
/** @import { QueueMessage } from "./types.js" */
import { removeSubscriber } from "./kv-store.js";
import { telegramUrl } from "./telegram-api.js";
/**
* Process a batch of queued messages, sending each to Telegram.
* Handles rate limits (429 → retry), blocked bots (403/400 → remove subscriber).
* @param {{ messages: Array<{ body: QueueMessage, ack: () => void, retry: () => void }> }} batch
* @param {object} env
*/
export async function handleQueue(batch, env) {
let sent = 0, failed = 0, retried = 0, removed = 0;
@@ -12,7 +16,7 @@ export async function handleQueue(batch, env) {
// Defensive check for malformed messages
if (!chatId || !html) {
console.error("Queue: malformed message, skipping", msg.body);
console.error(JSON.stringify({ event: "queue_skip", reason: "malformed", body: msg.body }));
msg.ack();
continue;
}
@@ -37,32 +41,32 @@ export async function handleQueue(batch, env) {
sent++;
msg.ack();
} else if (res.status === 403 || res.status === 400) {
console.log(`Queue: removing subscriber ${chatId}:${threadId} (HTTP ${res.status})`);
console.log(JSON.stringify({ event: "queue_remove", chatId, threadId, status: res.status }));
await removeSubscriber(env.claude_status, chatId, threadId);
removed++;
msg.ack();
} else if (res.status === 429) {
const retryAfter = res.headers.get("Retry-After");
console.log(`Queue: rate limited for ${chatId}, Retry-After: ${retryAfter ?? "unknown"}`);
console.log(JSON.stringify({ event: "queue_ratelimit", chatId, retryAfter }));
retried++;
msg.retry();
} else if (res.status >= 500) {
console.error(`Queue: Telegram 5xx (${res.status}) for ${chatId}, retrying`);
console.error(JSON.stringify({ event: "queue_retry", chatId, status: res.status }));
retried++;
msg.retry();
} else {
console.error(`Queue: unexpected HTTP ${res.status} for ${chatId}`);
console.error(JSON.stringify({ event: "queue_error", chatId, status: res.status }));
failed++;
msg.ack();
}
} catch (err) {
console.error("Queue: network error, retrying", err);
console.error(JSON.stringify({ event: "queue_network_error", chatId, error: err.message }));
retried++;
msg.retry();
}
}
if (sent || failed || retried || removed) {
console.log(`Queue batch: sent=${sent} failed=${failed} retried=${retried} removed=${removed}`);
console.log(JSON.stringify({ event: "queue_batch", sent, failed, retried, removed }));
}
}

View File

@@ -40,9 +40,15 @@ function formatComponentMessage(component, update) {
export async function handleStatuspageWebhook(c) {
try {
// Validate URL secret (timing-safe)
// Guard against misconfigured deploy (undefined env var)
if (!c.env.WEBHOOK_SECRET) {
console.error(JSON.stringify({ event: "webhook_error", reason: "WEBHOOK_SECRET not configured" }));
return c.text("OK", 200);
}
const secret = c.req.param("secret");
if (!await timingSafeEqual(secret, c.env.WEBHOOK_SECRET)) {
console.error("Statuspage webhook: invalid secret");
console.error(JSON.stringify({ event: "webhook_error", reason: "invalid_secret" }));
return c.text("OK", 200);
}
@@ -51,37 +57,37 @@ export async function handleStatuspageWebhook(c) {
try {
body = await c.req.json();
} catch {
console.error("Statuspage webhook: invalid JSON body");
console.error(JSON.stringify({ event: "webhook_error", reason: "invalid_json" }));
return c.text("OK", 200);
}
const eventType = body?.meta?.event_type;
if (!eventType) {
console.error("Statuspage webhook: missing event_type");
console.error(JSON.stringify({ event: "webhook_error", reason: "missing_event_type" }));
return c.text("OK", 200);
}
console.log(`Statuspage webhook: ${eventType}`);
console.log(JSON.stringify({ event: "webhook_received", eventType }));
// Determine category and format message
let category, html, componentName;
if (eventType.startsWith("incident.")) {
if (!body.incident) {
console.error("Statuspage webhook: incident event missing incident data");
console.error(JSON.stringify({ event: "webhook_error", reason: "missing_incident_data", eventType }));
return c.text("OK", 200);
}
category = "incident";
html = formatIncidentMessage(body.incident);
} else if (eventType.startsWith("component.")) {
if (!body.component) {
console.error("Statuspage webhook: component event missing component data");
console.error(JSON.stringify({ event: "webhook_error", reason: "missing_component_data", eventType }));
return c.text("OK", 200);
}
category = "component";
componentName = body.component.name || null;
html = formatComponentMessage(body.component, body.component_update);
} else {
console.error(`Statuspage webhook: unknown event type ${eventType}`);
console.error(JSON.stringify({ event: "webhook_error", reason: "unknown_event_type", eventType }));
return c.text("OK", 200);
}
@@ -96,11 +102,11 @@ export async function handleStatuspageWebhook(c) {
await c.env["claude-status"].sendBatch(messages.slice(i, i + 100));
}
console.log(`Enqueued ${messages.length} messages for ${category}${componentName ? `:${componentName}` : ""}`);
console.log(JSON.stringify({ event: "webhook_enqueued", category, componentName, count: messages.length }));
return c.text("OK", 200);
} catch (err) {
// Catch-all: log error but still return 200 to prevent Statuspage from removing us
console.error("Statuspage webhook: unexpected error", err);
console.error(JSON.stringify({ event: "webhook_error", reason: "unexpected", error: err.message }));
return c.text("OK", 200);
}
}

21
src/types.js Normal file
View File

@@ -0,0 +1,21 @@
/**
* Shared JSDoc type definitions for the project.
* Import via: @import { Subscriber, QueueMessage, ChatTarget } from "./types.js"
*/
/**
* Subscriber preferences stored in KV value and metadata
* @typedef {{ types: string[], components: string[] }} Subscriber
*/
/**
* Message body enqueued to CF Queue for fan-out delivery
* @typedef {{ chatId: number, threadId: ?number, html: string }} QueueMessage
*/
/**
* Chat target extracted from Telegram update
* @typedef {{ chatId: number, threadId: ?number }} ChatTarget
*/
export {};