mirror of
https://github.com/tiennm99/claude-status-webhook.git
synced 2026-04-17 11:20:30 +00:00
feat: structured logging, JSDoc types, and WEBHOOK_SECRET guard
- Replace string-interpolated console.log/error with JSON.stringify for searchable/filterable CF Workers dashboard logs - Add shared JSDoc typedefs (Subscriber, QueueMessage, ChatTarget) with @param/@returns annotations on key functions - Guard against undefined WEBHOOK_SECRET env var (auth bypass on misconfigured deploy) - Add 3 declined features to feature-decisions.md (fan-out decoupling, idempotency keys, /ping)
This commit is contained in:
@@ -6,7 +6,32 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
## Declined Features
|
||||
|
||||
### 1. Admin Commands (/stats)
|
||||
### 1. Fan-Out Decoupling (Two-Phase Queue)
|
||||
|
||||
**Idea**: Webhook handler enqueues a single "dispatch" message; queue consumer lists subscribers and re-enqueues individual "deliver" messages. Converts O(N) webhook handler to O(1).
|
||||
|
||||
**Decision**: Skip. Current subscriber count is small. The webhook handler completing in one pass is simpler to reason about and debug. Adding a two-phase queue introduces message type routing, a new queue message schema, and makes the data flow harder to follow — all for a scaling problem that doesn't exist yet.
|
||||
|
||||
**Why this rank**: Clear trigger: webhook response times or CPU usage climbing in CF dashboard. Straightforward to implement when needed.
|
||||
|
||||
### 2. Queue Message Idempotency Keys
|
||||
|
||||
**Idea**: Include `{ incidentId, chatId }` hash as dedup key. Check short-TTL KV key before sending to prevent duplicate delivery on queue retries.
|
||||
|
||||
**Decision**: Skip. Duplicate notifications are a minor UX annoyance, not a correctness issue. Adding a KV read+write per message doubles KV operations in the queue consumer for a rare edge case (crash between successful Telegram send and `msg.ack()`). CF Queues retry is already bounded to 3 attempts.
|
||||
|
||||
**Why this rank**: Only worth it if users report duplicate notifications as a real problem.
|
||||
|
||||
### 3. /ping Command
|
||||
|
||||
**Idea**: Bot replies with worker region + timestamp for liveness check.
|
||||
|
||||
**Decision**: Skip. `/status` already proves the bot is alive (it fetches from external API and replies). A dedicated `/ping` adds another command for marginal value. The web health check endpoint (`GET /`) serves the same purpose for monitoring.
|
||||
|
||||
**Why this rank**: Trivial to add but not useful enough to justify another command.
|
||||
|
||||
### 4. Admin Commands (/stats)
|
||||
|
||||
|
||||
**Idea**: `/stats` to show subscriber count, recent webhook events (useful for bot operator).
|
||||
|
||||
@@ -14,7 +39,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why highest**: Low effort, no architectural changes. Just a new command + `kv.list()` count. First thing to add if the bot grows.
|
||||
|
||||
### 2. Webhook HMAC Signature Verification
|
||||
### 5. Webhook HMAC Signature Verification
|
||||
|
||||
**Idea**: Verify Statuspage webhook payloads using HMAC signatures as a second auth layer beyond URL secret.
|
||||
|
||||
@@ -22,7 +47,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Not blocked by effort — blocked by platform. Would be implemented immediately if Atlassian ships HMAC support.
|
||||
|
||||
### 3. Proactive Rate Limit Tracking
|
||||
### 6. Proactive Rate Limit Tracking
|
||||
|
||||
**Idea**: Track per-chat message counts to stay within Telegram's rate limits proactively.
|
||||
|
||||
@@ -30,7 +55,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Becomes necessary at scale. Clear trigger: frequent 429 errors in logs.
|
||||
|
||||
### 4. Status Change Deduplication
|
||||
### 7. Status Change Deduplication
|
||||
|
||||
**Idea**: If a component flaps (operational → degraded → operational in 2 minutes), debounce into one message.
|
||||
|
||||
@@ -38,7 +63,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Useful if flapping becomes noisy. Moderate effort with clear user-facing benefit.
|
||||
|
||||
### 5. Inline Keyboard for /subscribe
|
||||
### 8. Inline Keyboard for /subscribe
|
||||
|
||||
**Idea**: Replace text commands with clickable buttons using grammY's inline keyboard support.
|
||||
|
||||
@@ -46,7 +71,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Nice UX polish but not functional gap. grammY supports it well — moderate effort.
|
||||
|
||||
### 6. Scheduled Status Digest
|
||||
### 9. Scheduled Status Digest
|
||||
|
||||
**Idea**: CF Workers `scheduled` cron trigger sends a daily "all clear" or summary to subscribers.
|
||||
|
||||
@@ -54,7 +79,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Low user value. Only useful if users explicitly request daily summaries.
|
||||
|
||||
### 7. Mute Command (/mute \<duration>)
|
||||
### 10. Mute Command (/mute \<duration>)
|
||||
|
||||
**Idea**: Temporarily pause notifications without unsubscribing (e.g., `/mute 2h`).
|
||||
|
||||
@@ -62,7 +87,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Contradicts real-time purpose. `/stop` + `/start` is sufficient.
|
||||
|
||||
### 8. Multi-Language Support
|
||||
### 11. Multi-Language Support
|
||||
|
||||
**Idea**: At minimum English/Vietnamese support.
|
||||
|
||||
@@ -70,7 +95,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Source data is English-only. Translating bot chrome while incidents stay English creates a mixed-language experience.
|
||||
|
||||
### 9. Web Dashboard
|
||||
### 12. Web Dashboard
|
||||
|
||||
**Idea**: Replace the `/` health check with a status page showing subscriber count and recent webhook events.
|
||||
|
||||
@@ -78,7 +103,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Out of scope. The bot is the product — adding a web frontend changes the project's nature.
|
||||
|
||||
### 10. Dead Letter Queue for Failed Messages
|
||||
### 13. Dead Letter Queue for Failed Messages
|
||||
|
||||
**Idea**: After CF Queues exhausts 3 retries, persist failed messages to KV or a dedicated DLQ for debugging.
|
||||
|
||||
@@ -86,7 +111,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Logging is sufficient for current scale. Revisit only if log retention (3-day free tier) is too short for debugging patterns.
|
||||
|
||||
### 11. KV List Scalability (Subscriber Sharding)
|
||||
### 14. KV List Scalability (Subscriber Sharding)
|
||||
|
||||
**Idea**: Shard subscriber keys by event type (e.g., `sub:incident:{chatId}`, `sub:component:{chatId}`) to avoid listing all subscribers on every webhook.
|
||||
|
||||
@@ -94,7 +119,7 @@ Ordered by likelihood of future implementation (top = most likely to revisit).
|
||||
|
||||
**Why this rank**: Clear trigger: slow webhook response times at high subscriber counts. Migration path is straightforward when needed.
|
||||
|
||||
### 12. Digest / Quiet Mode
|
||||
### 15. Digest / Quiet Mode
|
||||
|
||||
**Idea**: Batch notifications into a daily summary instead of instant alerts.
|
||||
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
/** @import { ChatTarget } from "./types.js" */
|
||||
|
||||
import { Bot, webhookCallback } from "grammy";
|
||||
import {
|
||||
addSubscriber,
|
||||
@@ -17,6 +19,7 @@ let kv = null;
|
||||
|
||||
/**
|
||||
* Extract chatId and threadId from grammY context
|
||||
* @returns {ChatTarget}
|
||||
*/
|
||||
function getChatTarget(ctx) {
|
||||
return {
|
||||
|
||||
@@ -69,7 +69,7 @@ export function registerInfoCommands(bot) {
|
||||
);
|
||||
}
|
||||
} catch (err) {
|
||||
console.error("status command error:", err);
|
||||
console.error(JSON.stringify({ event: "command_error", command: "status", error: err.message }));
|
||||
await ctx.reply("Unable to fetch status. Please try again later.");
|
||||
}
|
||||
});
|
||||
@@ -91,7 +91,7 @@ export function registerInfoCommands(bot) {
|
||||
{ parse_mode: "HTML", disable_web_page_preview: true }
|
||||
);
|
||||
} catch (err) {
|
||||
console.error("history command error:", err);
|
||||
console.error(JSON.stringify({ event: "command_error", command: "history", error: err.message }));
|
||||
await ctx.reply("Unable to fetch incident history. Please try again later.");
|
||||
}
|
||||
});
|
||||
@@ -116,7 +116,7 @@ export function registerInfoCommands(bot) {
|
||||
{ parse_mode: "HTML", disable_web_page_preview: true }
|
||||
);
|
||||
} catch (err) {
|
||||
console.error("uptime command error:", err);
|
||||
console.error(JSON.stringify({ event: "command_error", command: "uptime", error: err.message }));
|
||||
await ctx.reply("Unable to fetch uptime data. Please try again later.");
|
||||
}
|
||||
});
|
||||
|
||||
@@ -1,3 +1,5 @@
|
||||
/** @import { Subscriber, ChatTarget } from "./types.js" */
|
||||
|
||||
const KEY_PREFIX = "sub:";
|
||||
|
||||
/**
|
||||
@@ -54,6 +56,10 @@ async function listAllSubscriberKeys(kv) {
|
||||
|
||||
/**
|
||||
* Add or re-subscribe a user. Preserves existing types and components if already subscribed.
|
||||
* @param {KVNamespace} kv
|
||||
* @param {number} chatId
|
||||
* @param {?number} threadId
|
||||
* @param {string[]} types
|
||||
*/
|
||||
export async function addSubscriber(kv, chatId, threadId, types = ["incident", "component"]) {
|
||||
const key = buildKvKey(chatId, threadId);
|
||||
@@ -105,6 +111,10 @@ export async function updateSubscriberComponents(kv, chatId, threadId, component
|
||||
|
||||
/**
|
||||
* Get a single subscriber's data, or null if not subscribed
|
||||
* @param {KVNamespace} kv
|
||||
* @param {number} chatId
|
||||
* @param {?number} threadId
|
||||
* @returns {Promise<?Subscriber>}
|
||||
*/
|
||||
export async function getSubscriber(kv, chatId, threadId) {
|
||||
const key = buildKvKey(chatId, threadId);
|
||||
@@ -114,6 +124,10 @@ export async function getSubscriber(kv, chatId, threadId) {
|
||||
/**
|
||||
* Get subscribers filtered by event type and optional component name.
|
||||
* Uses KV metadata from list() — O(1) list call, no individual get() needed.
|
||||
* @param {KVNamespace} kv
|
||||
* @param {string} eventType
|
||||
* @param {?string} componentName
|
||||
* @returns {Promise<ChatTarget[]>}
|
||||
*/
|
||||
export async function getSubscribersByType(kv, eventType, componentName = null) {
|
||||
const keys = await listAllSubscriberKeys(kv);
|
||||
|
||||
@@ -1,8 +1,12 @@
|
||||
/** @import { QueueMessage } from "./types.js" */
|
||||
|
||||
import { removeSubscriber } from "./kv-store.js";
|
||||
import { telegramUrl } from "./telegram-api.js";
|
||||
/**
|
||||
* Process a batch of queued messages, sending each to Telegram.
|
||||
* Handles rate limits (429 → retry), blocked bots (403/400 → remove subscriber).
|
||||
* @param {{ messages: Array<{ body: QueueMessage, ack: () => void, retry: () => void }> }} batch
|
||||
* @param {object} env
|
||||
*/
|
||||
export async function handleQueue(batch, env) {
|
||||
let sent = 0, failed = 0, retried = 0, removed = 0;
|
||||
@@ -12,7 +16,7 @@ export async function handleQueue(batch, env) {
|
||||
|
||||
// Defensive check for malformed messages
|
||||
if (!chatId || !html) {
|
||||
console.error("Queue: malformed message, skipping", msg.body);
|
||||
console.error(JSON.stringify({ event: "queue_skip", reason: "malformed", body: msg.body }));
|
||||
msg.ack();
|
||||
continue;
|
||||
}
|
||||
@@ -37,32 +41,32 @@ export async function handleQueue(batch, env) {
|
||||
sent++;
|
||||
msg.ack();
|
||||
} else if (res.status === 403 || res.status === 400) {
|
||||
console.log(`Queue: removing subscriber ${chatId}:${threadId} (HTTP ${res.status})`);
|
||||
console.log(JSON.stringify({ event: "queue_remove", chatId, threadId, status: res.status }));
|
||||
await removeSubscriber(env.claude_status, chatId, threadId);
|
||||
removed++;
|
||||
msg.ack();
|
||||
} else if (res.status === 429) {
|
||||
const retryAfter = res.headers.get("Retry-After");
|
||||
console.log(`Queue: rate limited for ${chatId}, Retry-After: ${retryAfter ?? "unknown"}`);
|
||||
console.log(JSON.stringify({ event: "queue_ratelimit", chatId, retryAfter }));
|
||||
retried++;
|
||||
msg.retry();
|
||||
} else if (res.status >= 500) {
|
||||
console.error(`Queue: Telegram 5xx (${res.status}) for ${chatId}, retrying`);
|
||||
console.error(JSON.stringify({ event: "queue_retry", chatId, status: res.status }));
|
||||
retried++;
|
||||
msg.retry();
|
||||
} else {
|
||||
console.error(`Queue: unexpected HTTP ${res.status} for ${chatId}`);
|
||||
console.error(JSON.stringify({ event: "queue_error", chatId, status: res.status }));
|
||||
failed++;
|
||||
msg.ack();
|
||||
}
|
||||
} catch (err) {
|
||||
console.error("Queue: network error, retrying", err);
|
||||
console.error(JSON.stringify({ event: "queue_network_error", chatId, error: err.message }));
|
||||
retried++;
|
||||
msg.retry();
|
||||
}
|
||||
}
|
||||
|
||||
if (sent || failed || retried || removed) {
|
||||
console.log(`Queue batch: sent=${sent} failed=${failed} retried=${retried} removed=${removed}`);
|
||||
console.log(JSON.stringify({ event: "queue_batch", sent, failed, retried, removed }));
|
||||
}
|
||||
}
|
||||
|
||||
@@ -40,9 +40,15 @@ function formatComponentMessage(component, update) {
|
||||
export async function handleStatuspageWebhook(c) {
|
||||
try {
|
||||
// Validate URL secret (timing-safe)
|
||||
// Guard against misconfigured deploy (undefined env var)
|
||||
if (!c.env.WEBHOOK_SECRET) {
|
||||
console.error(JSON.stringify({ event: "webhook_error", reason: "WEBHOOK_SECRET not configured" }));
|
||||
return c.text("OK", 200);
|
||||
}
|
||||
|
||||
const secret = c.req.param("secret");
|
||||
if (!await timingSafeEqual(secret, c.env.WEBHOOK_SECRET)) {
|
||||
console.error("Statuspage webhook: invalid secret");
|
||||
console.error(JSON.stringify({ event: "webhook_error", reason: "invalid_secret" }));
|
||||
return c.text("OK", 200);
|
||||
}
|
||||
|
||||
@@ -51,37 +57,37 @@ export async function handleStatuspageWebhook(c) {
|
||||
try {
|
||||
body = await c.req.json();
|
||||
} catch {
|
||||
console.error("Statuspage webhook: invalid JSON body");
|
||||
console.error(JSON.stringify({ event: "webhook_error", reason: "invalid_json" }));
|
||||
return c.text("OK", 200);
|
||||
}
|
||||
|
||||
const eventType = body?.meta?.event_type;
|
||||
if (!eventType) {
|
||||
console.error("Statuspage webhook: missing event_type");
|
||||
console.error(JSON.stringify({ event: "webhook_error", reason: "missing_event_type" }));
|
||||
return c.text("OK", 200);
|
||||
}
|
||||
|
||||
console.log(`Statuspage webhook: ${eventType}`);
|
||||
console.log(JSON.stringify({ event: "webhook_received", eventType }));
|
||||
|
||||
// Determine category and format message
|
||||
let category, html, componentName;
|
||||
if (eventType.startsWith("incident.")) {
|
||||
if (!body.incident) {
|
||||
console.error("Statuspage webhook: incident event missing incident data");
|
||||
console.error(JSON.stringify({ event: "webhook_error", reason: "missing_incident_data", eventType }));
|
||||
return c.text("OK", 200);
|
||||
}
|
||||
category = "incident";
|
||||
html = formatIncidentMessage(body.incident);
|
||||
} else if (eventType.startsWith("component.")) {
|
||||
if (!body.component) {
|
||||
console.error("Statuspage webhook: component event missing component data");
|
||||
console.error(JSON.stringify({ event: "webhook_error", reason: "missing_component_data", eventType }));
|
||||
return c.text("OK", 200);
|
||||
}
|
||||
category = "component";
|
||||
componentName = body.component.name || null;
|
||||
html = formatComponentMessage(body.component, body.component_update);
|
||||
} else {
|
||||
console.error(`Statuspage webhook: unknown event type ${eventType}`);
|
||||
console.error(JSON.stringify({ event: "webhook_error", reason: "unknown_event_type", eventType }));
|
||||
return c.text("OK", 200);
|
||||
}
|
||||
|
||||
@@ -96,11 +102,11 @@ export async function handleStatuspageWebhook(c) {
|
||||
await c.env["claude-status"].sendBatch(messages.slice(i, i + 100));
|
||||
}
|
||||
|
||||
console.log(`Enqueued ${messages.length} messages for ${category}${componentName ? `:${componentName}` : ""}`);
|
||||
console.log(JSON.stringify({ event: "webhook_enqueued", category, componentName, count: messages.length }));
|
||||
return c.text("OK", 200);
|
||||
} catch (err) {
|
||||
// Catch-all: log error but still return 200 to prevent Statuspage from removing us
|
||||
console.error("Statuspage webhook: unexpected error", err);
|
||||
console.error(JSON.stringify({ event: "webhook_error", reason: "unexpected", error: err.message }));
|
||||
return c.text("OK", 200);
|
||||
}
|
||||
}
|
||||
|
||||
21
src/types.js
Normal file
21
src/types.js
Normal file
@@ -0,0 +1,21 @@
|
||||
/**
|
||||
* Shared JSDoc type definitions for the project.
|
||||
* Import via: @import { Subscriber, QueueMessage, ChatTarget } from "./types.js"
|
||||
*/
|
||||
|
||||
/**
|
||||
* Subscriber preferences stored in KV value and metadata
|
||||
* @typedef {{ types: string[], components: string[] }} Subscriber
|
||||
*/
|
||||
|
||||
/**
|
||||
* Message body enqueued to CF Queue for fan-out delivery
|
||||
* @typedef {{ chatId: number, threadId: ?number, html: string }} QueueMessage
|
||||
*/
|
||||
|
||||
/**
|
||||
* Chat target extracted from Telegram update
|
||||
* @typedef {{ chatId: number, threadId: ?number }} ChatTarget
|
||||
*/
|
||||
|
||||
export {};
|
||||
Reference in New Issue
Block a user