Files
tiennm99 55c873965c feat(observability): phase 06 — timing telemetry + soak analyzer + burst tester
Code prerequisites for the Phase 06 cold-start soak gate. The 24-72h soak
itself is operator-run; this commit ships the instrumentation + analysis
tools needed to make the PROCEED-or-PIVOT decision.

Telemetry
- src/util/timing.js: startTiming(cmd) returns {mark, end} that emits a
  structured cmd_timing log. takeColdFlag() returns {cold, isolateAgeMs}
  using a module-scoped boolean — first request in an isolate is cold,
  subsequent are warm. This replaces the originally-planned
  isolate_age_ms < 200ms classifier (broken because Mongo cold-connect
  itself is ~1500ms; cold requests would always bucket as warm —
  code-reviewer #11).
- src/util/request-context.js: setLastCold/getLastCold shared state
  bridges fetch-level cold detection into the dispatcher middleware
  without a circular import.
- src/index.js: takeColdFlag at the top of fetch() emits a request log
  and primes the request context for the dispatcher.
- src/modules/dispatcher.js: bot.use() middleware times every command.
  Chosen over per-handler wrapping to preserve the existing identity
  assertion in tests (handler === reg.allCommands.get(name).cmd.handler)
  — single instrumentation point, no contract change.

Soak tools (operator-run)
- scripts/analyze-soak.js: parses CF Logs export (NDJSON or CSV), filters
  cmd_timing events, computes p50/p95/p99 per (cmd, cold/warm). Counts
  dual-write secondary failures, mongo connection errors, CPU-time
  exceeded events. Writes markdown report.
- scripts/synthetic-burst.js: fires N parallel synthetic Telegram updates
  at the deployed Worker URL with cache-busting tokens. Used for the
  pre-deploy connection-cap stress test (debugger #2 — 20 parallel cold
  requests, abort if Atlas peak > 60% of 500-conn cap).
- package.json: analyze:soak + burst:synthetic scripts wired.

Tests
- tests/util/timing.test.js: 8 tests — timing semantics, cold flag flip.
- tests/scripts/analyze-soak.test.js: 22 tests — percentile math, NDJSON
  + CSV parse, aggregation, markdown formatting.

Tests: 667 → 697 (+30). Lint clean.

Operator runbook for Phase 06 (NOT executed by this commit):
1. Verify telemetry live via wrangler tail.
2. Run synthetic burst test: npm run burst:synthetic -- --url <prod>
3. Configure Atlas + CF Observability email alerts.
4. 24h soak (extend to 72h on stop-conditions per phase plan).
5. Daily npm run verify:mongo.
6. npm run analyze:soak -- --input <cf-logs.json> → soak-decision.md.
7. PROCEED to Phase 07 if cold-start P95 ≤ 2.5 × BASELINE_COLD_PING_MS;
   else execute phase-07-alt-pivot.md (Upstash standby).
2026-04-26 09:22:04 +07:00
..