3 Commits

Author SHA1 Message Date
tiennm99 3f03521e84 feat(scripts): phase 07 — reverse-backfill scripts + delete guard
Pre-execution prerequisites for the Phase 07 cutover. Stage 2 of the
cutover keeps DUAL_WRITE=0 for ~6 days; if anything regresses during
that window the operator MUST be able to roll back to KV/D1 with the
last N days of Mongo-only writes recovered. Pre-building these scripts
(per code-reviewer #4) eliminates "draft a backfill under outage
pressure" — the anti-pattern of writing untested code at 4am.

Reverse-backfill
- scripts/backfill-mongo-to-kv.js: full-scan Mongo collection per module,
  PUT each doc back to CF KV via REST. expiresAt → expirationTtl (clamped
  to 60s minimum per CF KV); already-expired docs are skipped (won't
  resurrect dead state). 50 ops/sec throttle. --dry-run + --module flags.
- scripts/backfill-mongo-to-d1.js: full-scan trading_trades, build INSERT
  SQL preserving legacy_id where present (round-trips D1 autoincrement IDs
  preserved by phase-05 forward backfill). Sequential int generation for
  any docs without legacy_id. Pipes through wrangler d1 execute.
- scripts/lib/migration-helpers.js: cfKvPut helper added.

Delete guard (debugger #12)
- scripts/wrangler-delete-guard.sh: interactive CONFIRM wrapper around
  wrangler kv namespace delete + wrangler d1 delete. Exits 3 when stdin
  is not a tty so it cannot run in CI. Documented: never run in CI.

package.json: backfill:mongo:kv[:dry] + backfill:mongo:d1[:dry] scripts
wired.

Tests: 697 → 733 (+36).
- 7 cfKvPut tests (REST URL, querystring, body, expiration_ttl param).
- 10 reverse-KV TTL math tests (expired sentinel, future seconds, no-TTL,
  CF 60s minimum clamp).
- 9 reverse-D1 SQL construction tests (escaping, legacy_id preservation,
  sequential generation).

Lint clean. No Worker code touched. Stage 1 cutover, 7-day soak,
snapshots, and Stage 3 cleanup (delete CFKVStore + simplify factories +
edit package.json deploy chain) remain operator-driven and will be
committed separately after binding deletion.
2026-04-26 09:29:14 +07:00
tiennm99 55c873965c feat(observability): phase 06 — timing telemetry + soak analyzer + burst tester
Code prerequisites for the Phase 06 cold-start soak gate. The 24-72h soak
itself is operator-run; this commit ships the instrumentation + analysis
tools needed to make the PROCEED-or-PIVOT decision.

Telemetry
- src/util/timing.js: startTiming(cmd) returns {mark, end} that emits a
  structured cmd_timing log. takeColdFlag() returns {cold, isolateAgeMs}
  using a module-scoped boolean — first request in an isolate is cold,
  subsequent are warm. This replaces the originally-planned
  isolate_age_ms < 200ms classifier (broken because Mongo cold-connect
  itself is ~1500ms; cold requests would always bucket as warm —
  code-reviewer #11).
- src/util/request-context.js: setLastCold/getLastCold shared state
  bridges fetch-level cold detection into the dispatcher middleware
  without a circular import.
- src/index.js: takeColdFlag at the top of fetch() emits a request log
  and primes the request context for the dispatcher.
- src/modules/dispatcher.js: bot.use() middleware times every command.
  Chosen over per-handler wrapping to preserve the existing identity
  assertion in tests (handler === reg.allCommands.get(name).cmd.handler)
  — single instrumentation point, no contract change.

Soak tools (operator-run)
- scripts/analyze-soak.js: parses CF Logs export (NDJSON or CSV), filters
  cmd_timing events, computes p50/p95/p99 per (cmd, cold/warm). Counts
  dual-write secondary failures, mongo connection errors, CPU-time
  exceeded events. Writes markdown report.
- scripts/synthetic-burst.js: fires N parallel synthetic Telegram updates
  at the deployed Worker URL with cache-busting tokens. Used for the
  pre-deploy connection-cap stress test (debugger #2 — 20 parallel cold
  requests, abort if Atlas peak > 60% of 500-conn cap).
- package.json: analyze:soak + burst:synthetic scripts wired.

Tests
- tests/util/timing.test.js: 8 tests — timing semantics, cold flag flip.
- tests/scripts/analyze-soak.test.js: 22 tests — percentile math, NDJSON
  + CSV parse, aggregation, markdown formatting.

Tests: 667 → 697 (+30). Lint clean.

Operator runbook for Phase 06 (NOT executed by this commit):
1. Verify telemetry live via wrangler tail.
2. Run synthetic burst test: npm run burst:synthetic -- --url <prod>
3. Configure Atlas + CF Observability email alerts.
4. 24h soak (extend to 72h on stop-conditions per phase plan).
5. Daily npm run verify:mongo.
6. npm run analyze:soak -- --input <cf-logs.json> → soak-decision.md.
7. PROCEED to Phase 07 if cold-start P95 ≤ 2.5 × BASELINE_COLD_PING_MS;
   else execute phase-07-alt-pivot.md (Upstash standby).
2026-04-26 09:22:04 +07:00
tiennm99 0859356ec7 feat(scripts): phase 05 — backfill + verify + wipe (local node, no admin routes)
Operator-run migration scripts for KV→Mongo and D1→trading_trades, plus a
parity verifier and a rollback wiper. Pure local Node — no Worker code,
no /__admin/* routes, no new Worker secrets. Complies with
docs/architecture.md §10.

Scripts
- backfill-kv-to-mongo.js: paginates CF KV REST API per module, fetches
  values, $setOnInsert upsert into per-module Mongo collection. Resumes
  from .backfill-cursor-<module>.json on restart. Throttles 50 ops/sec.
  expiresAt derived from KV metadata.expiration (debugger #10). --dry-run
  and --module flags for incremental work.
- backfill-d1-to-mongo.js: wrangler d1 execute --remote --json → parse →
  insertMany batches into trading_trades, preserving original integer id
  as legacy_id (code-reviewer #13). Pre-flight aborts if collection
  non-empty unless --force.
- verify-mongo-parity.js: count parity ±1%, SHA256 value compare,
  expiresAt ±5min bucket. Full-scan when <10K docs, sqrt-sample
  capped at 500 otherwise (code-reviewer #21). Trading: full-scan
  on legacy_id/ts/user_id/symbol/qty.
- wipe-mongo.js: rollback helper. deleteMany across all collections
  with readline confirm. --yes for CI.
- lib/migration-helpers.js: shared sleep, sha256, checkpoint I/O,
  cfKvList/cfKvGet, MongoClient singleton, sample strategy.

Surface updates
- .env.deploy.example: CF account/token/namespace placeholders.
- package.json: backfill:kv[:dry], backfill:d1[:dry], verify:mongo,
  wipe:mongo scripts.
- check-secret-leaks.js: SECRETS array gains CLOUDFLARE_API_TOKEN +
  CLOUDFLARE_ACCOUNT_ID for defense-in-depth.
- .gitignore: .backfill-cursor-*.json excluded.

Tests: 638 → 667 (+29 pure-logic tests for sha256, checkpoint round-trip,
count-diff, sample-size, fetch-mocked CF REST). Lint clean.

Operator-run sequence (after Phase 06 deploy):
  npm run backfill:kv:dry   # preview
  npm run backfill:kv
  npm run backfill:d1:dry
  npm run backfill:d1
  npm run verify:mongo      # exit 0 = parity ok
2026-04-26 09:13:00 +07:00