Pre-execution prerequisites for the Phase 07 cutover. Stage 2 of the
cutover keeps DUAL_WRITE=0 for ~6 days; if anything regresses during
that window the operator MUST be able to roll back to KV/D1 with the
last N days of Mongo-only writes recovered. Pre-building these scripts
(per code-reviewer #4) eliminates "draft a backfill under outage
pressure" — the anti-pattern of writing untested code at 4am.
Reverse-backfill
- scripts/backfill-mongo-to-kv.js: full-scan Mongo collection per module,
PUT each doc back to CF KV via REST. expiresAt → expirationTtl (clamped
to 60s minimum per CF KV); already-expired docs are skipped (won't
resurrect dead state). 50 ops/sec throttle. --dry-run + --module flags.
- scripts/backfill-mongo-to-d1.js: full-scan trading_trades, build INSERT
SQL preserving legacy_id where present (round-trips D1 autoincrement IDs
preserved by phase-05 forward backfill). Sequential int generation for
any docs without legacy_id. Pipes through wrangler d1 execute.
- scripts/lib/migration-helpers.js: cfKvPut helper added.
Delete guard (debugger #12)
- scripts/wrangler-delete-guard.sh: interactive CONFIRM wrapper around
wrangler kv namespace delete + wrangler d1 delete. Exits 3 when stdin
is not a tty so it cannot run in CI. Documented: never run in CI.
package.json: backfill:mongo:kv[:dry] + backfill:mongo:d1[:dry] scripts
wired.
Tests: 697 → 733 (+36).
- 7 cfKvPut tests (REST URL, querystring, body, expiration_ttl param).
- 10 reverse-KV TTL math tests (expired sentinel, future seconds, no-TTL,
CF 60s minimum clamp).
- 9 reverse-D1 SQL construction tests (escaping, legacy_id preservation,
sequential generation).
Lint clean. No Worker code touched. Stage 1 cutover, 7-day soak,
snapshots, and Stage 3 cleanup (delete CFKVStore + simplify factories +
edit package.json deploy chain) remain operator-driven and will be
committed separately after binding deletion.
Code prerequisites for the Phase 06 cold-start soak gate. The 24-72h soak
itself is operator-run; this commit ships the instrumentation + analysis
tools needed to make the PROCEED-or-PIVOT decision.
Telemetry
- src/util/timing.js: startTiming(cmd) returns {mark, end} that emits a
structured cmd_timing log. takeColdFlag() returns {cold, isolateAgeMs}
using a module-scoped boolean — first request in an isolate is cold,
subsequent are warm. This replaces the originally-planned
isolate_age_ms < 200ms classifier (broken because Mongo cold-connect
itself is ~1500ms; cold requests would always bucket as warm —
code-reviewer #11).
- src/util/request-context.js: setLastCold/getLastCold shared state
bridges fetch-level cold detection into the dispatcher middleware
without a circular import.
- src/index.js: takeColdFlag at the top of fetch() emits a request log
and primes the request context for the dispatcher.
- src/modules/dispatcher.js: bot.use() middleware times every command.
Chosen over per-handler wrapping to preserve the existing identity
assertion in tests (handler === reg.allCommands.get(name).cmd.handler)
— single instrumentation point, no contract change.
Soak tools (operator-run)
- scripts/analyze-soak.js: parses CF Logs export (NDJSON or CSV), filters
cmd_timing events, computes p50/p95/p99 per (cmd, cold/warm). Counts
dual-write secondary failures, mongo connection errors, CPU-time
exceeded events. Writes markdown report.
- scripts/synthetic-burst.js: fires N parallel synthetic Telegram updates
at the deployed Worker URL with cache-busting tokens. Used for the
pre-deploy connection-cap stress test (debugger #2 — 20 parallel cold
requests, abort if Atlas peak > 60% of 500-conn cap).
- package.json: analyze:soak + burst:synthetic scripts wired.
Tests
- tests/util/timing.test.js: 8 tests — timing semantics, cold flag flip.
- tests/scripts/analyze-soak.test.js: 22 tests — percentile math, NDJSON
+ CSV parse, aggregation, markdown formatting.
Tests: 667 → 697 (+30). Lint clean.
Operator runbook for Phase 06 (NOT executed by this commit):
1. Verify telemetry live via wrangler tail.
2. Run synthetic burst test: npm run burst:synthetic -- --url <prod>
3. Configure Atlas + CF Observability email alerts.
4. 24h soak (extend to 72h on stop-conditions per phase plan).
5. Daily npm run verify:mongo.
6. npm run analyze:soak -- --input <cf-logs.json> → soak-decision.md.
7. PROCEED to Phase 07 if cold-start P95 ≤ 2.5 × BASELINE_COLD_PING_MS;
else execute phase-07-alt-pivot.md (Upstash standby).