Files
tiennm99 6f0b5ff0a8 feat(db): phase 01 — atlas wrangler config + secret-leak lint + mongodb dep
Code/config slice of plan phase 01 (operator-only steps for cluster
provisioning, secrets, and runtime smoke tests deferred to user).

- wrangler.toml: add `compatibility_flags = ["nodejs_compat_v2"]`
  (compatibility_date `2025-10-01` already satisfies ≥ 2025-03-20)
- .env.deploy.example: add `MONGODB_URI` placeholder with mirror-protocol note
- scripts/check-secret-leaks.js: lint that fails build on `console.log(env.<SECRET>)`
  for MONGODB_URI / TELEGRAM_BOT_TOKEN / TELEGRAM_WEBHOOK_SECRET / ADMIN_TOKEN
- package.json: install mongodb@^6.7.0 (resolved 6.21.0); wire secret-leak
  check into `npm run lint`
- docs/using-mongodb.md: operational runbook (cluster spec, free-tier ceiling,
  auto-pause behavior, network access permanence, rollback, rotation)

Bundle-size HARD GATE: PASS. Probe with `import { MongoClient }` measures
226 KiB gzipped (3 MiB Free cap, 92% headroom) — nodejs_compat_v2 provides
node:net/tls/crypto from runtime so transitive deps stay unbundled.

CPU-time gate and auto-pause behavior gate require real Atlas access;
deferred to operator (see docs/using-mongodb.md for procedure).

503/503 vitest tests still pass.
2026-04-26 08:32:19 +07:00

143 lines
5.1 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Using MongoDB Atlas
Operational runbook for the MongoDB Atlas backend introduced by `plans/260425-1945-mongodb-atlas-migration/`.
## Cluster
| Field | Value |
|---|---|
| Provider | MongoDB Atlas |
| Tier | M0 Free |
| Region | `aws-ap-southeast-1` (Singapore) |
| Cluster name | `miti99bot-prod` (operator confirms) |
| Database | `miti99bot` |
| DB user | `miti99bot-worker` (`readWrite@miti99bot`) |
Connection string format:
```
mongodb+srv://miti99bot-worker:<pass>@<host>/miti99bot?retryWrites=true&w=majority
```
Stored in two places (must match):
1. CF Worker secret: `wrangler secret put MONGODB_URI`
2. `.env.deploy` (gitignored, used by local backfill / verify scripts)
Same secret-mirror protocol as `TELEGRAM_BOT_TOKEN`.
## Free-tier ceiling
- 512 MB storage (data + indexes)
- 500 max concurrent connections
- ~100 ops/sec sustained (no daily cap)
- No backups, single region, no PITR
- Auto-pauses after 30 days of zero ops
Upgrade path: **Flex Tier $8$30/month** (M2/M5 deprecated as of 2026).
## Auto-pause
After 30 days idle the cluster pauses. First request after pause:
- Driver throws `MongoServerSelectionError` after `serverSelectionTimeoutMS` (5s).
- Worker code (see `src/db/mongo-client.js`, lands Phase 02) catches and returns 503 with `Retry-After: 30`.
- Cluster auto-wakes within 3060s on attempted connection.
The bot has 6+ daily crons; any cron that writes Mongo prevents pause. Phase 08 confirms.
## Network access
`0.0.0.0/0` — Cloudflare Workers do NOT have static egress IPs on the Free or basic Paid plans. Only auth (SCRAM-SHA-256) + TLS gate connections.
**Permanent risk** unless upgrading to CF Workers paid static-egress IP add-on (~$10/mo).
Mitigations:
- DB user has `readWrite` on one db only (NOT `dbAdmin` / `clusterAdmin`).
- Password ≥32 chars random.
- Rotate quarterly.
- Atlas free-tier email alerts configured for cluster unavailability + connections > 400.
## Bundle gate (Phase 01 result)
Measured `npx wrangler deploy --dry-run` with a minimal probe importing `MongoClient`:
| Metric | Value | Cap (Free) | Cap (Paid) |
|---|---|---|---|
| Compressed (gzip) | **226 KiB** | 3 MiB | 10 MiB |
| Raw (minified) | 1.74 MiB | — | — |
| On-disk (uncompressed) | 3.9 MiB | — | — |
Pass on both plans with **>92% headroom**. nodejs_compat_v2 provides `node:net`/`node:tls`/`node:crypto` from the runtime, so the driver's transitive deps are not bundled.
## CPU-time gate (Phase 01 — operator-run)
Requires real Atlas + `wrangler dev`. Procedure:
1. Add a temporary `/__mongo-ping` route that connects + runs `db.runCommand({ping:1})` + returns `{wall_ms}`.
2. Run 5+ cold cycles (10-min spaced).
3. Inspect CF dashboard CPU column for each invocation.
4. **Hard gate**: if any cold-start CPU time approaches 50ms (Free plan limit), abort migration. Escalate to paid plan or pivot via `phase-07-alt-pivot.md`.
5. Record cold-ping P95 wall-clock as `BASELINE_COLD_PING_MS` here:
```
BASELINE_COLD_PING_MS = <fill after measurement>
```
Phase 06 derives the abort threshold from this value: `2.5 × BASELINE_COLD_PING_MS`.
## Auto-pause behavior gate (Phase 01 — operator-run)
In Atlas UI, manually pause the cluster, then hit `/__mongo-ping`. Confirm:
- Driver throws within 5s (does NOT hang indefinitely).
- Error class is `MongoServerSelectionError` (or driver subclass).
- Phase 02 `getDb()` catches this and surfaces a 503.
## Node API surface
`src/` (the Worker) imports zero `node:*` modules today. `nodejs_compat_v2` is enabled solely for the `mongodb` driver:
| Module | Used by Worker? | Used by scripts/? |
|---|---|---|
| `node:fs` | no | yes (build/scrape/migrate) |
| `node:path` | no | yes |
| `node:child_process` | no | yes (migrate.js) |
| `node:net` | indirectly (via mongodb) | no |
| `node:tls` | indirectly (via mongodb) | no |
| `node:crypto` | indirectly (via mongodb) | no |
| `process.env` | no | yes (register.js) |
| `Buffer` | no | no |
Risk: minimal. No existing module relies on the absence of these globals.
## Rollback
If migration is abandoned at any phase before cutover:
1. `wrangler secret delete MONGODB_URI`
2. Revert `wrangler.toml`: remove `compatibility_flags = ["nodejs_compat_v2"]`.
3. `npm uninstall mongodb`.
4. `npm run deploy` — bot continues on KV/D1 unchanged.
5. (Optional) Delete Atlas cluster from UI.
`scripts/check-secret-leaks.js` should stay — it covers other secrets too.
## Rotation
`MONGODB_URI` rotation cadence: every 90 days, owner = repo maintainer.
Procedure:
1. In Atlas UI → Database Access → edit `miti99bot-worker` → reset password.
2. Update `.env.deploy` with new URI.
3. `wrangler secret put MONGODB_URI` (paste new URI).
4. `npm run deploy` (re-runs register; no Worker restart needed since secret reads at request time via `env.MONGODB_URI`).
Mismatch between `.env.deploy` and CF secret causes register-script failure on next deploy — same fail-loud pattern as `TELEGRAM_WEBHOOK_SECRET`.
## Alerts
Configured in Atlas free-tier UI:
- **Cluster unavailable** → email maintainer.
- **Current connections > 400** (80% of cap) → email maintainer.
Plus CF Observability rule (Phase 06): >10 errors per 1 min window → email.