chore: remove cloudflare + docker + legacy migration scripts

Phase 7 cleanup of the Vercel + Upstash consolidation plan:

- delete wrangler.toml, Dockerfile, docker-compose{,.dev}.yml,
  scripts/migrate-atlas-to-upstash.js (one-shot migration done)
- drop wrangler + mongodb devDeps and migrate* npm scripts;
  regenerate package-lock.json (-70 packages)
- prune CF/Wrangler/Atlas-export entries from .gitignore + .vercelignore
- drop MONGODB_URI from .env.deploy.example
- rewrite README for Vercel + Upstash architecture
- refresh stale Cloudflare comments in src/{logger,models,repository}
This commit is contained in:
2026-05-09 21:49:48 +07:00
parent b2082c4601
commit 0a395bde62
19 changed files with 55 additions and 2126 deletions
-4
View File
@@ -16,7 +16,3 @@ WORKER_URL=https://store-scraper-bot.vercel.app/api/webhook
UPSTASH_REDIS_REST_URL=
UPSTASH_REDIS_REST_TOKEN=
KEY_PREFIX=store-scraper-bot:
# Legacy Java-bot MongoDB Atlas — required for one-shot Phase 5 migration only.
# After successful cutover, remove this line.
MONGODB_URI=
-7
View File
@@ -10,13 +10,6 @@ pnpm-debug.log*
.env.deploy
.dev.vars
# Cloudflare Workers
.tmp-deploy/
.wrangler/
# Atlas → KV migration intermediate (contains exported bot state)
scripts/.atlas-export.json
# Editor / IDE
# .idea/
# .vscode/
-6
View File
@@ -2,14 +2,8 @@ plans/
docs/
*.md
LICENSE
Dockerfile
docker-compose.yml
docker-compose.dev.yml
wrangler.toml
.env*
scripts/check-secret-leaks.js
scripts/migrate-atlas-to-upstash.js
scripts/migrate-atlas-to-kv.js
scripts/register-webhook.js
.git
.github
-13
View File
@@ -1,13 +0,0 @@
FROM node:20-alpine
RUN apk --no-cache add tzdata ca-certificates
WORKDIR /app
COPY package.json ./
RUN npm install --omit=dev
COPY src ./src
# Bot uses long polling — no ports exposed.
CMD ["node", "src/index.js"]
+42 -34
View File
@@ -1,6 +1,7 @@
# store-scraper-bot
JavaScript (Node.js) implementation. Ports [java-store-scraper-bot](https://github.com/tiennm99/java-store-scraper-bot).
Runs on Vercel serverless functions with Upstash Redis as the data store.
> ⚠️ **Preview / unstable — use at your own risk.**
> This port was produced largely with AI assistance and has **not** been tested
@@ -11,88 +12,95 @@ The Java version remains the reference implementation.
## Status
- Mongo schema matches Java/Go (collections: `common`, `group`, `apple_app`,
`google_app`; string `_id`; `class` discriminator).
- Upstash Redis schema mirrors the Java/Go Mongo layout: keys `admin`,
`group:{chatId}`, `apple:{appId}`, `google:{appId}` (last two TTL'd via Redis
`EX`). Multi-tenant isolation via `KEY_PREFIX` (default `store-scraper-bot:`).
- Telegram command identifiers match Java exactly: `/info`, `/addgroup`,
`/delgroup`, `/listgroup`, `/addapple`, `/delapple`, `/addgoogle`,
`/delgoogle`, `/listapp`, `/checkapp`, `/checkappscore`, `/rawappleapp`,
`/rawgoogleapp`.
- HTML parse mode; weekend-silent daily report; configurable API cache (default 10 min).
- HTML parse mode; weekend-silent daily report; configurable upstream cache (default 10 min).
- Inlined `app-store-scraper` + `google-play-scraper` (no external scraper service).
## Requirements
- Node.js 20+ (uses built-in `fetch`)
- MongoDB 4.4+
- Vercel account (Hobby plan / free tier is enough)
- Upstash Redis database (free tier; sign up at upstash.com or via Vercel Marketplace)
## Configuration
See `.env.example`:
Vercel env vars:
| Name | Notes |
|---|---|
| `TELEGRAM_BOT_TOKEN` | Telegram bot token (required) |
| `TELEGRAM_BOT_USERNAME` | Bot username (required) |
| `MONGODB_CONNECTION_STRING` | Preferred (Java parity); falls back to `MONGO_URI` |
| `MONGO_DATABASE` | Optional; inferred from URI path if omitted |
| `TELEGRAM_WEBHOOK_SECRET` | ≥32 chars random; verifies inbound webhook calls |
| `ADMIN_IDS` | Comma-separated Telegram user IDs (required) |
| `UPSTASH_REDIS_REST_URL` | Upstash REST endpoint (or `KV_REST_API_URL` from Vercel Marketplace integration) |
| `UPSTASH_REDIS_REST_TOKEN` | Upstash REST token (or `KV_REST_API_TOKEN` fallback) |
| `KEY_PREFIX` | Namespace for all Redis keys (default `store-scraper-bot:`) |
| `CRON_SECRET` | ≥32 chars random; required by Vercel Cron handler |
| `ENV` | `DEVELOPMENT` or `PRODUCTION` |
| `SOURCE_COMMIT` | Optional; shown on startup |
| `APP_CACHE_SECONDS` | Cache TTL for upstream API responses (default 600) |
| `NUM_DAYS_WARNING_NOT_UPDATED` | Threshold for daily warning (default 30) |
| `SCHEDULE_CHECK_APP_TIME` | Cron expression in Vietnam timezone (default `0 7 * * *`) |
Operator-only `.env.deploy` (used by `npm run register`) — see `.env.deploy.example`.
## Run
Local dev:
```sh
npm install
cp .env.example .env # then edit credentials
npm start
vercel link # link to your Vercel project
vercel env pull .env.local
npm run dev # vercel dev
```
Or via Docker Compose:
Deploy:
```sh
docker compose up --build
npm run deploy # vercel deploy --prod && register webhook
```
## Migrating from MongoDB Atlas (one-time)
If you're moving an existing bot off Atlas to Cloudflare KV:
```sh
# 1. Make sure .env has MONGODB_URI pointing at the Atlas cluster (read-only is fine).
npm install # pulls the mongodb devDep needed by the migration script
npm run migrate # writes scripts/.atlas-export.json (admin + groups)
# Optionally also migrate cached app entries with their TTL preserved:
# npm run migrate -- --include-cache
npm run migrate:bulk # uploads the JSON to the production KV namespace
rm scripts/.atlas-export.json # contains your bot state — delete after success
```
Cache collections (`apple_app`, `google_app`) are skipped by default since they auto-rebuild from upstream APIs within `APP_CACHE_SECONDS`. Re-running the migration is idempotent.
`npm run register` re-points the Telegram webhook at the URL in `.env.deploy:WORKER_URL`.
## Project Layout
```
api/
├── webhook.js # Telegram webhook entry (Vercel function)
└── cron.js # Daily cron entry (Vercel Cron)
src/
├── index.js # entry point: wire up config, mongo, scrapers, bot, scheduler
├── app-builder.js # wires config, Upstash, scrapers, bot, scheduler
├── config.js
├── logger.js
├── api/
│ ├── apple-scraper.js
│ └── google-scraper.js
├── models/ # plain object factories matching Mongo docs
├── repository/ # Mongo collection wrappers (admin / group / cached app)
├── models/ # plain object factories matching the Mongo schema
├── repository/ # Upstash adapter + per-collection wrappers
├── bot/
│ ├── bot.js # Telegram polling, command dispatch, sender
── commands/ # one file per /command
├── scheduler/scheduler.js # daily 7am Vietnam-time check
└── util/ # table renderer, time helpers
│ ├── bot.js # command dispatch, sender
── dispatch.js
│ ├── telegram-api.js
│ └── commands/ # one file per /command
├── scheduler/scheduler.js # 07:00 Asia/Saigon = 00:00 UTC
└── util/ # table renderer, time helpers
scripts/
├── register-webhook.js
└── check-secret-leaks.js
```
## Differences vs Go / Java
- Group / admin / chat IDs are JS `number`s. Telegram chat IDs fit in safe-int
range, so this is intentional and matches Telegram's documented limits.
- Pino logging instead of Java/Go's structured loggers; semantics equivalent.
- Pino-style structured JSON logging instead of Java/Go's structured loggers.
- HTTP via Node 20's built-in `fetch` (no extra dependency).
- Storage is Upstash Redis (REST) instead of MongoDB; key namespace mirrors the
original collections, TTL via Redis `EX`.
-56
View File
@@ -1,56 +0,0 @@
version: '3.8'
services:
bot:
build: .
container_name: js-store-scraper-bot-dev
restart: unless-stopped
environment:
- TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
- TELEGRAM_BOT_USERNAME=${TELEGRAM_BOT_USERNAME}
- MONGODB_CONNECTION_STRING=mongodb://mongodb:27017
- MONGO_DATABASE=store_scraper_bot_dev
- ENV=DEVELOPMENT
- ADMIN_IDS=${ADMIN_IDS}
- SOURCE_COMMIT=${SOURCE_COMMIT:-dev}
depends_on:
- mongodb
networks:
- bot-network
volumes:
- ./src:/app/src
mongodb:
image: mongo:7.0
container_name: js-store-scraper-mongodb-dev
restart: unless-stopped
environment:
- MONGO_INITDB_DATABASE=store_scraper_bot_dev
volumes:
- mongodb_data_dev:/data/db
networks:
- bot-network
ports:
- "27017:27017"
mongo-express:
image: mongo-express:latest
container_name: js-store-mongo-express-dev
restart: unless-stopped
environment:
- ME_CONFIG_MONGODB_URL=mongodb://mongodb:27017
- ME_CONFIG_BASICAUTH_USERNAME=admin
- ME_CONFIG_BASICAUTH_PASSWORD=admin
depends_on:
- mongodb
networks:
- bot-network
ports:
- "8081:8081"
networks:
bot-network:
driver: bridge
volumes:
mongodb_data_dev:
-39
View File
@@ -1,39 +0,0 @@
version: '3.8'
services:
bot:
build: .
container_name: js-store-scraper-bot
restart: unless-stopped
environment:
- TELEGRAM_BOT_TOKEN=${TELEGRAM_BOT_TOKEN}
- TELEGRAM_BOT_USERNAME=${TELEGRAM_BOT_USERNAME}
- MONGODB_CONNECTION_STRING=mongodb://mongodb:27017
- MONGO_DATABASE=store_scraper_bot
- ENV=${ENV:-PRODUCTION}
- ADMIN_IDS=${ADMIN_IDS}
- SOURCE_COMMIT=${SOURCE_COMMIT:-unknown}
depends_on:
- mongodb
networks:
- bot-network
mongodb:
image: mongo:7.0
container_name: js-store-scraper-mongodb
restart: unless-stopped
environment:
- MONGO_INITDB_DATABASE=store_scraper_bot
volumes:
- mongodb_data:/data/db
networks:
- bot-network
ports:
- "27017:27017"
networks:
bot-network:
driver: bridge
volumes:
mongodb_data:
-1757
View File
File diff suppressed because it is too large Load Diff
-6
View File
@@ -13,14 +13,8 @@
"deploy": "vercel deploy --prod && npm run register",
"register": "node --env-file=.env.deploy scripts/register-webhook.js",
"register:dry": "node --env-file=.env.deploy scripts/register-webhook.js --dry-run",
"migrate": "node --env-file=.env.deploy scripts/migrate-atlas-to-upstash.js",
"migrate:dry": "node --env-file=.env.deploy scripts/migrate-atlas-to-upstash.js --dry-run",
"lint": "node scripts/check-secret-leaks.js"
},
"devDependencies": {
"mongodb": "^6.10.0",
"wrangler": "^3.90.0"
},
"license": "Apache-2.0",
"dependencies": {
"@upstash/redis": "^1.38.0",
@@ -1,7 +1,7 @@
---
phase: 6
title: "Deploy + cutover + webhook re-register"
status: pending
status: completed
priority: P1
effort: 45 min
dependencies: [4, 5]
@@ -1,7 +1,7 @@
---
phase: 7
title: "Cleanup wrangler + docker + docs"
status: pending
status: completed
priority: P2
effort: 30 min
dependencies: [6]
@@ -1,7 +1,7 @@
---
title: "Consolidate on Vercel + Upstash Redis"
description: "Deploy bot to Vercel (no Cloudflare). Migrate live state from legacy java-store-scraper-bot MongoDB Atlas → Upstash Redis. Inline scraper libs (drop store-scraper.vercel.app fetch). Delete Docker artifacts. Single repo, single vendor, free tier."
status: in-progress
status: completed
priority: P1
effort: 6h
branch: main
@@ -35,8 +35,8 @@ Storage research: [`reports/researcher-260509-1656-upstash-vs-atlas.md`](../repo
| 03 | [Inline scraper modules](phase-03-inline-scraper-modules.md) | 45 min | completed |
| 04 | [HTTP layer (webhook + cron)](phase-04-http-layer-webhook-and-cron.md) | 1h | completed |
| 05 | [Data migration MongoDB Atlas → Upstash](phase-05-data-migration-atlas-to-upstash.md) | 30 min | completed |
| 06 | [Deploy + cutover + webhook register](phase-06-deploy-cutover-and-webhook-reregister.md) | 45 min | pending (operator) |
| 07 | [Cleanup wrangler + docker + docs](phase-07-cleanup-wrangler-and-docs.md) | 30 min | pending (post-deploy) |
| 06 | [Deploy + cutover + webhook register](phase-06-deploy-cutover-and-webhook-reregister.md) | 45 min | completed |
| 07 | [Cleanup wrangler + docker + docs](phase-07-cleanup-wrangler-and-docs.md) | 30 min | completed |
## Key Constraints
+3 -3
View File
@@ -1,10 +1,10 @@
# Outstanding Work
Quick index of active direction. CF Workers path superseded — bot now targets Vercel + Upstash.
Quick index of active direction. CF Workers path superseded — bot runs on Vercel + Upstash.
## Active plan
## Completed plans
**[260509-1656-consolidate-vercel-upstash](260509-1656-consolidate-vercel-upstash/plan.md)** — Vercel deploy, Atlas → Upstash data migration, Docker + wrangler cleanup. 7 phases, ~6h.
**[260509-1656-consolidate-vercel-upstash](260509-1656-consolidate-vercel-upstash/plan.md)** — Vercel deploy, Atlas → Upstash data migration, Docker + wrangler cleanup. 7 phases done.
## Superseded plans (left for history; do not execute)
-158
View File
@@ -1,158 +0,0 @@
// One-shot legacy-DB migrator: MongoDB Atlas (java-store-scraper-bot) → Upstash Redis.
// Direct write — no on-disk JSON intermediate.
//
// common._id="admin" → SET <prefix>admin <json>
// group.find({}) → SET <prefix>group:<_id> <json> (per group)
// apple_app.find({}) → SET <prefix>apple:<_id> <json> EX <ttl> (only with --include-cache)
// google_app.find({}) → SET <prefix>google:<_id> <json> EX <ttl> (only with --include-cache)
//
// KEY_PREFIX defaults to 'store-scraper-bot:' — must match what the bot
// runtime reads or migrated data is invisible after cutover.
//
// Run: npm run migrate
// Dry: npm run migrate:dry
// Cache: npm run migrate -- --include-cache
//
// Reads .env.deploy for: MONGODB_URI, UPSTASH_REDIS_REST_URL,
// UPSTASH_REDIS_REST_TOKEN, KEY_PREFIX (optional), APP_CACHE_SECONDS (optional).
import { MongoClient } from 'mongodb';
import { Redis } from '@upstash/redis';
const MIN_TTL_SECONDS = 60;
const DEFAULT_KEY_PREFIX = 'store-scraper-bot:';
const APP_CACHE_SECONDS = Number(process.env.APP_CACHE_SECONDS ?? 600);
function exitWith(message) {
console.error(`migrate-atlas-to-upstash: ${message}`);
process.exit(1);
}
function log(line) {
console.log(`migrate-atlas-to-upstash: ${line}`);
}
// Compute remaining TTL in seconds for a cached app entry, given its
// stored `millis` (= cache write time). Returns null if already expired.
function remainingTtl(millis, nowMs) {
const expiresAt = millis + APP_CACHE_SECONDS * 1000;
const remainingSec = Math.floor((expiresAt - nowMs) / 1000);
if (remainingSec <= 0) return null;
return Math.max(MIN_TTL_SECONDS, remainingSec);
}
async function main() {
const mongoUri = process.env.MONGODB_URI;
if (!mongoUri) exitWith('MONGODB_URI not set; check .env.deploy');
// Accept either Upstash naming convention (vanilla signup) or Vercel
// Marketplace integration names — same as the runtime adapter.
const upstashUrl = process.env.UPSTASH_REDIS_REST_URL ?? process.env.KV_REST_API_URL;
const upstashToken = process.env.UPSTASH_REDIS_REST_TOKEN ?? process.env.KV_REST_API_TOKEN;
const dryRun = process.argv.includes('--dry-run');
if (!dryRun) {
if (!upstashUrl) exitWith('UPSTASH_REDIS_REST_URL (or KV_REST_API_URL) not set; check .env.deploy');
if (!upstashToken) exitWith('UPSTASH_REDIS_REST_TOKEN (or KV_REST_API_TOKEN) not set; check .env.deploy');
}
const includeCache = process.argv.includes('--include-cache');
const prefix = process.env.KEY_PREFIX ?? DEFAULT_KEY_PREFIX;
log(`mode: ${dryRun ? 'DRY RUN (no writes)' : 'LIVE (writes to Upstash)'}`);
log(`KEY_PREFIX: ${prefix} (must match bot runtime KEY_PREFIX)`);
log(`include-cache: ${includeCache}`);
const redis = dryRun
? null
: new Redis({ url: upstashUrl, token: upstashToken });
// Long timeout because Atlas free tier auto-pauses idle clusters; first hit
// can take 1030 s to wake up. Migration is one-shot, not perf-critical.
const mongo = new MongoClient(mongoUri, {
serverSelectionTimeoutMS: 30000,
socketTimeoutMS: 30000,
appName: 'migrate-atlas-to-upstash',
});
await mongo.connect();
const db = mongo.db();
const counts = {
admin: 0,
group: 0,
apple: 0,
appleSkipped: 0,
google: 0,
googleSkipped: 0,
};
async function writeKey(logicalKey, value, ttlSeconds = null) {
const physical = `${prefix}${logicalKey}`;
if (dryRun) {
log(` DRY SET ${physical}${ttlSeconds != null ? ` EX ${ttlSeconds}` : ''}`);
return;
}
await redis.set(
physical,
JSON.stringify(value),
ttlSeconds != null ? { ex: ttlSeconds } : undefined,
);
}
// 1. admin singleton (common._id = "admin")
const adminDoc = await db.collection('common').findOne({ _id: 'admin' });
if (adminDoc) {
await writeKey('admin', adminDoc);
counts.admin = 1;
} else {
log('warning: no admin doc found in common collection');
}
// 2. groups
const groupDocs = await db.collection('group').find({}).toArray();
for (const doc of groupDocs) {
await writeKey(`group:${doc._id}`, doc);
}
counts.group = groupDocs.length;
// 3. caches (opt-in)
if (includeCache) {
const now = Date.now();
const appleDocs = await db.collection('apple_app').find({}).toArray();
for (const doc of appleDocs) {
const ttl = remainingTtl(doc.millis ?? 0, now);
if (ttl == null) {
counts.appleSkipped++;
continue;
}
await writeKey(`apple:${doc._id}`, doc, ttl);
counts.apple++;
}
const googleDocs = await db.collection('google_app').find({}).toArray();
for (const doc of googleDocs) {
const ttl = remainingTtl(doc.millis ?? 0, now);
if (ttl == null) {
counts.googleSkipped++;
continue;
}
await writeKey(`google:${doc._id}`, doc, ttl);
counts.google++;
}
}
await mongo.close();
log('---');
log(`admin: ${counts.admin}`);
log(`groups: ${counts.group}`);
if (includeCache) {
log(`apple: ${counts.apple} (skipped ${counts.appleSkipped} expired)`);
log(`google: ${counts.google} (skipped ${counts.googleSkipped} expired)`);
} else {
log('caches: skipped (use --include-cache to migrate them)');
}
log('');
log(dryRun ? 'dry run complete — no Upstash writes performed' : 'migration complete');
}
main().catch((err) => exitWith(err.stack ?? err.message ?? String(err)));
+1 -2
View File
@@ -1,5 +1,4 @@
// Worker-friendly structured logger. Cloudflare Observability indexes JSON
// console output, so we emit one JSON record per call.
// Structured JSON logger — one JSON record per call for log aggregators.
export function createLogger() {
function log(level, payloadOrMsg, maybeMsg) {
const isObj = payloadOrMsg !== null && typeof payloadOrMsg === 'object';
+1 -1
View File
@@ -1,5 +1,5 @@
// AppleApp cache entry — Java parity (_id=appId, class="AppleApp").
// TTL is enforced by Cloudflare KV via expirationTtl, so no isExpired helper.
// TTL is enforced by Upstash Redis EX, so no isExpired helper.
export function newAppleApp(appId, response, millis) {
return { _id: appId, class: 'AppleApp', app: response, millis };
}
+1 -1
View File
@@ -1,5 +1,5 @@
// GoogleApp cache entry — Java parity (_id=appId, class="GoogleApp").
// TTL is enforced by Cloudflare KV via expirationTtl, so no isExpired helper.
// TTL is enforced by Upstash Redis EX, so no isExpired helper.
export function newGoogleApp(appId, response, millis) {
return { _id: appId, class: 'GoogleApp', app: response, millis };
}
+2 -2
View File
@@ -1,6 +1,6 @@
// Upstash Redis adapter — replaces the prior Cloudflare KV wrapper.
// Upstash Redis adapter.
//
// Logical key namespace (unchanged from KV layer):
// Logical key namespace:
// admin singleton
// group:{chatId} per-group state
// apple:{appId} cached Apple response (with TTL)
-32
View File
@@ -1,32 +0,0 @@
name = "store-scraper-bot"
main = "src/index.js"
compatibility_date = "2025-10-01"
# nodejs_compat_v2 enables node:net + node:tls so the official `mongodb` driver
# can open a TCP socket to Atlas. v1 vs v2 are alternatives, not additive.
compatibility_flags = ["nodejs_compat_v2"]
[vars]
APP_CACHE_SECONDS = "600"
NUM_DAYS_WARNING_NOT_UPDATED = "30"
# Daily check job. Cloudflare cron is UTC.
# 0 UTC = 7am Asia/Ho_Chi_Minh (UTC+7).
[triggers]
crons = ["0 0 * * *"]
# Workers Observability — captures console.* logs and request metadata in the
# Cloudflare dashboard. 200k events/day on the Free plan.
[observability]
enabled = true
head_sampling_rate = 1
[observability.logs]
enabled = true
invocation_logs = true
# Secrets (set via `wrangler secret put <name>`, NOT in this file):
# TELEGRAM_BOT_TOKEN — bot token from @BotFather
# TELEGRAM_BOT_USERNAME — bot username (without @)
# TELEGRAM_WEBHOOK_SECRET — random ≥32 chars; validates the X-Telegram-Bot-Api-Secret-Token header
# MONGODB_URI — full SRV string, db inferred from URI path
# ADMIN_IDS — comma-separated Telegram user IDs