mirror of
https://github.com/tiennm99/store-scraper-bot.git
synced 2026-05-14 06:59:15 +00:00
eb0f79be82f625e9555300a34dff4299dbe86c34
- remove MONGODB_URI from .env.example (Atlas migration done; deleted from Vercel cloud env too) - trim .env.deploy.example to vars actually consumed by deploy scripts (Upstash creds were only needed by the now-deleted migration script) - README config table: drop ENV / SOURCE_COMMIT / SCHEDULE_CHECK_APP_TIME (never read by code; Java-era leftovers) - check-secret-leaks: drop MONGODB_URI; add UPSTASH/KV/CRON tokens; widen scan roots to include api/ - add scripts/list-upstash-keys.js read-only ops helper
store-scraper-bot
JavaScript (Node.js) implementation. Ports java-store-scraper-bot. Runs on Vercel serverless functions with Upstash Redis as the data store.
Status
- Upstash Redis schema mirrors the Java/Go Mongo layout: keys
admin,group:{chatId},apple:{appId},google:{appId}(last two TTL'd via RedisEX). Multi-tenant isolation viaKEY_PREFIX(defaultstore-scraper-bot:). - Telegram command identifiers match Java plus per-group settings:
/info,/addgroup,/delgroup,/listgroup,/addapple,/delapple,/addgoogle,/delgoogle,/listapp,/checkapp,/checkappscore,/rawappleapp,/rawgoogleapp,/settings,/setdayswarning. - HTML parse mode; weekend-silent daily report; configurable upstream cache (default 10 min).
- Per-group warning threshold override via
/setdayswarning(falls back toNUM_DAYS_WARNING_NOT_UPDATEDenv default). - Inlined
app-store-scraper+google-play-scraper(no external scraper service).
Requirements
- Node.js 20+ (uses built-in
fetch) - Vercel account (Hobby plan / free tier is enough)
- Upstash Redis database (free tier; sign up at upstash.com or via Vercel Marketplace)
Configuration
Vercel env vars:
| Name | Notes |
|---|---|
TELEGRAM_BOT_TOKEN |
Telegram bot token (required) |
TELEGRAM_BOT_USERNAME |
Bot username (required) |
TELEGRAM_WEBHOOK_SECRET |
≥32 chars random; verifies inbound webhook calls |
ADMIN_IDS |
Comma-separated Telegram user IDs (required) |
UPSTASH_REDIS_REST_URL |
Upstash REST endpoint (or KV_REST_API_URL from Vercel Marketplace integration) |
UPSTASH_REDIS_REST_TOKEN |
Upstash REST token (or KV_REST_API_TOKEN fallback) |
KEY_PREFIX |
Namespace for all Redis keys (default store-scraper-bot:) |
CRON_SECRET |
≥32 chars random; required by Vercel Cron handler |
APP_CACHE_SECONDS |
Cache TTL for upstream API responses (default 600) |
NUM_DAYS_WARNING_NOT_UPDATED |
Default warning threshold in days (default 30; per-group override via /setdayswarning) |
Operator-only .env.deploy (used by npm run register + npm run describe) — see .env.deploy.example.
Run
Local dev:
npm install
vercel link # link to your Vercel project
vercel env pull .env.local
npm run dev # vercel dev
Deploy:
npm run deploy # vercel deploy --prod && register webhook
npm run register re-points the Telegram webhook at the URL in .env.deploy:WORKER_URL.
npm run describe updates the bot's profile description / about-text (run once when copy changes).
Operations
Dashboards
- Vercel project — function logs, cron history, deploy status
- Upstash console — Redis metrics, key browser, request latency
Credential rotation (quarterly)
- Upstash REST token — regenerate in Upstash console, update
UPSTASH_REDIS_REST_TOKENin Vercel env, redeploy - Telegram webhook secret — generate new value, update
TELEGRAM_WEBHOOK_SECRETin Vercel env, redeploy, thennpm run register
Dependency security
- Transitive vulnerabilities from
app-store-scraper → requestare pinned viaoverridesinpackage.json(form-data,qs,tough-cookie). - The unfixable
requestSSRF advisory is risk-accepted: only known endpoints (itunes.apple.com,play.google.com) are called; no user-controlled URLs reachrequest.
Project Layout
api/
├── webhook.js # Telegram webhook entry (Vercel function)
└── cron.js # Daily cron entry (Vercel Cron)
src/
├── app-builder.js # wires config, Upstash, scrapers, bot, scheduler
├── config.js
├── logger.js
├── api/
│ ├── apple-scraper.js
│ └── google-scraper.js
├── models/ # plain object factories matching the Mongo schema
├── repository/ # Upstash adapter + per-collection wrappers
├── bot/
│ ├── bot.js # command dispatch, sender
│ ├── dispatch.js
│ ├── telegram-api.js
│ └── commands/ # one file per /command
├── scheduler/scheduler.js # 07:00 Asia/Saigon = 00:00 UTC
└── util/ # table renderer, time helpers
scripts/
├── register-webhook.js
└── check-secret-leaks.js
Differences vs Go / Java
- Group / admin / chat IDs are JS
numbers. Telegram chat IDs fit in safe-int range, so this is intentional and matches Telegram's documented limits. - Pino-style structured JSON logging instead of Java/Go's structured loggers.
- HTTP via Node 20's built-in
fetch(no extra dependency). - Storage is Upstash Redis (REST) instead of MongoDB; key namespace mirrors the
original collections, TTL via Redis
EX.
Description
Languages
JavaScript
100%