Files
tiennm99 8f86e4dc3b feat: refresh data from baotintuc.vn source, fix overflow sheet loss
Dataset update:
- Crawl all 63 .xls province files from baotintuc.vn CDN (original source)
- Old xlsx dataset moved to data-old/ for reference
- Net: +13,719 students (Hà Nội +7,275, HCM +6,445) — the old .xls → xlsx
  conversion silently dropped rows beyond the 65,536 per-sheet cap
- Also removes 1 bogus header row that had leaked into the old DB
- 100% identical scores on the 847,348 SBDs present in both datasets

Build pipeline:
- build-database.js: iterate ALL sheets per workbook (fixes the overflow
  loss) and accept .xls in addition to .xlsx

Audit tooling:
- scripts/crawl-baotintuc.js: idempotent 63-province downloader
- scripts/diff-datasets.js: compares two DBs by SBD set and per-column
  score deltas
2026-04-14 21:42:29 +07:00
..