Commit Graph

5 Commits

Author SHA1 Message Date
tiennm99 8f86e4dc3b feat: refresh data from baotintuc.vn source, fix overflow sheet loss
Dataset update:
- Crawl all 63 .xls province files from baotintuc.vn CDN (original source)
- Old xlsx dataset moved to data-old/ for reference
- Net: +13,719 students (Hà Nội +7,275, HCM +6,445) — the old .xls → xlsx
  conversion silently dropped rows beyond the 65,536 per-sheet cap
- Also removes 1 bogus header row that had leaked into the old DB
- 100% identical scores on the 847,348 SBDs present in both datasets

Build pipeline:
- build-database.js: iterate ALL sheets per workbook (fixes the overflow
  loss) and accept .xls in addition to .xlsx

Audit tooling:
- scripts/crawl-baotintuc.js: idempotent 63-province downloader
- scripts/diff-datasets.js: compares two DBs by SBD set and per-column
  score deltas
2026-04-14 21:42:29 +07:00
tiennm99 718e2e9117 refactor: flatten data layout to data/, drop update/ overrides
- Move 63 Excel files from data/raw/ to data/ (single flat dir)
- Remove all 53 files in data/raw/update/: verified identical SBD
  coverage to raw/ (847349 rows either way), so they added no new
  students — only potential score corrections that can be reintroduced
  later if source is recovered
- Update build-database.js to read data/ directly
- Add scripts/audit-row-counts.js: compares source row count to DB row
  count to verify zero-loss parsing
- Point check-duplicates.js at new data/ location
2026-04-14 21:02:47 +07:00
tiennm99 52cc7ac2d0 feat: diacritics-insensitive search and SQL query tab
- Add ho_ten_ascii column + index for accent-folded name search
- Parse Tiếng Pháp/Nga/Trung scores (recovers ~2k students' foreign-language data)
- Loosen score regex to accept integer values (e.g. KHTN: 4)
- App.jsx: 3-mode search (SBD / ASCII / Vietnamese) and Tra cứu/SQL tabs
- New custom-query component: 8 presets adapted to 2017 schema, read-only safety, auto-LIMIT
- ScoreTable: render new foreign-lang columns, hide columns null across all results
2026-04-14 20:44:43 +07:00
tiennm99 4474547433 refactor: remove Java code, move web app to project root
- Remove Gradle build, Java sources, Hibernate config, old database.sqlite
- Move Excel data files from src/main/resources/raw/ to data/raw/
- Move Vite+React app from web/ to project root
- Merge package.json into single root-level config
- Update build script paths and CI workflow accordingly
2026-04-13 00:06:22 +07:00
tiennm99 1cf65be51c feat: add static score lookup site with Node.js DB builder
- Node script parses 119 Excel files into SQLite (847K students)
- Vite + React frontend with sql.js for client-side querying
- Search by exam ID (số báo danh) or student name
- Gzipped DB (36MB) with download progress bar
- GitHub Actions workflow for GitHub Pages deployment
2026-04-12 23:54:06 +07:00