thptqg2017

tiennm99/thptqg2017

mirror of https://github.com/tiennm99/thptqg2017.git synced 2026-06-04 02:14:02 +00:00

Author	SHA1	Message	Date
tiennm99	8f86e4dc3b	feat: refresh data from baotintuc.vn source, fix overflow sheet loss Dataset update: - Crawl all 63 .xls province files from baotintuc.vn CDN (original source) - Old xlsx dataset moved to data-old/ for reference - Net: +13,719 students (Hà Nội +7,275, HCM +6,445) — the old .xls → xlsx conversion silently dropped rows beyond the 65,536 per-sheet cap - Also removes 1 bogus header row that had leaked into the old DB - 100% identical scores on the 847,348 SBDs present in both datasets Build pipeline: - build-database.js: iterate ALL sheets per workbook (fixes the overflow loss) and accept .xls in addition to .xlsx Audit tooling: - scripts/crawl-baotintuc.js: idempotent 63-province downloader - scripts/diff-datasets.js: compares two DBs by SBD set and per-column score deltas	2026-04-14 21:42:29 +07:00
tiennm99	718e2e9117	refactor: flatten data layout to data/, drop update/ overrides - Move 63 Excel files from data/raw/ to data/ (single flat dir) - Remove all 53 files in data/raw/update/: verified identical SBD coverage to raw/ (847349 rows either way), so they added no new students — only potential score corrections that can be reintroduced later if source is recovered - Update build-database.js to read data/ directly - Add scripts/audit-row-counts.js: compares source row count to DB row count to verify zero-loss parsing - Point check-duplicates.js at new data/ location	2026-04-14 21:02:47 +07:00
tiennm99	f10046f63d	chore: remove duplicate Excel files, add md5 audit script - Drop 10_LamDong_GNFT (1) and 2.BacKan_YQNX(1): identical row content to siblings (Excel metadata differs but file size & sheet rows match) - Add scripts/check-duplicates.js to detect byte-identical and row-identical files across data/raw and data/raw/update	2026-04-14 20:49:41 +07:00
tiennm99	52cc7ac2d0	feat: diacritics-insensitive search and SQL query tab - Add ho_ten_ascii column + index for accent-folded name search - Parse Tiếng Pháp/Nga/Trung scores (recovers ~2k students' foreign-language data) - Loosen score regex to accept integer values (e.g. KHTN: 4) - App.jsx: 3-mode search (SBD / ASCII / Vietnamese) and Tra cứu/SQL tabs - New custom-query component: 8 presets adapted to 2017 schema, read-only safety, auto-LIMIT - ScoreTable: render new foreign-lang columns, hide columns null across all results	2026-04-14 20:44:43 +07:00
tiennm99	4474547433	refactor: remove Java code, move web app to project root - Remove Gradle build, Java sources, Hibernate config, old database.sqlite - Move Excel data files from src/main/resources/raw/ to data/raw/ - Move Vite+React app from web/ to project root - Merge package.json into single root-level config - Update build script paths and CI workflow accordingly	2026-04-13 00:06:22 +07:00
tiennm99	1cf65be51c	feat: add static score lookup site with Node.js DB builder - Node script parses 119 Excel files into SQLite (847K students) - Vite + React frontend with sql.js for client-side querying - Search by exam ID (số báo danh) or student name - Gzipped DB (36MB) with download progress bar - GitHub Actions workflow for GitHub Pages deployment	2026-04-12 23:54:06 +07:00

6 Commits