Commit Graph

3 Commits

Author SHA1 Message Date
tiennm99 9b331fc24d refactor(semantle,doantu): drop ConceptNet vestiges, trim wordlist API
Now that both modules run on Workers AI embeddings, drop the legacy
Word2SimError alias, the unused wordlist helpers (getLine, LINE_COUNT,
pickFromPool), and every comment/README section still describing the
removed ConceptNet backend. Fix the bge-small doc typo in semantle/index.js
and align the semantle api-client test fake-vector dim with the real
384-dim output.
2026-04-23 00:19:28 +07:00
tiennm99 4c2890ba25 refactor(semantle): drop word filter, expose line-based wordlist API
Use the full google-10000-english list verbatim (normalize only —
lowercase + dedupe, no length or alpha filtering). Pool goes from 7953
to 9894 entries; rare/short/long picks are still sieved by ConceptNet's
verify-and-fallback at round start.

Replaces TARGET_POOL/pickFromPool with a clearer line-based API:
  LINE_COUNT    — how many entries
  randomLine()  — uniform pick
  getLine(n)    — nth entry (n = frequency rank)

pickFromPool retained as a back-compat re-export so existing callers
don't break.
2026-04-22 23:19:51 +07:00
tiennm99 64c0248eea feat(semantle): source target pool from google-10000-english dictionary
The ~250-word hand-curated TARGET_POOL was too small for long-term play.
Replaces it with a build-script-generated dictionary:

- scripts/build-semantle-words.js fetches first20hours/google-10000-english
  (no-swears variant), filters to 4–10 ASCII letters, drops the top-200
  most frequent function words, and writes src/modules/semantle/words-data.js
  as a static ES-module export.
- wordlist.js now just re-exports that data via TARGET_POOL + pickFromPool.
- package.json: new build:semantle-words script; chained into `npm run build`
  alongside build:wordle-data so `npm run deploy` regenerates automatically.

Pool size: ~250 → 7953 words. Same ConceptNet verify-and-fallback flow, so
low-quality picks still cost at most one extra concept lookup.
2026-04-22 23:12:07 +07:00