Commit Graph

5 Commits

Author SHA1 Message Date
tiennm99 4f7f6896c5 refactor(semantle): switch embedding model from bge-small-en-v1.5 to bge-m3
Aligns semantle with doantu so both modules share one Workers AI model.
bge-m3 is multilingual and cheaper (1075 N/M input tokens vs 1841 N/M)
and produces 1024-dim vectors. Updates the api-client default, test
fake-vector dimensions, README, index.js doc comment, and the
wrangler.toml [ai] binding comment (Neurons/day budget recomputed).
2026-04-23 00:22:28 +07:00
tiennm99 9b331fc24d refactor(semantle,doantu): drop ConceptNet vestiges, trim wordlist API
Now that both modules run on Workers AI embeddings, drop the legacy
Word2SimError alias, the unused wordlist helpers (getLine, LINE_COUNT,
pickFromPool), and every comment/README section still describing the
removed ConceptNet backend. Fix the bge-small doc typo in semantle/index.js
and align the semantle api-client test fake-vector dim with the real
384-dim output.
2026-04-23 00:19:28 +07:00
tiennm99 31ced88b78 refactor(semantle): swap ConceptNet for Workers AI embeddings
ConceptNet (api.conceptnet.io) was returning sustained 502s, breaking
every guess with an "Upstream hiccup" reply. Replace with env.AI.run
on @cf/baai/bge-small-en-v1.5 and score guesses by computing cosine
similarity locally against the target vector.

The local google-10k wordlist doubles as the in/out-of-vocabulary set,
so OOV detection is an O(1) Set.has() with no upstream call. The
similarity() response shape is unchanged, so handlers/render/state
stay as-is.

Free on the Workers Free plan: 10k Neurons/day cap, ~0.0037 Neurons
per 2-word guess → ~2.7M guesses/day headroom for this bot.
2026-04-22 23:48:17 +07:00
tiennm99 fca6d733c9 refactor(semantle): swap word2sim backend for ConceptNet
ConceptNet provides a free public /relatedness endpoint (returns cosine-like
[-1, 1]) and /c/en/{term} for vocabulary check. No random-word endpoint, so
we ship a curated local target pool in wordlist.js (~250 words) and verify
each pick via the concept endpoint with a fallback to an unverified pick.

Each guess now makes two parallel ConceptNet calls (concept + relatedness)
instead of a single word2sim call. Slightly higher latency but zero hosting
cost and no dependency on the self-hosted word2sim instance.

- api-client.js rewritten; UpstreamError replaces Word2SimError (aliased
  for backwards compat with older imports).
- wordlist.js added (curated target pool + pickFromPool).
- handlers.js: drops RANDOM_FILTERS (no filtering needed; pool is curated).
- index.js: drops WORD2SIM_API_URL env var; ConceptNet base hardcoded.
- wrangler.toml + .dev.vars.example: drop WORD2SIM_API_URL.
- api-client tests rewritten for ConceptNet shape; total tests 336 → 341.
2026-04-22 23:07:54 +07:00
tiennm99 08ff72985a feat(semantle): add word2vec guessing game module
Telegram commands /semantle, /semantle_new, /semantle_giveup, /semantle_stats.
Round starts with /random pick from hosted word2sim; each guess scored via
/similarity. Unlimited guesses; solve on case-insensitive exact match.

New env var WORD2SIM_API_URL (wrangler.toml, .env.deploy). Includes
module README and 90 unit tests covering api-client, state, format,
render, and handlers.
2026-04-22 22:05:27 +07:00