Mirror the semantle migration but with @cf/baai/bge-m3 — BAAI's
multilingual embedding model — because the English-only BGE variants
can't produce meaningful Vietnamese vectors (their tokenizer shreds
diacritics into noisy byte-level subwords).
bge-m3 is trained across 194 languages incl. Vietnamese and is
actually cheaper in Neurons (1,075 vs 1,841 per M tokens for
bge-small-en-v1.5). Vocab check reuses the local Viet22K wordlist as
an in-memory Set — O(1) OOV detection, no upstream call.
Also add a test file for the module (mirrors semantle coverage plus
Vietnamese-specific cases: diacritics, multi-syllable compounds).
ConceptNet (api.conceptnet.io) was returning sustained 502s, breaking
every guess with an "Upstream hiccup" reply. Replace with env.AI.run
on @cf/baai/bge-small-en-v1.5 and score guesses by computing cosine
similarity locally against the target vector.
The local google-10k wordlist doubles as the in/out-of-vocabulary set,
so OOV detection is an O(1) Set.has() with no upstream call. The
similarity() response shape is unchanged, so handlers/render/state
stay as-is.
Free on the Workers Free plan: 10k Neurons/day cap, ~0.0037 Neurons
per 2-word guess → ~2.7M guesses/day headroom for this bot.
Near-clone of the semantle module, adapted for Vietnamese:
- Targets from duyet/vietnamese-wordlist Viet22K (~22k entries, GPL).
Regenerate via scripts/build-doantu-words.js; chained into npm run build.
- ConceptNet client uses /c/vi/<term> URIs; multi-word guesses (e.g.
"con chó") are space-to-underscore converted at URL build time so the
board keeps the natural display.
- lookup.js permits Unicode letters + combining marks + single internal
spaces; rejects digits/punctuation.
- All three commands (/doantu, /doantu_giveup, /doantu_stats) are
visibility=protected — shown in /help, hidden from Telegram's native /
autocomplete menu while the module is still experimental.
Wired into src/modules/index.js, wrangler.toml MODULES, .env.deploy(.example),
and package.json build chain.
Separate module rather than a shared base with semantle — matches the
repo's one-module-per-game convention (see loldle vs wordle); factor later
if a third language appears.
Use the full google-10000-english list verbatim (normalize only —
lowercase + dedupe, no length or alpha filtering). Pool goes from 7953
to 9894 entries; rare/short/long picks are still sieved by ConceptNet's
verify-and-fallback at round start.
Replaces TARGET_POOL/pickFromPool with a clearer line-based API:
LINE_COUNT — how many entries
randomLine() — uniform pick
getLine(n) — nth entry (n = frequency rank)
pickFromPool retained as a back-compat re-export so existing callers
don't break.
The ~250-word hand-curated TARGET_POOL was too small for long-term play.
Replaces it with a build-script-generated dictionary:
- scripts/build-semantle-words.js fetches first20hours/google-10000-english
(no-swears variant), filters to 4–10 ASCII letters, drops the top-200
most frequent function words, and writes src/modules/semantle/words-data.js
as a static ES-module export.
- wordlist.js now just re-exports that data via TARGET_POOL + pickFromPool.
- package.json: new build:semantle-words script; chained into `npm run build`
alongside build:wordle-data so `npm run deploy` regenerates automatically.
Pool size: ~250 → 7953 words. Same ConceptNet verify-and-fallback flow, so
low-quality picks still cost at most one extra concept lookup.
ConceptNet provides a free public /relatedness endpoint (returns cosine-like
[-1, 1]) and /c/en/{term} for vocabulary check. No random-word endpoint, so
we ship a curated local target pool in wordlist.js (~250 words) and verify
each pick via the concept endpoint with a fallback to an unverified pick.
Each guess now makes two parallel ConceptNet calls (concept + relatedness)
instead of a single word2sim call. Slightly higher latency but zero hosting
cost and no dependency on the self-hosted word2sim instance.
- api-client.js rewritten; UpstreamError replaces Word2SimError (aliased
for backwards compat with older imports).
- wordlist.js added (curated target pool + pickFromPool).
- handlers.js: drops RANDOM_FILTERS (no filtering needed; pool is curated).
- index.js: drops WORD2SIM_API_URL env var; ConceptNet base hardcoded.
- wrangler.toml + .dev.vars.example: drop WORD2SIM_API_URL.
- api-client tests rewritten for ConceptNet shape; total tests 336 → 341.
Giveup already auto-starts a fresh round on next /semantle, so /semantle_new
was redundant. Duplicate guesses now match loldle's behavior: reply with
"🔁 already guessed" and skip the similarity API call (fast-path dedup
against prior word or canonical, with a post-API fallback for different
inputs that canonicalize to the same token).
Telegram commands /semantle, /semantle_new, /semantle_giveup, /semantle_stats.
Round starts with /random pick from hosted word2sim; each guess scored via
/similarity. Unlimited guesses; solve on case-insensitive exact match.
New env var WORD2SIM_API_URL (wrangler.toml, .env.deploy). Includes
module README and 90 unit tests covering api-client, state, format,
render, and handlers.
Previously startFreshGame was called at the tail of every win/lose/giveup
path, stamping startedAt to that moment — so the clock accrued while the
player was away between rounds. Now:
- round-ending paths call clearGame (new helper in state.js), deleting the
KV record instead of pre-creating the next round
- getOrInitGame lazily creates the next round on the player's next /loldle
call, with startedAt: null
- the first actual guess inside handleLoldle stamps startedAt = Date.now()
Viewing an empty board gives no hints, so it shouldn't count against the
clock. handleGiveup no longer auto-creates a fresh round and now reports
"No active round" when called with nothing in progress.
Broaden `npm run format` / `npm run lint` to biome's full scan (`.`)
instead of a fixed src/tests/scripts list, so root-level files and any
new top-level directories stay formatted. Drop the stale ignore entry
for the deleted champions-data.js.
loldle.net's classic-mode bundle has two record shapes — older champions
carry _id/championId, newer ones (Bel'Veth, K'Sante, Nilah, …) don't.
The regex required those leading fields, silently dropping anyone added
since 2022.
Make _id/championId optional and non-capturing, and drop them from the
output record (the bot never read them anyway). Champion count:
169 → 172; guessing /loldle k'sante, /loldle bel'veth, /loldle nilah
now resolve correctly.
Column headers now match loldle.net's classic-mode grid verbatim:
Range → Range type, Region → Region(s), Lane → Position(s),
Year → Release year. The champion row header becomes Champion (was
Name). Data field names already matched; only labels diverged.
KV payload cleanup:
- drop lastResultAt from stats (never read)
- drop solved/giveup flags from game state (round is immediately
replaced after finish, making the flags transient noise)
- skip redundant saveGame on winning/giveup/out-of-guesses paths;
startFreshGame overwrites anyway
Code cleanup:
- delete daily.js + daily.test.js (pickDaily/todayUtc were speculative
"future use" — only pickRandom was wired in, inlined into handlers)
- drop the dead switch default in compare.js
- trim file preambles across the module
Docs: rewrite README around current behavior with loldle.net as the
sole data source; update scraper header to match the raw schema.
Round state now keeps `guesses` as a plain string[] (the names the player
tried) instead of caching full comparison results. The board view
rehydrates rows at display time by re-running compareChampions against
the current target.
Smaller KV payloads, and the rendered board always reflects the live
champions.json — useful if a weekly data refresh lands mid-round.
Drop the in-scraper normalization step — champions.json now mirrors the
exact shape emitted by loldle.net's JS bundle. Records use _id,
championId, championName, arrays for positions/species/regions/
range_type, "Male"/"Female"/"Other" gender strings, and a full
YYYY-MM-DD release_date.
Comparison is schema-aware: multi-value keys accept arrays directly,
the year axis parses YYYY out of the ISO date, and exact compares stay
case-insensitive.
Node 24 + wrangler 4.x both accept `import ... with { type: "json" }`,
so the generated champions-data.js wrapper is no longer needed.
Drop scripts/build-loldle-data.js and the build:loldle-data npm script.
Scraper writes champions.json only.
loldle.net's JS bundle ships the complete set of classic-mode axes in
plaintext, so ddragon merging is no longer needed. Scraper now produces
the final schema directly.
Schema changes: drop title, skinCount, image, and genre (ddragon-only).
Replace genre (class tags like Fighter/Mage) with species (Human/Darkin/
Vastayan) — the axis loldle.net actually uses. Promote region to a
multi-value field so multi-region champions compare correctly.
Handlers no longer show "Name — Title" on win/giveup.
First regeneration via scripts/scrape-loldle-data.js. Corrects
multi-region champions (Aatrox runeterra,shurima; Ambessa
piltover,noxus; Ashe noxus,runeterra), fixes lane/region accuracy,
and canonicalizes region slugs (mount-targon -> targon).
Prevents two players in a group from wasting a slot on the same champion —
re-guessing a previously tried champion replies with a hint and does not
consume a guess.
- Send a random sticker from WIN/LOSE/GIVEUP pools before the text reply
(errors swallowed so a rotten file_id never blocks the message).
- Win message includes attempt-count flavor ("First try!" / "Sharp!" /
"Close call!" / "Phew — last one!") and elapsed solve time.
- Lose and giveup messages now also include the champion's title.
- Extract stickers.js and flavor.js so handlers.js stays at ~200 LoC.
Reply to any sticker with /stickerid to get its file_id and
file_unique_id. Collects IDs for hard-coded sticker pools in other
modules (upcoming loldle win/lose/giveup reactions).
Private visibility keeps the command out of /help and the Telegram
slash menu.
Literal <champion> inside an HTML parse_mode reply made Telegram reject
the message as an unknown entity, so the victory / out-of-guesses /
giveup text never reached the user — only the silently-started fresh
round was visible on the next /loldle.
- Remove /loldle_new; finished rounds (solve/giveup/out-of-guesses)
immediately roll into a fresh round.
- Render guesses as an HTML <pre> monospace table with auto-widthed
label column and a 🎯 Name row (uppercase champion name).
- Year direction uses ⬆️ / ⬇️.
The TCBS `apipubaws.tcbs.com.vn` host returns HTTP 404/500 for every
request, so every ticker resolved as "Unknown stock ticker" and /trade_buy
was unusable. Switch price + symbol resolution to the KBS public endpoint
that vnstock currently defaults to (`kbbuddywts.kbsec.com.vn/iis-server/
investment/stocks/{TICKER}/data_day`). KBS needs no auth, returns JSON,
and is Worker-compatible.
- `prices.fetchStockPrice` now queries KBS with a 14-day lookback window
(covers weekends/holidays) and drops the TCBS-specific ×1000 scaling;
KBS returns real VND.
- `symbols.resolveSymbol` delegates to `fetchStockPrice` for existence
checks — empty `data_day` means unknown ticker.
- Update test fetch stubs to match the `kbsec` host and KBS response
shape (`{ symbol, data_day: [{ c }] }`).
The TCBS `apipubaws.tcbs.com.vn` host returns HTTP 404/500 for every
request, so every ticker resolved as "Unknown stock ticker" and /trade_buy
was unusable. Switch price + symbol resolution to the KBS public endpoint
that vnstock currently defaults to (`kbbuddywts.kbsec.com.vn/iis-server/
investment/stocks/{TICKER}/data_day`). KBS needs no auth, returns JSON,
and is Worker-compatible.
- `prices.fetchStockPrice` now queries KBS with a 14-day lookback window
(covers weekends/holidays) and drops the TCBS-specific ×1000 scaling;
KBS returns real VND.
- `symbols.resolveSymbol` delegates to `fetchStockPrice` for existence
checks — empty `data_day` means unknown ticker.
- Update test fetch stubs to match the `kbsec` host and KBS response
shape (`{ symbol, data_day: [{ c }] }`).
No preview deploy stage is used, so the preview_id binding (and its
companion create comment) just advertises a namespace that never sees
traffic. Keep wrangler.toml narrowly scoped to what we actually deploy.
Users expect to scan a single region's upcoming week rather than a daily
fixture board. Reverses the nesting: /lolschedule_week now lists each
major league as a top-level section, then italicised ICT date headers
with that league's matches beneath.
- handlers.js header now reflects fan-out to subscribers, not a single chat.
- README "Time zone" references the correct command names and gains a
Subscribers section; Files section lists subscribers.js.
- formatEventLine's showLeague option is dead in production (renderToday
and renderWeek always group under a league header), so drop it and the
test that covered only the option toggle.
Replaces the single LOLSCHEDULE_CHAT_ID env var with a KV-backed
subscriber list. New commands /lolschedule_subscribe and
/lolschedule_unsubscribe let each chat opt in/out. The cron now fans
out to every subscriber via Promise.allSettled so one blocked chat
cannot break the others.
- Commands renamed: /lol_today → /lolschedule_today, /lol_week → /lolschedule_week.
- Today view groups events under a league header per section.
- Week view nests leagues under each ICT day.
- LEAGUE_ORDER gives tier-1 tournaments priority (Worlds / MSI / First Stand,
then LCK / LPL / LEC / LCS, etc).
- New cron "0 1 * * *" (08:00 ICT) pushes today's major-league schedule to
LOLSCHEDULE_CHAT_ID via the Telegram Bot API. Skips cleanly when chat id
or token is missing, or when today has no major-league matches.
Module header and api-client top comment still mentioned Leaguepedia /
MatchSchedule / keying by league filter. fetchSchedulePage also exposed
an unused `leagueId` parameter and returned an unused `olderToken`;
remove both to match the actual usage.
Leaguepedia's anonymous IP rate limit is too aggressive for a bot even
from CF Worker egress (~1–2 req/min), and authenticated Fandom tokens
don't lift it. Switching to the lolesports.com getSchedule endpoint —
the same data feed powering the official site — removes the limit and
provides richer fields: state (unstarted/inProgress/completed), per-team
result.gameWins and outcome, league metadata, bestOf strategy.
Handlers simplify back to cache-first (120 s fresh / 1 h stale fallback)
with no cron needed. Results are filtered to major leagues (LCK, LPL,
LEC, LCS, worlds, msi, first_stand, LCP, CBLOL, EMEA Masters) to keep
the week view under Telegram's 4096-char message limit.
Swaps the best-effort console.warn for JSON log lines emitted via
console.log so Workers Observability + wrangler tail surface the real
cause (HTTP status, API error info, or non-JSON body) when /lol_today
and /lol_week fall into the error branch.
Captures the feasibility check and auth-token investigation that led to
the lolschedule module: confirmed endpoint + table + field set, and
documented why caching beats token-based rate-limit mitigation on Fandom.
New module exposing /lol_today and /lol_week commands, backed by the
Leaguepedia Cargo API (MatchSchedule table). Renders scores for
played/live matches and ICT times for scheduled ones. Caches range
queries in KV (60s today, 300s week) with stale-fallback on fetch error.
Cross-client column alignment between the marker row and a letter row is
unreliable in Telegram:
- <pre> monospace doesn't enforce equal width for emoji
- fullwidth Latin (U+FF21..FF3A) falls back to base Latin on mobile fonts
- squared-letter emoji (U+1F130/1F170..) render at different intrinsic
widths than color-square emoji (U+1F7E8/1F7E9/2B1C) on many clients
Instead, render each guess as the word on one line followed by the
colored marker row — the standard NYT Wordle share format:
CRANE
🟩🟨⬜🟩🟩
The association between a letter and its color is visually unambiguous
without depending on character-column alignment. Also drops the HTML
parse_mode requirement — replies are plain text again.
Previous fullwidth-Latin approach (U+FF21..FF3A) failed on mobile
Telegram clients because their monospace fonts don't ship fullwidth
glyphs — the codepoints fall back to base Latin at 1 cell, making the
letter row half as wide as the marker row.
Switch to Negative Squared Latin Capital Letters (U+1F170..1F189,
🅰🅱🅲..🆉) — these are emoji-class characters, so both rows are
drawn by the same emoji font at the same cell width. Column alignment
becomes a property of Unicode, not of the client's monospace font.
🟩🟨⬜🟩🟩
🅲🆁🅰🅽🅴
Color-square emoji (🟩🟨⬜) render at ~2 monospace cells wide in
Telegram's <pre> blocks; the previous letter row used " X " (3 cells
per letter = 15 cells total for a 5-letter word) against markers that
only span ~10 cells, so columns drifted.
Switching letters to fullwidth Latin (U+FF21..U+FF3A, e.g. 'A' instead
of 'A') puts each letter at exactly 2 cells via East Asian Width =
Fullwidth, matching one emoji per letter with no padding:
🟩🟨⬜🟩🟩
CRANE
No spacing heuristics — alignment is a consequence of character width,
which every monospace-capable Telegram client respects.
Wrap renderGuess / renderBoard output in <pre> and send replies with
parse_mode: HTML. In Telegram's monospace font each " X " letter cell
is 3 characters wide, which is roughly the width of a single emoji
marker, so colored squares stack cleanly over the letter they score.
No user-controlled content lands inside the <pre> (guesses are
validated [a-z]{5}), so no HTML escaping is needed in the grid. The
inline <code>/wordle <word></code> placeholder is properly
entity-encoded for HTML parse mode.
Code fixes:
- trading/handlers + stats-handler: guard ctx.from?.id to prevent
cross-user state corruption when channel posts or inline queries lack
a sender
- trading/prices + trading/symbols: encodeURIComponent on ticker before
interpolating into TCBS API URLs
- trading/stats-handler: parallelize per-stock price fetches with
Promise.allSettled so N-stock portfolios don't stack serial latency
- loldle/handlers: guard target champion lookup against champions.json
refresh drift — start a fresh round or fall back to the stored id
- wordle + loldle: explicitly initialize giveup:false in startFreshGame
for stable state shape
- wordle/lookup: fix stale JSDoc that claimed null return
- biome: ignore auto-generated champions.json / champions-data.js /
words-data.js
- Apply formatter to src/index.js, loldle/handlers.js imports, and
loldle/compare.test.js (previously red)
Docs refresh:
- README: 105+ tests -> 200+; wordle/loldle described as real modules
- architecture: module tree updated, test count 105 -> 200, runtime
~500ms -> ~2s, stub list narrowed to misc only
- codebase-summary: module table rewritten (wordle/loldle now Complete
with real command lists and KV schema); test coverage table updated
- loldle/README: full rewrite matching the current implementation
(was describing the original stub)
- New docs/development-roadmap.md tracking upcoming features
(daily-mode for wordle + loldle, crypto/gold/forex trading, shared
picker util, handler-level tests, coverage reporting, staging env)
Tests: 200/200 passing. Lint: clean.
Replaces the wordle stub with a full implementation mirroring the loldle
module layout: compare/lookup/daily/render/state/handlers/index split,
per-subject KV state, standard 6 guesses, two-pass duplicate-letter
marking.
Commands: /wordle, /wordle_new, /wordle_giveup, /wordle_stats.
Word list (14,855 entries) sourced from dracos's gist
(https://gist.github.com/dracos/dd0668f281e685bad51479e5acaadb93) and
bundled via scripts/build-wordle-data.js. Credits in module README and
generated file headers.
Dispatcher test updated for the new command count (12 → 13).