Closes deferred phases 04 + 05 of loldle-new-modes plan.
- loldle-ability: 5 guesses, DDragon ability icon as photo. State pins
slot (P/Q/W/E/R) so the same icon shows every turn. Abilities pulled
from DDragon per-champion — same source loldle.net uses at runtime.
- loldle-splash: 4 guesses, random skin splash as photo. Skin pool
scraped from loldle.net bundle (var Ad=[…] — 172 champs × 1939 skins,
non-chroma, matches their splash mode exactly). URLs from Riot
DDragon CDN (no version segment, stable across patches).
- fetch-ddragon-data.js: extended to write all four JSONs in one run.
Shares a single DDragon per-champion fetch cycle (concurrency 10).
- Credits loldle.net + Riot Games in all loldle-family READMEs.
19 new tests (503 total). Lint clean. register:dry reports 12 loldle_*
commands with no conflicts.
Ship two new loldle-family modules mirroring loldle.net's non-classic
modes. Text-only MVP (ability/splash phases stay deferred).
- loldle-emoji: 5 guesses, emoji-sequence clue. Pool derived algorithmically
from classic's champions.json metadata (species/region/resource mapping
table) since loldle.net's bundle has no static emoji pool.
- loldle-quote: 6 guesses, lore-blurb clue. Pool seeded from Data Dragon
champion title + first lore sentence; champion name redacted to ___.
- scripts/fetch-ddragon-data.js: single generator for both JSONs.
- src/util/normalize-name.js: shared lookup helper; loldle/lookup.js
refactored to import it.
35 new tests (484 total passing). Lint clean.
Self-review of the prior cleanup commit caught one omission — src/types.js
(central JSDoc typedefs file: Env, Module, Command, Cron, …) was listed in
the top-level README but absent from docs/architecture.md's src/ tree.
Previously seeds carried hand-curated {category, target, initialHint}.
Now SEEDS is a flat string[] of keywords — at round-start, the model
generates {category, initialHint} on the fly. Benefits:
- adding a seed is trivial (just append a word)
- every round gets a fresh cryptic opener (varies across plays of the
same word)
- HINT STYLE rules apply to the opening hint too, so the initial clue
isn't a definitional giveaway
Implementation:
- prompts.buildStartRoundPrompt(target) — with good/bad examples
- ai-client.generateRoundStart(env, target) — same JSON-in-content
approach as judge(), with defensive fallbacks + redactSecret
- handlers.startFreshGame now async; surfaces roundstart errors via the
existing UPSTREAM_FAIL path
Tests: 449 pass (5 new for generateRoundStart, 1 for roundstart error path).
Production showed: Request timed out after 10000 ms / status 500.
grammY's webhookCallback defaults to 10s — fine for simple handlers but
too tight for twentyq's Workers AI call (Gemma 4 26B cold-starts can
easily exceed 10s). Raise to 25s, leaving 5s headroom under Cloudflare
Workers' 30s wall-clock cap.
Player feedback: hints were too clear — gave away the answer in one or two
turns because the model was leaning on "it is used for X" / category-word
phrasings.
Reworked the hint-style section of the system prompt to force the model
toward indirect, riddle-style, lateral facts. Added good/bad example pairs
(secret="organ") so the model has concrete contrast to pattern-match.
No schema change — tests unaffected (444 pass).
Gemma 4 likely rejects the flat "traditional" tools schema we were sending
(the docs use OpenAI-wrapped shape for this model) — causing env.AI.run to
throw and users to see the "AI service hiccup" reply every turn.
Switch to the universal approach:
- system prompt asks the model for a one-line JSON {is_guess, answer, hint}
- ai-client.extractText handles both Workers-AI and OpenAI response shapes
- parseJudgementJson walks brace-depth to extract JSON from stray prose /
accidental code fences
- logs twentyq_ai_throw / twentyq_ai_unparseable with preview on failure
so future issues surface in wrangler tail immediately
Tests: 7 new (parser + extractText); 444 total pass.
Uses phow2sim /neighbors. Filters out capitalized foreign place names
that leak in from the corpus (e.g. al-Qantara, Nam_Afrin) and requires
tokens look Vietnamese (diacritic or underscore compound) to dodge
pure-ASCII junk like "adiyeh". Samples 3 from the tail after skipping
the top 20% so the hint doesn't give away the answer.
Sigmoid was inherited from semantle where bge-m3's narrow cone (unrelated
pairs at 0.40-0.55) needed spreading. phow2sim cosines span 0.0-0.8
naturally, so a linear map is honest and free of magic constants. Kept
the emoji buckets — they already work well against raw percentages.
format.js was inherited from semantle (bge-m3 transformer) whose raw
cosines live in a narrow 0.4-0.55 band for unrelated words. phow2sim
runs on PhoW2V word2vec — related pairs sit at 0.3-0.5, synonyms at
0.55-0.80 — so the FLOOR=0.4 cutoff was dumping real signal (làng/đất
=0.38, làng/phố=0.38) to a displayed 0.
Retune: FLOOR=0.1, CENTER=0.4, SCALE=6. Now 0.38 → 39, 0.52 → 64, 0.80 → 93.
Pre-phow2sim games (Workers AI era) left targets in KV that phow2sim
doesn't know. The API returned in_vocab_a:false, similarity:null, which
our handler misread as a guess-OOV and blamed the player's word. Now we
detect target-OOV explicitly, wipe the stale round, and prompt the user
to start fresh.
Default /random pulled from the full Vietnamese corpus (rank 40k+ words
like "sa_mạc_hoá" showed up), which made rounds unplayable for casual
speakers. Filter targets to min_rank=100, max_rank=1000 so words stay
recognizable.
`npm run register` imports buildRegistry to derive the public command list
but ran outside the Worker runtime, so `env.AI` was undefined — semantle
(and previously doantu) tripped `createClient` type-checks. Add a no-op
AI stub alongside stubKv and wire it through the buildRegistry env.
Doantu now mirrors semantle's pre-Workers-AI shape: a thin fetch wrapper
around /random + /similarity on https://phow2sim.sg.miti99.com (overridable
via PHOW2SIM_API_URL). Drops the local Viet22K wordlist + build script —
the service owns vocabulary now. Promotes commands from protected to
public so they show up in Telegram's native / menu.
BGE embeddings occupy a narrow cone in vector space, so raw cosine of
two unrelated words already sits at ~0.40-0.55. Displaying `raw * 100`
made every random guess read as 40-70% warm, which defeated the warmth
UX.
format.js now applies a normalized sigmoid (FLOOR 0.40, CENTER 0.60,
SCALE 8) to remap raw cosine → displayed 0-100. Unrelated pairs drop
to ≤30, loose relation lands around 40-55, clear synonyms hit 85+, and
exact match stays at 100. Emoji buckets were rebased onto the calibrated
score; formatWarmth lost its sign column (calibrated output is always
non-negative).
render.js rounds once and feeds the integer to both formatWarmth and
warmthEmoji so the display value and bucket stay in sync.
Constants are empirical — retune if swapping to a non-BGE model.
Aligns semantle with doantu so both modules share one Workers AI model.
bge-m3 is multilingual and cheaper (1075 N/M input tokens vs 1841 N/M)
and produces 1024-dim vectors. Updates the api-client default, test
fake-vector dimensions, README, index.js doc comment, and the
wrangler.toml [ai] binding comment (Neurons/day budget recomputed).
Now that both modules run on Workers AI embeddings, drop the legacy
Word2SimError alias, the unused wordlist helpers (getLine, LINE_COUNT,
pickFromPool), and every comment/README section still describing the
removed ConceptNet backend. Fix the bge-small doc typo in semantle/index.js
and align the semantle api-client test fake-vector dim with the real
384-dim output.
Mirror the semantle migration but with @cf/baai/bge-m3 — BAAI's
multilingual embedding model — because the English-only BGE variants
can't produce meaningful Vietnamese vectors (their tokenizer shreds
diacritics into noisy byte-level subwords).
bge-m3 is trained across 194 languages incl. Vietnamese and is
actually cheaper in Neurons (1,075 vs 1,841 per M tokens for
bge-small-en-v1.5). Vocab check reuses the local Viet22K wordlist as
an in-memory Set — O(1) OOV detection, no upstream call.
Also add a test file for the module (mirrors semantle coverage plus
Vietnamese-specific cases: diacritics, multi-syllable compounds).
ConceptNet (api.conceptnet.io) was returning sustained 502s, breaking
every guess with an "Upstream hiccup" reply. Replace with env.AI.run
on @cf/baai/bge-small-en-v1.5 and score guesses by computing cosine
similarity locally against the target vector.
The local google-10k wordlist doubles as the in/out-of-vocabulary set,
so OOV detection is an O(1) Set.has() with no upstream call. The
similarity() response shape is unchanged, so handlers/render/state
stay as-is.
Free on the Workers Free plan: 10k Neurons/day cap, ~0.0037 Neurons
per 2-word guess → ~2.7M guesses/day headroom for this bot.
Near-clone of the semantle module, adapted for Vietnamese:
- Targets from duyet/vietnamese-wordlist Viet22K (~22k entries, GPL).
Regenerate via scripts/build-doantu-words.js; chained into npm run build.
- ConceptNet client uses /c/vi/<term> URIs; multi-word guesses (e.g.
"con chó") are space-to-underscore converted at URL build time so the
board keeps the natural display.
- lookup.js permits Unicode letters + combining marks + single internal
spaces; rejects digits/punctuation.
- All three commands (/doantu, /doantu_giveup, /doantu_stats) are
visibility=protected — shown in /help, hidden from Telegram's native /
autocomplete menu while the module is still experimental.
Wired into src/modules/index.js, wrangler.toml MODULES, .env.deploy(.example),
and package.json build chain.
Separate module rather than a shared base with semantle — matches the
repo's one-module-per-game convention (see loldle vs wordle); factor later
if a third language appears.
Use the full google-10000-english list verbatim (normalize only —
lowercase + dedupe, no length or alpha filtering). Pool goes from 7953
to 9894 entries; rare/short/long picks are still sieved by ConceptNet's
verify-and-fallback at round start.
Replaces TARGET_POOL/pickFromPool with a clearer line-based API:
LINE_COUNT — how many entries
randomLine() — uniform pick
getLine(n) — nth entry (n = frequency rank)
pickFromPool retained as a back-compat re-export so existing callers
don't break.
The ~250-word hand-curated TARGET_POOL was too small for long-term play.
Replaces it with a build-script-generated dictionary:
- scripts/build-semantle-words.js fetches first20hours/google-10000-english
(no-swears variant), filters to 4–10 ASCII letters, drops the top-200
most frequent function words, and writes src/modules/semantle/words-data.js
as a static ES-module export.
- wordlist.js now just re-exports that data via TARGET_POOL + pickFromPool.
- package.json: new build:semantle-words script; chained into `npm run build`
alongside build:wordle-data so `npm run deploy` regenerates automatically.
Pool size: ~250 → 7953 words. Same ConceptNet verify-and-fallback flow, so
low-quality picks still cost at most one extra concept lookup.
ConceptNet provides a free public /relatedness endpoint (returns cosine-like
[-1, 1]) and /c/en/{term} for vocabulary check. No random-word endpoint, so
we ship a curated local target pool in wordlist.js (~250 words) and verify
each pick via the concept endpoint with a fallback to an unverified pick.
Each guess now makes two parallel ConceptNet calls (concept + relatedness)
instead of a single word2sim call. Slightly higher latency but zero hosting
cost and no dependency on the self-hosted word2sim instance.
- api-client.js rewritten; UpstreamError replaces Word2SimError (aliased
for backwards compat with older imports).
- wordlist.js added (curated target pool + pickFromPool).
- handlers.js: drops RANDOM_FILTERS (no filtering needed; pool is curated).
- index.js: drops WORD2SIM_API_URL env var; ConceptNet base hardcoded.
- wrangler.toml + .dev.vars.example: drop WORD2SIM_API_URL.
- api-client tests rewritten for ConceptNet shape; total tests 336 → 341.
Giveup already auto-starts a fresh round on next /semantle, so /semantle_new
was redundant. Duplicate guesses now match loldle's behavior: reply with
"🔁 already guessed" and skip the similarity API call (fast-path dedup
against prior word or canonical, with a post-API fallback for different
inputs that canonicalize to the same token).
Telegram commands /semantle, /semantle_new, /semantle_giveup, /semantle_stats.
Round starts with /random pick from hosted word2sim; each guess scored via
/similarity. Unlimited guesses; solve on case-insensitive exact match.
New env var WORD2SIM_API_URL (wrangler.toml, .env.deploy). Includes
module README and 90 unit tests covering api-client, state, format,
render, and handlers.
Previously startFreshGame was called at the tail of every win/lose/giveup
path, stamping startedAt to that moment — so the clock accrued while the
player was away between rounds. Now:
- round-ending paths call clearGame (new helper in state.js), deleting the
KV record instead of pre-creating the next round
- getOrInitGame lazily creates the next round on the player's next /loldle
call, with startedAt: null
- the first actual guess inside handleLoldle stamps startedAt = Date.now()
Viewing an empty board gives no hints, so it shouldn't count against the
clock. handleGiveup no longer auto-creates a fresh round and now reports
"No active round" when called with nothing in progress.
Broaden `npm run format` / `npm run lint` to biome's full scan (`.`)
instead of a fixed src/tests/scripts list, so root-level files and any
new top-level directories stay formatted. Drop the stale ignore entry
for the deleted champions-data.js.
loldle.net's classic-mode bundle has two record shapes — older champions
carry _id/championId, newer ones (Bel'Veth, K'Sante, Nilah, …) don't.
The regex required those leading fields, silently dropping anyone added
since 2022.
Make _id/championId optional and non-capturing, and drop them from the
output record (the bot never read them anyway). Champion count:
169 → 172; guessing /loldle k'sante, /loldle bel'veth, /loldle nilah
now resolve correctly.
Column headers now match loldle.net's classic-mode grid verbatim:
Range → Range type, Region → Region(s), Lane → Position(s),
Year → Release year. The champion row header becomes Champion (was
Name). Data field names already matched; only labels diverged.
KV payload cleanup:
- drop lastResultAt from stats (never read)
- drop solved/giveup flags from game state (round is immediately
replaced after finish, making the flags transient noise)
- skip redundant saveGame on winning/giveup/out-of-guesses paths;
startFreshGame overwrites anyway
Code cleanup:
- delete daily.js + daily.test.js (pickDaily/todayUtc were speculative
"future use" — only pickRandom was wired in, inlined into handlers)
- drop the dead switch default in compare.js
- trim file preambles across the module
Docs: rewrite README around current behavior with loldle.net as the
sole data source; update scraper header to match the raw schema.
Round state now keeps `guesses` as a plain string[] (the names the player
tried) instead of caching full comparison results. The board view
rehydrates rows at display time by re-running compareChampions against
the current target.
Smaller KV payloads, and the rendered board always reflects the live
champions.json — useful if a weekly data refresh lands mid-round.
Drop the in-scraper normalization step — champions.json now mirrors the
exact shape emitted by loldle.net's JS bundle. Records use _id,
championId, championName, arrays for positions/species/regions/
range_type, "Male"/"Female"/"Other" gender strings, and a full
YYYY-MM-DD release_date.
Comparison is schema-aware: multi-value keys accept arrays directly,
the year axis parses YYYY out of the ISO date, and exact compares stay
case-insensitive.
Node 24 + wrangler 4.x both accept `import ... with { type: "json" }`,
so the generated champions-data.js wrapper is no longer needed.
Drop scripts/build-loldle-data.js and the build:loldle-data npm script.
Scraper writes champions.json only.
loldle.net's JS bundle ships the complete set of classic-mode axes in
plaintext, so ddragon merging is no longer needed. Scraper now produces
the final schema directly.
Schema changes: drop title, skinCount, image, and genre (ddragon-only).
Replace genre (class tags like Fighter/Mage) with species (Human/Darkin/
Vastayan) — the axis loldle.net actually uses. Promote region to a
multi-value field so multi-region champions compare correctly.
Handlers no longer show "Name — Title" on win/giveup.
First regeneration via scripts/scrape-loldle-data.js. Corrects
multi-region champions (Aatrox runeterra,shurima; Ambessa
piltover,noxus; Ashe noxus,runeterra), fixes lane/region accuracy,
and canonicalizes region slugs (mount-targon -> targon).
Prevents two players in a group from wasting a slot on the same champion —
re-guessing a previously tried champion replies with a hint and does not
consume a guess.
- Send a random sticker from WIN/LOSE/GIVEUP pools before the text reply
(errors swallowed so a rotten file_id never blocks the message).
- Win message includes attempt-count flavor ("First try!" / "Sharp!" /
"Close call!" / "Phew — last one!") and elapsed solve time.
- Lose and giveup messages now also include the champion's title.
- Extract stickers.js and flavor.js so handlers.js stays at ~200 LoC.
Reply to any sticker with /stickerid to get its file_id and
file_unique_id. Collects IDs for hard-coded sticker pools in other
modules (upcoming loldle win/lose/giveup reactions).
Private visibility keeps the command out of /help and the Telegram
slash menu.
Literal <champion> inside an HTML parse_mode reply made Telegram reject
the message as an unknown entity, so the victory / out-of-guesses /
giveup text never reached the user — only the silently-started fresh
round was visible on the next /loldle.
- Remove /loldle_new; finished rounds (solve/giveup/out-of-guesses)
immediately roll into a fresh round.
- Render guesses as an HTML <pre> monospace table with auto-widthed
label column and a 🎯 Name row (uppercase champion name).
- Year direction uses ⬆️ / ⬇️.
The TCBS `apipubaws.tcbs.com.vn` host returns HTTP 404/500 for every
request, so every ticker resolved as "Unknown stock ticker" and /trade_buy
was unusable. Switch price + symbol resolution to the KBS public endpoint
that vnstock currently defaults to (`kbbuddywts.kbsec.com.vn/iis-server/
investment/stocks/{TICKER}/data_day`). KBS needs no auth, returns JSON,
and is Worker-compatible.
- `prices.fetchStockPrice` now queries KBS with a 14-day lookback window
(covers weekends/holidays) and drops the TCBS-specific ×1000 scaling;
KBS returns real VND.
- `symbols.resolveSymbol` delegates to `fetchStockPrice` for existence
checks — empty `data_day` means unknown ticker.
- Update test fetch stubs to match the `kbsec` host and KBS response
shape (`{ symbol, data_day: [{ c }] }`).
The TCBS `apipubaws.tcbs.com.vn` host returns HTTP 404/500 for every
request, so every ticker resolved as "Unknown stock ticker" and /trade_buy
was unusable. Switch price + symbol resolution to the KBS public endpoint
that vnstock currently defaults to (`kbbuddywts.kbsec.com.vn/iis-server/
investment/stocks/{TICKER}/data_day`). KBS needs no auth, returns JSON,
and is Worker-compatible.
- `prices.fetchStockPrice` now queries KBS with a 14-day lookback window
(covers weekends/holidays) and drops the TCBS-specific ×1000 scaling;
KBS returns real VND.
- `symbols.resolveSymbol` delegates to `fetchStockPrice` for existence
checks — empty `data_day` means unknown ticker.
- Update test fetch stubs to match the `kbsec` host and KBS response
shape (`{ symbol, data_day: [{ c }] }`).