Roll classic loldle back to 8 (from 6) and emoji to 5 (from 4). The
/<module>_setmax override command stays — chats that want tighter limits
can opt in instead of having defaults forced on them.
Drop classic loldle from 8 → 6 (7-axis grid leaks too much per guess for 8
to feel earned) and emoji from 5 → 4 (3 emojis are usually unmistakable).
Add a hidden /<module>_setmax <n> command per loldle module so a chat can
override its own round length (1-10). Override stored at config:<subject>
in each module's KV; getMaxGuesses() falls back to the default when unset.
Removed reports that documented work already shipped and not tied to
any archived plan dir:
- docs-manager-260420-2151-documentation-audit.md (one-off doc audit)
- researcher-260421-0845-leaguepedia-api-verification.md (lolschedule research)
- researcher-260421-0909-leaguepedia-auth-token.md (lolschedule research)
The lolschedule module is in src/modules/lolschedule/ but never had a
discrete plan dir. Findings from the leaguepedia reports are reflected
in the live module code; the markdown is no longer load-bearing.
plans/reports/ now contains only the 6 Atlas migration reports for the
active plan.
Six research reports were sitting in plans/reports/ but tied to
already-archived plans. Move each under the matching archived plan
dir so the archive is self-contained and plans/reports/ only holds
reports for in-flight or unarchived work.
Moves
- researcher-260422-2329-semantle-api-alternatives.md
- researcher-260423-0025-bge-m3-cosine-calibration.md
- researcher-260423-1110-vietnamese-embeddings-semantle.md
→ plans/archive/260422-2128-semantle-module/reports/
- researcher-260424-2215-loldle-ability-splash-modes.md
- researcher-260424-2215-loldle-emoji-and-modes-overview.md
- researcher-260424-2215-loldle-quote-mode.md
→ plans/archive/260424-2215-loldle-new-modes/reports/ (new dir)
Stays in plans/reports/
- 6 Atlas migration reports (active plan)
- docs-manager-260420-2151-documentation-audit.md (general audit, no
discrete plan home)
- researcher-260421-0845-leaguepedia-api-verification.md
- researcher-260421-0909-leaguepedia-auth-token.md
(lolschedule research; no archived lolschedule plan exists)
No code changes.
Three feature plans (semantle, twentyq, loldle-new-modes) are
status:completed in their frontmatter and the corresponding modules
exist in src/modules/. Move them to plans/archive/ to keep the active
plans/ dir focused on in-flight work.
Atlas migration (260425-1945-mongodb-atlas-migration/plan.md): bump
status from `planning` to `code-complete` and annotate each phase row
with its commit SHA + whether operator action is still pending. Plan
stays in active plans/ until cutover lands or the Upstash standby
(phase-07-alt-pivot.md) executes.
No code changes. Tests, lint, register:dry unaffected (733 passing).
Operator-facing summary in the plan.md status note: 8 phases of
implementation are committed on dev (6f0b5ff..e2e3112). Outstanding
operator work: Atlas provisioning, real-cluster smoke tests,
backfill runs, soak, cutover stages, Stage 3 code cleanup.
Pre-execution prerequisites for the Phase 07 cutover. Stage 2 of the
cutover keeps DUAL_WRITE=0 for ~6 days; if anything regresses during
that window the operator MUST be able to roll back to KV/D1 with the
last N days of Mongo-only writes recovered. Pre-building these scripts
(per code-reviewer #4) eliminates "draft a backfill under outage
pressure" — the anti-pattern of writing untested code at 4am.
Reverse-backfill
- scripts/backfill-mongo-to-kv.js: full-scan Mongo collection per module,
PUT each doc back to CF KV via REST. expiresAt → expirationTtl (clamped
to 60s minimum per CF KV); already-expired docs are skipped (won't
resurrect dead state). 50 ops/sec throttle. --dry-run + --module flags.
- scripts/backfill-mongo-to-d1.js: full-scan trading_trades, build INSERT
SQL preserving legacy_id where present (round-trips D1 autoincrement IDs
preserved by phase-05 forward backfill). Sequential int generation for
any docs without legacy_id. Pipes through wrangler d1 execute.
- scripts/lib/migration-helpers.js: cfKvPut helper added.
Delete guard (debugger #12)
- scripts/wrangler-delete-guard.sh: interactive CONFIRM wrapper around
wrangler kv namespace delete + wrangler d1 delete. Exits 3 when stdin
is not a tty so it cannot run in CI. Documented: never run in CI.
package.json: backfill:mongo:kv[:dry] + backfill:mongo:d1[:dry] scripts
wired.
Tests: 697 → 733 (+36).
- 7 cfKvPut tests (REST URL, querystring, body, expiration_ttl param).
- 10 reverse-KV TTL math tests (expired sentinel, future seconds, no-TTL,
CF 60s minimum clamp).
- 9 reverse-D1 SQL construction tests (escaping, legacy_id preservation,
sequential generation).
Lint clean. No Worker code touched. Stage 1 cutover, 7-day soak,
snapshots, and Stage 3 cleanup (delete CFKVStore + simplify factories +
edit package.json deploy chain) remain operator-driven and will be
committed separately after binding deletion.
Code prerequisites for the Phase 06 cold-start soak gate. The 24-72h soak
itself is operator-run; this commit ships the instrumentation + analysis
tools needed to make the PROCEED-or-PIVOT decision.
Telemetry
- src/util/timing.js: startTiming(cmd) returns {mark, end} that emits a
structured cmd_timing log. takeColdFlag() returns {cold, isolateAgeMs}
using a module-scoped boolean — first request in an isolate is cold,
subsequent are warm. This replaces the originally-planned
isolate_age_ms < 200ms classifier (broken because Mongo cold-connect
itself is ~1500ms; cold requests would always bucket as warm —
code-reviewer #11).
- src/util/request-context.js: setLastCold/getLastCold shared state
bridges fetch-level cold detection into the dispatcher middleware
without a circular import.
- src/index.js: takeColdFlag at the top of fetch() emits a request log
and primes the request context for the dispatcher.
- src/modules/dispatcher.js: bot.use() middleware times every command.
Chosen over per-handler wrapping to preserve the existing identity
assertion in tests (handler === reg.allCommands.get(name).cmd.handler)
— single instrumentation point, no contract change.
Soak tools (operator-run)
- scripts/analyze-soak.js: parses CF Logs export (NDJSON or CSV), filters
cmd_timing events, computes p50/p95/p99 per (cmd, cold/warm). Counts
dual-write secondary failures, mongo connection errors, CPU-time
exceeded events. Writes markdown report.
- scripts/synthetic-burst.js: fires N parallel synthetic Telegram updates
at the deployed Worker URL with cache-busting tokens. Used for the
pre-deploy connection-cap stress test (debugger #2 — 20 parallel cold
requests, abort if Atlas peak > 60% of 500-conn cap).
- package.json: analyze:soak + burst:synthetic scripts wired.
Tests
- tests/util/timing.test.js: 8 tests — timing semantics, cold flag flip.
- tests/scripts/analyze-soak.test.js: 22 tests — percentile math, NDJSON
+ CSV parse, aggregation, markdown formatting.
Tests: 667 → 697 (+30). Lint clean.
Operator runbook for Phase 06 (NOT executed by this commit):
1. Verify telemetry live via wrangler tail.
2. Run synthetic burst test: npm run burst:synthetic -- --url <prod>
3. Configure Atlas + CF Observability email alerts.
4. 24h soak (extend to 72h on stop-conditions per phase plan).
5. Daily npm run verify:mongo.
6. npm run analyze:soak -- --input <cf-logs.json> → soak-decision.md.
7. PROCEED to Phase 07 if cold-start P95 ≤ 2.5 × BASELINE_COLD_PING_MS;
else execute phase-07-alt-pivot.md (Upstash standby).
Implements the KVStore interface against MongoDB Atlas with full behavioral
parity vs CFKVStore (null-on-missing, swallow-corrupt-JSON, idempotent delete,
throw-on-undefined-putJSON). Not wired into the request path yet — Phase 04
adds dual-write wrappers and factory routing.
- src/db/mongo-client.js: memoized MongoClient + getDb(env). On connect()
reject, nulls both client and connectPromise so next call retries cleanly
(regression-tested). Catches MongoServerSelectionError and emits a
structured warning before rethrow so callers can map to 503.
- src/db/mongo-kv-store.js: KVStore impl. get/getJSON filter on expiresAt
at read time to close the up-to-60s TTL-sweeper stale-read window vs
CFKVStore. list() returns keys WITH prefix preserved (parity — wrapper
in create-store.js:65 strips). Cursor pagination via sorted _id +
limit(N+1), NOT skip(). Lazy ensureIndex per (collection, isolate)
tracked in module-scope Set.
- src/db/mongo-list-cursor.js: extracted cursor encode/decode to keep
mongo-kv-store.js under 200 LOC.
- tests/fakes/fake-mongo.js: Map-backed fake covering the surface needed
by both Phase 02 (KVStore) and Phase 03 (MongoTradesStore).
- tests/db/mongo-kv-store.test.js: 26 tests, including TTL stale-read
regression (1s TTL + time advance), 2-level prefix list regression,
cursor pagination, connect-reject retry, MongoServerSelectionError
structured log.
Tests: 503 → 529 (+26). Lint clean.
Closes deferred phases 04 + 05 of loldle-new-modes plan.
- loldle-ability: 5 guesses, DDragon ability icon as photo. State pins
slot (P/Q/W/E/R) so the same icon shows every turn. Abilities pulled
from DDragon per-champion — same source loldle.net uses at runtime.
- loldle-splash: 4 guesses, random skin splash as photo. Skin pool
scraped from loldle.net bundle (var Ad=[…] — 172 champs × 1939 skins,
non-chroma, matches their splash mode exactly). URLs from Riot
DDragon CDN (no version segment, stable across patches).
- fetch-ddragon-data.js: extended to write all four JSONs in one run.
Shares a single DDragon per-champion fetch cycle (concurrency 10).
- Credits loldle.net + Riot Games in all loldle-family READMEs.
19 new tests (503 total). Lint clean. register:dry reports 12 loldle_*
commands with no conflicts.
Ship two new loldle-family modules mirroring loldle.net's non-classic
modes. Text-only MVP (ability/splash phases stay deferred).
- loldle-emoji: 5 guesses, emoji-sequence clue. Pool derived algorithmically
from classic's champions.json metadata (species/region/resource mapping
table) since loldle.net's bundle has no static emoji pool.
- loldle-quote: 6 guesses, lore-blurb clue. Pool seeded from Data Dragon
champion title + first lore sentence; champion name redacted to ___.
- scripts/fetch-ddragon-data.js: single generator for both JSONs.
- src/util/normalize-name.js: shared lookup helper; loldle/lookup.js
refactored to import it.
35 new tests (484 total passing). Lint clean.
Self-review of the prior cleanup commit caught one omission — src/types.js
(central JSDoc typedefs file: Env, Module, Command, Cron, …) was listed in
the top-level README but absent from docs/architecture.md's src/ tree.
Previously seeds carried hand-curated {category, target, initialHint}.
Now SEEDS is a flat string[] of keywords — at round-start, the model
generates {category, initialHint} on the fly. Benefits:
- adding a seed is trivial (just append a word)
- every round gets a fresh cryptic opener (varies across plays of the
same word)
- HINT STYLE rules apply to the opening hint too, so the initial clue
isn't a definitional giveaway
Implementation:
- prompts.buildStartRoundPrompt(target) — with good/bad examples
- ai-client.generateRoundStart(env, target) — same JSON-in-content
approach as judge(), with defensive fallbacks + redactSecret
- handlers.startFreshGame now async; surfaces roundstart errors via the
existing UPSTREAM_FAIL path
Tests: 449 pass (5 new for generateRoundStart, 1 for roundstart error path).
Production showed: Request timed out after 10000 ms / status 500.
grammY's webhookCallback defaults to 10s — fine for simple handlers but
too tight for twentyq's Workers AI call (Gemma 4 26B cold-starts can
easily exceed 10s). Raise to 25s, leaving 5s headroom under Cloudflare
Workers' 30s wall-clock cap.
Player feedback: hints were too clear — gave away the answer in one or two
turns because the model was leaning on "it is used for X" / category-word
phrasings.
Reworked the hint-style section of the system prompt to force the model
toward indirect, riddle-style, lateral facts. Added good/bad example pairs
(secret="organ") so the model has concrete contrast to pattern-match.
No schema change — tests unaffected (444 pass).
Gemma 4 likely rejects the flat "traditional" tools schema we were sending
(the docs use OpenAI-wrapped shape for this model) — causing env.AI.run to
throw and users to see the "AI service hiccup" reply every turn.
Switch to the universal approach:
- system prompt asks the model for a one-line JSON {is_guess, answer, hint}
- ai-client.extractText handles both Workers-AI and OpenAI response shapes
- parseJudgementJson walks brace-depth to extract JSON from stray prose /
accidental code fences
- logs twentyq_ai_throw / twentyq_ai_unparseable with preview on failure
so future issues surface in wrangler tail immediately
Tests: 7 new (parser + extractText); 444 total pass.
Uses phow2sim /neighbors. Filters out capitalized foreign place names
that leak in from the corpus (e.g. al-Qantara, Nam_Afrin) and requires
tokens look Vietnamese (diacritic or underscore compound) to dodge
pure-ASCII junk like "adiyeh". Samples 3 from the tail after skipping
the top 20% so the hint doesn't give away the answer.
Sigmoid was inherited from semantle where bge-m3's narrow cone (unrelated
pairs at 0.40-0.55) needed spreading. phow2sim cosines span 0.0-0.8
naturally, so a linear map is honest and free of magic constants. Kept
the emoji buckets — they already work well against raw percentages.
format.js was inherited from semantle (bge-m3 transformer) whose raw
cosines live in a narrow 0.4-0.55 band for unrelated words. phow2sim
runs on PhoW2V word2vec — related pairs sit at 0.3-0.5, synonyms at
0.55-0.80 — so the FLOOR=0.4 cutoff was dumping real signal (làng/đất
=0.38, làng/phố=0.38) to a displayed 0.
Retune: FLOOR=0.1, CENTER=0.4, SCALE=6. Now 0.38 → 39, 0.52 → 64, 0.80 → 93.
Pre-phow2sim games (Workers AI era) left targets in KV that phow2sim
doesn't know. The API returned in_vocab_a:false, similarity:null, which
our handler misread as a guess-OOV and blamed the player's word. Now we
detect target-OOV explicitly, wipe the stale round, and prompt the user
to start fresh.
Default /random pulled from the full Vietnamese corpus (rank 40k+ words
like "sa_mạc_hoá" showed up), which made rounds unplayable for casual
speakers. Filter targets to min_rank=100, max_rank=1000 so words stay
recognizable.
`npm run register` imports buildRegistry to derive the public command list
but ran outside the Worker runtime, so `env.AI` was undefined — semantle
(and previously doantu) tripped `createClient` type-checks. Add a no-op
AI stub alongside stubKv and wire it through the buildRegistry env.
Doantu now mirrors semantle's pre-Workers-AI shape: a thin fetch wrapper
around /random + /similarity on https://phow2sim.sg.miti99.com (overridable
via PHOW2SIM_API_URL). Drops the local Viet22K wordlist + build script —
the service owns vocabulary now. Promotes commands from protected to
public so they show up in Telegram's native / menu.
BGE embeddings occupy a narrow cone in vector space, so raw cosine of
two unrelated words already sits at ~0.40-0.55. Displaying `raw * 100`
made every random guess read as 40-70% warm, which defeated the warmth
UX.
format.js now applies a normalized sigmoid (FLOOR 0.40, CENTER 0.60,
SCALE 8) to remap raw cosine → displayed 0-100. Unrelated pairs drop
to ≤30, loose relation lands around 40-55, clear synonyms hit 85+, and
exact match stays at 100. Emoji buckets were rebased onto the calibrated
score; formatWarmth lost its sign column (calibrated output is always
non-negative).
render.js rounds once and feeds the integer to both formatWarmth and
warmthEmoji so the display value and bucket stay in sync.
Constants are empirical — retune if swapping to a non-BGE model.
Aligns semantle with doantu so both modules share one Workers AI model.
bge-m3 is multilingual and cheaper (1075 N/M input tokens vs 1841 N/M)
and produces 1024-dim vectors. Updates the api-client default, test
fake-vector dimensions, README, index.js doc comment, and the
wrangler.toml [ai] binding comment (Neurons/day budget recomputed).
Now that both modules run on Workers AI embeddings, drop the legacy
Word2SimError alias, the unused wordlist helpers (getLine, LINE_COUNT,
pickFromPool), and every comment/README section still describing the
removed ConceptNet backend. Fix the bge-small doc typo in semantle/index.js
and align the semantle api-client test fake-vector dim with the real
384-dim output.
Mirror the semantle migration but with @cf/baai/bge-m3 — BAAI's
multilingual embedding model — because the English-only BGE variants
can't produce meaningful Vietnamese vectors (their tokenizer shreds
diacritics into noisy byte-level subwords).
bge-m3 is trained across 194 languages incl. Vietnamese and is
actually cheaper in Neurons (1,075 vs 1,841 per M tokens for
bge-small-en-v1.5). Vocab check reuses the local Viet22K wordlist as
an in-memory Set — O(1) OOV detection, no upstream call.
Also add a test file for the module (mirrors semantle coverage plus
Vietnamese-specific cases: diacritics, multi-syllable compounds).
ConceptNet (api.conceptnet.io) was returning sustained 502s, breaking
every guess with an "Upstream hiccup" reply. Replace with env.AI.run
on @cf/baai/bge-small-en-v1.5 and score guesses by computing cosine
similarity locally against the target vector.
The local google-10k wordlist doubles as the in/out-of-vocabulary set,
so OOV detection is an O(1) Set.has() with no upstream call. The
similarity() response shape is unchanged, so handlers/render/state
stay as-is.
Free on the Workers Free plan: 10k Neurons/day cap, ~0.0037 Neurons
per 2-word guess → ~2.7M guesses/day headroom for this bot.
Near-clone of the semantle module, adapted for Vietnamese:
- Targets from duyet/vietnamese-wordlist Viet22K (~22k entries, GPL).
Regenerate via scripts/build-doantu-words.js; chained into npm run build.
- ConceptNet client uses /c/vi/<term> URIs; multi-word guesses (e.g.
"con chó") are space-to-underscore converted at URL build time so the
board keeps the natural display.
- lookup.js permits Unicode letters + combining marks + single internal
spaces; rejects digits/punctuation.
- All three commands (/doantu, /doantu_giveup, /doantu_stats) are
visibility=protected — shown in /help, hidden from Telegram's native /
autocomplete menu while the module is still experimental.
Wired into src/modules/index.js, wrangler.toml MODULES, .env.deploy(.example),
and package.json build chain.
Separate module rather than a shared base with semantle — matches the
repo's one-module-per-game convention (see loldle vs wordle); factor later
if a third language appears.
Use the full google-10000-english list verbatim (normalize only —
lowercase + dedupe, no length or alpha filtering). Pool goes from 7953
to 9894 entries; rare/short/long picks are still sieved by ConceptNet's
verify-and-fallback at round start.
Replaces TARGET_POOL/pickFromPool with a clearer line-based API:
LINE_COUNT — how many entries
randomLine() — uniform pick
getLine(n) — nth entry (n = frequency rank)
pickFromPool retained as a back-compat re-export so existing callers
don't break.
The ~250-word hand-curated TARGET_POOL was too small for long-term play.
Replaces it with a build-script-generated dictionary:
- scripts/build-semantle-words.js fetches first20hours/google-10000-english
(no-swears variant), filters to 4–10 ASCII letters, drops the top-200
most frequent function words, and writes src/modules/semantle/words-data.js
as a static ES-module export.
- wordlist.js now just re-exports that data via TARGET_POOL + pickFromPool.
- package.json: new build:semantle-words script; chained into `npm run build`
alongside build:wordle-data so `npm run deploy` regenerates automatically.
Pool size: ~250 → 7953 words. Same ConceptNet verify-and-fallback flow, so
low-quality picks still cost at most one extra concept lookup.
ConceptNet provides a free public /relatedness endpoint (returns cosine-like
[-1, 1]) and /c/en/{term} for vocabulary check. No random-word endpoint, so
we ship a curated local target pool in wordlist.js (~250 words) and verify
each pick via the concept endpoint with a fallback to an unverified pick.
Each guess now makes two parallel ConceptNet calls (concept + relatedness)
instead of a single word2sim call. Slightly higher latency but zero hosting
cost and no dependency on the self-hosted word2sim instance.
- api-client.js rewritten; UpstreamError replaces Word2SimError (aliased
for backwards compat with older imports).
- wordlist.js added (curated target pool + pickFromPool).
- handlers.js: drops RANDOM_FILTERS (no filtering needed; pool is curated).
- index.js: drops WORD2SIM_API_URL env var; ConceptNet base hardcoded.
- wrangler.toml + .dev.vars.example: drop WORD2SIM_API_URL.
- api-client tests rewritten for ConceptNet shape; total tests 336 → 341.
Giveup already auto-starts a fresh round on next /semantle, so /semantle_new
was redundant. Duplicate guesses now match loldle's behavior: reply with
"🔁 already guessed" and skip the similarity API call (fast-path dedup
against prior word or canonical, with a post-API fallback for different
inputs that canonicalize to the same token).
Telegram commands /semantle, /semantle_new, /semantle_giveup, /semantle_stats.
Round starts with /random pick from hosted word2sim; each guess scored via
/similarity. Unlimited guesses; solve on case-insensitive exact match.
New env var WORD2SIM_API_URL (wrangler.toml, .env.deploy). Includes
module README and 90 unit tests covering api-client, state, format,
render, and handlers.
Previously startFreshGame was called at the tail of every win/lose/giveup
path, stamping startedAt to that moment — so the clock accrued while the
player was away between rounds. Now:
- round-ending paths call clearGame (new helper in state.js), deleting the
KV record instead of pre-creating the next round
- getOrInitGame lazily creates the next round on the player's next /loldle
call, with startedAt: null
- the first actual guess inside handleLoldle stamps startedAt = Date.now()
Viewing an empty board gives no hints, so it shouldn't count against the
clock. handleGiveup no longer auto-creates a fresh round and now reports
"No active round" when called with nothing in progress.
Broaden `npm run format` / `npm run lint` to biome's full scan (`.`)
instead of a fixed src/tests/scripts list, so root-level files and any
new top-level directories stay formatted. Drop the stale ignore entry
for the deleted champions-data.js.
loldle.net's classic-mode bundle has two record shapes — older champions
carry _id/championId, newer ones (Bel'Veth, K'Sante, Nilah, …) don't.
The regex required those leading fields, silently dropping anyone added
since 2022.
Make _id/championId optional and non-capturing, and drop them from the
output record (the bot never read them anyway). Champion count:
169 → 172; guessing /loldle k'sante, /loldle bel'veth, /loldle nilah
now resolve correctly.
Column headers now match loldle.net's classic-mode grid verbatim:
Range → Range type, Region → Region(s), Lane → Position(s),
Year → Release year. The champion row header becomes Champion (was
Name). Data field names already matched; only labels diverged.
KV payload cleanup:
- drop lastResultAt from stats (never read)
- drop solved/giveup flags from game state (round is immediately
replaced after finish, making the flags transient noise)
- skip redundant saveGame on winning/giveup/out-of-guesses paths;
startFreshGame overwrites anyway
Code cleanup:
- delete daily.js + daily.test.js (pickDaily/todayUtc were speculative
"future use" — only pickRandom was wired in, inlined into handlers)
- drop the dead switch default in compare.js
- trim file preambles across the module
Docs: rewrite README around current behavior with loldle.net as the
sole data source; update scraper header to match the raw schema.
Round state now keeps `guesses` as a plain string[] (the names the player
tried) instead of caching full comparison results. The board view
rehydrates rows at display time by re-running compareChampions against
the current target.
Smaller KV payloads, and the rendered board always reflects the live
champions.json — useful if a weekly data refresh lands mid-round.
Drop the in-scraper normalization step — champions.json now mirrors the
exact shape emitted by loldle.net's JS bundle. Records use _id,
championId, championName, arrays for positions/species/regions/
range_type, "Male"/"Female"/"Other" gender strings, and a full
YYYY-MM-DD release_date.
Comparison is schema-aware: multi-value keys accept arrays directly,
the year axis parses YYYY out of the ISO date, and exact compares stay
case-insensitive.