Match word2sim's pattern — port stays unexposed by default; caller
uncomments only if they want to reach the service from the host.
Keeps it network-internal-friendly when run behind a reverse proxy on
the same compose network.
The hard ${MODEL_URL:?...} gate made 'docker compose up' fail at config
parsing if no .env existed — container never started, no logs beyond
compose's own error. Now:
- MODEL_URL defaults to empty in compose. The Python loader checks at
startup and raises a clear FileNotFoundError naming the missing path
and the two ways to fix it (set MODEL_URL, or mount a local file).
- Document an alternative local-mount flow in README, mirroring
word2sim's ./vectors.bin pattern.
- Add container_name: phow2sim for easier docker ps / docker logs.
Was set in Dockerfile/compose/.env/README but never read by the app.
Tokenization is inferred at lookup time by _variant_candidates trying
both spaced and underscore-joined forms.
Simpler contract: operator hosts the zip behind any URL that answers a
plain GET (Nextcloud public share, signed S3/R2 URL, etc.). Any auth is
baked into the URL; the service sends no Authorization headers.
Removes MODEL_DOWNLOAD_USER / MODEL_DOWNLOAD_PASSWORD and their
plumbing. .env.example and README rewritten around the URL-only flow.
The upstream public.vinai.io mirror is dead and PhoW2V's research
license forbids public redistribution, so anonymous auto-download is
no longer viable. Expect a private Nextcloud (WebDAV or password-
protected public share) per deployment.
- Stream downloads in 1MiB chunks (flat RAM for ~1GB zips)
- Basic auth via MODEL_DOWNLOAD_USER / MODEL_DOWNLOAD_PASSWORD
- Drop the broken public.vinai.io default; compose requires MODEL_URL
- Add .env.example with WebDAV and public-share recipes
- Remove scripts/download-phow2v.sh (pointed at the dead mirror)
- README rewritten around the NC workflow; update license caveat
Tiny FastAPI service over PhoW2V Vietnamese word vectors. Mirrors
word2sim's endpoint shapes (/similarity /neighbors /vocab /random) so
clients can swap URLs without code changes.
- Auto-downloads VinAI's PhoW2V on first boot, caches binary .bin for ~5x faster restarts
- Viet-aware canonicalizer: exact -> lowercase -> space-to-underscore
- Supports both word (compound) and syllable variants via env
- Unicode-aware random-word filter accepts diacritics, rejects digits/punct