6 Commits

Author SHA1 Message Date
tiennm99 503d877a94 chore: comment out port mapping in docker-compose
Match word2sim's pattern — port stays unexposed by default; caller
uncomments only if they want to reach the service from the host.
Keeps it network-internal-friendly when run behind a reverse proxy on
the same compose network.
2026-04-23 11:19:12 +07:00
tiennm99 7f8990fd30 fix: allow compose to start without MODEL_URL; defer missing-model error to runtime
The hard ${MODEL_URL:?...} gate made 'docker compose up' fail at config
parsing if no .env existed — container never started, no logs beyond
compose's own error. Now:

- MODEL_URL defaults to empty in compose. The Python loader checks at
  startup and raises a clear FileNotFoundError naming the missing path
  and the two ways to fix it (set MODEL_URL, or mount a local file).
- Document an alternative local-mount flow in README, mirroring
  word2sim's ./vectors.bin pattern.
- Add container_name: phow2sim for easier docker ps / docker logs.
2026-04-23 11:16:16 +07:00
tiennm99 8140b51d3d refactor: remove unused MODEL_VARIANT env var
Was set in Dockerfile/compose/.env/README but never read by the app.
Tokenization is inferred at lookup time by _variant_candidates trying
both spaced and underscore-joined forms.
2026-04-23 11:09:55 +07:00
tiennm99 a1fd486937 refactor: drop Basic auth; require plain-GET MODEL_URL
Simpler contract: operator hosts the zip behind any URL that answers a
plain GET (Nextcloud public share, signed S3/R2 URL, etc.). Any auth is
baked into the URL; the service sends no Authorization headers.

Removes MODEL_DOWNLOAD_USER / MODEL_DOWNLOAD_PASSWORD and their
plumbing. .env.example and README rewritten around the URL-only flow.
2026-04-23 11:07:00 +07:00
tiennm99 6b1b401283 feat: fetch model via Nextcloud WebDAV with Basic auth
The upstream public.vinai.io mirror is dead and PhoW2V's research
license forbids public redistribution, so anonymous auto-download is
no longer viable. Expect a private Nextcloud (WebDAV or password-
protected public share) per deployment.

- Stream downloads in 1MiB chunks (flat RAM for ~1GB zips)
- Basic auth via MODEL_DOWNLOAD_USER / MODEL_DOWNLOAD_PASSWORD
- Drop the broken public.vinai.io default; compose requires MODEL_URL
- Add .env.example with WebDAV and public-share recipes
- Remove scripts/download-phow2v.sh (pointed at the dead mirror)
- README rewritten around the NC workflow; update license caveat
2026-04-23 10:44:33 +07:00
tiennm99 8dd17acd4f feat: initial phow2sim service
Tiny FastAPI service over PhoW2V Vietnamese word vectors. Mirrors
word2sim's endpoint shapes (/similarity /neighbors /vocab /random) so
clients can swap URLs without code changes.

- Auto-downloads VinAI's PhoW2V on first boot, caches binary .bin for ~5x faster restarts
- Viet-aware canonicalizer: exact -> lowercase -> space-to-underscore
- Supports both word (compound) and syllable variants via env
- Unicode-aware random-word filter accepts diacritics, rejects digits/punct
2026-04-23 10:05:50 +07:00