Files
tiennm99 54eaf95fc4 feat: progress logging during model download/parse
'Waiting for application startup.' was the last line visible for
several minutes while the lifespan hook silently downloaded 1.2GB and
parsed the text vectors — looks like a hang.

- Print milestones for each load phase (cache hit / download /
  extract / parse / cache-write) with timings.
- During download, print every ~50 MiB with running percent if the
  server sent Content-Length.
- PYTHONUNBUFFERED=1 in Dockerfile so the prints flush to
  'docker compose logs' in real time.

Uses plain print (not logging) because uvicorn's default log config
filters INFO on non-uvicorn loggers, and wrestling with that for six
operator-facing status lines isn't worth the surface area.
2026-04-23 11:22:06 +07:00

30 lines
994 B
Docker

FROM python:3.11-slim
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
curl unzip ca-certificates \
&& rm -rf /var/lib/apt/lists/*
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY app ./app
# MODEL_URL + MODEL_PATH + credentials are injected at runtime via
# docker-compose env/.env (see docker-compose.yml). No defaults here —
# PhoW2V's license forbids public redistribution, so every deployment
# must point at its own private mirror (typically Nextcloud WebDAV).
ENV MODEL_PATH=/data/phow2v/word2vec_vi_words_300dims.txt \
PORT=8000 \
PYTHONUNBUFFERED=1
EXPOSE 8000
# First boot downloads ~1.2GB then parses ~60s; later boots use the
# cached .bin and only need ~10s. start-period accommodates both.
HEALTHCHECK --interval=30s --timeout=5s --start-period=600s --retries=3 \
CMD curl -fsS http://localhost:8000/health || exit 1
CMD ["uvicorn", "app.main:app", "--host", "0.0.0.0", "--port", "8000"]