[Infra] Dockerfile.non_root: add BuildKit uv cache mount

Mount /app/.cache/uv as a BuildKit type=cache on both 'uv sync' steps.
The cache persists across builds on the same builder (and, when used
with type=gha in CI, across CI runs) so repeat builds don't re-download
every wheel.

Side-effect: because the cache lives outside the image layer, the
~742MB of downloaded wheel archives that were previously baked into
/app/.cache/uv drop out of the final image. Compressed image size
goes from ~5.0GB to ~3.7GB, and the 'USER nobody' prisma-generate
layer is 1.7GB vs 2.4GB.

Warm-build timing: a uv-sync-invalidating edit now takes ~1m30s vs
~2m39s without the cache mount, on this dev VM.

API parity and UI visual regression continue to match baseline.
Trivy HIGH/CRITICAL: 6 at baseline -> 2 now, no new CVEs.

Co-authored-by: yuneng-jiang <yuneng-berri@users.noreply.github.com>
This commit is contained in:
Cursor Agent
2026-04-19 06:35:39 +00:00
parent ca52e346b0
commit e24c02f478
+4 -2
View File
@@ -41,7 +41,8 @@ COPY enterprise/pyproject.toml enterprise/
COPY litellm-proxy-extras/pyproject.toml litellm-proxy-extras/
# Install third-party dependencies (cached unless pyproject.toml/uv.lock change)
RUN uv sync --frozen --no-install-project --no-install-workspace --no-default-groups --no-editable \
RUN --mount=type=cache,target=/app/.cache/uv,id=litellm-uv-cache \
uv sync --frozen --no-install-project --no-install-workspace --no-default-groups --no-editable \
--extra proxy \
--extra proxy-runtime \
--extra extra_proxy \
@@ -71,7 +72,8 @@ RUN mkdir -p /var/lib/litellm/ui /var/lib/litellm/assets && \
done && \
touch .litellm_ui_ready )
RUN if [ "$PROXY_EXTRAS_SOURCE" = "published" ]; then \
RUN --mount=type=cache,target=/app/.cache/uv,id=litellm-uv-cache \
if [ "$PROXY_EXTRAS_SOURCE" = "published" ]; then \
uv sync --frozen --no-default-groups --no-editable \
--extra proxy \
--extra proxy-runtime \