* fix(helm): drop main- prefix from default image tag
The default image tag in the deployment + migrations-job templates was
`main-{{ .Chart.AppVersion }}`. The current release pipeline publishes
content tags without the `main-` prefix (e.g. `v1.85.1` / `1.85.1`,
`v1.86.0-rc.1` / `1.86.0-rc.1`), so the rendered ref points at a tag
that does not exist on GHCR or DockerHub and installs fail with
ImagePullBackOff.
- templates/deployment.yaml, templates/migrations-job.yaml: render
`.Chart.AppVersion` directly instead of `main-<AppVersion>`.
- Chart.yaml: bump stale `appVersion: v1.80.12` (not on either
registry) to `v1.85.1` so local-checkout installs also resolve.
- values.yaml: update the commented tag-override hint to match.
* fix(helm): use :latest in tag override example, not pinned version
Per review: ghcr.io/berriai/litellm-database:latest is a floating
alias for the most recent stable (same digest as :main-stable),
maintained by the release pipeline's UPDATE_LATEST advance step.
Better example than a pinned version that goes stale.
* fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455)
Squash-merged by litellm-agent from Anai-Guo's PR.
* feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508)
Squash-merged by litellm-agent from yimao's PR.
* fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503)
Squash-merged by litellm-agent from krisxia0506's PR.
* Fix Gemini MIME detection for extensionless GCS URIs (#27278)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes#28084) (#28107)
Squash-merged by litellm-agent from voidborne-d's PR.
* feat(chart): add support for autoscaling behavior in HPA (#27990)
Squash-merged by litellm-agent from FabrizioCafolla's PR.
* feat(proxy): add blocked flag to models for pause/resume from the UI (#27927)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix: pass socket timeouts to Redis cluster clients (#27920)
Squash-merged by litellm-agent from tomdee's PR.
* Fix/cache token (#28009)
Squash-merged by litellm-agent from escon1004's PR.
* fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080)
Squash-merged by litellm-agent from Divyansh8321's PR.
* fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617)
* fix: reset org and tag budgets (#27326)
* reset org budgets
* reset tag budgets
---------
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
* fix(ui): omit allowed_routes from key edit save when unchanged (#27553)
* fix(ui): omit allowed_routes from key edit save when unchanged
When a team admin opens Edit Settings on a key with key_type=AI APIs and
saves without changing anything, the UI re-sends the existing allowed_routes
value, which the backend's _check_allowed_routes_caller_permission gate
rejects for non-proxy-admins (LIT-2681).
Strip allowed_routes from the patch in handleSubmit when it deep-equals the
original keyData.allowed_routes. The backend treats absence as "leave alone,"
so no-op saves now succeed for non-admins. Admins explicitly editing the
field still send the new value.
* fix(ui): order-insensitive allowed_routes diff + cover null-original case
Address Greptile review:
- Switch the "is allowed_routes unchanged" check to a Set-based comparison so
a server-side reorder of the array doesn't register as a user edit and
re-trigger LIT-2681.
- Add two regression tests: (1) keyData.allowed_routes is null and the form
is untouched — patch should strip the field; (2) server returned routes in
a different order than the user originally entered — patch should still
recognize the value as unchanged.
* chore(ui): strip ticket refs and tighten comments in key edit fix
- Remove internal-tracker references from in-code comments
- Tighten the WHY comment in handleSubmit to two lines
- Drop redundant test-block comments — test names already describe the case
* fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc
* fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests
GuardrailRaisedException and BlockedPiiEntityError both lacked a
status_code attribute. When these exceptions reached the proxy
exception handler (getattr(e, 'status_code', 500)), the fallback
defaulted to HTTP 500 — making intentional guardrail blocks
indistinguishable from server errors and causing unnecessary client
retries.
Changes:
- Add status_code=400 (keyword-only) to GuardrailRaisedException
- Add status_code=400 (keyword-only) to BlockedPiiEntityError
- Update _is_guardrail_intervention() to recognize both exceptions
so downstream loggers record 'guardrail_intervened' instead of
'guardrail_failed_to_respond'
- Add 6 unit tests for default/custom status codes and getattr pattern
- Strengthen existing blocked-action test with status_code assertion
Fixes#24348
---------
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
* fix(router/proxy): address Greptile P1+P2 review comments on PR #28161
- router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429)
when a specifically-addressed deployment is administratively blocked; 429 misleads
retry-enabled clients into spinning forever against a paused model
- proxy_server: compute get_fully_blocked_model_names() once before both branches in
model_list() instead of duplicating the call in each branch
- deepseek: upgrade silent debug log to warning when injecting placeholder
reasoning_content so callers are clearly notified of degraded multi-turn quality
- tests: update two blocked-deployment assertions to expect ServiceUnavailableError
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address bug detection findings (cache token order, mutable defaults)
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix: address bugs in async pass-through, anthropic cache token detection, rerank tests
- async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments
- cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0
- dashscope rerank tests: pass request to httpx.Response constructions for consistency
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix code qa
* fix(vertex_ai/gemini): strip MIME parameters from GCS contentType
GCS object metadata's contentType field can include parameters such as
'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases
so downstream get_file_extension_from_mime_type sees a bare MIME type.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(vertex_ai/gemini): clarify mime-type error message string concatenation
Co-authored-by: Yassin Kortam <yassin@berri.ai>
---------
Co-authored-by: Tai An <antai12232931@outlook.com>
Co-authored-by: Vincent <yimao1231@gmail.com>
Co-authored-by: Kris Xia <xiajiayi0506@gmail.com>
Co-authored-by: d 🔹 <liusway405@gmail.com>
Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Tom Denham <tom@tomdee.co.uk>
Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com>
Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com>
Co-authored-by: robin-fiddler <robin@fiddler.ai>
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
- Remove litellm-js/proxy and litellm-js/spend-logs TypeScript packages that provided Cloudflare Worker proxy and Node.js spend logging services, as these are no longer maintained
- Remove deprecated Docker variants (Dockerfile.alpine, Dockerfile.dev, Dockerfile.custom_ui, Dockerfile.health_check, Dockerfile.ghcr_base) that have been superseded by the primary Dockerfile
- Remove legacy Kubernetes manifests (kub.yaml, service.yaml) from deploy/kubernetes in favor of the Helm chart
- Remove stale index.yaml Helm chart index pinned to an old version (v1.43.18)
- Remove dev_config.yaml development configuration file that contained hardcoded credentials and example endpoints
- Clean up ~3,500 lines of unused code and configuration to reduce repository maintenance burden
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
- Introduce RoutingPrismaWrapper that transparently routes read operations (find_*, count, group_by, query_raw, query_first) to a reader endpoint while writes remain on the writer, enabling Aurora-style reader/writer endpoint splits
- Add IAMEndpoint dataclass and parse_iam_endpoint_from_url() to capture static connection fields from a reader URL so only the IAM token needs to rotate, avoiding the need for separate DATABASE_HOST_READ_REPLICA/etc. env vars
- Enhance PrismaWrapper with per-instance knobs (db_url_env_var, iam_endpoint, recreate_uses_datasource, log_prefix) so writer and reader wrappers are independent: the reader writes its fresh URL to DATABASE_URL_READ_REPLICA and passes datasource override to Prisma since Prisma only auto-reads DATABASE_URL
- Fix deadlock in PrismaWrapper.__getattr__: when called from inside a running event loop, schedule the token refresh as a background task instead of blocking with run_coroutine_threadsafe + future.result(), which would deadlock the loop thread waiting for a coroutine that needs the loop to run
- Fix botocore crash when DATABASE_PORT is unset by defaulting to "5432" in both proxy_cli.py and PrismaWrapper.get_rds_iam_token(); passing None caused botocore to embed the literal string "None" in the presigned URL
- Implement graceful reader degradation: reader connect/recreate failures are non-fatal; wrapper sets _reader_unavailable=True and silently routes reads to the writer to keep the proxy serving traffic during transient reader outages
- Add PrismaClient.writer_db property so the reconnect smoke-test always validates the writer engine specifically; query_raw on the routing wrapper would route to the reader and not verify the newly-recreated writer
- Expose DATABASE_URL_READ_REPLICA in Helm chart (values.yaml + deployment.yaml) via both plain value and secret key reference, and document the field in docker-compose.yml
- Add 887-line test suite covering routing logic, IAM token refresh paths, reader degradation scenarios, datasource override behavior, and the deadlock regression
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
Wrap toYaml with tpl in deployment and migration job templates so
users can reference Helm values (e.g. {{ .Values.image.repository }})
inside extraContainers and extraInitContainers definitions.
Pin every dependency across all Docker builds so upgrades are intentional.
Verified by building all 3 production images and diffing pip freeze against
known-good v1.83.0-nightly baselines — zero version drift.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* staged first pass
* black
* Update litellm/proxy/health_check.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* simpler
* restore cached logo
* fix tests for perform_health_check max_concurrency arg
* implement pr suggestion
* and the helm chart
* add configureable resources and probes to the deployment in the helm chart
* more helm chart unittests
* move some background healthcheck loggin to debug
---------
Co-authored-by: Sean Glover <sglover@athenahealth.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
The Helm chart on GHCR displays a `docker pull` command instead of
the correct `helm pull oci://` command. This is because the OCI artifact
is missing the `org.opencontainers.image.source` annotation that GHCR
uses to identify and properly display Helm charts.
Changes:
- Add OCI annotations to Chart.yaml (source + url) which Helm 3.10+
propagates to the OCI manifest on push
- Install explicit Helm v3.20.0 via azure/setup-helm@v4 for reproducible
builds and proper OCI annotation support
- Remove deprecated HELM_EXPERIMENTAL_OCI env var (OCI is GA since Helm 3.8)
* feat: add support for keda in helm chart
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
* chore: bump chart version
---------
Signed-off-by: R.Sicart <roger.sicart@gmail.com>
* fix: sync Helm chart versioning with production standards and Docker versions
- Update Chart.yaml version from 0.4.10 to 1.0.0 (SemVer 0.x is for development, 1.0+ for production)
- Update appVersion from v1.50.2 to v1.80.12 to match current Docker image version
- Update workflow defaults from 0.1.0 to 1.0.0 for new chart version scheme
- Maintain independent chart versioning per Helm best practices
This ensures:
- Helm chart follows SemVer production standards (1.x instead of 0.x)
- appVersion stays synchronized with Docker/application version
- Chart version remains independent for flexibility (can update chart without waiting for app releases)
* fix: sync Helm chart appVersion with Docker image tags in release workflow
Updates the GitHub workflow to ensure Helm chart appVersion matches the
Docker image tags that are actually published:
- For stable/rc releases: Uses the workflow input tag (e.g., v1.80.12)
- For latest/dev releases: Uses the release_type to match main-{type} tags
- Makes 'tag' input required to prevent accidental releases with wrong versions
- Simplifies fallback logic by removing git-describe dependency
This ensures the chart's appVersion correctly references Docker images
that exist, preventing deployment failures from missing image tags.
* Update ghcr_deploy.yml
- Add support for envVars (simple key-value pairs) in migrations job
- Add support for extraEnvVars (complex environment variable configurations)
- Include comprehensive test coverage for both envVars and extraEnvVars
- Ensure backward compatibility with existing configurations
- Tests verify proper rendering of environment variables in container spec