Commit Graph

130 Commits

Author SHA1 Message Date
yuneng-jiang 7c667b8797 fix(helm): drop main- prefix from default image tag (#28710)
* fix(helm): drop main- prefix from default image tag

The default image tag in the deployment + migrations-job templates was
`main-{{ .Chart.AppVersion }}`. The current release pipeline publishes
content tags without the `main-` prefix (e.g. `v1.85.1` / `1.85.1`,
`v1.86.0-rc.1` / `1.86.0-rc.1`), so the rendered ref points at a tag
that does not exist on GHCR or DockerHub and installs fail with
ImagePullBackOff.

- templates/deployment.yaml, templates/migrations-job.yaml: render
  `.Chart.AppVersion` directly instead of `main-<AppVersion>`.
- Chart.yaml: bump stale `appVersion: v1.80.12` (not on either
  registry) to `v1.85.1` so local-checkout installs also resolve.
- values.yaml: update the commented tag-override hint to match.

* fix(helm): use :latest in tag override example, not pinned version

Per review: ghcr.io/berriai/litellm-database:latest is a floating
alias for the most recent stable (same digest as :main-stable),
maintained by the release pipeline's UPDATE_LATEST advance step.
Better example than a pinned version that goes stale.
2026-05-23 15:57:38 -07:00
Sameer Kankute 36c494fdd2 Litellm oss staging (#28161)
* fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455)

Squash-merged by litellm-agent from Anai-Guo's PR.

* feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508)

Squash-merged by litellm-agent from yimao's PR.

* fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503)

Squash-merged by litellm-agent from krisxia0506's PR.

* Fix Gemini MIME detection for extensionless GCS URIs (#27278)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107)

Squash-merged by litellm-agent from voidborne-d's PR.

* feat(chart): add support for autoscaling behavior in HPA (#27990)

Squash-merged by litellm-agent from FabrizioCafolla's PR.

* feat(proxy): add blocked flag to models for pause/resume from the UI (#27927)

Squash-merged by litellm-agent from Cyberfilo's PR.

* fix: pass socket timeouts to Redis cluster clients (#27920)

Squash-merged by litellm-agent from tomdee's PR.

* Fix/cache token (#28009)

Squash-merged by litellm-agent from escon1004's PR.

* fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080)

Squash-merged by litellm-agent from Divyansh8321's PR.

* fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617)

* fix: reset org and tag budgets (#27326)

* reset org budgets

* reset tag budgets

---------

Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>

* fix(ui): omit allowed_routes from key edit save when unchanged (#27553)

* fix(ui): omit allowed_routes from key edit save when unchanged

When a team admin opens Edit Settings on a key with key_type=AI APIs and
saves without changing anything, the UI re-sends the existing allowed_routes
value, which the backend's _check_allowed_routes_caller_permission gate
rejects for non-proxy-admins (LIT-2681).

Strip allowed_routes from the patch in handleSubmit when it deep-equals the
original keyData.allowed_routes. The backend treats absence as "leave alone,"
so no-op saves now succeed for non-admins. Admins explicitly editing the
field still send the new value.

* fix(ui): order-insensitive allowed_routes diff + cover null-original case

Address Greptile review:

- Switch the "is allowed_routes unchanged" check to a Set-based comparison so
  a server-side reorder of the array doesn't register as a user edit and
  re-trigger LIT-2681.
- Add two regression tests: (1) keyData.allowed_routes is null and the form
  is untouched — patch should strip the field; (2) server returned routes in
  a different order than the user originally entered — patch should still
  recognize the value as unchanged.

* chore(ui): strip ticket refs and tighten comments in key edit fix

- Remove internal-tracker references from in-code comments
- Tighten the WHY comment in handleSubmit to two lines
- Drop redundant test-block comments — test names already describe the case

* fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc

* fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests

GuardrailRaisedException and BlockedPiiEntityError both lacked a
status_code attribute.  When these exceptions reached the proxy
exception handler (getattr(e, 'status_code', 500)), the fallback
defaulted to HTTP 500 — making intentional guardrail blocks
indistinguishable from server errors and causing unnecessary client
retries.

Changes:
- Add status_code=400 (keyword-only) to GuardrailRaisedException
- Add status_code=400 (keyword-only) to BlockedPiiEntityError
- Update _is_guardrail_intervention() to recognize both exceptions
  so downstream loggers record 'guardrail_intervened' instead of
  'guardrail_failed_to_respond'
- Add 6 unit tests for default/custom status codes and getattr pattern
- Strengthen existing blocked-action test with status_code assertion

Fixes #24348

---------

Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>

* fix(router/proxy): address Greptile P1+P2 review comments on PR #28161

- router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429)
  when a specifically-addressed deployment is administratively blocked; 429 misleads
  retry-enabled clients into spinning forever against a paused model
- proxy_server: compute get_fully_blocked_model_names() once before both branches in
  model_list() instead of duplicating the call in each branch
- deepseek: upgrade silent debug log to warning when injecting placeholder
  reasoning_content so callers are clearly notified of degraded multi-turn quality
- tests: update two blocked-deployment assertions to expect ServiceUnavailableError

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: address bug detection findings (cache token order, mutable defaults)

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: address bugs in async pass-through, anthropic cache token detection, rerank tests

- async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments
- cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0
- dashscope rerank tests: pass request to httpx.Response constructions for consistency

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix code qa

* fix(vertex_ai/gemini): strip MIME parameters from GCS contentType

GCS object metadata's contentType field can include parameters such as
'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases
so downstream get_file_extension_from_mime_type sees a bare MIME type.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(vertex_ai/gemini): clarify mime-type error message string concatenation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Tai An <antai12232931@outlook.com>
Co-authored-by: Vincent <yimao1231@gmail.com>
Co-authored-by: Kris Xia <xiajiayi0506@gmail.com>
Co-authored-by: d 🔹 <liusway405@gmail.com>
Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Tom Denham <tom@tomdee.co.uk>
Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com>
Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com>
Co-authored-by: robin-fiddler <robin@fiddler.ai>
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
2026-05-18 16:27:44 -07:00
Yassin Kortam fa5eae8bc9 chore: remove legacy deployment artifacts and litellm-js packages (#27541)
- Remove litellm-js/proxy and litellm-js/spend-logs TypeScript packages that provided Cloudflare Worker proxy and Node.js spend logging services, as these are no longer maintained
- Remove deprecated Docker variants (Dockerfile.alpine, Dockerfile.dev, Dockerfile.custom_ui, Dockerfile.health_check, Dockerfile.ghcr_base) that have been superseded by the primary Dockerfile
- Remove legacy Kubernetes manifests (kub.yaml, service.yaml) from deploy/kubernetes in favor of the Helm chart
- Remove stale index.yaml Helm chart index pinned to an old version (v1.43.18)
- Remove dev_config.yaml development configuration file that contained hardcoded credentials and example endpoints
- Clean up ~3,500 lines of unused code and configuration to reduce repository maintenance burden

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-09 20:51:34 +00:00
Yassin Kortam b5d3a5fc85 feat: add read-replica routing for Prisma DB via DATABASE_URL_READ_REPLICA (#27493)
- Introduce RoutingPrismaWrapper that transparently routes read operations (find_*, count, group_by, query_raw, query_first) to a reader endpoint while writes remain on the writer, enabling Aurora-style reader/writer endpoint splits
- Add IAMEndpoint dataclass and parse_iam_endpoint_from_url() to capture static connection fields from a reader URL so only the IAM token needs to rotate, avoiding the need for separate DATABASE_HOST_READ_REPLICA/etc. env vars
- Enhance PrismaWrapper with per-instance knobs (db_url_env_var, iam_endpoint, recreate_uses_datasource, log_prefix) so writer and reader wrappers are independent: the reader writes its fresh URL to DATABASE_URL_READ_REPLICA and passes datasource override to Prisma since Prisma only auto-reads DATABASE_URL
- Fix deadlock in PrismaWrapper.__getattr__: when called from inside a running event loop, schedule the token refresh as a background task instead of blocking with run_coroutine_threadsafe + future.result(), which would deadlock the loop thread waiting for a coroutine that needs the loop to run
- Fix botocore crash when DATABASE_PORT is unset by defaulting to "5432" in both proxy_cli.py and PrismaWrapper.get_rds_iam_token(); passing None caused botocore to embed the literal string "None" in the presigned URL
- Implement graceful reader degradation: reader connect/recreate failures are non-fatal; wrapper sets _reader_unavailable=True and silently routes reads to the writer to keep the proxy serving traffic during transient reader outages
- Add PrismaClient.writer_db property so the reconnect smoke-test always validates the writer engine specifically; query_raw on the routing wrapper would route to the reader and not verify the newly-recreated writer
- Expose DATABASE_URL_READ_REPLICA in Helm chart (values.yaml + deployment.yaml) via both plain value and secret key reference, and document the field in docker-compose.yml
- Add 887-line test suite covering routing logic, IAM token refresh paths, reader degradation scenarios, datasource override behavior, and the deadlock regression

Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-08 21:05:50 -07:00
Yassin Kortam 451ce161fc fix: remove separate health app 2026-05-07 16:04:56 -07:00
Yassin Kortam dbc8f5a937 helm: skip proxy startup prisma db push when migrations Job is enabled (#27200)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:53 -07:00
Yassin Kortam 618df94433 helm: increase default probe timeouts, disable debug logging by default (#27237)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:34 -07:00
CHANGE 87d7e86479 feat(helm): add tpl support to extraContainers and extraInitContainers
Wrap toYaml with tpl in deployment and migration job templates so
users can reference Helm values (e.g. {{ .Values.image.repository }})
inside extraContainers and extraInitContainers definitions.
2026-04-10 09:41:33 -04:00
Yuneng Jiang 5f63873dca [Infra] Pin all Docker build dependencies to exact versions
Pin every dependency across all Docker builds so upgrades are intentional.
Verified by building all 3 production images and diffing pip freeze against
known-good v1.83.0-nightly baselines — zero version drift.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:05:39 -07:00
Chesars 1be6b31e2f merge: resolve conflicts between main and litellm_oss_staging_03_11_2026 2026-03-12 09:38:31 -03:00
RJ Duffner 0c95d415e1 Add Abilty To Set minReadySeconds From values Files (#23173)
* Add Abilty To Set minReadySeconds From values Files

* typo

* uppercase Min as it comes after deployment

* Don't use defaults, just omit
2026-03-11 23:29:15 +05:30
Harshit28j 3127d79da8 feat: add strategy to deployment for helmchart 2026-03-10 05:49:46 +05:30
Sean Marsh Glover 4652c73259 feat(proxy): limit concurrent health checks with health_check_concurrency (#20584)
* staged first pass

* black

* Update litellm/proxy/health_check.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* simpler

* restore cached logo

* fix tests for perform_health_check max_concurrency arg

* implement pr suggestion

* and the helm chart

* add configureable resources and probes to the deployment in the helm chart

* more helm chart unittests

* move some background healthcheck loggin to debug

---------

Co-authored-by: Sean Glover <sglover@athenahealth.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-24 08:16:59 -08:00
Cesar Garcia 622983cf89 fix(helm): add OCI annotations so GHCR shows helm pull instead of docker pull (#20617)
The Helm chart on GHCR displays a `docker pull` command instead of
the correct `helm pull oci://` command. This is because the OCI artifact
is missing the `org.opencontainers.image.source` annotation that GHCR
uses to identify and properly display Helm charts.

Changes:
- Add OCI annotations to Chart.yaml (source + url) which Helm 3.10+
  propagates to the OCI manifest on push
- Install explicit Helm v3.20.0 via azure/setup-helm@v4 for reproducible
  builds and proper OCI annotation support
- Remove deprecated HELM_EXPERIMENTAL_OCI env var (OCI is GA since Helm 3.8)
2026-02-12 19:58:16 +05:30
Pragya Sardana b4a27712a1 Add Init Containers in the community helm chart (#19816) 2026-01-27 18:10:47 -08:00
Harshit Jain 9084c1d1bd feat(helm): Enable PreStop hook configuration in values.yaml (#19613) 2026-01-22 19:28:52 -08:00
R.Sicart 608979c7e9 feat: add support for keda in helm chart (#19337)
* feat: add support for keda in helm chart

Signed-off-by: R.Sicart <roger.sicart@gmail.com>

* chore: bump chart version

---------

Signed-off-by: R.Sicart <roger.sicart@gmail.com>
2026-01-19 10:38:41 -08:00
Harshit Jain 3ad8fa5422 fix: mount config.yaml as single file in Helm chart (#19146) 2026-01-15 21:21:13 +05:30
Cesar Garcia 46dd420833 fix: sync Helm chart versioning with production standards and Docker versions (#18868)
* fix: sync Helm chart versioning with production standards and Docker versions

- Update Chart.yaml version from 0.4.10 to 1.0.0 (SemVer 0.x is for development, 1.0+ for production)
- Update appVersion from v1.50.2 to v1.80.12 to match current Docker image version
- Update workflow defaults from 0.1.0 to 1.0.0 for new chart version scheme
- Maintain independent chart versioning per Helm best practices

This ensures:
- Helm chart follows SemVer production standards (1.x instead of 0.x)
- appVersion stays synchronized with Docker/application version
- Chart version remains independent for flexibility (can update chart without waiting for app releases)

* fix: sync Helm chart appVersion with Docker image tags in release workflow

Updates the GitHub workflow to ensure Helm chart appVersion matches the
Docker image tags that are actually published:

- For stable/rc releases: Uses the workflow input tag (e.g., v1.80.12)
- For latest/dev releases: Uses the release_type to match main-{type} tags
- Makes 'tag' input required to prevent accidental releases with wrong versions
- Simplifies fallback logic by removing git-describe dependency

This ensures the chart's appVersion correctly references Docker images
that exist, preventing deployment failures from missing image tags.

* Update ghcr_deploy.yml
2026-01-12 17:04:59 +05:30
Alexsander Hamir 1544e8f971 feat: Add line_profiler support for performance analysis and fix Windows CRLF issues in Docker builds (#18773) 2026-01-07 11:36:57 -08:00
Mehmet Can Şakiroğlu a3503e59c2 Litellm feat helm lifecycle support (#18517)
* feat(helm): add lifecycle hook support for helm

* add tests
2026-01-04 00:22:50 +05:30
Krrish Dholakia 7c2478b70e docs: replace ghcr link with docker.litellm.ai 2025-12-16 08:35:45 +05:30
expruc 2d112fc8b2 add option to include additional resources to chart (#17627) 2025-12-07 23:25:57 -08:00
Lukas de Boer 3b8a6ec888 Helm Chart: Add possibility to override command, args and add deployment labels (#17535)
* Helm Chart: Add possibility to override command, args and also add deployment labels

* Helm Chart: Fix helm lint issue

* Helm Chart: Fix helm unit tests
2025-12-06 14:01:09 -08:00
Fabian Reinold c173a4a275 Helm Chart: add ingress-only labels (#17348)
* feat(helm): add ingress-only labels

* feat(helm): add ingress configuration tests

* chore(helm): bump chart version
2025-12-02 22:30:54 -08:00
Saar wintrov 777ef628d2 Enhancement(helm): ServiceMonitor template rendering (#17038)
* Metadata: fix 401 when audio/transcriptions

* check if str, CR fixes

* Added new helmchart functionality

* .

* .

* adding new tests
2025-11-24 20:53:02 -08:00
tushar8408 5f94b372f8 Migration job labels (#16831)
* Add dynamic pod labels and annotations to migrations job

* Bump chart version to 0.4.8
2025-11-19 09:53:21 -08:00
YutaSaito 645f84c02e fix: add imagePullSecrets to migrations-job (#15681) 2025-10-18 13:56:31 -07:00
Krish Dholakia cf3c18a420 Merge pull request #13855 from edify42/allow-no-db-url
feat(helm): Allow no DATABASE_URL to be set on migration job to keep the behaviour same as deployment
2025-09-06 22:02:01 -07:00
Abhinav b6c26c3365 helm(chart): add optional PodDisruptionBudget for litellm proxy (#14062) (#14093) 2025-09-01 12:21:44 -07:00
Const-antine f8d1e03450 rework tests 2025-08-28 13:39:09 -04:00
Const-antine 1350336515 fix tests 2025-08-28 13:30:11 -04:00
Const-antine d3b526041f better formatting 2025-08-28 13:18:36 -04:00
Const-antine 730e9c90a2 fix formatting 2025-08-28 13:18:33 -04:00
Const-antine 5d973ea06e update readme 2025-08-28 13:18:26 -04:00
Const-antine 409429ddd6 add new tests 2025-08-28 13:18:23 -04:00
Const-antine ff4040bbe1 add functionality to mount existing configmap if needed 2025-08-28 13:18:05 -04:00
Jugal D. Bhatt d63f5f99e9 Enhance database configuration: add support for optional endpointKey in values.yaml and update deployment/migrations job templates to conditionally source DATABASE_HOST from the secret if endpointKey is set. (#13763) 2025-08-21 14:58:50 -07:00
Ishaan Jaff f498cf4901 Fix - Ensure Helm chart auto generated master keys follow sk-xxxx format (#13871)
* docs - master key

* fix - auto generate sk-xxx prefixed key

* test master key fix

* fix master key gen
2025-08-21 14:34:21 -07:00
Ed Kim c88a13c58b add unit test which confirms the removal of DATABASE_URL
Signed-off-by: Ed Kim <edward.kim@lendi.com.au>
2025-08-21 21:08:18 +10:00
edward kim 418b70b38e fixes
Signed-off-by: edward kim <edward.kim@lendi.com.au>
2025-08-21 17:44:54 +10:00
edward kim 2bd3daa742 fixes the mounting of this only when deployStandalone is true
Signed-off-by: edward kim <edward.kim@lendi.com.au>
2025-08-21 17:39:31 +10:00
Mattias Andersson 89f71af4cd Add possibility to configure resources for migrations-job in Helm chart 2025-08-14 17:08:26 +02:00
unique-jakub f58807ff6e Add labels to migrations job template (#13343)
* set labels on the migration job

* update comment to retrigger the pipeline
2025-08-07 09:41:24 -07:00
Jugal D. Bhatt 7cf3b4682a [Separate Health App] Update Helm Deployment.yaml (#13162)
* add helm deployment fix

* clean deployment
2025-08-01 16:50:23 -07:00
unique-jakub 3edb71e617 allow helm hooks for migrations job (#13174) 2025-07-31 21:51:07 -07:00
Marvin Huetter d23a6e3ea4 fix: best practices suggest this to set to true (#12809)
The order of the specification is important here, k8s will take the last value as truth. Push down to be sure schema update is done by migration job
2025-07-29 15:40:12 -07:00
Anton f05ec34e11 feat: Add envVars and extraEnvVars support to Helm migrations job (#12591)
- Add support for envVars (simple key-value pairs) in migrations job
- Add support for extraEnvVars (complex environment variable configurations)
- Include comprehensive test coverage for both envVars and extraEnvVars
- Ensure backward compatibility with existing configurations
- Tests verify proper rendering of environment variables in container spec
2025-07-14 22:24:13 -07:00
Victor Krylov 1d58fc5429 Add deployment annotations (#11849)
* Add deployment annotations

* Correct the indent and simplify if 0 annotations
2025-06-19 20:11:31 -07:00
Steven Aldinger b8bdf98a4b feat(helm): [BerriAI/litellm#11648] support extraContainers in migrations-job.yaml (#11649) 2025-06-11 23:16:06 -07:00