litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-17 18:48:36 +00:00

Author	SHA1	Message	Date
Yassin Kortam	3a1c6bba97	feat(proxy): native /health/drain preStop hook for graceful shutdown (#29439 )	2026-06-02 16:30:44 -07:00
Sameer Kankute	d52fbfb458	Litellm oss staging 250526 (#28770 ) * fix(mcp): handle OAuth IdP error responses in /callback (LIT-2750) Per RFC 6749 section 4.1.2.1, when the IdP rejects an OAuth authorization request it redirects back to the client with ?error=...&error_description=... and no code. The MCP /callback handler declared code and state as required query params, so FastAPI rejected such error responses with a 422 before the handler ran -- stranding the MCP client waiting on the loopback. This change: - Makes code and state optional and accepts the RFC-defined error, error_description, and error_uri params. - When state decodes to a trusted client redirect_uri, propagates the error params back to that URI with the client's original (un-wrapped) state preserved, so the client's OAuth library can surface the failure. - When state is missing/undecryptable or the encoded redirect_uri is no longer trusted, renders a 400 HTML page with the (HTML-escaped) error details instead of leaking to an attacker-controlled redirect. - Preserves the existing success path (code + state -> 302 to validated client redirect_uri with original state). Fixes LIT-2750. * test(mcp): regression tests for /callback handling IdP error responses (LIT-2750) Adds a new test module covering the LIT-2750 fix: the MCP OAuth /callback endpoint must accept IdP error responses (e.g. ?error=access_denied) per RFC 6749 section 4.1.2.1 instead of returning a 422 because ``code`` is missing. Coverage: - IdP error with no state -> 400 HTML page surfacing the error. - HTML escaping of user-controlled error / error_description fields. - IdP error with a trusted (loopback) state -> 302 propagating error / error_description / original client state to the client. - IdP error with an untrusted redirect_uri encoded in state -> 400 inline (no open-redirect to attacker-controlled origin). - IdP error with an undecryptable state -> 400 HTML fallback. - Bare GET /callback with no params -> 400 HTML (not Pydantic 422). - Success path (code + state) still 302 to validated client redirect_uri with the original (un-wrapped) state preserved. * refactor(mcp): drop unused _OAUTH_ERROR_PARAMS constant (Greptile P2) The tuple was leftover scaffolding from an earlier draft of the LIT-2750 fix; nothing references it. The explanatory RFC 6749 §4.1.2.1 comment block above the callback handler covers the same intent. * fix(mcp/oauth): preserve empty original_state and clarify missing-param error in /callback Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * fix: apply black formatting to base_llm chat transformation Fix CI black --check failure on is_thinking_enabled return formatting. Co-authored-by: Cursor <cursoragent@cursor.com> * merge main (#28836) * fix(proxy): Bedrock Knowledge Base pass-through: preserve SigV4 headers and signed request body (#27526) * Fix Bedrock KB pass-through SigV4 headers and signed body Coerce botocore HeadersDict to a dict for pass-through routes. When forward_headers is true, drop request headers that collide case-insensitively with signed headers so client Bearer auth does not shadow AWS SigV4. Send prepped.body as raw content so the outbound payload matches the signature after logging hooks mutate the parsed dict. Co-authored-by: Cursor <cursoragent@cursor.com> * Simplify pass-through raw body handling Read the SigV4-signed bytes directly from request.state inside pass_through_request instead of threading a custom_raw_body argument through three functions. Helper methods are restored to their original signatures, and the new branch lives in one place at each httpx call site. Co-authored-by: Cursor <cursoragent@cursor.com> * Harden pass-through raw body read from request.state Guard missing request.state (test fixtures) and ignore non-bytes/str values so MagicMock does not trigger the SigV4 raw-body path. Co-authored-by: Cursor <cursoragent@cursor.com> * Test pass_through_request state_raw_body uses httpx content= Cover non-streaming (async_client.request) and streaming (build_request) paths so SigV4 bytes on request.state are not replaced by json= of a hook-mutated dict. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> * chore(tests): migrate Bedrock CI to AWS account 941277531214 (#28728) * chore(tests): migrate Bedrock CI from AWS account 888602223428 to 941277531214 The original account (888602223428) was put under a security restriction by AWS after a root access key leaked in a PR comment. While that account works its way through the AWS Support unlock process, Bedrock-touching CI tests have been migrated to a fresh account (941277531214). Changes: - Replace 26 hardcoded references to 888602223428 with 941277531214 across 8 files (provisioned-model ARNs, imported-model ARNs, AgentCore runtime ARNs, batch execution role ARN, and example proxy config). - The provisioned-model and imported-model ARNs are referenced only from mocked unit tests — no AWS resources to recreate. - The batch execution IAM role has been recreated in the new account with the same name and equivalent permissions. - The two AgentCore runtimes (hosted_agent_r9jvp-3ySZuRHjLC, hosted_agent_13sf6-cALnp38iZD) are being recreated in the new account under the same names — see tools/agentcore-deploy/ in a follow-up. CircleCI env vars AWS_ACCESS_KEY_ID / AWS_SECRET_ACCESS_KEY / AWS_REGION_NAME were updated separately via the CircleCI API to point at the new account. Smoke-tested locally against the new account: aws bedrock-runtime converse --region us-west-2 \ --model-id us.anthropic.claude-sonnet-4-5-20250929-v1:0 \ --messages '[{"role":"user","content":[{"text":"ping"}]}]' → 200, model returned 'pong' Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(tests): refresh AgentCore ARN suffixes to match newly-deployed runtimes The first migration commit replaced just the account ID, but AgentCore auto-assigns a random 10-char suffix to every runtime on creation — we can't reuse the original suffixes (`3ySZuRHjLC`, `cALnp38iZD`) in the new account. Updated the AgentCore-runtime ARNs in the three files that reference real runtime IDs (not the mock-based unit-test ARNs). Deployed runtimes: arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_r9jvp-Rq79QFC2fp arn:aws:bedrock-agentcore:us-west-2:941277531214:runtime/hosted_agent_13sf6-4046UzHSwy Both runtimes are status=READY and pass a smoke invoke: $ aws bedrock-agentcore invoke-agent-runtime --agent-runtime-arn ... --payload '{"prompt":"ping"}' → 200, {"result": "echo: ping"} The agent is a minimal echo (see /tmp/agentcore_deploy/agent.py for the deploy artifacts). Tests that only verify the SDK wiring will pass; if any test asserts on agent output content, swap the echo for the real agent. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> * chore(tests): point Bedrock batch tests at new-account S3 bucket The account migration (888602223428 -> 941277531214) was a flat account-ID swap, which only rewrites ARNs that embed the account number. S3 bucket names carry no account ID, so the live Bedrock batch tests still uploaded to `litellm-proxy` — a bucket that lives in the old account. S3 names are globally unique, and the old account still holds that name, so it can't be recreated in the new account. Rename to `litellm-proxy-941277531214` (account-ID suffix guarantees global uniqueness). The bucket must be created in 941277531214 and the batch execution role granted s3:GetObject/PutObject/ListBucket on it before this job is run in CI. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(tests): point live S3 logging test at new-account bucket Same account-ID-free blind spot as the batch bucket: `load-testing-oct` lives in the old account and its name can't be reused globally. The `logging_testing` CI job is wired into the workflow and runs test_basic_s3_logging, which uploads to this bucket with the CI env creds, then lists and deletes objects — a live dependency. Rename to `load-testing-oct-941277531214`. The bucket must exist in the new account with the CI IAM principal granted s3:PutObject/GetObject/ListBucket/DeleteObject before this job runs. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * chore(tests): repoint Bedrock guardrail IDs to new-account guardrails The migration left guardrail IDs untouched (no account ID in them), so all live guardrail tests failed with "guardrail identifier or version does not exist" against 941277531214. Recreated both guardrails in the new account and updated the hardcoded IDs: - wf0hkdb5x07f -> zgkmukebruil (PII mask: PHONE + CREDIT_DEBIT_CARD, with explicit inputAction=ANONYMIZE so masking applies to INPUT, which is the source litellm's moderation hook sends) - ff6ujrregl1q -> 4w3d1di3snt5 (blocks "coffee"; blocked message set to the exact string the tests assert on) Updated test_bedrock_guardrails.py, otel_test_config.yaml, and the guardrailConfig in test_bedrock_completion.py. Verified locally: the 5 previously-failing guardrail tests now pass. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bedrock): migrate legacy models to current inference profiles The new CI account (941277531214) cannot invoke legacy Bedrock models (AWS gates them: "marked by provider as Legacy... not actively using in the last 30 days"). Migrated the live-call tests: - anthropic.claude-3-sonnet-20240229 -> us.anthropic.claude-sonnet-4-5-20250929-v1:0 - anthropic.claude-3-haiku-20240307 -> us.anthropic.claude-haiku-4-5-20251001-v1:0 Current Claude models on Bedrock require the us. inference-profile prefix (bare on-demand ids are rejected). cohere.command-r-plus has no working replacement (all Cohere is legacy- gated in the new account): swapped to claude-haiku-4-5 in provider- agnostic param lists. amazon.titan-image-generator skipped (no working replacement). Mocked/transformation/cost tests that reference the legacy strings are intentionally left unchanged. Verified live against the new account. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bedrock): repoint SageMaker + Knowledge Base to new-account resources These referenced account-scoped resources by hardcoded id that only existed in the old account, so the migration's account-ID swap missed them. Recreated in 941277531214 and repointed: - SageMaker endpoint jumpstart-dft-hf-textgeneration1-mp-20240815-185614 -> litellm-ci-textgen (gpt2 on a TGI container, ml.g5.xlarge) - Bedrock Knowledge Base T37J8R4WTM -> LCYXFBR2TU (OpenSearch Serverless vector store + titan-embed-text-v2, seeded with a LiteLLM doc) Verified live: test_sagemaker.py (12 passed) and test_bedrock_knowledgebase_hook.py (12 passed). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(reasoning_effort_grid): skip bedrock claude-opus-4-7 cells (not entitled on 941277531214) claude-opus-4-7 is listed in the new Bedrock CI account's foundation models but invoke is denied (AccessDeniedException: "not available for this account"). Bedrock access to the flagship Opus requires an AWS Sales request, not the self-serve model-access toggle, so it can't be enabled inline with the rest of the account migration. Add an optional `skip_reason` to ModelEntry and set it on the bedrock-claude-opus-4-7 entry; the grid test honors it via pytest.skip. Cell count (231) and route coverage are unchanged, so the structural asserts still pass. Restore coverage by deleting the one skip_reason line once access is granted. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * test(bedrock): swap/skip legacy-gated models unavailable on new CI account The migrated AWS account (941277531214) cannot access several models that the old account could, so the remaining red CI jobs were hitting real Bedrock "Access denied / Legacy" and "account not authorized" errors: - image_gen: skip both Nova Canvas test classes (amazon.nova-canvas-v1:0 is legacy-gated), matching the existing titan skip. - batches: skip test_async_file_and_batch (Bedrock batch inference is not authorized on the new account; requires an AWS support case). - litellm_overhead: swap legacy claude-3-5-haiku for the active us.anthropic.claude-haiku-4-5 inference profile. - test_completion_claude_3_function_call: swap legacy claude-3-sonnet for the active us.anthropic.claude-sonnet-4-5 inference profile. https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa * test(bedrock): fix remaining e2e legacy-model + batch failures on new CI account - e2e_openai_endpoints: skip test_bedrock_batches_api (Bedrock batch inference is not authorized on account 941277531214) and migrate the missed s3_bucket_name in oai_misc_config.yaml to litellm-proxy-941277531214. - build_and_test: swap legacy bedrock claude-3-sonnet for the active us.anthropic.claude-sonnet-4-5 inference profile in the proxy structured output e2e test. https://claude.ai/code/session_01Y7zgHYu9GX29YRwV4yiWAa * test(bedrock): make opus-4-7 + batch cells fail loudly and mock image-gen (#28791) Replace the silent skips added for the new CI account with noisier behavior: - reasoning-effort grid: opus-4-7 cells now fail (when AWS creds are present) instead of skipping, so the missing entitlement stays visible in CI; they still skip when AWS creds are absent (local dev) - Bedrock batch inference tests: drop the skip so they run and fail until batch access is granted - Titan + Nova Canvas image-gen tests: mock the Bedrock HTTP call so the transform + cost-tracking path stays under test without live model access https://claude.ai/code/session_01MT7SWDnXUjv6e6EPG7BDjT Co-authored-by: Claude <noreply@anthropic.com> * test(bedrock): use pytest.xfail for known-failing opus-4-7 cells Replace pytest.fail with pytest.xfail when a model has a fail_reason, so known-broken cells stay visible as XFAIL without keeping CI red. Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(otel): export SERVER span on management-endpoint success without http_request (#28794) Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local> * chore(ci): merge dev branch (#28801) * chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> * chore(ci): merge dev branch (#28657) * feat(dashboard): navbar hierarchy + Agent Platform notifications (#27543) * feat(dashboard): refine navbar zones and Agent Platform notice Restructure the admin navbar for production users: clear product vs community vs personal columns with vertical dividers, icon-only Slack/GitHub in a shared chip, and Docs/Blog typography aligned on an 8px rhythm. Add a notifications bell with popover linking to the LiteLLM Agent Platform repo and optional mark-as-read persistence. Promote the account control with initials avatar, single-line display name, and navDisplayName mapping for placeholder user ids (e.g. default_user_id). Co-authored-by: Cursor <cursoragent@cursor.com> * fix(dashboard): address PR review — AntD buttons, public page guard, dedupe regex - Replace raw <button> with AntD Button in BlogDropdown, NotificationsBell, UserDropdown, and test mock - Guard NotificationsBell + container behind !isPublicPage to avoid rendering on public pages - Remove redundant equality checks in navDisplayName (regex already covers them) - Remove unused `lower` variable after simplification Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * fix(dashboard): drop dead useHealthReadiness import in navbar The module was removed in #27896 (replaced by useHealthReadinessDetails), but the import survived the rebase. The symbol is unused — only useHealthReadinessDetails is consumed in the file. Removing the dead import unblocks the UI TypeScript build. * fix(dashboard): align CommunityEngagementButtons test with icon-only aria-labels The component was refactored to an icon-only chip with aria-label='LiteLLM on GitHub' (squash #27543), but the test still asserted /star us on github/i. Update the query to match the rendered accessible name. * refactor(dashboard): drop unused props from NavbarProps The navbar refactor moved user identity + dark-mode state to internal hooks (useAuthorized, useWorker), but the NavbarProps interface still declared userID, userEmail, userRole, premiumUser, isDarkMode, and toggleDarkMode as required, forcing every caller to thread them through. Drop them from the interface and all four call sites (page.tsx, (dashboard)/layout.tsx, public_model_hub.tsx, navbar.test.tsx). Also shrinks the destructure in layout.tsx so the now-unused locals stop being pulled out of useAuthorized(). * refactor(dashboard): use useSyncExternalStore for NotificationsBell dismiss flag Reads/writes of the litellmHideAgentPlatformBanner key were done directly inside NotificationsBell via a useEffect + useState pair. Every other localStorage-backed flag in the dashboard (Disable ShowPrompts, DisableBouncingIcon, DisableShowNewBadge, DisableUsageIndicator, DisableBlogPosts) is wrapped in a useSyncExternalStore hook over localStorageUtils so all mounted components stay in sync. Extract useHideAgentPlatformBanner to follow the same shape, swap NotificationsBell to consume it, and add a regression test that two sibling bells stay in sync without a remount when one is dismissed. * refactor: mask credential fields in proxy settings GET responses (#28682) * refactor: mask credential fields in proxy settings GET responses Brings SSO settings, cache settings, and the email/Slack alerting view in /get/config/callbacks in line with the HashiCorp Vault config-override pattern, so persisted credentials are not transported back to the UI in plaintext. * refactor: harden short-value masking and hoist alerting var constant Closes two review observations: - mask_sensitive_keys now replaces short values (below the visible prefix+suffix length) with an all-mask string instead of returning them unchanged, so a 1-7 character credential is no longer round-tripped verbatim. - _ALERTING_SENSITIVE_VARS is moved out of get_config() to a module-level constant, matching the analogous _SSO_SENSITIVE_FIELDS and _CACHE_SENSITIVE_FIELDS in the SSO and cache endpoint files. --------- Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(ui): show 2-decimal precision for max_budget on key overview (#28809) The Key Info Overview tab's Spend card truncated sub-dollar budgets to "$0" because formatNumberWithCommas defaults to 0 decimals. The Settings tab passes 2; align the overview so a $0.10 budget renders as "$0.10". Resolves LIT-2845 * feat(proxy): allow `llm_api_routes` virtual keys to list MCP servers (#28442) * feat(proxy): allow llm_api_routes virtual keys to list MCP servers Add a new `mcp_discovery_routes` group (GET /v1/mcp/server and GET /v1/mcp/server/{server_id}) and include it in `llm_api_routes` so that virtual keys configured with `allowed_routes=["llm_api_routes"]` can discover the MCP servers they have access to. Previously these calls failed with 'Virtual key is not allowed to call this route. Only allowed to call routes: [llm_api_routes]'. The GET handlers already sanitize the response for restricted virtual keys via `_sanitize_mcp_server_list_for_virtual_key`, stripping credential-bearing fields (url, headers, env). Write methods (POST/PUT/DELETE) on the same paths remain gated by the existing handler-level admin role checks. The new discovery list is intentionally kept OUT of `mcp_inference_routes`, so `is_llm_api_route()` still returns False for these paths — this preserves the existing contract that DISABLE_LLM_API_ENDPOINTS must not block the Admin UI from listing MCP servers. Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> * refactor(proxy): make MCP discovery carve-out method-aware Replace the `mcp_discovery_routes` group in `llm_api_routes` with a method-aware special case inside `is_virtual_key_allowed_to_call_route`. Virtual keys with allowed_routes=["llm_api_routes"] are now permitted to call only GET /v1/mcp/server and GET /v1/mcp/server/{server_id} — non-GET methods and multi-segment admin sub-paths fall through to the existing 403. This keeps the general llm_api_routes list free of management paths and avoids accidentally exposing POST/PUT/DELETE writes through the route-check layer. --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> * chore(ci): merge dev branch (#28807) * chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> * fix(team): keep team_alias cache in sync on _cache_team_object writes (#28737) * fix(team): keep team_alias cache in sync on _cache_team_object writes _cache_team_object wrote only to the team_id:<id> cache key, but the JWT auth path that uses team_alias_jwt_field reads from a separate team_alias:<alias> key (get_team_object_by_alias caches under both keys on miss, but reads only the alias-keyed one). After any team-mutation endpoint (team_model_add, team_model_delete, update_team, the two access-group writes) the team_id cache was refreshed but the team_alias cache stayed stale until TTL — JWT callers using team_alias_jwt_field kept seeing the pre-mutation team for the full cache window. Mirror the write under the alias key inside _cache_team_object so every existing caller stays in sync without further changes. Skip the alias write when team_alias is None/empty so we don't collide across alias-less teams. Surfaced testing the LIT-3244 cherry-pick on patch/1.86.0: the LIT-3244 fix correctly invalidated the team_id cache but the customer's JWT used team_alias_jwt_field, so they kept hitting the stale alias-keyed entry. * fix(team): delete (not overwrite) team_alias cache on _cache_team_object The prior shape of this PR wrote both team_id:<id> AND team_alias:<alias> from _cache_team_object. team_alias is NOT unique in the schema (no @unique on LiteLLM_TeamTable.team_alias), and get_team_object_by_alias enforces uniqueness on its own DB-fetch path (len(teams) > 1 raises). Writing the alias-keyed cache from the generic refresh path bypassed that check: a team admin renaming their team to collide with another team's alias could silently overwrite the cached team for JWT-by-alias auth, swapping the resolved team under that alias for the cache window. Switch the alias-keyed operation from a write to a delete (mirroring the dual-cache delete pattern in _delete_cache_key_object). After every team write, the next JWT-by-alias reader cache-misses and falls through to get_team_object_by_alias, which (a) re-fetches the fresh team from DB, closing the LIT-3244 staleness gap that motivated this PR, and (b) enforces alias uniqueness before populating either cache key. team_id:<id> writes are unchanged — team_id is the table PK and is guaranteed unique. Surfaced in veria-ai review on #28739. * fix(managed-files): anchor model_id regex so it doesn't match llm_output_file_model_id extract_model_id_from_unified_id used `re.search(r"model_id,([^;]+)", ...)` which substring-matches the `model_id,` inside the file-ID encoding's `llm_output_file_model_id,<deployment_uuid>` field. parse_unified_id then fed that deployment UUID back into the auth path as a model candidate via _extract_models_from_managed_resource_id, and every team-BYOK file attach 403'd with: team not allowed to access model. This team can only access models=['openai/']. Tried to access <deployment-uuid> The team's models list correctly contains the public name (`openai/`) that target_model_names matches, but the bogus UUID candidate fails the wildcard check first. Anchor the regex to a field boundary (`(?:^\|;)model_id,`) so it matches the legitimate top-level `model_id,<value>` field on vector_store unified IDs and skips substring matches inside other fields. File-IDs (which have no top-level `model_id` field) now return None and contribute no spurious UUID candidate. Surfaced reproducing LIT-3244 on patch/1.86.0 with the customer's exact flow: team with openai/* BYOK deployment, JWT-scoped user, POST /v1/vector_stores/{id}/files attaching a file uploaded with target_model_names=openai/gpt-4o. * fix(proxy): hydrate wildcard discovery credentials (#28284) (#28822) * fix(proxy): hydrate wildcard discovery credentials * fix(proxy): constrain wildcard credential hydration Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com> * ci: add daily oss-agent-shin branch creation workflow (#28829) Creates litellm_oss_agent_shin_MM_DD_YYYY from main every day at 00:00 UTC. Lets us retarget oss-agent-shin fork PRs onto a canonical branch so CircleCI runs with secrets, without granting the agent write access. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> * test(proxy): add harness for proxy_server.py behavior-pinning (#28827) * test(proxy): add harness for proxy_server.py behavior-pinning Creates tests/test_litellm/proxy/proxy_server/ with: - conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as, mock_router with parametrized response builders, normalize, etc.) - _coverage_check.py: per-PR coverage gate (line + branch) against a baseline, self-selects target by inspecting which placeholder files have been filled - _pin_check.py: AST-based gate that verifies every pin-list item has >=1 happy + >=1 error test with a real assertion (no status-only) - test_harness_smoke.py: 19 smoke tests covering every fixture + both scripts end-to-end - 26 placeholder test files (one docstring each) reserved for follow-up PRs per the directory ownership in the Notion plan - .coverage_baseline pinned at 0% so future PRs measure deltas against new-tests-only and aren't entangled with the broader scattered test suite Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml so this directory's runtime + coverage are tracked independently. Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc * ci(proxy-endpoints): allow workflow_dispatch Lets the workflow be triggered manually on a branch via `gh workflow run`, which is needed for the verify-first flow on workflow changes before opening a PR. * test(proxy): address review feedback on proxy_server harness - conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4]) instead of CWD-relative os.path.abspath("../../../../") which resolved to the wrong directory when pytest is launched from the repo root. - _coverage_check.py: actually read .coverage_baseline and use it as the floor (line_min = max(target, baseline)). Closes the gap between the PR description's "delta semantics" and what the script was doing. With baseline=0.0 today this is a no-op; future PRs that update the baseline cause regressions (test deletions etc.) to trip the gate even if the static PR target is still met. - _pin_check.py: drop unreachable startswith("_") guard (test_.py glob never yields underscore-prefixed names) and read each test file once instead of twice. feat(openai): apply regional-processing cost uplift for EU/US data residency (#28626) * feat(openai): apply regional-processing cost uplift for EU/US data residency OpenAI charges a 10% uplift on the latest GPT models when requests are served from a regionalized hostname (eu./us.api.openai.com). Infer the region from `api_base`, expose it on `kwargs["litellm_params"]["data_residency"]`, and multiply the computed cost by a per-model `regional_processing_uplift_multiplier_<region>` field. https://claude.ai/code/session_012ebH44s7ohYxjoix5CXzTW * test: allow regional_processing_uplift_multiplier_{eu,us} in model_prices schema * fix(cost): tighten data_residency inference and restore model_cost in tests - Only infer OpenAI data_residency when custom_llm_provider == "openai"; drop the implicit None fallback so non-OpenAI callers can't accidentally pick up a regional tag from a stray OpenAI hostname. - _local_model_cost_map fixture now snapshots and restores litellm.model_cost and LITELLM_LOCAL_MODEL_COST_MAP so tests don't leak state across the session. * refactor(openai): move data_residency helper under llms/openai * fix: thread data_residency through realtime stream cost calculation Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(cost): thread data_residency through batch_cost_calculator Apply the OpenAI regional-processing uplift multiplier to retrieve_batch cost paths so Batch API requests served via eu./us.api.openai.com are priced at the same uplifted token rates as completions/transcriptions. * refactor(openai): encapsulate provider check inside infer_openai_data_residency Move the custom_llm_provider == "openai" guard from get_litellm_params into the helper itself so the core utility no longer carries provider-specific dispatch logic. Callers pass through the provider unconditionally; the helper returns None for any non-OpenAI provider. * fix(responses): thread data_residency through Responses logging params The Responses API paths build their logging litellm_params dict after provider resolution but did not include data_residency, so cost calc saw None even when the effective api_base was a regional OpenAI host. --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> * fix: preserve OTEL response payload and remove duplicate constant - _emit_management_endpoint_otel_span now passes result as response on success - remove duplicate _CREDENTIAL_LITELLM_PARAM_FIELDS assignment in model_checks Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: address bug detection findings - pass_through_endpoints: use request.method instead of hardcoded POST in streaming SigV4-signed request path for consistency with the non-streaming branch - llm_cost_calc/utils: hoist DataResidency value set to a module-level frozenset to avoid rebuilding it on every cost calculation - example_config_yaml/oai_misc_config: replace real-looking AWS account ID with placeholder 123456789012 in example bucket and role ARN Co-authored-by: Yassin Kortam <yassin@berri.ai> * chore(github_copilot): refresh model catalog from upstream /models API (#28055) Aligns the github_copilot catalog with values returned by Copilot's public /models endpoint (capabilities.limits + capabilities.supports + model.supported_endpoints). - Adds 10 new model entries: claude-opus-4.7, claude-sonnet-4.6, gemini-3-flash-preview, gemini-3.1-pro-preview, gpt-4-0125-preview, gpt-5.2-codex, gpt-5.4, gpt-5.4-mini, gpt-5.5, oswe-vscode-prime. - Updates max_input_tokens for existing entries to reflect each model's true context window (e.g. gpt-4o-mini 64000 -> 128000, gpt-5-mini 128000 -> 264000, gpt-5.3-codex 128000 -> 400000, claude-haiku-4.5 128000 -> 200000). - Adds supports_reasoning, supports_response_schema, supports_function_calling, supports_parallel_function_calling, supports_vision based on capabilities.supports. - Declares supported_endpoints for entries missing it (e.g. gpt-3.5-turbo, gpt-4o, embeddings). - For responses-only models (gpt-5.2-codex, gpt-5.4, gpt-5.4-mini, gpt-5.5), sets mode to 'responses'. - gpt-41-copilot.mode changes from 'completion' to 'chat' because Copilot reports capabilities.type = 'chat'. Revertible on request. Pricing fields and other manually-curated values are preserved. * feat(datadog): emit litellm.overhead.latency as a standalone Datadog metric (#28831) Adds a new `litellm.overhead.latency` gauge metric to `DatadogMetricsLogger` (the `/api/v2/series` path). The value is sourced from `hidden_params["litellm_overhead_time_ms"]` already computed in `ResponseMetadata` and exposed in `StandardLoggingPayload`. Matches the Prometheus integration which exposes the same value via `litellm_overhead_latency_metric`. Emitted in seconds (ms ÷ 1000) for consistency with the other latency series. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Shin <shin@litellm.ai> Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com> * feat(arize): route Phoenix traces via per-project TracerProviders (#28876) Use LRU-cached TracerProviders with project-scoped OTEL Resources so team/key metadata routes traces correctly. On the proxy, project selection is limited to server-controlled user_api_key_auth_metadata; client metadata fields stay banned. * fix(arize_phoenix): skip _emit_semantic_logs on failure path Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(arize_phoenix): skip raw request logging and metrics on failure path Restores pre-refactor behavior: _handle_failure no longer emits raw-request sub-spans or records OTEL metrics, matching the original _handle_failure that did not call these helpers. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(security): close two medium telemetry trust-boundary issues Issue 1 (arize_phoenix.py — caller-controlled telemetry routing): - _is_proxy_request no longer detects proxy mode by checking user_api_key_auth_metadata in request metadata. That field is user-supplied, so an authenticated caller could fake proxy-mode detection and have _project_from_metadata_dict read their own dict for project selection, routing telemetry to arbitrary Arize/Phoenix projects. Proxy mode is now determined solely by the server-set proxy_server_request field in litellm_params. - auth_utils.py adds user_api_key_auth_metadata to the banned request body params list so the proxy rejects any attempt to supply the field at the HTTP layer. The field is server-reserved: it is written exclusively by add_user_api_key_auth_to_request_metadata from the authenticated key's database record after the ban check runs. Issue 2 (management_helpers/utils.py — API key in OTEL span): - _emit_management_endpoint_otel_span stripped plaintext credential fields (key, token, api_key, secret, …) from the response dict before passing it to the OTEL success hook. dict(result) on a Pydantic GenerateKeyResponse includes the freshly-generated key field, which would previously be written as a span attribute to every configured OTEL collector/backend. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: oss-agent-shin <ext-agent-shin@berri.ai> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: milan-berri <milan@berri.ai> Co-authored-by: Mateo Wang <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Mateo <mateo@Mateos-MacBook-Pro.local> Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com> Co-authored-by: Yassin Kortam <yassinkortam@Yassins-MacBook-Pro.local> Co-authored-by: user <70670632+stuxf@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan-crabbe-berri@users.noreply.github.com> Co-authored-by: Dibyo Mukherjee <dibyo@adobe.com> Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com> Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com> Co-authored-by: rinto <54238243+ririnto@users.noreply.github.com> Co-authored-by: Shin <shin@litellm.ai> Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>	2026-05-26 11:57:39 -07:00
yuneng-jiang	7c667b8797	fix(helm): drop main- prefix from default image tag (#28710 ) * fix(helm): drop main- prefix from default image tag The default image tag in the deployment + migrations-job templates was `main-{{ .Chart.AppVersion }}`. The current release pipeline publishes content tags without the `main-` prefix (e.g. `v1.85.1` / `1.85.1`, `v1.86.0-rc.1` / `1.86.0-rc.1`), so the rendered ref points at a tag that does not exist on GHCR or DockerHub and installs fail with ImagePullBackOff. - templates/deployment.yaml, templates/migrations-job.yaml: render `.Chart.AppVersion` directly instead of `main-<AppVersion>`. - Chart.yaml: bump stale `appVersion: v1.80.12` (not on either registry) to `v1.85.1` so local-checkout installs also resolve. - values.yaml: update the commented tag-override hint to match. * fix(helm): use :latest in tag override example, not pinned version Per review: ghcr.io/berriai/litellm-database:latest is a floating alias for the most recent stable (same digest as :main-stable), maintained by the release pipeline's UPDATE_LATEST advance step. Better example than a pinned version that goes stale.	2026-05-23 15:57:38 -07:00
Sameer Kankute	36c494fdd2	Litellm oss staging (#28161 ) * fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455) Squash-merged by litellm-agent from Anai-Guo's PR. * feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508) Squash-merged by litellm-agent from yimao's PR. * fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550) Squash-merged by litellm-agent from krisxia0506's PR. * fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711) Squash-merged by litellm-agent from krisxia0506's PR. * fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503) Squash-merged by litellm-agent from krisxia0506's PR. * Fix Gemini MIME detection for extensionless GCS URIs (#27278) Squash-merged by litellm-agent from krisxia0506's PR. * fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107) Squash-merged by litellm-agent from voidborne-d's PR. * feat(chart): add support for autoscaling behavior in HPA (#27990) Squash-merged by litellm-agent from FabrizioCafolla's PR. * feat(proxy): add blocked flag to models for pause/resume from the UI (#27927) Squash-merged by litellm-agent from Cyberfilo's PR. * fix: pass socket timeouts to Redis cluster clients (#27920) Squash-merged by litellm-agent from tomdee's PR. * Fix/cache token (#28009) Squash-merged by litellm-agent from escon1004's PR. * fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080) Squash-merged by litellm-agent from Divyansh8321's PR. * fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617) * fix: reset org and tag budgets (#27326) * reset org budgets * reset tag budgets --------- Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> * fix(ui): omit allowed_routes from key edit save when unchanged (#27553) * fix(ui): omit allowed_routes from key edit save when unchanged When a team admin opens Edit Settings on a key with key_type=AI APIs and saves without changing anything, the UI re-sends the existing allowed_routes value, which the backend's _check_allowed_routes_caller_permission gate rejects for non-proxy-admins (LIT-2681). Strip allowed_routes from the patch in handleSubmit when it deep-equals the original keyData.allowed_routes. The backend treats absence as "leave alone," so no-op saves now succeed for non-admins. Admins explicitly editing the field still send the new value. * fix(ui): order-insensitive allowed_routes diff + cover null-original case Address Greptile review: - Switch the "is allowed_routes unchanged" check to a Set-based comparison so a server-side reorder of the array doesn't register as a user edit and re-trigger LIT-2681. - Add two regression tests: (1) keyData.allowed_routes is null and the form is untouched — patch should strip the field; (2) server returned routes in a different order than the user originally entered — patch should still recognize the value as unchanged. * chore(ui): strip ticket refs and tighten comments in key edit fix - Remove internal-tracker references from in-code comments - Tighten the WHY comment in handleSubmit to two lines - Drop redundant test-block comments — test names already describe the case * fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc * fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests GuardrailRaisedException and BlockedPiiEntityError both lacked a status_code attribute. When these exceptions reached the proxy exception handler (getattr(e, 'status_code', 500)), the fallback defaulted to HTTP 500 — making intentional guardrail blocks indistinguishable from server errors and causing unnecessary client retries. Changes: - Add status_code=400 (keyword-only) to GuardrailRaisedException - Add status_code=400 (keyword-only) to BlockedPiiEntityError - Update _is_guardrail_intervention() to recognize both exceptions so downstream loggers record 'guardrail_intervened' instead of 'guardrail_failed_to_respond' - Add 6 unit tests for default/custom status codes and getattr pattern - Strengthen existing blocked-action test with status_code assertion Fixes #24348 --------- Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> * fix(router/proxy): address Greptile P1+P2 review comments on PR #28161 - router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429) when a specifically-addressed deployment is administratively blocked; 429 misleads retry-enabled clients into spinning forever against a paused model - proxy_server: compute get_fully_blocked_model_names() once before both branches in model_list() instead of duplicating the call in each branch - deepseek: upgrade silent debug log to warning when injecting placeholder reasoning_content so callers are clearly notified of degraded multi-turn quality - tests: update two blocked-deployment assertions to expect ServiceUnavailableError Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address bug detection findings (cache token order, mutable defaults) Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: address bugs in async pass-through, anthropic cache token detection, rerank tests - async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments - cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0 - dashscope rerank tests: pass request to httpx.Response constructions for consistency Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix code qa * fix(vertex_ai/gemini): strip MIME parameters from GCS contentType GCS object metadata's contentType field can include parameters such as 'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases so downstream get_file_extension_from_mime_type sees a bare MIME type. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(vertex_ai/gemini): clarify mime-type error message string concatenation Co-authored-by: Yassin Kortam <yassin@berri.ai> --------- Co-authored-by: Tai An <antai12232931@outlook.com> Co-authored-by: Vincent <yimao1231@gmail.com> Co-authored-by: Kris Xia <xiajiayi0506@gmail.com> Co-authored-by: d 🔹 <liusway405@gmail.com> Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Tom Denham <tom@tomdee.co.uk> Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com> Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com> Co-authored-by: robin-fiddler <robin@fiddler.ai> Co-authored-by: Michael-RZ-Berri <michael@berri.ai> Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>	2026-05-18 16:27:44 -07:00
Yassin Kortam	fa5eae8bc9	chore: remove legacy deployment artifacts and litellm-js packages (#27541 ) - Remove litellm-js/proxy and litellm-js/spend-logs TypeScript packages that provided Cloudflare Worker proxy and Node.js spend logging services, as these are no longer maintained - Remove deprecated Docker variants (Dockerfile.alpine, Dockerfile.dev, Dockerfile.custom_ui, Dockerfile.health_check, Dockerfile.ghcr_base) that have been superseded by the primary Dockerfile - Remove legacy Kubernetes manifests (kub.yaml, service.yaml) from deploy/kubernetes in favor of the Helm chart - Remove stale index.yaml Helm chart index pinned to an old version (v1.43.18) - Remove dev_config.yaml development configuration file that contained hardcoded credentials and example endpoints - Clean up ~3,500 lines of unused code and configuration to reduce repository maintenance burden Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-09 20:51:34 +00:00
Yassin Kortam	b5d3a5fc85	feat: add read-replica routing for Prisma DB via DATABASE_URL_READ_REPLICA (#27493 ) - Introduce RoutingPrismaWrapper that transparently routes read operations (find_*, count, group_by, query_raw, query_first) to a reader endpoint while writes remain on the writer, enabling Aurora-style reader/writer endpoint splits - Add IAMEndpoint dataclass and parse_iam_endpoint_from_url() to capture static connection fields from a reader URL so only the IAM token needs to rotate, avoiding the need for separate DATABASE_HOST_READ_REPLICA/etc. env vars - Enhance PrismaWrapper with per-instance knobs (db_url_env_var, iam_endpoint, recreate_uses_datasource, log_prefix) so writer and reader wrappers are independent: the reader writes its fresh URL to DATABASE_URL_READ_REPLICA and passes datasource override to Prisma since Prisma only auto-reads DATABASE_URL - Fix deadlock in PrismaWrapper.__getattr__: when called from inside a running event loop, schedule the token refresh as a background task instead of blocking with run_coroutine_threadsafe + future.result(), which would deadlock the loop thread waiting for a coroutine that needs the loop to run - Fix botocore crash when DATABASE_PORT is unset by defaulting to "5432" in both proxy_cli.py and PrismaWrapper.get_rds_iam_token(); passing None caused botocore to embed the literal string "None" in the presigned URL - Implement graceful reader degradation: reader connect/recreate failures are non-fatal; wrapper sets _reader_unavailable=True and silently routes reads to the writer to keep the proxy serving traffic during transient reader outages - Add PrismaClient.writer_db property so the reconnect smoke-test always validates the writer engine specifically; query_raw on the routing wrapper would route to the reader and not verify the newly-recreated writer - Expose DATABASE_URL_READ_REPLICA in Helm chart (values.yaml + deployment.yaml) via both plain value and secret key reference, and document the field in docker-compose.yml - Add 887-line test suite covering routing logic, IAM token refresh paths, reader degradation scenarios, datasource override behavior, and the deadlock regression Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-08 21:05:50 -07:00
Yassin Kortam	451ce161fc	fix: remove separate health app	2026-05-07 16:04:56 -07:00
Yassin Kortam	dbc8f5a937	helm: skip proxy startup prisma db push when migrations Job is enabled (#27200 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 16:58:53 -07:00
Yassin Kortam	618df94433	helm: increase default probe timeouts, disable debug logging by default (#27237 ) Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>	2026-05-05 16:58:34 -07:00
CHANGE	87d7e86479	feat(helm): add tpl support to extraContainers and extraInitContainers Wrap toYaml with tpl in deployment and migration job templates so users can reference Helm values (e.g. {{ .Values.image.repository }}) inside extraContainers and extraInitContainers definitions.	2026-04-10 09:41:33 -04:00
Yuneng Jiang	5f63873dca	[Infra] Pin all Docker build dependencies to exact versions Pin every dependency across all Docker builds so upgrades are intentional. Verified by building all 3 production images and diffing pip freeze against known-good v1.83.0-nightly baselines — zero version drift. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-01 00:05:39 -07:00
Chesars	1be6b31e2f	merge: resolve conflicts between main and litellm_oss_staging_03_11_2026	2026-03-12 09:38:31 -03:00
RJ Duffner	0c95d415e1	Add Abilty To Set minReadySeconds From values Files (#23173 ) * Add Abilty To Set minReadySeconds From values Files * typo * uppercase Min as it comes after deployment * Don't use defaults, just omit	2026-03-11 23:29:15 +05:30
Harshit28j	3127d79da8	feat: add strategy to deployment for helmchart	2026-03-10 05:49:46 +05:30
Sean Marsh Glover	4652c73259	feat(proxy): limit concurrent health checks with health_check_concurrency (#20584 ) * staged first pass * black * Update litellm/proxy/health_check.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * simpler * restore cached logo * fix tests for perform_health_check max_concurrency arg * implement pr suggestion * and the helm chart * add configureable resources and probes to the deployment in the helm chart * more helm chart unittests * move some background healthcheck loggin to debug --------- Co-authored-by: Sean Glover <sglover@athenahealth.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>	2026-02-24 08:16:59 -08:00
Cesar Garcia	622983cf89	fix(helm): add OCI annotations so GHCR shows helm pull instead of docker pull (#20617 ) The Helm chart on GHCR displays a `docker pull` command instead of the correct `helm pull oci://` command. This is because the OCI artifact is missing the `org.opencontainers.image.source` annotation that GHCR uses to identify and properly display Helm charts. Changes: - Add OCI annotations to Chart.yaml (source + url) which Helm 3.10+ propagates to the OCI manifest on push - Install explicit Helm v3.20.0 via azure/setup-helm@v4 for reproducible builds and proper OCI annotation support - Remove deprecated HELM_EXPERIMENTAL_OCI env var (OCI is GA since Helm 3.8)	2026-02-12 19:58:16 +05:30
Pragya Sardana	b4a27712a1	Add Init Containers in the community helm chart (#19816 )	2026-01-27 18:10:47 -08:00
Harshit Jain	9084c1d1bd	feat(helm): Enable PreStop hook configuration in values.yaml (#19613 )	2026-01-22 19:28:52 -08:00
R.Sicart	608979c7e9	feat: add support for keda in helm chart (#19337 ) * feat: add support for keda in helm chart Signed-off-by: R.Sicart <roger.sicart@gmail.com> * chore: bump chart version --------- Signed-off-by: R.Sicart <roger.sicart@gmail.com>	2026-01-19 10:38:41 -08:00
Harshit Jain	3ad8fa5422	fix: mount config.yaml as single file in Helm chart (#19146 )	2026-01-15 21:21:13 +05:30
Cesar Garcia	46dd420833	fix: sync Helm chart versioning with production standards and Docker versions (#18868 ) * fix: sync Helm chart versioning with production standards and Docker versions - Update Chart.yaml version from 0.4.10 to 1.0.0 (SemVer 0.x is for development, 1.0+ for production) - Update appVersion from v1.50.2 to v1.80.12 to match current Docker image version - Update workflow defaults from 0.1.0 to 1.0.0 for new chart version scheme - Maintain independent chart versioning per Helm best practices This ensures: - Helm chart follows SemVer production standards (1.x instead of 0.x) - appVersion stays synchronized with Docker/application version - Chart version remains independent for flexibility (can update chart without waiting for app releases) * fix: sync Helm chart appVersion with Docker image tags in release workflow Updates the GitHub workflow to ensure Helm chart appVersion matches the Docker image tags that are actually published: - For stable/rc releases: Uses the workflow input tag (e.g., v1.80.12) - For latest/dev releases: Uses the release_type to match main-{type} tags - Makes 'tag' input required to prevent accidental releases with wrong versions - Simplifies fallback logic by removing git-describe dependency This ensures the chart's appVersion correctly references Docker images that exist, preventing deployment failures from missing image tags. * Update ghcr_deploy.yml	2026-01-12 17:04:59 +05:30
Alexsander Hamir	1544e8f971	feat: Add line_profiler support for performance analysis and fix Windows CRLF issues in Docker builds (#18773 )	2026-01-07 11:36:57 -08:00
Mehmet Can Şakiroğlu	a3503e59c2	Litellm feat helm lifecycle support (#18517 ) * feat(helm): add lifecycle hook support for helm * add tests	2026-01-04 00:22:50 +05:30
Krrish Dholakia	7c2478b70e	docs: replace ghcr link with docker.litellm.ai	2025-12-16 08:35:45 +05:30
expruc	2d112fc8b2	add option to include additional resources to chart (#17627 )	2025-12-07 23:25:57 -08:00
Lukas de Boer	3b8a6ec888	Helm Chart: Add possibility to override command, args and add deployment labels (#17535 ) * Helm Chart: Add possibility to override command, args and also add deployment labels * Helm Chart: Fix helm lint issue * Helm Chart: Fix helm unit tests	2025-12-06 14:01:09 -08:00
Fabian Reinold	c173a4a275	Helm Chart: add ingress-only labels (#17348 ) * feat(helm): add ingress-only labels * feat(helm): add ingress configuration tests * chore(helm): bump chart version	2025-12-02 22:30:54 -08:00
Saar wintrov	777ef628d2	Enhancement(helm): ServiceMonitor template rendering (#17038 ) * Metadata: fix 401 when audio/transcriptions * check if str, CR fixes * Added new helmchart functionality * . * . * adding new tests	2025-11-24 20:53:02 -08:00
tushar8408	5f94b372f8	Migration job labels (#16831 ) * Add dynamic pod labels and annotations to migrations job * Bump chart version to 0.4.8	2025-11-19 09:53:21 -08:00
YutaSaito	645f84c02e	fix: add imagePullSecrets to migrations-job (#15681 )	2025-10-18 13:56:31 -07:00
Krish Dholakia	cf3c18a420	Merge pull request #13855 from edify42/allow-no-db-url feat(helm): Allow no DATABASE_URL to be set on migration job to keep the behaviour same as deployment	2025-09-06 22:02:01 -07:00
Abhinav	b6c26c3365	helm(chart): add optional PodDisruptionBudget for litellm proxy (#14062 ) (#14093 )	2025-09-01 12:21:44 -07:00
Const-antine	f8d1e03450	rework tests	2025-08-28 13:39:09 -04:00
Const-antine	1350336515	fix tests	2025-08-28 13:30:11 -04:00
Const-antine	d3b526041f	better formatting	2025-08-28 13:18:36 -04:00
Const-antine	730e9c90a2	fix formatting	2025-08-28 13:18:33 -04:00
Const-antine	5d973ea06e	update readme	2025-08-28 13:18:26 -04:00
Const-antine	409429ddd6	add new tests	2025-08-28 13:18:23 -04:00
Const-antine	ff4040bbe1	add functionality to mount existing configmap if needed	2025-08-28 13:18:05 -04:00
Jugal D. Bhatt	d63f5f99e9	Enhance database configuration: add support for optional endpointKey in values.yaml and update deployment/migrations job templates to conditionally source DATABASE_HOST from the secret if endpointKey is set. (#13763 )	2025-08-21 14:58:50 -07:00
Ishaan Jaff	f498cf4901	Fix - Ensure Helm chart auto generated master keys follow sk-xxxx format (#13871 ) * docs - master key * fix - auto generate sk-xxx prefixed key * test master key fix * fix master key gen	2025-08-21 14:34:21 -07:00
Ed Kim	c88a13c58b	add unit test which confirms the removal of DATABASE_URL Signed-off-by: Ed Kim <edward.kim@lendi.com.au>	2025-08-21 21:08:18 +10:00
edward kim	418b70b38e	fixes Signed-off-by: edward kim <edward.kim@lendi.com.au>	2025-08-21 17:44:54 +10:00
edward kim	2bd3daa742	fixes the mounting of this only when deployStandalone is true Signed-off-by: edward kim <edward.kim@lendi.com.au>	2025-08-21 17:39:31 +10:00
Mattias Andersson	89f71af4cd	Add possibility to configure resources for migrations-job in Helm chart	2025-08-14 17:08:26 +02:00
unique-jakub	f58807ff6e	Add labels to migrations job template (#13343 ) * set labels on the migration job * update comment to retrigger the pipeline	2025-08-07 09:41:24 -07:00
Jugal D. Bhatt	7cf3b4682a	[Separate Health App] Update Helm Deployment.yaml (#13162 ) * add helm deployment fix * clean deployment	2025-08-01 16:50:23 -07:00
unique-jakub	3edb71e617	allow helm hooks for migrations job (#13174 )	2025-07-31 21:51:07 -07:00
Marvin Huetter	d23a6e3ea4	fix: best practices suggest this to set to true (#12809 ) The order of the specification is important here, k8s will take the last value as truth. Push down to be sure schema update is done by migration job	2025-07-29 15:40:12 -07:00
Anton	f05ec34e11	feat: Add envVars and extraEnvVars support to Helm migrations job (#12591 ) - Add support for envVars (simple key-value pairs) in migrations job - Add support for extraEnvVars (complex environment variable configurations) - Include comprehensive test coverage for both envVars and extraEnvVars - Ensure backward compatibility with existing configurations - Tests verify proper rendering of environment variables in container spec	2025-07-14 22:24:13 -07:00

1 2 3

132 Commits