litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-17 14:48:44 +00:00

Author	SHA1	Message	Date
Shivam Rawat	3bd89f209e	Litellm jwt mapping virtualkeys (#28510 ) * restore an explicit no-match policy * fix(jwt): fix AUTO_REGISTER sentinel bypass, race condition, and inline import comment - AUTO_REGISTER now evicts stale __NO_MAPPING__ sentinel instead of silently returning None when cached under a prior fallback_team_mapping config - Race condition in _auto_register_jwt_mapping: catch P2002 unique-constraint violation on concurrent creates, fetch the winning mapping, proceed cleanly - Added comment on inline generate_key_helper_fn import explaining the circular dependency (key_management_endpoints imports user_api_key_auth at line 51) - 3 new tests: stale sentinel eviction, race condition winner fallback, and the existing auto_register happy path Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(jwt): cache __NO_MAPPING__ sentinel before raising 403 in REJECT mode REJECT mode was raising HTTPException immediately on a DB miss without writing the __NO_MAPPING__ sentinel, causing every subsequent rejected request to re-query the DB. Write the sentinel first so repeated rejections are served from cache within virtual_key_mapping_cache_ttl. Adds test asserting DB is not hit on the second reject after a cache-warm miss. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(jwt): enforce no-match policy when prisma_client is None The early `if prisma_client is None: return None` guard ran before the no-match policy check, silently bypassing REJECT and AUTO_REGISTER — every JWT client fell through to team auth regardless of configuration. Fix: treat prisma_client=None as a definitive DB miss and fall through to the same policy block as a real miss. REJECT now raises 403, AUTO_REGISTER raises 500 with a clear message (can't create keys without a DB), FALLBACK_TEAM_MAPPING returns None unchanged. Adds three tests: REJECT/403 with no DB, FALLBACK returns None with no DB, AUTO_REGISTER/500 with no DB. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(jwt): consistent AUTO_REGISTER on cached sentinel; clean up race orphans Addresses Greptile review on PR #25570 cherry-pick. 1. Inconsistent AUTO_REGISTER when __NO_MAPPING__ sentinel is cached: The cached-sentinel branch silently returned None when prisma_client was None, while the fresh path raised HTTP 500 under the same config. Same request, different access-control outcome depending on cache state. Both paths now raise the same 500. 2. Orphaned virtual keys from race-condition losers: On unique-constraint conflict, generate_key_helper_fn had already persisted an unrestricted virtual key in LiteLLM_VerificationToken with the cleartext in request memory. Under sustained concurrency these accumulated indefinitely. The loser now deletes its orphan before falling back to the winner's mapping; failure to delete is logged but does not fail the request. Also corrects a latent FK bug surfaced while fixing #2: the mapping row was storing the plaintext key in LiteLLM_JWTKeyMapping.token, but that column FKs to the hashed LiteLLM_VerificationToken.token — now hashed at the call site. Tests: - updated test_auto_register_creates_key_and_mapping to assert the hashed token is stored, not the plaintext - updated test_auto_register_race_condition_unique_conflict to assert the orphan is deleted with the correct hashed token - added test_auto_register_raises_500_when_sentinel_cached_and_no_db - added test_auto_register_race_conflict_tolerates_delete_failure Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jwt): close REJECT bypass when JWT omits the configured claim field A JWT presented without the configured `virtual_key_claim_field` previously returned None at the `claim_value is None` guard before the `unregistered_jwt_client_behavior` check ran. A caller who knows the configured claim-field name could bypass REJECT by simply omitting that field and falling through to team-based JWT auth. Apply the no-match policy on a missing claim: - REJECT → 403 - AUTO_REGISTER → 403 (no stable identity to map; refuse rather than create a sentinel-keyed record) - FALLBACK_TEAM_MAPPING → return None (unchanged, backward-compatible) Adds three tests covering each branch of the missing-claim path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jwt): AUTO_REGISTER inherits team_id so keys are bounded by team limits Auto-registered virtual keys were created with no team, model, route, rate, or budget constraints — broader access than the standard team-based JWT auth path the same client would have taken. Under AUTO_REGISTER, resolve the team_id from the JWT (via the operator-configured team_id_jwt_field / team_id_default) and stamp it on the new key. Downstream auth then applies the team's budget/models/tpm/rpm/allowed_routes via the existing virtual-key flow. Policy when team_id_jwt_field is configured: - JWT carries team claim → stamp resolved team_id - JWT lacks claim + team_id_default set → stamp default - JWT lacks claim + no default → 403 (refuse to create an unbounded key) When neither team_id_jwt_field nor team_id_default is configured, the operator has explicitly opted out of team-based limits — the auto-created key has no team_id (matches what team-auth would do in the same config). Adds 4 tests covering each branch. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jwt): make AUTO_REGISTER functional in prod; raise on missing winner Two correctness fixes flagged by Greptile on the AUTO_REGISTER path: 1. generate_key_helper_fn was called without table_name="key". Without that, the helper falls into the user-upsert branch (table_name in (None, "user")) and tries to insert into LiteLLM_UserTable with user_id=None, which hits the NOT NULL @id constraint. AUTO_REGISTER would never have succeeded in production. Now passes table_name="key" explicitly, matching the /key/generate caller. 2. When the race loser refetches the winner's mapping and gets None (winner row concurrently deleted), the previous code returned None — and the caller in _resolve_jwt_to_virtual_key then fell through to less- restrictive team-based JWT auth, silently bypassing the configured AUTO_REGISTER policy. Now raises HTTP 503 so the caller retries against a stable state rather than getting unintended fallback access. Adds one test for the 503 winner-vanishes path. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(jwt): defer AUTO_REGISTER until JWT policy is enforced by auth_builder Closes the JWT policy bypass on the AUTO_REGISTER path flagged by veria-ai. Before: when unregistered_jwt_client_behavior=auto_register and the JWT's claim was unmapped, _resolve_jwt_to_virtual_key validated the JWT signature and then immediately created a virtual key + mapping. JWTAuthManager.auth_builder never ran for the first request (the new key short-circuited the team-auth path), and every subsequent request hit the cached mapping — so custom_validate, RBAC, scope_mappings, and user_allowed_email_domain were never enforced for auto-registered clients. After: _resolve_jwt_to_virtual_key returns a _PendingAutoRegister signal instead of creating the key. The caller in _user_api_key_auth_builder runs JWTAuthManager.auth_builder, then — only on a validated, policy-passing result — calls _auto_register_jwt_mapping with the team_id / user_id from that result. The created key inherits team + user limits from the validated identity, and future cache hits load that already-policy-checked key. Also drops the interim _resolve_inherited_team_id helper that pulled team_id from raw JWT claims — same bypass risk; team_id now comes exclusively from auth_builder. Tests: - Rewrote two existing tests to assert _resolve_jwt_to_virtual_key returns _PendingAutoRegister (no key created yet) for both the fresh-DB-miss and stale-sentinel branches - Added a contract test that _auto_register_jwt_mapping stamps the validated team_id/user_id onto generate_key_helper_fn - Removed four stale team-binding tests that exercised the prior raw-claim helper Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * Update user_api_key_auth.py * fix(jwt): cache proxy-admin AUTO_REGISTER path to avoid repeated DB lookups Cache-miss regression introduced by the deferred-auto-register refactor: when a JWT under AUTO_REGISTER resolved to a proxy admin, the is_proxy_admin early-return in _user_api_key_auth_builder ran before the pending auto-register cache-write block. Result: no cache entry, so every subsequent proxy-admin request re-queried get_jwt_key_mapping_object indefinitely. Fix: write a __JWT_PROXY_ADMIN__ sentinel to user_api_key_cache before the early return when a pending auto-register existed. _resolve_jwt_to_virtual_key treats that sentinel as "skip mapping, fall through to auth_builder", so future requests from the same JWT identity hit the cache instead of the DB. auth_builder still runs full JWT policy on every request — only the mapping DB lookup is short-circuited. Adds one test asserting the sentinel cache-hit returns None without hitting prisma_client.db.litellm_jwtkeymapping.find_first. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(proxy): stamp org context on JWT auto-registered keys AUTO_REGISTER keys were created with team_id and user_id only, so org budget checks were skipped after switching to the key-scoped path. Co-authored-by: Cursor <cursoragent@cursor.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-04 19:00:36 -07:00
ryan-crabbe-berri	770fff7058	test(proxy): stop running real-DB tests in GitHub Actions unit jobs (#29700 ) * test(proxy): stop running real-DB tests in GitHub Actions unit jobs GitHub Actions unit jobs were spinning up a Postgres service container, but the only active tests that touched it either used the DB incidentally (a cargo-culted prisma_client.connect()) or were genuine integration tests mislabeled as unit. Mock the incidental ones so the proxy-db job needs no container, and move the tests that genuinely need a database (proxy management behavior, master-key-not-persisted, schema-migration sync) to CircleCI, which is already the real-infrastructure lane. * test(proxy): restore no-unexpected-startup-writes canary in master-key test Greptile noted the hash-match assertion no longer catches other unexpected startup writes (a default key, a rotation artifact). The CircleCI job gives each run a fresh DB, so a clean startup must leave the table empty; add that canary back alongside the precise master-key assertion.	2026-06-04 14:56:02 -07:00
yuneng-jiang	1dbf46665e	test: make custom_tokenizer proxy tests hermetic (#29643 ) test_custom_tokenizer_bug.py loaded Xenova/llama-3-tokenizer from HuggingFace Hub at test time, so it flaked on shared CI runners whenever HF returned 429 Too Many Requests; the surfaced LocalEntryNotFoundError made it look like a connectivity bug. Rewrite the suite to mock the one network boundary (litellm.utils.Tokenizer.from_pretrained) while running the proxy's real extraction-and-selection path. The regression test now asserts the configured identifier from model_info.custom_tokenizer actually reaches from_pretrained and that the response reports the huggingface tokenizer, which the previous llama-3-named test could not distinguish from the default path. A control test pins the no-custom-tokenizer case to the OpenAI tokenizer with from_pretrained asserted unused. Verified by reintroducing the original bug (model_info left unpopulated from the deployment): the regression test fails (from_pretrained called 0 times) while the control stays green.	2026-06-04 12:51:37 -07:00
Sameer Kankute	c7ab9adde5	Litellm oss staging 030626 (#29578 ) * Fix incorrect agent API request example payload structure (#29556) * fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs (#29427) * fix(otel): add litellm_metadata fallback in _get_span_context and _end_proxy_span_from_kwargs On /v1/messages and other LITELLM_METADATA_ROUTES, the parent OTel span is stored in litellm_params['litellm_metadata'] instead of litellm_params['metadata']. When the request body contains a native 'metadata' field (e.g. Anthropic's {"user_id": "..."}), litellm_params['metadata'] gets overwritten and the parent span is lost, producing orphan root spans with a different trace_id. Add fallback checks to litellm_metadata in: - _get_span_context(): so child spans find the correct parent - _end_proxy_span_from_kwargs(): so the proxy span gets closed Fixes: https://github.com/BerriAI/litellm/issues/27934 * test(otel): tighten assertions per Greptile review - test_span_context_metadata_takes_priority: assert litellm_metadata span is never accessed, proving metadata takes priority - test_span_context_no_parent_when_neither_has_span: assert both ctx and detected_span are None --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * fix: remove premature end-user budget check from get_end_user_object (#29420) * fix(proxy): remove premature end-user budget check from get_end_user_object Problem: - `_check_end_user_budget()` was called inside `get_end_user_object()` - This caused budget checks to run BEFORE `skip_budget_checks` could be evaluated - Zero-cost models (e.g., local vLLM) were incorrectly blocked when end-users exceeded their budget, even though they should bypass budget checks Solution: - Remove `_check_end_user_budget()` calls from `get_end_user_object()` - Budget enforcement now happens exclusively in `common_checks()` where `skip_budget_checks` context is available - `get_end_user_object()` keeps `route` as optional in function parameter for backwards compatibility and future implementation. * refactor(tests): update budget enforcement tests to reflect changes in get_end_user_object - test_get_end_user_object() verifies data fetching - test_check_end_user_budget() verifies enforcement - test_budget_enforcement_blocks_over_budget_users() integrates _check_end_user_budget() - test_resolve_end_user_reraises_budget_exceeded() is now test_resolve_end_user since no budget exceeded is thrown in get_end_user_object() * Gemini /images/generate and /images/edits billing fixes + add support for size and aspect ratio params (#29534) * Fix Gemini image config mapping * Address Gemini image config review * Format Gemini image generation transform * Fix Gemini image token usage logging * Share Gemini image request helpers * Fix Gemini Imagen model routing * Fixes as per self code review * Fixes per internal code review * Stop gating Imagen imageSize forwarding * Document Gemini image size mapping source * chore: retrigger lint * Clarify Gemini candidate count precedence * Add Inception provider (#29522) * add inception as provider (chat, fim) * linting * seperate test suite for chat and fim * fix test coverage * fix: model hub custom pricing model info (#29293) * Opik user auth key metadata extractors (#28397) * fix: enhance Opik metadata extraction to include user API key auth context fixed after refactoring to extractor logic * test: add unit tests for OPik metadata extraction logic * fix: enhance extract_opik_metadata function to prioritize metadata sources for improved accuracy * fix(ci): clarified comments and edited unit tests * test: add unit tests for OPik metadata extraction with auth and requester overrides * fix(ui): replace fixed favicon.ico with current api get /get_favicon (#29532) Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar> * fix(vertex/gemini): keep tool_call reference when a text-only assistant message follows (#29561) `_gemini_convert_messages_with_history` tracks `last_message_with_tool_calls` so a following tool result can be matched back to its tool call. The assignment was inside a branch guarded by `assistant_msg.get("tool_calls", []) is not None`, which is also True for a text-only assistant message (an empty list is not None). As a result, an assistant message with no tool calls that appears between a tool call and its tool result overwrote the reference, and conversion failed with: Exception: Missing corresponding tool call for tool response message. This shape is common: a model emits a short narration/assistant message after a tool call before the tool result is appended. Only update `last_message_with_tool_calls` when the assistant message actually carries tool_calls (or a function_call). Adds a regression test. Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> * Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models (#28572) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add 1-hour cache write pricing for EU/AU/JP Bedrock Anthropic models The 1-hour prompt-cache write tier (`cache_creation_input_token_cost_above_1hr`) was added to the us./global. variants of the Claude 4.5/4.6/4.7 family on Bedrock, but the eu./au./jp. cross-region inference profiles were left without it. AWS Bedrock pricing applies the same +10% regional premium across all geo profiles, so eu./au./jp. should carry the same 1-hour rates as us. (1.6x the 5-minute regional rate). Without these fields, cost tracking on EU/AU/JP Bedrock 1-hour-TTL prompt caching falls back to the 5-minute write rate and undercounts spend by ~60% for European, Australian, and Japanese tenants. Adds the 1-hour tier (and Sonnet 4.5's long-context >200K tier where AWS publishes one) to 14 regional Bedrock entries in both `model_prices_and_context_window.json` and the bundled `model_prices_and_context_window_backup.json`: - eu./au. Opus 4.6 ($11.00 / MTok) - eu./au. Opus 4.7 ($11.00 / MTok) - eu./au./jp. Sonnet 4.6 ($6.60 / MTok) - eu./au./jp. Sonnet 4.5 ($6.60 / MTok regular, $13.20 / MTok LC) - eu./au./jp. Haiku 4.5 ($2.20 / MTok) Also extends `tests/test_litellm/test_bedrock_anthropic_1hr_cache_pricing.py` with a `REGIONAL_EXPECTED` parametrized block covering all 13 new entries plus the existing 1.6x ratio invariant. Note: `eu.anthropic.claude-opus-4-5-20251101-v1:0` carries the wrong 5m rate today (base 6.25e-06 instead of regional 6.875e-06), which would break the 1.6x ratio check. It is intentionally left out of this PR so the scope stays "1-hour cache tier addition" — a separate follow-up should correct the EU 5m rates for Opus 4.5. --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * Add 1-hour cache write pricing tier for Vertex AI Anthropic models (#28569) * fix(thinking): handle None thinking param in is_thinking_enabled (#28598) Squash-merged by litellm-agent from Terrajlz's PR. * feat(helm): support tpl rendering in podAnnotations (#28609) Squash-merged by litellm-agent from devauxbr's PR. * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575) * Forward custom_llm_provider through the Responses API bridge (Fixes #28505) When a Chat Completions request to a GPT-5.4+ model contains both `tools` and `reasoning_effort`, `completion()` auto-routes through `responses_api_bridge`. The bridge handler called `litellm.responses()` / `litellm.aresponses()` without forwarding the already-resolved `custom_llm_provider`, so the downstream call re-invoked `get_llm_provider()` with `custom_llm_provider=None` and stripped a second provider prefix from a `provider/provider/model` deployment string. For a deployment configured as `openai/openai/openai/gpt-5.5`, the bridge flow sent `openai/gpt-5.5` to the upstream API instead of the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce model-name allow-lists rejected this as `key_model_access_denied`. Fix: pass the locally-resolved `custom_llm_provider` into both the sync `responses()` and async `aresponses()` calls so the downstream `_resolve_model_provider_for_responses` sees an explicit provider and skips the second prefix-strip. New regression test `tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py` pins both call sites: each must forward `custom_llm_provider`. * fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg Greptile flagged that the previous patch passed custom_llm_provider as an explicit kwarg to responses()/aresponses() while request_data already carried it via the spread of sanitized_litellm_params, which would raise TypeError: got multiple values for keyword argument on every real bridge call. Switches to assigning request_data['custom_llm_provider'] before the call so the resolved provider wins over whatever sanitized_litellm_params spread in, without duplicating the kwarg. Updates the regression test to seed request_data with a sentinel custom_llm_provider so it actually exercises the overwrite path (the previous test mocked transform_request with a minimal dict and never hit the conflict). * chore: trigger shin-agent re-eval on retargeted staging base * chore: trigger shin-agent re-eval against updated Greptile state * Add 1-hour cache write pricing tier for Vertex AI Anthropic models GCP Vertex AI publishes a separate 1-hour cache write column for the Claude family (1.6x the 5-minute write rate, matching the documented Bedrock ratio). LiteLLM's Vertex AI Anthropic entries only carry the 5-minute tier, so any request that uses `cache_control: {"ttl": "1h"}` on Vertex AI Claude is undercounted in cost tracking by ~60%. The runtime side already supports the 1-hour tier — `VertexAIAnthropicConfig` extends `AnthropicConfig`, populating `ephemeral_1h_input_tokens`, and `_calculate_cache_creation_cost` reads `cache_creation_input_token_cost_above_1hr`. Only the price registry was missing data. Adds the field to 19 vertex_ai/claude-* entries across both `model_prices_and_context_window.json` and the bundled `model_prices_and_context_window_backup.json`: - Haiku 4.5 ($1.25 -> $2.00 / MTok) - Sonnet 3.7 / 4 / 4.5 / 4.6 ($3.75 -> $6.00 / MTok) - Opus 4.5 / 4.6 / 4.7 ($6.25 -> $10.00 / MTok) - Opus 4 / 4.1 ($18.75 -> $30.00 / MTok) Adds `tests/test_litellm/test_vertex_anthropic_1hr_cache_pricing.py` mirroring the Bedrock equivalent — pins each (5m, 1h) pair per model and asserts the 1.6x ratio across the family. Fixes #27781. --------- Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: Sameer Kankute <sameer@berri.ai> * Fix Gemini multimodal function responses (#29325) Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> * address greptile review: add _transform_image_usage method and model-map supports_image_size flag - Add _transform_image_usage instance method to GoogleImageGenConfig that delegates to transform_gemini_image_usage, fixing the regression test - Replace hardcoded "2.5-flash" string check in supports_gemini_image_size with a get_model_info lookup on supports_image_size (default true) - Add supports_image_size: false to all gemini-2.5-flash model entries in model_prices_and_context_window.json so capability is controlled via the model map rather than embedded in code * fix test failures: schema validation, mypy type, model info plumbing, pricing test - Add supports_image_size to ModelInfoBase TypedDict so get_model_info surfaces it - Pass supports_image_size through _get_model_info_helper constructor call - Fix supports_gemini_image_size to use value is not False (None means unset, defaults to True) - Add supports_image_size to JSON schema in test_aaamodel_prices_and_context_window_json_is_valid - Correct gemini-3.1-flash-lite pricing assertions in test to match JSON values * Add Azure AI Kimi K2.6 metadata (#27052) * Add Azure AI Kimi K2.6 metadata * Scope Kimi metadata test cost map setup * fall back to substring check for models not in model_prices_and_context_window.json Models like gemini-2.5-flash-image-preview are not in the pricing JSON, so get_model_info raises. Fall back to "2.5-flash" not in model when the JSON has no explicit supports_image_size entry for the model. * fix(inception): don't forward global litellm.api_key to Inception FIM Match the Inception chat config: resolve only an Inception-specific key (param, litellm.inception_key, or INCEPTION_API_KEY) for the text-completion FIM path. The global litellm.api_key (often an OpenAI key) was both leaking to api.inceptionlabs.ai and taking precedence over the configured Inception key when set. * fix(auth): enforce end-user budget on custom-auth path that skips common_checks get_end_user_object() no longer raises BudgetExceededError, so custom-auth deployments with custom_auth_run_common_checks unset (which skip the centralized common_checks gate) stopped enforcing the end-user budget, letting an over-budget end user keep making requests. Re-enforce the budget in _run_post_custom_auth_checks on that path. --------- Signed-off-by: José Luis Di Biase <josx@interorganic.com.ar> Co-authored-by: Isha <72744901+IshaMeera@users.noreply.github.com> Co-authored-by: aneeshsangvikar <aneeshsangvikar@fiddler.ai> Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: Aneesh-Fiddler <aneeshfiddler@gmail.com> Co-authored-by: Suleiman Elkhoury <108065141+suleimanelkhoury@users.noreply.github.com> Co-authored-by: Dmitriy Alergant <93501479+DmitriyAlergant@users.noreply.github.com> Co-authored-by: Yanis Miraoui <yanis.miraoui19@imperial.ac.uk> Co-authored-by: Lovro Seder <vrovro@gmail.com> Co-authored-by: Thomas Mildner <12685945+Thomas-Mildner@users.noreply.github.com> Co-authored-by: José Luis Di Biase <josx@interorganic.com.ar> Co-authored-by: Lai Quang Huy <64073540+1qh@users.noreply.github.com> Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com> Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com> Co-authored-by: Terrajlz <info@jouleselectrictech.com> Co-authored-by: Bruno Devaux <devaux.br@gmail.com> Co-authored-by: ZHONG Ziwen <67355585+zzw-math@users.noreply.github.com> Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>	2026-06-03 11:01:51 -07:00
Mateo Wang	6d6eda8101	[internal copy of #28008 ] Support MCP OAuth passthrough and issuer-scoped JWT auth (#28356 ) * fix(proxy): point /metrics 401 at the opt-out flag Operators upgrading past `35bbca60b0` (which made /metrics auth default-on) see "Malformed API Key passed in. Ensure Key has 'Bearer ' prefix." with no hint that litellm_settings.require_auth_for_metrics_endpoint: false restores the previous unauthenticated behavior. Append that discovery hint to the existing 401 body so a Prometheus scraper that breaks after upgrade has a clear migration path. No behavior change. * fix(proxy): bound budget reservation per request instead of pinning to remaining headroom reserve_budget_for_request fell back to reserving the entire remaining team/key/user headroom whenever a request omitted max_tokens, which pinned the spend counter at max_budget for the duration of the in-flight request and false-positive-blocked every concurrent or back-to-back request until the success callback reconciled. Surfaced as an integration-test team being budget-blocked at its $2000 cap while DB spend was $0.144. Switch the missing-max_tokens path to a fixed default of 16384 output tokens (mirrors parallel_request_limiter_v3's DEFAULT_MAX_TOKENS_ESTIMATE precedent), and clamp explicit max_tokens at the model's max_output_tokens for reservation accounting only. The outbound request body is unchanged, so providers see whatever the caller actually sent; only the local integer used to compute reservation cost is bounded. This also prevents a hostile max_tokens=999999999 from inflating one request's reservation up to the entire team headroom. For Opus 4.7 (output $25/M, max_output 128K) on a $2000 budget the worst-case per-request reservation drops from "everything left" to $3.20, raising admittable concurrency from 1 to ~625. * fix(proxy): reserve per-image cost for image-generation requests Image-generation routes (dall-e-3, flux, etc.) have no per-token output cost so they fell through to the no-reservation read-time-only path. Concurrent image requests against a depleted budget could all pass common_checks (counter exactly at max_budget passes the strict-`>` gate) and reach the provider before reconciliation caught up. Add per-image reservation in _estimate_request_max_cost_for_model: when the model has a per-image cost field, reserve `n × cost_per_image` upfront. The atomic counter increment serializes concurrent admissions, so the second request sees the post-first-reservation counter and raises BudgetExceededError instead of silently leaking through. Both `output_cost_per_image` and `input_cost_per_image` are honored — naming is inconsistent across providers (OpenAI dall-e-3 uses input_cost_per_image, aiml/dall-e-3 uses output_cost_per_image for the same per-generated-image price). Per-pixel pricing (DALL-E 2 size variants) and TTS/STT routes still fall through to read-time enforcement; those are follow-ups. * fix(proxy): gate image-gen reservation strictly on model mode The previous detection treated any model with input_cost_per_image or output_cost_per_image as image generation. Several chat and embedding models carry those fields to price multimodal vision input, not generated images: - gemini-3.1-pro-preview (mode=chat) has output_cost_per_image=0.00012 alongside input/output token pricing. - azure/gpt-realtime-* (mode=chat) has input_cost_per_image=5e-6. - amazon.titan-embed-image-v1 (mode=embedding) has input_cost_per_image=6e-5. For these models the image-gen branch fired first and reserved a fraction of a cent per request, short-circuiting the token-priced path entirely. Long Gemini chats reserved 1 × $0.00012 instead of the true token cost. Gate strictly on mode in {"image_generation", "image_edit"}. All 197 real image_generation entries and all 31 image_edit entries (Flux Kontext, Stability inpaint/outpaint, etc.) carry the right mode, so the field-presence fallback was unnecessary. Adds regression tests for the chat-model-with-image-cost-field case and for image_edit reservation. * build(packaging): relax core runtime pins to ranges Backport of #27241 onto litellm_1.84.0rc2. The 12 entries in `[project.dependencies]` were exact `==` pins, a side effect of the Poetry -> uv migration. This forces every downstream package that lists litellm as a dependency to downgrade common runtime libraries (openai, pydantic, aiohttp, click, jsonschema, ...) to the exact versions we ship. Switch to lower-bounded ranges with upper bounds where the upstream package is pre-1.0 or has a known breaking-major-version policy. Reproducibility for our Docker proxy and CI continues to come from `uv.lock`, which is regenerated here as a metadata-only diff. Conflict resolution vs upstream merge: - The upstream merge commit also surfaced unrelated context entries (nvidia-riva-client, soundfile/stt-nvidia-riva extra) that exist in staging but not in rc2. Those are not part of #27241's intent and were dropped from the resolution; the rc2 uv.lock keeps its existing entry set, only the 12 specifier strings changed. - `uv lock --check` passes (392 packages resolved, no drift). * build(packaging): raise jinja2 floor to 3.1.6 Our `uv.lock` already resolves jinja2 to 3.1.6, so Docker / CI installs get that version. The `pyproject.toml` floor was lagging at 3.1.0, which means downstream consumers using `--resolution=lowest-direct` or older constraint files can land on 3.1.0-3.1.5 instead of the version we actually test against. Aligns the declared floor with the resolved version so external installers see the same baseline our test matrix exercises. `uv lock` diff is metadata-only (no resolved-version drift). * fix(mcp): forward extra_headers for OpenAPI MCP tools OpenAPI-generated tools only applied static closure headers and BYOK Authorization via ContextVar. Copy MCPServer.extra_headers from the incoming MCP request into _request_extra_headers (set in server.py before local tool dispatch), merge in openapi_to_mcp_generator via a small helper. OAuth2 M2M: do not forward caller Authorization from raw_headers (same rule as _prepare_mcp_server_headers for managed MCP). Adds TestRequestExtraHeaders and clarifies mcp_server_manager registration comment. Fixes #26794 Co-authored-by: Cursor <cursoragent@cursor.com> * refactor(mcp): access has_client_credentials on MCPServer directly Greptile: getattr default was redundant; property exists on MCPServer and mcp_server is non-None inside the extra_headers forwarding block. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(mcp): static headers win over forwarded headers in OpenAPI MCP Match the existing MCP invariant in merge_mcp_headers and the managed MCP path: operator-configured static headers always override caller-forwarded headers on name conflict, with case-insensitive comparison so different casing cannot bypass the precedence. _request_auth_header (BYOK) still overrides Authorization last. Addresses Veria review on PR #27383. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(proxy): always merge caller-supplied tags into request metadata Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests. * docs(proxy): refresh stale comments referencing removed tag strip The tag-strip block was removed in the parent commit but two surrounding comments still referenced "tags without opt-in" and "runs AFTER the strip". Update them to describe the remaining user_api_key_* and _pipeline_managed_guardrails strip that the snapshot/merge ordering actually protects against. * chore: reject bare str at file-input sinks to prevent local-file read (#27762) Cherry-pick of #27762 onto litellm_1.84.0rc2. * chore: reject bare str at file-input sinks to prevent local-file read (#27667) * fix: use os.PathLike in ocr sink and check truthy reasoningSummary for bridge - ocr/main.py: widen Path check to os.PathLike for consistency with other sinks - main.py: bridge condition checks truthiness of reasoning_summary, not just None * fix: remove unused pathlib.Path import in ocr/main.py Co-authored-by: yuneng-jiang <yuneng@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> * Strip SERVER_ROOT_PATH before lazy-feature prefix match LazyFeatureMiddleware compared the raw scope path against registered prefixes (e.g. /policies), so requests under a server root path like /api/v1/policies/... never matched, the feature never loaded, and the endpoint returned 404. Strip the configured root path before matching, normalizing trailing slashes and enforcing a component boundary so /api does not falsely match /apiv2. * Cache normalized SERVER_ROOT_PATH at middleware init SERVER_ROOT_PATH is a process-startup env var. Read it once in __init__ instead of calling get_server_root_path() + rstrip on every request that arrives before all lazy features have loaded. * chore(proxy): backport /key/regenerate ownership-rebind + premium-gate guards (#27793) Backport of #27793 onto litellm_1.84.0rc2. A non-admin caller could rebind their own key's user_id via /key/regenerate. _execute_virtual_key_regeneration had org/team guards but no user_id guard, and prepare_key_update_data did not strip the field — it survived model_dump(exclude_unset=True) into the Prisma update. On the next request, _return_user_api_key_auth_obj resolved the rebound user_id against litellm_usertable and returned PROXY_ADMIN whenever the target row's user_role was admin. /key/update had the equivalent guard inline at _validate_update_key_data; extract it to a shared helper _validate_caller_can_change_key_ownership and call from both /key/update and _execute_virtual_key_regeneration. Also tighten the premium gate that allowed the master-key rotation branch to skip the enterprise check. The previous predicate was a field-presence test, not an identity check. Verify the caller actually holds the master key via _is_master_key before allowing the non-premium path. Block explicit-null user_id and empty-string user_id as removal attempts; both 403-reject for non-admin callers. * fix(proxy): expose db status on public /health/readiness Backport of #27866 onto litellm_1.84.0rc2. External readiness probes consumed the legacy detailed payload's `db` field to drive alerting and pod-rotation decisions. Stripping the body to {"status": "healthy"} broke those probes silently — the HTTP code still flipped to 503, but probes checking body.db == "connected" treated the response as healthy. Add `db` back to the unauthenticated payload. The rest of the diagnostic fields (litellm_version, callbacks, cache, log_level) stay behind /health/readiness/details so the recon-leak gate from #26912 holds. Values match the legacy contract: "connected", "disconnected", "Not connected". The 503-on-DB-disconnect behavior from LIT-2607 is preserved. * fix(ui): fetch version + debug flag from /health/readiness/details The proxy moved `litellm_version`, `is_detailed_debug`, and other diagnostic fields off the public `/health/readiness` payload behind an auth-gated `/health/readiness/details` endpoint. The navbar version tag and the detailed-debug-mode banner stopped working because they were still reading those fields from the unauthed response, which no longer contains them. Replace `useHealthReadiness` with a `useHealthReadinessDetails` hook that takes an `accessToken` argument and sends a Bearer header to the auth-gated endpoint. The hook stays disabled while `accessToken` is falsy, so the navbar can keep rendering on the public model hub (where the token is null) without triggering an auth redirect or a 401-loop. * fix(ui): disable retries on readiness/details + cover token forwarding Two small follow-ups on the readiness/details migration: - Set `retry: false` on the query. The payload feeds a passive navbar tag and a debug banner; a 401 from an expired token shouldn't fan out into three retries against the proxy. - Add navbar specs that assert the `accessToken` prop is forwarded into the hook (matches the DebugWarningBanner spec). Without this, the navbar could silently regress to passing `undefined` and the existing tests wouldn't catch it. * chore: update Next.js build artifacts (2026-05-14 03:52 UTC, node v20.20.2) * Merge pull request #27898 from stuxf/chore/banned-params-extra-body-cover chore(proxy): cover extra_body + azure_ad_token in banned-params check (cherry picked from commit `a6a9d8edf0`) * Merge pull request #27801 from stuxf/chore/get-instance-fn-runtime-s3-gate chore(proxy): refuse remote-URL instance-fn loads outside config-file path (cherry picked from commit `e3e5209f51`) * fix: block client-side pricing injection via request body Authenticated clients could supply CustomPricingLiteLLMParams fields (input_cost_per_token, output_cost_per_token, etc.) in the request body. These were forwarded to register_model() in main.py, permanently mutating the shared global litellm.model_cost dict for all users on the instance. Adds all CustomPricingLiteLLMParams fields to _BANNED_REQUEST_BODY_PARAMS so is_request_body_safe() rejects them before they reach completion(). New pricing fields added to CustomPricingLiteLLMParams are auto-covered. Admin opt-in via allow_client_side_credentials or configurable_clientside_auth_params still works as before. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF fields in RAG ingest vector_store config aws_sts_endpoint, aws_web_identity_token, and aws_bedrock_runtime_endpoint in ingest_options.vector_store were passed directly to the Bedrock ingestion class, which reads them into boto3 STS client construction. Any authenticated caller could redirect AssumeRole calls to an attacker-controlled server, leaking the proxy's instance profile credentials. Calls is_request_body_safe() on ingest_options["vector_store"] before forwarding to litellm.aingest(). Same banned-params list and admin opt-in escape hatch (allow_client_side_credentials) as the /chat/completions path. ValueError from the safety check is caught and re-raised as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: harden /key/update authorization checks (#27878) * fix: patch Host-header auth bypass in get_request_route Starlette reconstructs request.url from the Host header. A malformed Host like `localhost/?x=1` causes Starlette to build the full URL as `http://localhost/?x=1/health`, which url-parses to path="/". Since "/" is in LiteLLMRoutes.public_routes, all protected routes became reachable without authentication. Fix: read scope["path"] (set by uvicorn from the HTTP request line, not derivable from headers) instead of request.url.path. Sub-path deployments are handled via scope["app_root_path"] / scope["root_path"], mirroring Starlette's own base_url construction logic. Affected variants confirmed fixed: Host: localhost/?x=1 Host: localhost:4000/?x=1 Host: localhost/#test Host: localhost:4000/#test Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * style: reduce comments in route fix Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block credential fields in RAG ingest vector_store options Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.) in ingest_options.vector_store are now rejected at the API boundary with a 400 error. Credentials must be configured server-side. Previously any authenticated user could supply a vertex_credentials dict with type=external_account pointing credential_source.file at an arbitrary path (e.g. /proc/1/environ) and token_url at an attacker-controlled server. google-auth's identity_pool.Credentials refresh() would read the file and POST its contents to the attacker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block /key/update self-escalation by assigned users Non-admin users who were assigned a key (created_by != caller) could update any non-budget field — models, rpm_limit, guardrails, etc. — without admin authorization, allowing privilege self-escalation. Gate: only the key creator (created_by == caller) may edit their own key without admin check; budget changes always require admin regardless of creator status. All other callers must pass _check_key_admin_access. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block user-controlled api_base in RAG ingest vector_store options A user-supplied api_base in ingest_options.vector_store caused the server to forward its configured provider credentials (Gemini, OpenAI) to an attacker-controlled endpoint via SSRF. Add api_base to the blocked credential params set alongside api_key and the existing credential fields. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check Any authenticated internal_user could POST arbitrary provider config (aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have the server forward its credentials to an attacker-controlled endpoint. - Gate the endpoint on PROXY_ADMIN role (403 for all other roles) - Call is_request_body_safe() to reject banned params even for admins - Convert ValueError from safety check to HTTP 400 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: apply banned-param check to /utils/transform_request Without is_request_body_safe(), any authenticated user could pass aws_sts_endpoint, api_base, or aws_web_identity_token to /utils/transform_request and have the server forward its configured provider credentials to an attacker-controlled endpoint during SDK credential resolution. Applies the same banned-param blocklist already used by LLM endpoints. Endpoint remains accessible to all authenticated users. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter Any frontmatter key not in ["model","input","output"] flowed into optional_params and was merged into the LLM call data dict, bypassing is_request_body_safe. An attacker with any bearer key could set api_base in YAML to redirect the outbound LLM request — including the provider API key — to an attacker-controlled host. Fix: call is_request_body_safe on the constructed data dict after optional_params are merged, before invoking ProxyBaseLLMRequestProcessing. ValueError from the banned-param check is surfaced as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Update litellm/proxy/rag_endpoints/endpoints.py Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * fix: coerce nested config strings before banned-param check _NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently skipped litellm_embedding_config when delivered as a JSON string via multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.) nested inside the stringified value were invisible to is_request_body_safe. _NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: replace substring match with prefix match in is_llm_api_route mapped_pass_through_routes used `_llm_passthrough_route in route` (substring) so any admin-only path whose URL contained a provider name (openai, anthropic, azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the admin gate in non_proxy_admin_allowed_routes_check. Confirmed live: non-admin key could GET /credentials/by_name/openai (read masked provider API key) and DELETE /credentials/openai (delete credential). Fix: use exact match or startswith(prefix + "/") — the same pattern used everywhere else in RouteChecks — so only routes that actually start with a passthrough prefix are allowed through. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize PR #27878 test failures - key_management_endpoints: extend can_skip_admin_check to team keys so team members with /key/update permission can update non-budget fields. can_team_member_execute_key_management_endpoint already validates team membership + permission and raises if unauthorized; reaching the admin check on a team key means the caller was authorized. - test: set created_by on mock key in test_update_key_non_budget_fields_allowed_for_internal_user so caller_is_creator resolves correctly (MagicMock default ≠ user_id). - auth_utils.get_request_route: guard against non-dict request.scope (e.g. MagicMock in unit tests) to prevent a MagicMock leaking into UserAPIKeyAuth.request_route and failing Pydantic validation. - ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard in test-unit-proxy-db.yml to satisfy the shard-coverage check. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(lint): add explicit str() cast in get_request_route for MyPy scope.get() returns Any\|None which MyPy cannot coerce to str implicitly. Wrap both scope.get() calls in str() to satisfy the type checker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: guard bare-/ root_path strip + make total_spend migration idempotent auth_utils.get_request_route: when Starlette sets scope["app_root_path"] to "/" (e.g. behind some middleware), the old stripping logic would remove the leading slash from every path ("/team/new" → "team/new"), breaking route matching and causing auth to misclassify protected routes. Skip stripping when root_path is bare "/". migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration is safe to replay when a prior partial run already created the column. Without this guard, prisma migrate deploy fails on CI DBs that were partially migrated, causing all subsequent DB operations (including /team/new) to 500. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: require creator still owns key for personal-key bypass in /key/update caller_is_creator now requires both created_by == caller AND user_id == caller. Previously checking only created_by let a demoted admin who originally created a key for another user continue editing non-budget fields on it after reassignment, bypassing _check_key_admin_access. Adds regression test: creator whose key was reassigned is blocked (403). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: extract auth checks to fix PLR0915 + broaden max_budget assertion internal_user_endpoints._update_single_user_helper exceeded 50 statements (PLR0915). Extract authorization checks into _check_user_update_authz helper to bring statement count under the limit. test_validate_max_budget: assert "negative" (substring of both the local "cannot be negative" and the CI "non-negative finite number" messages) so the test is stable regardless of which exact wording the function uses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * bump: version 0.4.71 → 0.4.72 * uv lock * feat(mcp): support OAuth passthrough discovery * fix(mcp): support OAuth browser auth * fix(mcp): refine upstream OAuth metadata fallback * feat(proxy): support issuer-scoped JWT auth * fix(mcp): validate oauth callback redirect sink * feat(proxy): support issuer-scoped JWT auth * test(mcp): align trusted proxy fixtures * style(mcp): satisfy black formatting * chore(ui): bump next to 16.2.6 * fix(mcp): address oauth passthrough review findings * test(mcp): split oauth passthrough regressions * fix(interactions): align openapi response fields * security: prevent forwarding litellm api keys to upstream mcp servers - Strip Authorization header from extra_headers for pass-through servers - Pass-through servers (auth_type=None with extra_headers: [Authorization]) must not receive the user's LiteLLM API key - Only OAuth2 M2M and pass-through servers skip Authorization header - Other headers (x-request-id, x-trace-id) are still forwarded normally - Fixes credential leakage / authentication bypass in MCP pass-through mode * fix(interactions): remove steps field not in google openapi spec The steps field was added but is not present in the current Google Interactions OpenAPI specification. Revert to using only the fields that are actually defined in the spec. * fix(mcp): forward Authorization in pass-through when x-litellm-api-key is admission Commit 3753970cc9 widened the Authorization strip to cover all is_oauth_passthrough servers — protecting against the LiteLLM admission key leaking upstream when the caller used Authorization for admission, but also silently stripping legitimate upstream OAuth bearers when the caller used x-litellm-api-key for admission. That broke transparent OAuth pass-through (EAI-506 V5/V6): standards- compliant MCP clients (OpenCode, Claude Code, mcp-inspector) complete PKCE against the upstream IdP and send the resulting token as plain Authorization: Bearer per the MCP spec — with the wider strip in place, that token never reaches the upstream and tools/list returns empty. Narrow the strip: skip Authorization for pass-through servers only when the caller did NOT supply x-litellm-api-key. When x-litellm-api-key is present, admission is unambiguous and Authorization is free to carry the upstream OAuth bearer. The original security guarantee is preserved — a client that sends only Authorization (no x-litellm-api-key) still has it stripped, so the LiteLLM key cannot leak upstream via that path. Tests: - new: forwards Authorization when x-litellm-api-key is present - new: still strips Authorization when only Authorization is present - existing pass-through + M2M tests unchanged Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(interactions): align status enum with openapi spec * fix(mcp,jwt): address greptile review concerns - Cache _get_agent_object_permission via user_api_key_cache (sentinel for no-permission rows) so MCP requests from agent keys don't hit the DB on every tool-list / tool-call. - Re-raise HTTPException in handle_sse_mcp so 401 + WWW-Authenticate challenges (and other HTTP errors) propagate to SSE clients instead of being swallowed as 500. - Normalise booleans in _validate_token_response so admin rules written as JSON-style "true" / "false" match upstream responses that return Python True / False. - Treat configured JWT issuer claim mappings as advisory: when a mapped field is absent or empty, leave the normalised claim unset instead of raising, matching the global litellm_jwtauth path. Co-authored-by: Claude <noreply@anthropic.com> * test: replace dall-e-3 with gpt-image-1 in health check and router tests (#27813) OpenAI returns 'The model dall-e-3 does not exist' for the test account, breaking test_openai_img_gen_health_check and test_image_generation. Switch to gpt-image-1, matching the existing TestOpenAIGPTImage1 pattern. (cherry picked from commit `aee58db880`) * fix(tests): drop dall-e-only test classes; route live image tests via gpt-image-1 Second wave of failures from the 2026-05-12 DALL-E shutdown: - tests/image_gen_tests/test_image_edits.py::TestOpenAIImageEditDallE2 and tests/image_gen_tests/test_image_generation.py::TestOpenAIDalle3 are explicitly named for the deprecated models and can't pass; remove. gpt-image-1 coverage already exists in sibling classes. - tests/local_testing/test_router.py image gen tests use dall-e-3 only as a routing example; swap to gpt-image-1. - tests/local_testing/test_custom_callback_input.py image_generation success/failure paths swapped to gpt-image-1. (cherry picked from commit `945b10ded4`) * test(fireworks): replace deprecated llama-v3p3-70b-instruct model Fireworks removed llama-v3p3-70b-instruct from serverless, so every live test using it now fails with NotFoundError ("Model not found, inaccessible, and/or not deployed"). Swap the 6 references (3 files) to the currently-served accounts/fireworks/models/deepseek-v3p1 — the canonical model in Fireworks' current docs examples and present in LiteLLM's cost map. test_get_model_params_fireworks_ai is a pure pricing-heuristic test (no network) asserting the >16b branch, so it uses llama-v3p1-70b- instruct instead to keep the "fireworks-ai-above-16b" assertion and branch coverage intact. (cherry picked from commit `39a1d438f2`) * test(fireworks): mock remaining live smoke tests test_completion_fireworks_ai and test_completion_cost_fireworks_ai made real Fireworks calls and broke whenever Fireworks rotated its serverless catalog (no externally-verifiable model list exists). They also asserted nothing — just printed. Mock the HTTP post and assert real behavior instead: the request is built with the right model/messages and the OpenAI-compatible response parses back; the cost path yields a non-zero cost against the local cost map. No network, no model dependency, stronger than the old smoke checks. (cherry picked from commit `b5db7ed37d`) * fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 (#28281) * fix(tests): replace shut-down gpt-4o-audio-preview with gpt-audio-1.5 OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so the live audio calls in test_stream_chunk_builder_openai_audio_output_usage and test_standard_logging_payload_audio now hard-fail with a model-not-found error on every PR. The error was not "openai-internal", so the except block swallowed it and execution fell through to an unbound completion/response (UnboundLocalError). Switch both tests to gpt-audio-1.5, OpenAI's recommended successor (GA, not deprecated, already present in the litellm cost map so the response_cost assertion still resolves). Also broaden the except to skip with the real error in the reason instead of crashing, so a transient upstream blip can't reintroduce the UnboundLocalError. * fix(tests): narrow audio-test skip to model-not-found, re-raise the rest Address review feedback: an unconditional skip on any exception would silently mask a litellm-internal regression in the audio path (broken param transformation, serialization, bad header) instead of failing CI. Skip only on the upstream-unavailable class (model_not_found / "does not exist" / openai-internal) and re-raise everything else, so genuine regressions still fail loudly. The UnboundLocalError is still fixed because the handler either skips or raises - it never falls through. * fix(tests): add budget_exceeded to expected Interaction status enum Staging added budget_exceeded to the Interaction OpenAPI status enum; the staging merge into this branch picked up the spec change but not the matching test update, so test_status_enum_values failed in CI. Align the test's expected list (exact-match by design) with the live spec. * fix(tests): mock HTTP fetch in test_img_url_token_counter The test parameterized a live third-party image URL (blog.purpureus.net) which now 404s, causing get_image_dimensions to fall through to its base64 decode path and crash with 'not enough values to unpack' on every PR run. Mock safe_get with a tiny 1x1 PNG so the URL branch is still exercised without any network dependency. * fix(tests): swap gpt-4o-audio-preview to gpt-audio-1.5 in test_gpt4o_audio OpenAI shut down gpt-4o-audio-preview on 2026-05-07, so both live tests in test_gpt4o_audio.py (test_audio_output_from_model and test_audio_input_to_model) hard-fail model_not_found on every PR. Swap the hardcoded model to OpenAI's successor gpt-audio-1.5 (same chat-completions audio surface; already in the litellm cost map). Mirror the narrowed-skip pattern from the prior audio fixes: skip on model_not_found / does-not-exist / openai-internal, re-raise everything else so genuine litellm regressions still fail CI loudly. (cherry picked from commit `92de7423ef`) * fix(tests): migrate realtime + rerank tests off shut-down upstream models (#28191) * fix(tests): use gpt-realtime in realtime guardrails test OpenAI shut down gpt-4o-realtime-preview-2024-12-17 on 2026-05-07, so the live OpenAI realtime guardrails integration test now fails with model_not_found (session.created never arrives, _wait_for_event times out). Point OPENAI_REALTIME_URL at the current GA model, gpt-realtime. Scope limited to this test: the pricing-catalog JSON keeps the retired entries intentionally (historical cost calc + separate Azure timeline), and the Azure realtime cost-calc test is unaffected. * fix(tests): mock nvidia_nim rerank instead of hitting EOL'd endpoint NVIDIA reached end-of-life for the hosted nvidia/llama-3.2-nv-rerankqa-1b-v2 rerank API on 2026-05-18 with no published replacement, so the live BaseLLMRerankTest.test_basic_rerank for nvidia_nim now returns HTTP 410 ("Gone"). NVIDIA's hosted catalog rotates on a schedule, so swapping in another live model would only defer the failure. Override test_basic_rerank in TestNvidiaNim to mock the sync/async HTTP transport (same pattern as test_nvidia_nim_rerank_ranking_endpoint in this file) and inject a fake NVIDIA_NIM_API_KEY via monkeypatch. The request/response transformation and cost calculation stay covered offline. Scope limited to nvidia_nim; other BaseLLMRerankTest providers untouched. * fix(tests): migrate remaining realtime tests off shut-down gpt-4o-realtime-preview OpenAI's 2026-05-07 shutdown removed the entire gpt-4o-realtime-preview family, including the undated 'gpt-4o-realtime-preview' alias (not just the dated snapshot fixed earlier). Three live tests still connected with the dead alias and failed with messages_received=1 (an error event instead of session.created): - test_openai_realtime_simple.py: get_model() -> gpt-realtime (drives TestOpenAIRealtime.test_realtime_connection / test_realtime_with_query_params) - test_openai_realtime.py: test_openai_realtime_direct_call_no_intent and test_openai_realtime_direct_call_with_intent -> openai/gpt-realtime (the with_intent test shares the same dead alias even though it was not in the failing set this run) Mocked unit tests (test_realtime_query_params_construction, test_realtime_query_params_use_normalized_model_name) are left as-is: they never hit the network and assert string plumbing only. Also fixes test_text_message_blocked_by_guardrail_no_ai_response, which now connects (the earlier URL swap worked) but tripped a model-wording-brittle assertion. The guardrail flow asks the model to voice the block message verbatim; gpt-4o-realtime-preview complied (output contained 'blocked'), gpt-realtime refuses verbatim-repeat instructions ('I'm sorry, but I can't repeat that message.'). Since the original user message is blocked before it reaches OpenAI, the refusal is still a safe outcome. Assertion #3 now accepts both voicing and refusal, and adds a hard check that the blocked phrase never leaks into AI output. (cherry picked from commit `ce87c411bf`) * fix(model_prices): register mistral/ministral-8b-2512 Mistral's API now returns model='ministral-8b-2512' when 'mistral-tiny' is requested, so test_completion_mistral_api fails with 'This model isn't mapped yet'. Adding the entry so completion_cost can resolve the cost for that response. Author: Claude <noreply@anthropic.com> * fix(mcp,auth): address greptile review concerns - handle_sse_mcp now calls _raise_preemptive_401_for_unauthenticated_servers so SSE clients to pass-through OAuth MCP servers receive the RFC 9728 401 + WWW-Authenticate challenge that the streamable-HTTP path already emits. - get_request_route strips a trailing slash from root_path before length-based prefix removal so non-canonical ASGI root_path values like "/litellm/" don't strip the leading slash from the returned route. - _mcp_oauth_user_api_key_auth's cookie JWT decode now passes options={"verify_aud": False} so a future revision of the UI session JWT containing an aud claim cannot silently downgrade the request to unauthenticated. Co-authored-by: Claude <claude@anthropic.com> * fix(tests): backfill local model_cost into remote-fetched map litellm.model_cost is loaded at import time from LITELLM_MODEL_COST_MAP_URL (pinned to main), so pricing entries that exist only in this branch (e.g. mistral/ministral-8b-2512, freshly added because Mistral's API now returns this id from mistral-tiny) are absent at test time and completion_cost lookups raise 'This model isn't mapped yet'. Backfill the in-tree backup into litellm.model_cost in the local_testing conftest so cassette-driven cost calculations resolve against the entries that ship with the branch under test. Fixes local_testing_part1 failures on test_completion_mistral_api and test_completion_mistral_api_modified_input. * fix(mcp,jwt): address greptile concurrency and code-quality concerns - _apply_issuer_claim_mappings now builds a new dict and reads from the original token, rather than mutating its input. The change is behaviour-preserving (caller passes a fresh jwt.decode result), but avoids the surprise-mutation pattern flagged by greptile. - is_network_error uses isinstance(exc, httpx.TransportError) instead of matching type(exc).__name__ against a hand-maintained string set, so ReadError / WriteError / ProxyError / etc. are also treated as transport-level failures and surfaced as HTTP 502. - fetch_upstream_oauth_protected_resource now coalesces concurrent discovery requests per (server_id, resource_url) through an asyncio.Lock so concurrent .well-known calls share a single upstream fetch + cache write. - Drop the redundant 'if trusted_ranges:' branch in get_mcp_client_ip; it is always true on the path that reaches it (the prior 'if not trusted_ranges:' early-returns). Co-authored-by: Claude <claude@anthropic.com> * fix(jwt,mcp): fall back to global JWKS on unknown issuer; prune fetch locks - handle_jwt._get_configured_issuer now returns None for tokens whose 'iss' is not in the configured issuers list, letting auth_jwt fall through to the legacy JWT_PUBLIC_KEY_URL path instead of hard-raising. This keeps existing tokens from non-configured IdPs working when an operator adds the new 'issuers' list to a live deployment. - discoverable_endpoints._prune_oauth_metadata_cache now also prunes entries in _OAUTH_METADATA_FETCH_LOCKS whose cache entry has been evicted and whose lock isn't currently held, bounding the locks dict to match the cache it guards. Co-authored-by: Claude <claude@anthropic.com> * fix(mcp,auth): restore client_ip in oauth2 target check, drop from delegate check The merge of staging into the PR branch (d42a66adb6) misplaced the client_ip=client_ip kwarg: it landed inside _target_servers_delegate_auth_to_upstream (which never accepted client_ip and isn't called with it), while the sibling _target_servers_use_oauth2 has client_ip in its signature but stopped passing it through to get_mcp_server_by_name. That left ruff flagging F821 on the undefined name and lint failing. Move client_ip back into _target_servers_use_oauth2's lookup (matching the call site that already forwards IPAddressUtils.get_mcp_client_ip) and drop it from _target_servers_delegate_auth_to_upstream so its body matches its signature again. * fix(mcp): respect client ip for delegated auth * fix(auth): address remaining greptile style findings - get_request_route: require root_path to match whole path segments before stripping, so '/apifoo' isn't truncated to 'foo' when root_path='/api'. - get_mcp_client_ip: collapse the two trusted-proxy validation branches into a single is_request_from_trusted_proxy call so the return value drives control flow instead of being discarded for the side-effect warning. Co-authored-by: Claude <claude@anthropic.com> * fix(jwt): strip internal _litellm_* claims in global JWKS auth path Prevents identity spoofing where a token signed by the global JWKS could inject _litellm_jwt_issuer and other _litellm_* claims that downstream getters trust. The issuer-scoped path already strips these via _apply_issuer_claim_mappings; mirror that behavior for the global fallback path. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): surface MCPUpstreamAuthError as 401 in SSE/HTTP transport handlers Both handle_sse_mcp and handle_streamable_http_mcp only caught HTTPException to preserve 401 + WWW-Authenticate challenges, but MCPUpstreamAuthError (raised when a pass-through server's upstream rejects a bearer token mid-session) inherits from Exception. It was falling through to the generic handler and surfacing as an opaque 500. Mirror the REST endpoint behavior: translate MCPUpstreamAuthError into an HTTPException(status_code=e.status_code) with the upstream www-authenticate header so standards-compliant MCP clients trigger the upstream OAuth flow. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): add upstream auth pre-flight in SSE handler Mirror handle_streamable_http_mcp by calling _check_passthrough_upstream_auth after the cold-start 401 emitter so expired/invalid upstream tokens surface a proper 401 + WWW-Authenticate challenge before the SSE session commits 200 headers, instead of letting list_tools silently return [] when the upstream rejects the token. Co-authored-by: Claude <noreply@anthropic.com> * fix(mcp): tighten cold-start bypass against CSV paths + dedupe upstream auth probe - Return None from _parse_mcp_server_names_from_path for CSV multi-server paths (/mcp/a,b). The regex previously truncated at the first comma and silently passed a single server name to the cold-start gate. - Switch _is_mcp_passthrough_cold_start to all-targets semantics, matching _target_servers_use_oauth2: one non-passthrough target in a co-targeted set must not flip the anonymous-admission bypass open for the others. - Drop the redundant HTTPStatusError block in _extract_upstream_auth_failure - any HTTPStatusError carries a .response, so the preceding generic block already handles 401/403 detection. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp,tests): sync stubs and cold-start assertions with delegate-check The merge of base-branch _target_servers_delegate_auth_to_upstream into process_mcp_request inserts an additional get_mcp_server_by_name(name) lookup ahead of the cold-start path, which breaks two test patterns: 1. lookup_by_name(name) side-effect stubs in TestMCPDelegateAuthToUpstream are called positionally by the delegate check, then again by the cold-start path with client_ip=... — raising TypeError: unexpected keyword argument 'client_ip'. Accept *_kwargs to match the real signature. 2. TestMCPPassthroughColdStartAdmission assertions count the lookup exactly once with client_ip=..., but the delegate check now adds a positional-only call ahead of it. Switch assert_called_once_with to assert_any_call for the cold-start invocation, and assert client_ip was not* passed for the aggregate /mcp test where cold-start must not fire. Both updates align with CLAUDE.md guidance to keep monkeypatch stubs in sync with the real signature when an optional parameter is added. Co-authored-by: Claude <claude@anthropic.com> * fix(mcp): correct passthrough probe 401 + slashed-name cold start parser - _check_passthrough_upstream_auth now emits 'Bearer resource_metadata="..."' pointing at the gateway's oauth-protected-resource well-known URL, mirroring the pre-emptive 401 path. Pass-through servers don't use the gateway as an authorization server, so the previous 'authorization_uri=' challenge sent clients to the wrong metadata endpoint. - _parse_mcp_server_names_from_path now accepts server names that contain a single slash (e.g. custom_solutions/user_123), mirroring MCPRequestHandler._extract_target_server_names_from_path. Without this, the cold-start bypass missed slashed-name servers and the generic admission error propagated instead of the spec-compliant 401 challenge. - _is_mcp_passthrough_cold_start drops the unused scope parameter from its signature. Co-authored-by: Yassin Kortam <yassin@berri.ai> * style(mcp): format discoverable endpoints * refactor(mcp): dedupe MCPUpstreamAuthError->HTTPException + thread client_ip into delegate-auth gate Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): handle passthrough OAuth metadata and startup auth errors - discoverable_endpoints: For pass-through MCP servers, when upstream oauth-protected-resource returns a non-200/non-dict response, raise HTTP 502 instead of falling through to default gateway metadata. Falling through would direct MCP clients at the gateway, which is not the authorization server for pass-through configs. - mcp_server_manager: Wrap _get_tools_from_server in startup tool name mapping with try/except. Since _get_tools_from_server now re-raises MCPUpstreamAuthError, an upstream 401 from a pass-through server at startup (when no user token is present) would otherwise abort the loop and leave subsequent servers unmapped. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): restrict passthrough probe challenge to OAuth passthrough servers The probe filter previously matched any server with Authorization in extra_headers, including gateway-managed OAuth2 servers. Those would then receive the resource_metadata= WWW-Authenticate challenge meant for pass-through servers, instead of the authorization_uri= challenge pointing at the gateway AS metadata. Use srv.is_oauth_passthrough so only genuine pass-through servers get the resource-metadata challenge. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(proxy): cover issuer-scoped JWT auth * fix(mcp): use resource metadata for passthrough reauth * fix(mcp,tests): assert cold-start helper directly for aggregate /mcp Threading client_ip into _target_servers_delegate_auth_to_upstream made get_mcp_server_by_name(name, client_ip=...) also fire from the delegate-auth check, so the call_args_list assertion on client_ip-in-kwargs no longer uniquely signals a cold-start lookup. Patch _is_mcp_passthrough_cold_start and assert it is not invoked, which is the actual contract the test is pinning. * fix(mcp,jwt): drop unneeded async helper + suppress misleading unscoped JWT warning - _build_oauth_authorization_server_response: revert to sync (no awaits in body). The function only does dict construction and synchronous registry lookups; async added coroutine creation overhead per discovery call without need. - _build_decode_kwargs: accept has_issuer_config so the global path's 'JWT auth is unscoped' warning is suppressed when LiteLLM_JWTAuth.issuers provides per-issuer scoping. Previously the warning fired spuriously for admins who intentionally use only the new issuers config. * fix(jwt,mcp): clarify issuers fallthrough + add TTL on mcp permission cache - LiteLLM_JWTAuth.issuers docs now state explicitly that unlisted issuers fall back to the global JWT_AUDIENCE/JWT_ISSUER path; the field is additive routing, not an allow-list. Matches actual control flow in handle_jwt.auth_jwt and the regression tests asserting backwards compatibility with the global JWKS path. - MCPRequestHandler._get_{org,agent}_object_permission now pass ttl=DEFAULT_MANAGEMENT_OBJECT_IN_MEMORY_CACHE_TTL on async_set_cache, mirroring the auth_checks.py pattern so the cache TTL is explicit on both DualCache layers. * fix(tests): align merged JWT and MCP cold-start assertions Update the tests carried over from PR #28008 to match the assertions on the staging branch: - tests/test_litellm/proxy/auth/test_handle_jwt.py: unknown issuers now fall back to the legacy JWT_PUBLIC_KEY_URL path (per litellm_feat/v1.84.0-mcp-gateway-jwt-auth's '\''fall back to global JWKS on unknown issuer'\''), and mapped issuer claims that are absent no longer fail closed — they simply leave the normalised LiteLLM internal claim absent. - tests/test_litellm/proxy/_experimental/mcp_server/auth/test_user_api_key_auth_mcp.py: the aggregate '\''/mcp'\'' route still triggers the delegate-auth-to-upstream lookup once for the header-supplied server name; cold-start admission must NOT fire on top of that. Tighten the assertion to assert_called_once_with so a future regression that re-enters cold-start is caught. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(jwt): guard litellm_jwtauth access in auth_jwt global path JWTHandler() can be constructed without update_environment() being called (tests do this directly), in which case self.litellm_jwtauth does not exist. Accessing it raises AttributeError before getattr can fall back. Use the same safe pattern other call sites use. * Gate MCP OAuth pass-through on delegate_auth_to_upstream flag Sameer's review on #28356/#28008 flagged that the new pass-through behaviors (preemptive 401 challenges, /.well-known/oauth-protected- resource proxying, upstream 401/403 propagation as MCPUpstreamAuthError, and Authorization-stripping when no x-litellm-api-key is supplied) were implicitly enabled for every server with auth_type=none plus Authorization in extra_headers. Existing users doing static bearer pass-through for non-OAuth reasons would have silently regressed. Make the detection rule explicit: extend the existing delegate_auth_to_upstream flag (previously oauth2-only) to also gate is_oauth_passthrough. Now requires flag + auth_type=None + Authorization in extra_headers, per Sameer's suggested detection rule. The UI toggle now appears for both modes (oauth2 PKCE passthrough and auth_type=none OAuth pass-through) with mode-appropriate copy. Update test fixtures to set the flag where the test intent is to exercise OAuth pass-through behavior, and add negative tests covering the new default-false case. * fix(mcp): route org object_permission lookup through shared auth helpers Replace the bespoke litellm_organizationtable.find_unique + dedicated cache key in _get_org_object_permission with get_org_object + get_object_permission so MCP requests share the same user_api_key_cache entries as the rest of the proxy and no longer fragment org-row caching. * fix(mcp): wrap get_object_permission call in shared try/except Ensure exceptions from get_object_permission in _get_org_object_permission are caught and return None, preserving the original fail-safe semantics. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(jwt): validate issuer audience at config load + dedicated key-miss exception - Move JWTIssuerConfig audience-required guard into a Pydantic model_validator so misconfiguration fails at startup instead of on the first request. - Replace the string-match `No matching public key found` filter in get_public_key's multi-URL fallback with a dedicated NoMatchingJWTPublicKeyError; only that specific exception triggers continuation, every other error still surfaces. * fix(mcp): admit and forward Authorization for passthrough OAuth return For pass-through MCP servers (auth_type=none with delegate_auth_to_upstream) the RFC 9728 cold-start flow sends the client back with only "Authorization: Bearer <upstream-token>" after upstream OAuth discovery. Previously this path 1) was rejected in process_mcp_request because the oauth2_headers fallback only covered auth_type=oauth2 targets, and 2) had the Authorization header stripped by _prepare_mcp_server_headers when no x-litellm-api-key was present, treating the upstream token as a potential LiteLLM key leak. - Extend the elif oauth2_headers fallback to also admit anonymously when every target is a pass-through server. - Pass user_api_key_auth into _prepare_mcp_server_headers so it can forward Authorization for pass-through servers when admission did not consume the bearer as a LiteLLM key (api_key is unset). Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): consistent www-authenticate casing + SSE toolset scoping - Normalize the WWW-Authenticate header key emitted by _check_passthrough_upstream_auth to lowercase to match the other 401 emitters in the OAuth pass-through flow. - Mirror the streamable HTTP handler's toolset scoping in handle_sse_mcp: strip client-supplied x-mcp-toolset-id and apply _apply_toolset_scope before _check_passthrough_upstream_auth so the upstream probe list is derived from the fully-authorized server set. - Tighten _has_client_supplied_mcp_auth signature so mcp_server_auth_headers is Optional, matching its caller in process_mcp_request. Co-authored-by: Yassin Kortam <yassin@berri.ai> * security(mcp): strip Authorization in call_tool when LiteLLM admission used legacy header Mirror the OAuth pass-through admission check from _prepare_mcp_server_headers (list-tools path) in _call_regular_mcp_tool (tool-call path): when the server is OAuth pass-through and the caller did not supply x-litellm-api-key, Authorization on the inbound request may itself be the LiteLLM API key — so strip it before forwarding instead of leaking the gateway credential upstream. When x-litellm-api-key is present, admission is unambiguous and Authorization continues to carry the upstream OAuth bearer (transparent pass-through). * refactor(mcp): centralize caller Authorization strip decision Extracted the security-sensitive logic that decides whether the caller's Authorization header is forwarded to (or stripped from) an outgoing MCP request into a single helper, _should_strip_caller_authorization, in mcp_server_manager.py. Previously the same condition was duplicated across _call_regular_mcp_tool (mcp_server_manager.py) and _prepare_mcp_server_headers (server.py). Keeping two copies of this check risked future divergence and credential-leak / broken-passthrough bugs. Both call sites now share the helper, preserving exact behavior. Co-authored-by: Yassin Kortam <yassin@berri.ai> * log MCP OAuth discovery diagnostics for unmatched paths and non-transport upstream errors * fix(jwt): include issuer-normalized team id in get_all_jwt_team_ids The aggregator for team IDs only consulted the issuer-normalized claim for the plural (team_ids) path and fell back to the global config for the singular path. When an operator configures team_id_jwt_field only at the issuer level, get_team_id correctly returned the mapped value but get_all_jwt_team_ids silently dropped it, causing membership reconciliation to disagree with request routing. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp/jwt): dedupe cold-start path parser; reject conflicting audience flags - _parse_mcp_server_names_from_path now delegates to MCPRequestHandler._extract_target_server_names_from_path so the names used by the cold-start passthrough bypass cannot drift from the names used by downstream routing. - JWTIssuerConfig now rejects the combination of audience and disable_audience_validation=True at validation time instead of silently ignoring the flag. * fix(mcp): restrict passthrough cold-start bypass to 401 only The new elif passthrough cold-start branch reused is_auth_error which matches both 401 and 403. A 403 from user_api_key_auth indicates the LiteLLM key WAS recognized but is forbidden (e.g. over budget / rate limited); falling through to anonymous UserAPIKeyAuth() in that case bypasses spend and rate-limit controls on passthrough servers. Only trigger the cold-start anonymous admission on 401, which is the signal that the bearer is an upstream OAuth token rather than a recognized LiteLLM key. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(jwt/mcp): warn on unscoped JWT fallback; route agent permission lookup through shared helper - _build_decode_kwargs no longer suppresses the unscoped-fallback warning when LiteLLM_JWTAuth.issuers is set: tokens whose iss does not match any configured issuer still fall through to the global path, and that fallback is itself unscoped when JWT_AUDIENCE/JWT_ISSUER are absent. - _get_agent_object_permission now caches the agent_id -> object_permission_id mapping and delegates the permission lookup to the shared get_object_permission helper, so the agent path reuses the same cache entries as the org / team / key paths. * fix(mcp): fabricate resource_metadata challenge when upstream 401 omits WWW-Authenticate When an upstream pass-through MCP server returns 401 without a WWW-Authenticate header (non-compliant per RFC 7235 §3.1), to_http_exception() now produces a synthetic Bearer challenge pointing at the gateway's standard-pattern oauth-protected-resource well-known endpoint for that server. This keeps MCP clients on the RFC 9728 discovery flow instead of receiving a bare 401 with no recovery hint. * fix(jwt): make _get_decode_options explicitly control verify_iss Previously, _get_decode_options only set verify_aud based on whether audience was provided. The issuer JWT path relied on always passing issuer=issuer_config.issuer to trigger PyJWT's default verify_iss=True, making the helper's behavior implicitly dependent on caller behavior. Now _get_decode_options accepts issuer as well, mirroring the verify_aud handling and matching the dimensions handled by _build_decode_kwargs. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): emit absolute resource_metadata URI in fabricated 401 challenge Per RFC 9728 §3.2 the resource_metadata Bearer challenge must be an absolute URI; strict MCP clients reject relative URIs and fail to initiate discovery. MCPUpstreamAuthError.to_http_exception now accepts the gateway base URL and prepends it when the upstream omitted WWW-Authenticate, and all four call sites (streamable HTTP, SSE, and the two REST tool-list paths) supply it. * fix(mcp): correct 403 detail text and remove dead _list_tools_for_single_server duplicate - MCPUpstreamAuthError.to_http_exception() now returns detail='Forbidden' for 403 upstream responses (and 'Unauthorized' for 401), matching the _check_passthrough_upstream_auth pre-flight probe. - Remove the shadowed first definition of _list_tools_for_single_server in rest_endpoints.py; the second definition was the live one and the dead copy was a maintenance trap. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: address potential bugs in auth_utils, mcp discoverable endpoints, and mcp auth - auth_utils.get_request_route: return '/' instead of empty string when raw_path exactly equals root_path so downstream route allowlist checks still see a leading slash - discoverable_endpoints.fetch_upstream_oauth_protected_resource: also cache negative results (no upstream metadata) for a shorter TTL so we don't re-fetch on every discovery request and so the per-key fetch lock can be pruned - user_api_key_auth_mcp: guard the oauth2_headers 401 cold-start passthrough bypass with _has_client_supplied_mcp_auth, matching the parallel bypass in the no-Authorization branch so MCP-auth-bearing requests don't silently downgrade to anonymous admission Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(vertex): tolerate transient InternalServerError in google maps tool test test_gemini_google_maps_tool_simple makes live calls to Vertex AI's Google Maps grounding backend, which intermittently returns 500 INTERNAL ("Please retry") — a transient upstream failure, not a LiteLLM bug. The test already passes on RateLimitError; treat InternalServerError the same way so transient Vertex-side failures don't fail CI. * refactor(mcp): drop redundant has_client_credentials filter on passthrough probe is_oauth_passthrough already requires auth_type in (None, MCPAuth.none), which is mutually exclusive with has_client_credentials (auth_type == MCPAuth.oauth2), so the extra guard was always True and only added confusion about whether a server could be both passthrough and M2M. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: restore unreachable InternalServerError skip handler in vertex test Co-authored-by: Yassin Kortam <yassin@berri.ai> * feat(mcp): add dedicated oauth_passthrough flag for non-oauth2 pass-through Previously is_oauth_passthrough reused delegate_auth_to_upstream — a flag scoped to oauth2 servers (PKCE bypass) — to gate OAuth pass-through for auth_type=none servers. Overloading it risked regressing existing deployments that set delegate_auth_to_upstream, since the same flag would silently start driving pass-through (discovery proxying, 401 challenges, upstream 401/403 propagation) on non-oauth2 servers. Introduce a separate oauth_passthrough opt-in so the two behaviors never imply each other: - MCPServer.is_oauth_passthrough now requires oauth_passthrough (not delegate_auth_to_upstream). - Persist oauth_passthrough on LiteLLM_MCPServerTable (new column + migration) and wire it through config/DB load and API responses. - UI splits the single toggle into two: "Delegate auth to upstream (PKCE passthrough)" for oauth2 and "OAuth pass-through" for auth_type=none servers forwarding Authorization. Adds backend tests (property, round-trip, and a regression guard that delegate_auth_to_upstream alone never enables pass-through) and UI tests for the toggle split. * fix(mcp): reconcile cold-start bypass with x-mcp-servers header and skip non-absolute WWW-Authenticate fabrication - _parse_mcp_server_names_from_path now fails closed when the x-mcp-servers header introduces any target not present in the path-derived target set, closing a header/path mismatch where the cold-start passthrough bypass could otherwise admit anonymously while the header advertises a non-passthrough server. - MCPUpstreamAuthError.to_http_exception no longer emits a relative resource_metadata URI when base_url is missing; per RFC 9728 3.2 the URI must be absolute, so we skip fabrication entirely rather than send a challenge strict MCP clients will reject. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(mcp): fabricate path-aware resource_metadata URI for upstream 401 When MCPUpstreamAuthError.to_http_exception fabricates a `WWW-Authenticate: Bearer resource_metadata=...` challenge (because the upstream 401 omitted one), the URL now matches the inbound MCP transport pattern the client originally used: - /mcp/{server_name} -> /.well-known/oauth-protected-resource/mcp/{server_name} - /{server_name}/mcp -> /.well-known/oauth-protected-resource/{server_name}/mcp This mirrors the path-aware behaviour of _get_passthrough_resource_metadata_url in server.py so strict RFC 9728 \xA73.2 clients on legacy routes get a resource_metadata URI aligned with the resource pattern they originally targeted. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(jwt+mcp): tighten issuer-scoped claim type handling, RFC-quote authorization_uri, surface MCP upstream auth errors, defense-in-depth on decode options - handle_jwt: when an issuer-scoped _litellm_team_ids claim exists but has an unexpected type, return [] instead of falling through to the global team_ids_jwt_field path (different claim semantically). - handle_jwt: _get_decode_options/_decode_jwt_with_public_key now take an explicit disable_audience_validation flag; passing audience=None without it raises, so audience checks can't silently disappear if the model validator is ever bypassed. _auth_jwt_with_issuer forwards the flag from JWTIssuerConfig. - mcp_server: quote the authorization_uri WWW-Authenticate parameter value (RFC 6750 / 9728 auth-param must be quoted-string), matching the pass-through path. - mcp_server: in _fetch_and_filter_server_tools, re-raise MCPUpstreamAuthError so the outer streamable-HTTP handler can surface a proper 401 + WWW-Authenticate challenge instead of returning an empty tool list. Co-authored-by: Yassin Kortam <yassin@berri.ai> * chore(docker): align Dockerfile.non_root/Dockerfile.database to current wolfi-base SHA The older sha256:3258be... pin has been intermittently returning 500/not-found from cgr.dev, breaking the test-server-root-path GitHub Action and the build_docker_database_image CircleCI job. Move both Dockerfiles onto the same sha256:31da65... digest already in use by Dockerfile, gateway/Dockerfile, backend/Dockerfile, and migrations/Dockerfile so the base image is consistent across the repo. * ci(docker): bump wolfi-base pin to current working digest The previously aligned sha256:31da6565f35a... and the older sha256:3258be... both return HTTP 500 from cgr.dev's manifest endpoint, breaking the build_docker_database_image CircleCI job and test-server-root-path GitHub Action. The current 'latest' tag resolves to sha256:5743937d521c... which serves manifests normally, so move docker/Dockerfile.database and docker/Dockerfile.non_root onto that digest. * ci(docker): retry apk add in Dockerfile.database for apk.cgr.dev flakes Mirror the retry-loop pattern from #28888 (which fixed backend/Dockerfile, gateway/Dockerfile, and migrations/Dockerfile) into docker/Dockerfile.database. The build_docker_database_image CI job has been intermittently failing with "remote server returned error (try 'apk update')" when apk.cgr.dev flakes mid-fetch; bumping the wolfi-base SHA doesn't address the mirror, only a retry does. Same explicit-failure form as #28888: exit non-zero on the 3rd miss instead of silently succeeding because `sleep 5` was the last command in the `&& break \|\| sleep 5` chain. * fix(mcp): scope preemptive 401 to toolset-narrowed server set Move _raise_preemptive_401_for_unauthenticated_servers after toolset scoping in both the StreamableHTTP and SSE handlers, and add an optional allowed_server_ids parameter so passthrough/oauth2 servers that the active toolset excludes no longer trigger a spurious 401 challenge. Without this, a client targeting a toolset whose scope excludes a passthrough server could be pushed into an OAuth flow for a server it would be 403'd on immediately after authentication. Co-authored-by: Yassin Kortam <yassin@berri.ai> * revert(docker): drop unrelated Wolfi bump and apk retry loop from MCP/JWT PR These Docker changes are out of scope for the MCP OAuth passthrough + JWT auth work and duplicate the build-reliability fix already merged to litellm_internal_staging in #28888, which adds the same apk retry loop on the componentized backend/gateway/migrations Dockerfiles and also fixes the underlying nodeenv/libatomic root cause. Restoring docker/Dockerfile.database and docker/Dockerfile.non_root to the base so this PR is purely the MCP/JWT change. * fix(mcp): surface upstream 403 challenges from REST tools/list The single-server pass-through path converted an upstream MCPUpstreamAuthError into an HTTPException, but list_tool_rest_api only re-raised 401s; an upstream 403 (valid token, insufficient scope) collapsed into a 200 response with error=unexpected_error, so clients never saw the status or WWW-Authenticate challenge needed to refresh scopes. Let MCPUpstreamAuthError propagate and convert it once in list_tool_rest_api so both 401 and 403 reach the client, while internal access/IP 403s keep the legacy error-dict shape. * fix(mcp): fail closed for IP access control when XFF trusted ranges unset When use_x_forwarded_for is enabled but mcp_trusted_proxy_ranges is not configured, get_mcp_client_ip previously fell back to the direct peer IP. Behind an internal reverse proxy that peer is the proxy's private address, so every external caller was classified as internal and could reach MCP servers with available_on_public_internet=false. Return an empty string in that case so is_internal_ip treats the caller as external. --------- Co-authored-by: Yuneng Jiang <yuneng@berri.ai> Co-authored-by: Milan <milan@berri.ai> Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: ryan-crabbe-berri <ryan@berri.ai> Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com> Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> Co-authored-by: gym-cmd <186399764+gym-cmd@users.noreply.github.com> Co-authored-by: Artem Dudarev <artem.dudarev@justeattakeaway.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>	2026-06-02 12:22:04 -07:00
Mateo Wang	581c30f1e8	[internal copy of #29089 ] fix: duplicate claude code traces (#29311 )	2026-05-29 22:23:24 -07:00
yuneng-jiang	5f75be5c1c	chore(ci): merge dev branch (#28801 ) * chore(proxy): route path-dependent call sites through get_request_route Replace direct ``request.url.path`` reads in auth, ACL, routing, and audit-log decisions with ``get_request_route(request)`` — the helper already added in ``auth/auth_utils.py`` that returns the ASGI ``scope["path"]`` with ``root_path`` stripped. Starlette reconstructs ``url.path`` from the Host header; ``scope["path"]`` is uvicorn's parse of the request line and matches what FastAPI dispatches on, so it's the authoritative route for any decision that should agree with the actual handler. Sites: - _experimental/mcp_server/auth/user_api_key_auth_mcp.py - management_endpoints/mcp_management_endpoints.py - vector_store_endpoints/utils.py - pass_through_endpoints/pass_through_endpoints.py - auth/route_checks.py - litellm_pre_call_utils.py - spend_tracking/spend_management_endpoints.py - common_utils/http_parsing_utils.py - management_helpers/utils.py - health_endpoints/_health_endpoints.py Adds regression tests in tests/proxy_unit_tests/test_proxy_routes.py that construct a Request with scope["path"] set to a benign route and the Host header crafted so url.path would resolve differently; each site's decision is asserted against scope["path"]. * chore(proxy): make get_request_route imports lazy at call sites Move the ``from litellm.proxy.auth.auth_utils import get_request_route`` imports added in the prior commit back to the function bodies that use them. The module-level form participates in a long-standing import cycle through ``auth_utils -> _types -> ...`` and was flagged by CodeQL on the PR; the lazy form matches the pattern the proxy already uses for ``user_api_key_auth`` and related helpers elsewhere in these files. Also drop the ``RouteChecks._is_assistants_api_request`` delegation in ``_get_metadata_variable_name`` introduced in the prior commit — the delegation pulled ``RouteChecks`` into the same cycle, and the call site reuses the resolved route for its other branches, so inlining the substring check is both cycle-free and avoids a redundant second ``get_request_route`` call. Comment in test_proxy_routes.py acknowledges that the two MCP table entries exercise ``get_request_route`` directly rather than the full production handler (which needs ASGI scope + MCP state to invoke). --------- Co-authored-by: shin-berri <shin-laptop@berri.ai> Co-authored-by: user <70670632+stuxf@users.noreply.github.com>	2026-05-25 13:44:49 -07:00
Michael-RZ-Berri	3b2ce201d8	encrypt callback_vars in key/team metadata at rest (#27141 ) Co-authored-by: Michael Riad Zaky <michaelr@Michaels-MacBook-Air.local> Co-authored-by: Yuneng Jiang <yuneng@berri.ai>	2026-05-23 12:15:44 -07:00
Sameer Kankute	b7e978a5c3	Litellm oss staging 04 21 2026 2 (#26569 ) * fix(bedrock): use model info lookup for output_config support instead of hardcoded check Replace hardcoded _is_claude_4_6_model() string matching with supports_output_config flag in model_prices_and_context_window.json, accessed via _supports_factory(). This follows the project's established pattern for model capability checks (per AGENTS.md rule #8). Bedrock Invoke now conditionally preserves output_config for models that declare supports_output_config=true (currently Claude 4.6 models), while stripping it for older models to avoid request rejection. Ref: https://github.com/BerriAI/litellm/issues/22797 * fix(vertex_ai): single-flight credential refresh to prevent thundering herd (#26024) * fix(vertex_ai): single-flight credential refresh to prevent thundering herd When GCP credentials expire under high concurrency, all requests simultaneously call credentials.refresh() via asyncify, saturating the 40-thread anyio pool and blocking the proxy for 20+ seconds. This adds: - Per-credential asyncio.Lock in get_access_token_async for single-flight refresh (1 coroutine refreshes, others wait on the lock) - Background refresh when token_state is STALE (usable but near expiry), returning the current token immediately with zero added latency - threading.Lock on the sync get_access_token path - Uses google-auth's TokenState enum (FRESH/STALE/INVALID) instead of reimplementing expiry logic Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: address PR review comments - Use asyncio.create_task() instead of deprecated get_event_loop().create_task() - Track in-flight background refresh tasks to prevent duplicate refreshes when multiple STALE-path callers pass through the lock before the first background task completes - Add token validation in the STALE branch (consistent with FRESH/INVALID) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: lazy-import TokenState to avoid breaking when google-auth is not installed Also extract helper methods to bring get_access_token_async under the PLR0915 statement limit (50). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: apply Black formatting to test file and update uv.lock Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove user-provided project_id from log messages (CodeQL log injection) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: avoid leaking token value in error message, log type instead Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: restore uv.lock to match litellm_oss_branch Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove project_id from remaining log message (CodeQL log injection) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: remove remaining project_id from log and error messages Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: reuse cached credentials in VertexAIPartnerModels (#26065) * fix: reuse cached credentials in VertexAIPartnerModels instead of creating new VertexLLM per request VertexAIPartnerModels.completion() was creating a throwaway VertexLLM() instance on every call to get an access token, bypassing the credential cache inherited from VertexBase. This caused a fresh token fetch for every single request, adding significant latency overhead. Fix: call super().__init__() to initialize VertexBase's credential cache, and use self._ensure_access_token() instead of a new VertexLLM instance. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: apply same credential caching fix to VertexAIGemmaModels and VertexAIModelGardenModels Same bug as VertexAIPartnerModels: both classes had `pass` in __init__ instead of `super().__init__()`, and created throwaway VertexLLM() instances per request instead of using self._ensure_access_token(). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(fireworks): add glm-5p1 metadata and parallel_tool_calls (#26069) * fix(chatgpt): preserve responses routing and recover empty output (#25403) (#26219) - preserve existing shared backend `mode` when router deployment registration reuses a provider/model key already in `litellm.model_cost` (prevents alias with `mode: chat` from downgrading shared `chatgpt/gpt-5.4` from `responses` to `chat` and triggering 403s on /v1/chat/completions) - teach the ChatGPT Responses parser to recover `response.output_item.done` entries when `response.completed.output` is empty - add defensive /responses -> /chat/completions bridge fallback that reconstructs output items from raw SSE when `raw_response.output` is empty - regression coverage for shared alias routing, empty completed.output parsing, and SSE bridge recovery Closes #25403 Co-authored-by: afoninsky <andrey.afoninsky@gmail.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(deps): relax core runtime dependency pins from exact == to ranges When litellm migrated from Poetry to uv (PR #24905, v1.83.1), the core dependency specifications in pyproject.toml changed from Poetry bare-version strings (e.g. openai = "2.30.0") to PEP 621 exact pins (openai==2.24.0). Poetry bare-version strings are actually caret ranges (^X.Y.Z == >=X.Y.Z,<X+1), but PEP 621 == is exact. This means every downstream package that installs litellm as a library dependency is now forced to downgrade aiohttp, pydantic, openai, click, and 8 other common packages to exact old versions. Fix: restore range specifiers for the 12 core runtime dependencies. The optional extras (proxy, proxy-runtime, etc.) are consumed primarily by Docker images where exact pins are appropriate and are left unchanged. The uv.lock file continues to provide exact reproducibility for Docker builds and CI. Fixes: #26154 * Add Rubrik as officially-supported guardrail plugin (#25305) * Add Rubrik as officially-supported guardrail plugin Adds tool blocking and batch logging integration with an external Rubrik webhook service. The plugin validates LLM tool calls against a policy service (fail-open on errors) and batch-logs all requests/responses. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update Rubrik docs: config.yaml as primary, env vars as fallback Restructures the Quick Start to present config.yaml as the recommended approach with tabbed UI, and environment variables as an alternative fallback. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add Rubrik env vars to config_settings reference Fixes documentation validation by adding RUBRIK_API_KEY, RUBRIK_BATCH_SIZE, RUBRIK_SAMPLING_RATE, and RUBRIK_WEBHOOK_URL to the environment settings reference table. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Add fallback message when blocking service returns empty explanation Prevents whitespace-only violation message when the tool blocking service blocks tools but returns an empty content field. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(ocr): add Reducto parse OCR support (#26068) * feat(ocr): add Reducto parse OCR support * fix(reducto): address OCR review feedback * chore: refresh uv lockfile * Revert "chore: refresh uv lockfile" This reverts commit 47200c0e603275108335aee852d0a96586165337. * Fix failing tests * Fix code qa * Replaced the async client violation * Replaced black formatting * Fix failing tests * Fix failing tests * Fix failing tests * Fix failing tests * Fix tests * Fix vertex ai cred test * Fix test * fix(xai): normalize usage total_tokens for prompt caching xAI can return total_tokens inconsistent with prompt_tokens + completion_tokens when caching is enabled. Align with OpenAI-style usage so shared LLM tests and downstream consumers see coherent totals. Apply to non-streaming responses and streaming usage chunks. Made-with: Cursor * Fix stale Vertex token refresh fallback * Fix OCR zero credit and Bedrock support checks * Fix OCR and Fireworks capability handling * fix: evict completed background refresh tasks from _background_refresh_tasks Completed asyncio.Task objects were never removed from _background_refresh_tasks. In long-running proxies with many distinct credential keys the dict grows indefinitely, retaining references to finished tasks and their results. Fix: - Pop the existing (done) entry before creating a replacement task. - Attach a done_callback to each new task that removes its entry from the dict once the task finishes (success or failure). Tests: - test_background_refresh_task_removed_after_completion: verifies the done-callback cleans up a single entry after the task completes. - test_background_refresh_tasks_no_accumulation_across_many_keys: drives 20 distinct credential keys and confirms the dict is empty after all background refreshes finish. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix: guard asyncio.create_task in RubrikLogger.__init__ against missing event loop asyncio.create_task() raises RuntimeError when called outside a running event loop. Wrap the call in a try/except RuntimeError so that RubrikLogger can be instantiated in synchronous contexts (e.g. during startup, testing) without crashing. The periodic_flush background task simply won't start in those cases; it starts normally when the constructor is called inside an event loop. Add a test that verifies instantiation outside an event loop does not raise (does not patch asyncio.create_task). Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix: preserve async batch and reauth coordination * Fix mypy * Fix xAI usage and Fireworks parallel tool params * Fix Rubrik batch drain and SSE recovery mutation * Fix router mode preservation and Rubrik batch flushing * fix(responses): merge text-only items with output items in SSE recovery When recovering output from raw SSE, OUTPUT_ITEM_DONE and OUTPUT_TEXT_DONE events were treated as mutually exclusive fallbacks. If a stream emitted OUTPUT_ITEM_DONE for some output indices and only OUTPUT_TEXT_DONE for others, the text-only items at the missing indices were silently dropped. Merge both dicts before returning, with OUTPUT_ITEM_DONE entries taking precedence at any shared index (preserving the existing behavior covered by test_transform_response_preserves_output_item_when_text_done_arrives_later). Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(rubrik): preserve events on batch send failure Previously, _log_batch_to_rubrik swallowed all HTTP errors and exceptions, and the parent flush_queue unconditionally drained the queue afterwards. On Rubrik 5xx responses, network errors, or timeouts the in-flight events were silently dropped without ever being delivered. - Re-raise from _log_batch_to_rubrik so failures surface to the caller. - In CustomBatchLogger.flush_queue, catch exceptions from async_send_batch and leave the queue intact for retry on the next flush. Existing loggers that override flush_queue (e.g. Datadog) or that swallow their own errors inside async_send_batch (e.g. Langsmith, GCS, Argilla) are unaffected. - Tests now assert events are preserved on HTTP errors, network errors, and that mid-flush appended events are also preserved on failure. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(chatgpt/responses): strip whitespace before parsing SSE chunks _parse_sse_json_chunk in ChatGPTResponsesAPIConfig passed the raw chunk directly to _strip_sse_data_from_chunk, which only matches the 'data:' prefix at position 0. Chunks with leading whitespace (e.g. ' data: {...}') were returned unchanged and silently failed JSON parsing, dropping the contained event. Mirror the existing fix in LiteLLMResponsesTransformationHandler._parse_raw_sse_chunk by calling chunk.strip() before stripping the SSE prefix. Adds a regression test using whitespace-padded data: lines and verifies that the response.output_item.done payload is recovered into the final ResponsesAPIResponse output. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(rubrik): override flush_queue so a single snapshot drives send and drain Previously RubrikLogger relied on CustomBatchLogger.flush_queue, which captured len(self.log_queue) separately from the snapshot taken inside async_send_batch. Although both happen without an intervening await today (so they agree in practice), they are semantically disconnected: a future refactor that adds an await between the two captures, or that changes the async_send_batch contract, could cause the parent to delete a different number of items than were actually sent and trigger duplicate deliveries to Rubrik. Override flush_queue on RubrikLogger so a single snapshot drives both the HTTP POST and the queue truncation. async_send_batch is preserved for direct callers/tests but no longer participates in the canonical flush path. Existing tests (including the one that explicitly invokes the base CustomBatchLogger.flush_queue path) still pass. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix: register reducto/parse-v3 and reducto/parse-legacy in active model pricing file Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * fix(bedrock): restore output_config forwarding and black formatting Use model-map lookup with _model_supports_effort_param fallback so Bedrock Invoke keeps output_config for Claude 4.6/4.7 when pricing flags are missing. Revert custom_llm_provider=bedrock for supports_output_config checks, fix allowlist test model, and apply black to xai/vertex files failing lint CI. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(greptile): address remaining review concerns - fireworks: resolve supports_reasoning lookup for short model names by also trying the full accounts/fireworks/models/ path in model_cost - ocr_cost: drop reducto-specific guard in shared utility; treat missing pages_processed as zero cost when no per-page pricing is configured - docs: remove reducto/rubrik markdown stubs from this repo (canonical docs live in litellm-docs) * fix(model_prices): register mistral/ministral-8b-2512 Mistral's API now returns model='ministral-8b-2512' when 'mistral-tiny' is requested. Adding the entry so completion_cost can resolve the cost for that response. * fix(greptile): prune async refresh locks and lazy-start rubrik flush - vertex: back `_async_refresh_locks` with a WeakValueDictionary so a per-key Lock is auto-evicted once no coroutine holds it, preventing unbounded growth in deployments with many credential combinations while keeping single-flight semantics intact. - rubrik: defer the periodic flush task to the first log event when the logger is constructed without a running event loop, so low-traffic batches still get drained instead of being silently stranded by a swallowed RuntimeError. * Remove duplicate supports_max_reasoning_effort key in claude-opus-4-7 entries Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(vertex_ai): stabilize background refresh task tracking - Guard background refresh done_callback with an identity check so a stale callback cannot remove a newer task that already replaced it in the tracking dict (done_callbacks are scheduled via call_soon, so a fresh task can be stored for the same credential key before the old callback fires). - Replace WeakValueDictionary with a regular dict for _async_refresh_locks so the per-key asyncio.Lock identity is stable across concurrent callers; otherwise a lock can be GC'd between two coroutines arriving for the same key, breaking single-flight. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: surface OCR pricing gaps and recover OUTPUT_TEXT_DONE in ChatGPT SSE - cost_calculator.ocr_cost: log a warning when pages_processed is reported but no ocr_cost_per_page is configured, instead of silently billing zero via an implicit '(... or 0.0) * pages_processed' fallback. Behavior is preserved (zero cost) so free-tier / unpriced models still work, but configuration gaps are now visible in logs. - ChatGPTResponsesAPIConfig._extract_completed_response_from_sse: also collect response.output_text.done events into a text-only items map and merge them into the recovered output (OUTPUT_ITEM_DONE wins on duplicate output_index), mirroring the LiteLLMResponses handler. This recovers text content when a provider only emits OUTPUT_TEXT_DONE and the final response.completed event has an empty output list. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(cicd): drop obsolete async refresh locks auto-prune test Commit `dfb2524` intentionally reverted _async_refresh_locks from a WeakValueDictionary back to a regular Dict so the per-key asyncio.Lock identity is stable across concurrent callers — preserving single-flight semantics. The test asserting that the dict shrinks back to 0 after refreshes was added when the WeakValueDictionary backing was still in place; it now contradicts the deliberate design and is failing CI. * fix(rubrik): sanitize proxy_server_request and harden tool_calls parsing Address bugbot review concerns: - Sanitize proxy_server_request before forwarding to the Rubrik webhook. The previous code passed the entire inbound HTTP context (Authorization, Cookie, x-api-key, and the raw request body) through to a third-party endpoint, which exfiltrates proxy credentials and upstream secrets. The new _sanitize_proxy_server_request allowlists only url and method. (Cursor Bugbot HIGH severity #3192354895) - Treat a null choices[0].message.tool_calls as 'all blocked' rather than letting iteration raise and silently fall through the outer except in apply_guardrail (which would fail open). Iterate over a defensive fallback list instead of relying on the dict default. (Cursor Bugbot MEDIUM severity #3192349538) Co-authored-by: Cursor Bugbot <bugbot@cursor.com> * fix: restore Fireworks substring matching and use RLock for Vertex sync refresh - Fireworks _get_model_cost_capability: after exact-key lookups, fall back to substring matching against fireworks_ai/* entries in model_cost so model name variants (e.g. fine-tuned suffixes) continue to inherit capability flags like supports_reasoning. - Vertex vertex_llm_base: replace non-reentrant threading.Lock with RLock on the sync refresh path so the reauthentication retry, which recurses into get_access_token while still holding the lock, does not deadlock when reloaded credentials are also expired. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(rubrik): collapse BlockedToolsResult dead-code into Optional[str] The `allowed_tools` field on `BlockedToolsResult` was computed in `_extract_blocked_tools` but never read by the only caller — when any tool was blocked the integration unconditionally raised `ModifyResponseException` to reject the full response, never doing partial filtering. Drop the dataclass and return the blocking explanation directly as `Optional[str]` so there's no misleading shape hinting at unused partial-filter capability. Co-authored-by: Greptile <greptile-apps[bot]@users.noreply.github.com> * fix(greptile): prune vertex async refresh lock dict after release Address greptile's open thread on _async_refresh_locks growing unboundedly in high-cardinality deployments. - Add _maybe_prune_async_refresh_lock: drops the per-key Lock from the registry once no coroutine holds it and no coroutine is queued in lock._waiters. The check-then-pop sequence is safe under asyncio's cooperative scheduler — a waiter that arrives after the pop simply creates a fresh lock under the same key, which is fine because the previous batch is already done. - Wrap the slow-path async with lock in a try/finally so the prune runs on every exit (return, exception, reauth retry). - Extract the existing background-refresh task scheduling into _schedule_background_refresh so get_access_token_async stays under ruff's PLR0915 ("Too many statements") limit. No behaviour change. - Regression tests cover both pruning after release (the dict shrinks back to zero after each call) and the safeguard that keeps the lock alive while a waiter is still queued. * fix(greptile): pass explicit bedrock provider to _supports_factory Bedrock Invoke transformation files (chat and messages) called _supports_factory(custom_llm_provider=None, ...) which relies on auto-detection. For short Bedrock model names (e.g. 'anthropic.claude-opus-4-6' without the version suffix) auto-detection fails and the lookup falls back through the exception path. Passing the known 'bedrock' provider explicitly makes the lookup deterministic for all Bedrock model variants, including cross-region inference profile IDs. Co-authored-by: Claude <noreply@anthropic.com> * fix(greptile): warn when OCR cost silently returns 0.0 Address greptile's P2 thread (#3144753707) about ocr_cost silently under-reporting billing when response.usage_info.pages_processed is missing. The credit-priced and unpriced fallback still has to return 0.0 (we don't know how to bill without usage), but emit a warning so the missing-data case is visible in logs instead of disappearing. The per-page-priced branch still raises, preserving the original ValueError signal callers may catch. * fix(greptile): reorder bedrock output_config strip comment labels Swap the # 5a / # 5b step labels so they appear in numerical order within the file. The new output_config-strip block was added with label # 5b above the pre-existing # 5a 'remove custom field from tools' block; rename the new block to # 5a and the pre-existing block to # 5b so the labels match the order of the steps in the file. No behavior change. Co-authored-by: Greptile Reviewer <greptile-apps@users.noreply.github.com> * Fix substring matching specificity and remove mutable Reducto OCR config state - Fireworks: _get_model_cost_capability fallback now picks the longest substring match in model_cost so more specific entries win over less specific ones (instead of returning the first match by insertion order). - Reducto OCR: drop per-request _api_key/_api_base instance attributes on _BaseReductoOCRConfig and instead thread api_key/api_base through transform_ocr_request/async_transform_ocr_request kwargs from the shared OCR HTTP handler. Makes the config safe to share/cache across concurrent requests with different credentials. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(greptile): drain background refresh + warn on router mode override Address the two new findings from greptile's 19:45 review of the vertex+router surfaces. - vertex_llm_base: when the slow path sees TokenState.INVALID, await any in-flight background refresh task before invoking refresh_auth ourselves. google-auth's Credentials.refresh() is not safe to call concurrently on the same credentials object, and the background task runs outside the per-key lock. After the wait, re-check the cached token so we can short-circuit if the background refresh already restored it. Extracted the helper into _await_in_flight_background_refresh so get_access_token_async stays under ruff's PLR0915 statement budget. - router.py: when alias registration would overwrite the deployment's declared `mode` to keep the shared backend mode stable, emit a verbose_router_logger.warning so the override is visible to operators instead of silently winning. The existing fix (preventing alias registration from downgrading a shared `mode: responses` to chat) is preserved; the warning just surfaces it. * fix(cicd): apply black formatting to vertex_llm_base.py * fix(greptile): guard Reducto upload helpers against missing file_id Raise a clear ValueError when Reducto /upload returns 200 without a file_id key (or with a non-JSON body), instead of letting downstream callers see a confusing KeyError. * fireworks_ai: cache fireworks model_cost index and use hyphen-boundary matching - Build a memoized index of fireworks_ai/* entries from litellm.model_cost, invalidated by (id, len) of the model_cost dict. Avoids re-scanning the full ~30k-entry model_cost dictionary on every get_provider_info call. - Replace plain substring containment with hyphen-aligned boundary matching so a known short model name (e.g. 'some-model') cannot falsely match an unrelated longer query (e.g. 'awesome-model'). Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(greptile): refcount vertex async refresh lock pruning Replace the asyncio.Lock._waiters inspection in _maybe_prune_async_refresh_lock with an explicit refcount so the entry is pruned exactly when no coroutine is holding or waiting on the lock, without depending on any private asyncio internals. * fix(vertex): serialize credentials.refresh() across threads via _sync_refresh_lock refresh_auth is invoked from three call sites that can run on different threads (sync get_access_token, async slow path via asyncify, and the background proactive refresh task). Only the sync path was protected by _sync_refresh_lock, so a concurrent sync + async/background call could invoke google-auth's Credentials.refresh() on the same object from two threads simultaneously, mutating internal credential state. Move the lock acquisition into refresh_auth itself; the lock is an RLock so reentrant acquisition from the sync path remains safe. Co-authored-by: Yassin Kortam <yassin@berri.ai> * refactor(responses): extract shared SSE output-item recovery helpers Both ChatGPTResponsesAPIConfig and LiteLLMResponsesTransformationHandler duplicated the same OUTPUT_ITEM_DONE / OUTPUT_TEXT_DONE recovery algorithm. Move that logic into litellm.responses.sse_output_recovery and have both call sites use the shared helpers, so future fixes apply in one place. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(greptile): tie fireworks index cache to model_cost mutation generation * fix: address three bug detection findings - rubrik: use 'is not None' check for tool call IDs to allow empty-string IDs - router: indent mode preservation mutation to match warning conditional - responses transformation: add missing 'continue' after OUTPUT_TEXT_DONE handler Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(router): always preserve existing shared backend mode when deployment mode is None Previously the inner guard 'if _deployment_mode is not None' prevented _shared_model_info['mode'] from being set back to the existing shared mode when the deployment mode was None, which then overwrote the shared backend's mode with None via register_model. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix: address three bug detection findings - vertex_llm_base: guard background refresh's cache write with an identity check so a stale write cannot overwrite a credentials reference replaced by a concurrent reauthentication path. - router: make shared backend mode preservation directional - only preserve when an existing 'responses' mode would be downgraded to 'chat', or when the deployment mode is None (which would otherwise clear the existing mode). Legitimate upgrades now apply. - rubrik: remove unused preserve_events_added_during_flush attribute; RubrikLogger overrides flush_queue, so the base-class flag never applied. Drop the test that exercised the parent path on a Rubrik instance since it does not reflect real flush behavior. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(veria): scope reducto file IDs to current request + register pricing - Reject reducto:// file IDs sent through the proxy /v1/ocr JSON API. The IDs are not bound to a LiteLLM key, so an authenticated user could submit another user's file ID and receive OCR text via the proxy's shared Reducto credentials. Force fresh uploads (multipart form or inline base64 data URI) so every OCR call is server-mediated and implicitly bound to the originating request. - Add ocr_cost_per_credit=0.015 to reducto/parse-v3 and reducto/parse-legacy in both pricing JSONs so successful Reducto OCR calls debit key/team spend instead of recording zero. * fix(vertex): always overwrite resolved cache key with fresh credentials After reauthentication or fresh load, the resolved (cache_credentials, project_id) cache key may point to stale credentials from a prior load. Skipping the write when the key existed forced the next request to go through a redundant refresh/reauth cycle. Always overwrite so callers using the resolved project_id hit the fresh credentials object. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(xai): fold reasoning tokens before normalizing usage in streaming chunks The non-streaming transform_response folds xAI's reasoning_tokens into completion_tokens before calling _normalize_openai_compatible_usage_totals, preserving the OpenAI invariant total = prompt + completion. The streaming chunk_parser only ran the normalization, so when xAI streamed usage with reasoning tokens (total = prompt + completion + reasoning), the normalize check (total < prompt + completion) was a no-op and the invariant remained violated. Refactor _fold_reasoning_tokens_into_completion to also accept a raw usage dict (in addition to ModelResponse / Usage) and call it from the streaming chunk_parser before normalization, so streaming and non-streaming paths report usage consistently for reasoning models. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(greptile): cap SSE content_index padding and use multiset tool-id check * fix(rubrik): apply event_hook default when caller passes None initialize_guardrail always passes event_hook=litellm_params.mode, so setdefault never applied its default. When mode is omitted from the guardrail config, event_hook ended up as None instead of post_call. Use 'or' to fall back to the intended default when the value is None. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(rubrik): cover event_hook default coercion Regression tests for the case where the upstream caller (initialize_guardrail) passes event_hook=None and the logger should still fall back to post_call, and the sanity case where an explicitly-set non-None event_hook is preserved. * fix: address autofix bugs in chatgpt SSE, vertex token cache, rubrik aclose - chatgpt responses: don't overwrite a meaningful error_message with None when a later RESPONSE_FAILED/ERROR event lacks an error object. - vertex_ai: serve STALE tokens from the lock-free fast path and only schedule a deduplicated background refresh, eliminating per-key lock contention near token expiry. - rubrik: aclose() now closes both async_httpx_client and tool_blocking_client to avoid leaking connections from the dedicated client when the logger shuts down. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(vertex): drop redundant resolved_project rebind in slow path Reusing resolved_project (typed str from the fast path's tuple unpack) for an Optional[str] assignment tripped mypy. Use project_id directly after the None check. * test(team_members): skip flaky test_add_multiple_members The test creates a team via /team/new, adds a member via /team/member_add, then queries /team/info — and intermittently gets a 404 for a team that was just successfully created and mutated. The basic happy path is already covered by test_add_single_member; we only lose the 10-iteration stress loop. * fix(rubrik): cancel periodic flush task on aclose The aclose() method closed both HTTP clients but did not cancel the periodic flush task. After close, the task would wake up every flush_interval seconds and try to POST via the now-closed async_httpx_client, generating recurring errors. Cancel the task and await its termination before closing the clients. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(rubrik): coerce None default_on to True at init * fix: tighten SSE done parser + rubrik /v1/messages match Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(bedrock): warn when invoke transformation strips output_config The Bedrock Invoke chat and messages transformations strip output_config when neither supports_output_config nor any supports__reasoning_effort flag is set in the model JSON. This was silent; emit a verbose_logger warning when the strip actually removes a present output_config so newly released models (where the JSON entry hasn't caught up yet) surface a clear log line instead of dropping the effort parameter without notice. fix(rubrik): drop tool_call repr from normalize error to avoid leaking args The TypeError raised in _normalize_tool_calls is caught by apply_guardrail's broad except, which logs the message plus exc_info. Including repr(tc) in the message could expose function arguments (potentially sensitive user data) in the proxy log stream. Type name alone is enough for debugging. * fix: dedupe SSE chunk parser and warn on Fireworks tool drop - Centralize SSE 'data:' chunk parsing in litellm.responses.sse_output_recovery so the ChatGPT Responses transformer and the Responses->Chat-Completions bridge share a single implementation. - Log a warning when get_supported_openai_params drops 'tools' for a fireworks_ai model whose JSON entry sets supports_function_calling=false, so users notice the behavioral change instead of silently losing tools. Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(fireworks_ai): demote per-request tool drop warning to debug Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(veria): cap Rubrik retry queue at 10k events with drop-oldest A persistent Rubrik webhook outage previously let authenticated traffic accumulate prompt/response payloads in the in-memory retry queue without bound. The PR-introduced retry-on-failure behavior in flush_queue() never trims the queue, so under sustained outage and high request volume the proxy can run out of memory. Cap the queue at RUBRIK_MAX_QUEUE_SIZE events (default 10_000) and drop the oldest events when the cap is exceeded. Emit a throttled verbose_logger warning so operators can detect a stuck webhook. * fix(tests): accept either initial event type from xAI realtime xAI's Grok Voice Agent API used to emit 'conversation.created' as the first event over the WebSocket. It has since shipped a fully OpenAI-compatible 'session.created' event (and may still emit the legacy 'conversation.created' on some routes), which breaks the strict-equality assertion in the realtime e2e test: AssertionError: Expected conversation.created, got session.created This is an upstream behavior change, not a regression in our code. Loosen the base realtime test so get_initial_event_type() may return a tuple of acceptable event types, and have the xAI subclass accept both 'conversation.created' and 'session.created'. The OpenAI subclasses keep their single-string contract unchanged. * fix(rubrik): drop RUBRIK_MAX_QUEUE_SIZE env knob, hardcode 10k cap The doc-validation CI scans for os.getenv() calls and requires each key to appear in litellm-docs config_settings.md. Adding the env var here without a matching docs PR fails the docs and code-quality checks, and the extra env-parsing block in __init__ also tripped ruff PLR0915. The hard cap at 10k still bounds memory on a Rubrik webhook outage, which is the actual bug being fixed -- operators don't need to tune this knob to get the safety guarantee. * test(team_members): skip flaky test_duplicate_user_addition Same /team/info 404-after-add_team_member race that already led to test_add_multiple_members being skipped in `dedc4022`. Duplicate-prevention behavior is covered by test_update_team_members_list_duplicate_prevention in tests/test_litellm/proxy/management_endpoints/test_team_endpoints.py, so the e2e proxy variant doesn't add coverage. * fix: bound CustomBatchLogger queue and call super().__init__ in ContextCachingEndpoints Co-authored-by: Yassin Kortam <yassin@berri.ai> * fix(rubrik): distinguish malformed tool-blocking response from transient errors Raise a dedicated _MalformedToolBlockingResponseError when the tool blocking service returns an empty 'choices' list, instead of a bare Exception. Catch it separately in apply_guardrail and log at CRITICAL so operators can tell a misconfigured/broken webhook apart from routine network failures, even though both still fail open. Co-authored-by: Yassin Kortam <yassin@berri.ai> * router: clarify shared backend mode preservation flow Add a blank line and a brief comment before the _backend_alias_cost assignment to make it clear that registration runs unconditionally after the optional mode-preservation mutation. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(ci): skip chronically flaky test_spend_logs_with_org_id Same write-then-read race against the spend logs DB as test_spend_logs (already skipped above). /spend/logs?request_id=... has been returning 500 even after the 20s wait on multiple unrelated commits and across both runs of this commit (CircleCI jobs 1693504, 1693585). The PR itself does not touch spend logs. Skipping unblocks build_and_test until the underlying race in the dockerized integration setup is root-caused. Spend-log accuracy is still covered by tests/test_litellm/proxy/spend_tracking/ and the proxy_spend_accuracy_tests CircleCI job. --------- Co-authored-by: Kevin Zhao <zkm8093@gmail.com> Co-authored-by: Matthew Lapointe <lapointe683@gmail.com> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Elon Azoulay <elon.azoulay@gmail.com> Co-authored-by: Krrish Dholakia <krrish+github@berri.ai> Co-authored-by: afoninsky <andrey.afoninsky@gmail.com> Co-authored-by: Tai An <antai12232931@outlook.com> Co-authored-by: Joseph Barker <156112794+seph-barker@users.noreply.github.com> Co-authored-by: Maruti Agarwal <88403147+marutilai@users.noreply.github.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Claude <claude@anthropic.com> Co-authored-by: Yassin Kortam <yassin@berri.ai> Co-authored-by: Cursor Bugbot <bugbot@cursor.com> Co-authored-by: Greptile <greptile-apps[bot]@users.noreply.github.com> Co-authored-by: Greptile Reviewer <greptile-apps@users.noreply.github.com>	2026-05-20 21:25:19 -07:00
yuneng-jiang	f99fb5f27f	chore(ci): merge dev branch (#28314 ) * chore(proxy): strict media-type match for form bodies (#27939) * chore(proxy): strict media-type match for form bodies ``_read_request_body`` and ``get_request_body`` routed on ``"form" in content_type`` / ``"multipart/form-data" in content_type``, which match any header containing the literal — ``application/form-json``, ``multiform/anything``, ``application/json; xform=1``. Starlette's ``request.form()`` returns an empty ``FormData`` for any non-canonical type without consuming the body, so the auth-time pre-read saw ``{}`` and skipped the banned-param check while the handler's later ``request.body()`` saw the original JSON payload. Parse the media type per RFC 7231 (substring before ``;``, trimmed, lowercased) and accept only ``application/x-www-form-urlencoded`` and ``multipart/form-data``. Replace both substring sites with the shared ``_is_form_content_type`` helper. Tests pin: case/whitespace/charset variants of the two real types match; ``application/form-json`` and similar substring-match traps fall through to the JSON parse path; real form POSTs continue to route through ``request.form()``. * chore(proxy): extract _is_json_content_type symmetric helper Mirror ``_is_form_content_type`` for the JSON branch of ``get_request_body`` so both classifications share the same media-type normalisation (strip params, trim, lowercase) and any future change to the parsing rules has one place to update. Adds tests for ``_is_json_content_type`` and for ``get_request_body`` covering the canonical JSON / form / unsupported / non-POST paths. * chore(proxy): surface form-parse failures instead of caching empty body Starlette's ``request.form()`` raises ``MultiPartException`` / ``ValueError`` / ``AssertionError`` on malformed multipart input (missing boundary, malformed chunk encoding, etc.). The outer ``except Exception: return {}`` swallowed every form-parse failure and cached an empty parsed body — auth-time pre-reads saw ``{}`` and skipped every banned-param check while a later raw-body re-read in the handler still saw the original payload. Same TOCTOU shape as the substring-match bypass: the auth gate and the handler don't agree on what the body is. Wrap ``request.form()`` in a narrow ``try`` that converts any parse failure to a 400 ``ProxyException``. The outer broad ``except`` is retained for unrelated unexpected errors but no longer covers form-parse-side bypass shapes. Adds a regression test parametrised over the exception classes Starlette can raise from ``request.form()``. * chore(proxy): drop redundant _is_json_content_type test class ``_is_json_content_type`` is a 3-line wrapper around the shared ``_normalize_media_type`` helper. Positive coverage lives in ``TestGetRequestBody.test_json_with_charset_param_parses_as_json``; negative coverage is covered transitively by ``TestIsFormContentType``'s non-form parametrize matrix (anything that isn't a form type falls through to the JSON branch). * chore(proxy): carry ASGI path into WebSocket auth synthetic Request (#27940) ``user_api_key_auth_websocket`` built a synthetic ``Request`` with a two-key scope (``type`` + ``headers``) and set ``request._url = websocket.url``. ``get_request_route`` reads ``scope.get("path", ...)`` and falls back to ``request.url.path`` only when ``path`` is absent. For the WebSocket flow that fallback fires and resolves to the Host-header-derived value (Starlette reconstructs ``websocket.url`` from the Host header), so a malformed Host collapses the resolved route and lets the auth gate compare against the wrong value. Carry the ASGI scope's ``path``, ``root_path``, and ``app_root_path`` into the synthetic scope so the lookup never reaches the fallback on the legitimate path. Regression test pins that the request handed to ``user_api_key_auth`` has ``scope["path"]`` equal to the ASGI scope's path. --------- Co-authored-by: stuxf <70670632+stuxf@users.noreply.github.com>	2026-05-20 17:47:33 -07:00
Sameer Kankute	e59e34bed3	Gemini managed agents support (#28270 ) * Add support for environment variable in interactions api * Add sdk support for gemini create agent * Add agents endpoint support via proxy * Add outputs of each api * Add routing for model and agents param * Remove redundant condition in get_provider_agents_api_config LlmProviders.GEMINI.value is literally the string "gemini", so the second clause of the or was checking the exact same thing as the first. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix: forward query-param credentials to list/get/delete/versions Gemini agent endpoints The list_gemini_agents, get_gemini_agent, delete_gemini_agent, and list_gemini_agent_versions endpoints previously constructed a hardcoded data dict with no mechanism to pass provider credentials. Unlike create_gemini_agent (POST, reads litellm_params_template from body), these GET/DELETE endpoints gave no way for multi-tenant callers to supply a per-request api_key or other LiteLLM params. Fix: - Add _merge_query_params_into_data() helper that reads query parameters from the request and merges them into the data dict without overwriting already-set keys (e.g. path params like 'name'). - Support a JSON-encoded litellm_params_template query parameter (matching the POST body pattern) as well as flat key=value pairs (e.g. api_key=AIza...). - Apply the helper in all four affected endpoints. - Add 13 unit tests covering the helper and each endpoint. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix: pass model=None for managed agent proxy endpoints to prevent agent name polluting data["model"] Endpoints acreate_agent, aget_agent, adelete_agent, and alist_agent_versions were passing model=<agent_name> to base_process_llm_request. This caused common_processing_pre_call_logic to write the agent name into self.data["model"], which then triggered spurious model-alias mapping, rate-limiting lookups, and logging tied to a non-existent model deployment. The agent name is already carried in data["name"] and is passed correctly to the SDK functions (litellm.interactions.agents.). There is no reason to also set model=<agent_name>; the correct value is model=None for all five managed-agent management routes. Adds tests/test_litellm/proxy/google_endpoints/test_managed_agents_model_param.py to verify all five managed-agent endpoints pass model=None. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> fix: address greptile P1/P2 review comments P1 (router.py): Restore fallback/retry support for acreate_interaction and create_interaction. Both were silently moved to _init_interactions_api_endpoints (direct call, no fallbacks). Moved them back to _ageneric_api_call_with_fallbacks so users with configured fallback models keep retry behaviour. P1 security (agents_endpoints.py): Remove flat query-param credential path (e.g. ?api_key=AIza...) from _merge_query_params_into_data. Credentials in URL query strings appear verbatim in server access logs, CDN edge logs, and browser history. Only the JSON-encoded litellm_params_template query param (matching the POST body pattern) is retained. P2 (interactions/http_handler.py): Extract _BaseHTTPHandler with shared _handle_error, _sync_client, and _async_client helpers. InteractionsHTTPHandler now extends _BaseHTTPHandler. The _async_client reads the provider from litellm_params instead of hardcoding GEMINI. P2 (interactions/agents/http_handler.py): AgentsHTTPHandler now extends InteractionsHTTPHandler (which inherits _BaseHTTPHandler) so all shared HTTP infrastructure is reused rather than duplicated. Removes the hardcoded LlmProviders.GEMINI from the async client path. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: address CI failures from greptile review fixes - black: format interactions/agents/main.py and utils.py - tests: update test_gemini_agents_endpoints.py to match new _merge_query_params_into_data behaviour (flat credential params are rejected; only JSON-encoded litellm_params_template is accepted) - ci: add test_gemini_agents_endpoints.py to endpoints-and-responses shard in test-unit-proxy-db.yml so assert-shard-coverage passes - tests: add _initialize_managed_agents_endpoints and _init_managed_agents_api_endpoints test coverage so router_code_coverage passes; also fix TestRouterCreateInteractionRouting to reflect that acreate_interaction now correctly routes through _ageneric_api_call_with_fallbacks (restoring fallback support) Co-authored-by: Cursor <cursoragent@cursor.com> * fix: remove InteractionsHTTPHandler._handle_error override to fix type errors AgentsHTTPHandler extends InteractionsHTTPHandler and calls self._handle_error(provider_config=agents_api_config) where agents_api_config is BaseAgentsAPIConfig. Python MRO resolved _handle_error to InteractionsHTTPHandler._handle_error which expected BaseInteractionsAPIConfig, causing 10 mypy arg-type errors in interactions/agents/http_handler.py. Removing the redundant override lets both classes inherit _BaseHTTPHandler._handle_error (provider_config: Any) which is structurally correct for both config types. Co-authored-by: Cursor <cursoragent@cursor.com> * fix: agent-only interactions and managed agents provider routing Resolve None custom_llm_provider in agents HTTP client lookup and set custom_llm_provider on GenericLiteLLMParams for all agent CRUD paths. Stop mapping agent names to proxy model routing; route interactions through _init_interactions_api_endpoints with fallbacks only when model is set. Consolidate duplicate router elif branches for interaction APIs. Co-authored-by: Cursor <cursoragent@cursor.com> * Fix greptile review * test(agents): add unit tests for managed agents SDK and HTTP handler Adds coverage for the new `litellm.interactions.agents` surface area: - main.py: sync/async entry points (create/list/get/delete/list_versions), provider config lookup, logging-obj helper, async error wrapping - http_handler.py: every CRUD method (sync + async paths), `_is_async` dispatch branches, and provider error mapping through GeminiAgentsConfig - utils.py: get_provider_agents_api_config for supported / unsupported providers Brings patch coverage on these files from <25% to ~100% so codecov/patch is satisfied. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * docs(gemini-agents): fix misleading credential-passing examples in GET/DELETE docstrings (#28293) The four GET/DELETE endpoint docstrings (list_gemini_agents, get_gemini_agent, delete_gemini_agent, list_gemini_agent_versions) documented passing per-request credentials as flat query parameters (e.g. ?api_key=AIza...). However, _merge_query_params_into_data only reads the JSON-encoded litellm_params_template query parameter and intentionally ignores flat params (URL query strings appear verbatim in access logs, browser history, and Referer headers). Callers following the documented curl examples would have their credentials silently dropped and hit auth failures against Gemini. Update the examples to use the supported JSON-encoded litellm_params_template query parameter, matching _merge_query_params_into_data's own docstring. Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * refactor(agents): rename provider-agnostic agent response types Move GeminiAgent{ListResponse,DeleteResult,VersionsResponse} to provider-neutral names (AgentListResponse, AgentDeleteResult, AgentVersionsResponse) so the BaseAgentsAPIConfig interface no longer references Gemini-specific type names. * fix(gemini-agents): close veria-flagged credential-escalation gaps Two high-severity findings from the veria-ai PR review are addressed: 1. api_base override could leak the shared Gemini key GeminiAgentsConfig.validate_environment falls back to GOOGLE_API_KEY / GEMINI_API_KEY when no api_key is supplied. Combined with caller-controlled api_base on the proxy CRUD endpoints, an authenticated user could redirect the outbound request to an attacker-controlled host and capture the operator's shared Gemini key from the x-goog-api-key header. The config now refuses env-fallback whenever api_base is explicitly overridden. 2. Managed-agent CRUD exposed to ordinary LLM keys The new /v1beta/agents routes live in google_routes (i.e. llm_api_routes), so any non-admin LLM key can reach them. Unlike /v1beta/models/...: generateContent these endpoints are NOT model-routed and have no model_list-supplied credentials, so env-fallback would let any LLM key list / create / delete agents inside the operator's Gemini project. Each endpoint now calls _enforce_caller_supplied_provider_key, which requires non-admin callers to supply their own Gemini api_key via litellm_params_template. Proxy admins keep the env-fallback convenience. Tests cover non-admin rejection, admin allow-through, the api_base override guard, and SDK env-fallback when api_base is not overridden. Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com> * test(router): restore strict assert_called_once_with on interactions default-provider test --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com> Co-authored-by: Mateo Wang <mateo-berri@users.noreply.github.com>	2026-05-19 16:02:03 -07:00
Sameer Kankute	cbdc70d544	fix(managed_batches): convert raw output_file_id to managed ID in CheckBatchCost poller (#27984 ) * fix(managed_batches): convert raw output_file_id to managed ID in CheckBatchCost poller CheckBatchCost bypasses async_post_call_success_hook, causing raw provider output_file_ids to be persisted in LiteLLM_ManagedObjectTable. This fix converts output_file_id and error_file_id to managed base64 IDs before the DB write. Co-authored-by: Cursor <cursoragent@cursor.com> * fix(check_batch_cost): persist managed file before mutating response and propagate team_id - Move setattr after store_unified_file_id so the response only receives the managed ID once the DB record is successfully written. Avoids serializing an orphaned managed ID into file_object when the store call fails. - Populate team_id on the minimal UserAPIKeyAuth from job.team_id so the managed file record is created with the correct team ownership, allowing other team members to access the batch output file via /files/{id}/content. Co-authored-by: Yassin Kortam <yassin@berri.ai> * test(managed_batches): extend test to cover error_file_id conversion Co-authored-by: Cursor <cursoragent@cursor.com> * fix managed file test --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Yassin Kortam <yassin@berri.ai>	2026-05-15 04:41:38 -07:00
Krrish Dholakia	8bbc61e03c	fix: harden /key/update authorization checks (#27878 ) * fix: patch Host-header auth bypass in get_request_route Starlette reconstructs request.url from the Host header. A malformed Host like `localhost/?x=1` causes Starlette to build the full URL as `http://localhost/?x=1/health`, which url-parses to path="/". Since "/" is in LiteLLMRoutes.public_routes, all protected routes became reachable without authentication. Fix: read scope["path"] (set by uvicorn from the HTTP request line, not derivable from headers) instead of request.url.path. Sub-path deployments are handled via scope["app_root_path"] / scope["root_path"], mirroring Starlette's own base_url construction logic. Affected variants confirmed fixed: Host: localhost/?x=1 Host: localhost:4000/?x=1 Host: localhost/#test Host: localhost:4000/#test Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * style: reduce comments in route fix Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block credential fields in RAG ingest vector_store options Credential fields (vertex_credentials, aws_access_key_id, api_key, etc.) in ingest_options.vector_store are now rejected at the API boundary with a 400 error. Credentials must be configured server-side. Previously any authenticated user could supply a vertex_credentials dict with type=external_account pointing credential_source.file at an arbitrary path (e.g. /proc/1/environ) and token_url at an attacker-controlled server. google-auth's identity_pool.Credentials refresh() would read the file and POST its contents to the attacker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block /key/update self-escalation by assigned users Non-admin users who were assigned a key (created_by != caller) could update any non-budget field — models, rpm_limit, guardrails, etc. — without admin authorization, allowing privilege self-escalation. Gate: only the key creator (created_by == caller) may edit their own key without admin check; budget changes always require admin regardless of creator status. All other callers must pass _check_key_admin_access. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block user-controlled api_base in RAG ingest vector_store options A user-supplied api_base in ingest_options.vector_store caused the server to forward its configured provider credentials (Gemini, OpenAI) to an attacker-controlled endpoint via SSRF. Add api_base to the blocked credential params set alongside api_key and the existing credential fields. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: restrict /utils/transform_request to PROXY_ADMIN and apply body safety check Any authenticated internal_user could POST arbitrary provider config (aws_sts_endpoint, api_base, etc.) to /utils/transform_request and have the server forward its credentials to an attacker-controlled endpoint. - Gate the endpoint on PROXY_ADMIN role (403 for all other roles) - Call is_request_body_safe() to reject banned params even for admins - Convert ValueError from safety check to HTTP 400 Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: apply banned-param check to /utils/transform_request Without is_request_body_safe(), any authenticated user could pass aws_sts_endpoint, api_base, or aws_web_identity_token to /utils/transform_request and have the server forward its configured provider credentials to an attacker-controlled endpoint during SDK credential resolution. Applies the same banned-param blocklist already used by LLM endpoints. Endpoint remains accessible to all authenticated users. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: block SSRF via api_base in /prompts/test dotprompt YAML frontmatter Any frontmatter key not in ["model","input","output"] flowed into optional_params and was merged into the LLM call data dict, bypassing is_request_body_safe. An attacker with any bearer key could set api_base in YAML to redirect the outbound LLM request — including the provider API key — to an attacker-controlled host. Fix: call is_request_body_safe on the constructed data dict after optional_params are merged, before invoking ProxyBaseLLMRequestProcessing. ValueError from the banned-param check is surfaced as HTTP 400. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * Update litellm/proxy/rag_endpoints/endpoints.py Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com> * fix: coerce nested config strings before banned-param check _NESTED_CONFIG_KEYS descent used isinstance(nested, dict) which silently skipped litellm_embedding_config when delivered as a JSON string via multipart/form-data. Banned params (api_base, aws_sts_endpoint, etc.) nested inside the stringified value were invisible to is_request_body_safe. _NESTED_METADATA_KEYS already used _coerce_metadata_to_dict which parses JSON strings before checking. Apply the same coercion to _NESTED_CONFIG_KEYS. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: replace substring match with prefix match in is_llm_api_route mapped_pass_through_routes used `_llm_passthrough_route in route` (substring) so any admin-only path whose URL contained a provider name (openai, anthropic, azure, bedrock, etc.) was misclassified as an LLM API route and bypassed the admin gate in non_proxy_admin_allowed_routes_check. Confirmed live: non-admin key could GET /credentials/by_name/openai (read masked provider API key) and DELETE /credentials/openai (delete credential). Fix: use exact match or startswith(prefix + "/") — the same pattern used everywhere else in RouteChecks — so only routes that actually start with a passthrough prefix are allowed through. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: stabilize PR #27878 test failures - key_management_endpoints: extend can_skip_admin_check to team keys so team members with /key/update permission can update non-budget fields. can_team_member_execute_key_management_endpoint already validates team membership + permission and raises if unauthorized; reaching the admin check on a team key means the caller was authorized. - test: set created_by on mock key in test_update_key_non_budget_fields_allowed_for_internal_user so caller_is_creator resolves correctly (MagicMock default ≠ user_id). - auth_utils.get_request_route: guard against non-dict request.scope (e.g. MagicMock in unit tests) to prevent a MagicMock leaking into UserAPIKeyAuth.request_route and failing Pydantic validation. - ci: assign test_multipart_bypass_repro.py to the proxy-runtime shard in test-unit-proxy-db.yml to satisfy the shard-coverage check. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix(lint): add explicit str() cast in get_request_route for MyPy scope.get() returns Any\|None which MyPy cannot coerce to str implicitly. Wrap both scope.get() calls in str() to satisfy the type checker. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: guard bare-/ root_path strip + make total_spend migration idempotent auth_utils.get_request_route: when Starlette sets scope["app_root_path"] to "/" (e.g. behind some middleware), the old stripping logic would remove the leading slash from every path ("/team/new" → "team/new"), breaking route matching and causing auth to misclassify protected routes. Skip stripping when root_path is bare "/". migration: add IF NOT EXISTS to total_spend ALTER TABLE so the migration is safe to replay when a prior partial run already created the column. Without this guard, prisma migrate deploy fails on CI DBs that were partially migrated, causing all subsequent DB operations (including /team/new) to 500. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: require creator still owns key for personal-key bypass in /key/update caller_is_creator now requires both created_by == caller AND user_id == caller. Previously checking only created_by let a demoted admin who originally created a key for another user continue editing non-budget fields on it after reassignment, bypassing _check_key_admin_access. Adds regression test: creator whose key was reassigned is blocked (403). Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> * fix: extract auth checks to fix PLR0915 + broaden max_budget assertion internal_user_endpoints._update_single_user_helper exceeded 50 statements (PLR0915). Extract authorization checks into _check_user_update_authz helper to bring statement count under the limit. test_validate_max_budget: assert "negative" (substring of both the local "cannot be negative" and the CI "non-negative finite number" messages) so the test is stable regardless of which exact wording the function uses. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: veria-ai[bot] <224490171+veria-ai[bot]@users.noreply.github.com>	2026-05-14 04:16:04 +00:00
yuneng-jiang	e3e5209f51	Merge pull request #27801 from stuxf/chore/get-instance-fn-runtime-s3-gate chore(proxy): refuse remote-URL instance-fn loads outside config-file path	2026-05-13 20:53:54 -07:00
Sameer Kankute	38709ba9bb	feat(proxy): skip disable_background_health_check models on GET /health when flag set (#27716 ) * feat(proxy): skip disable_background_health_check models on GET /health when flag set Co-authored-by: Cursor <cursoragent@cursor.com> * fix comment * fix greptile comments * Fix health check fallback kwargs * Format health endpoint * Harden direct health check kwargs compatibility for monkeypatched perform_health_check Replace substring-based TypeError detection with unexpected-keyword checks and a short retry chain (full kwargs, instrumentation only, filter only, minimal) so partial stubs work regardless of which optional kwarg fails first. Add proxy unit tests for legacy three-arg stubs and single-kwarg variants. Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com> * fix black --------- Co-authored-by: Cursor <cursoragent@cursor.com> Co-authored-by: Sameer Kankute <Sameerlite@users.noreply.github.com>	2026-05-13 09:49:05 -07:00
user	d853d3dcd4	chore(tests): thread config_file_path through s3/gcs custom-logger tests The pre-existing s3:// / gcs:// custom-logger tests called ``get_instance_fn`` without ``config_file_path``, which means the new runtime gate (refuse remote URLs unless invoked from a config-file load) now raises ``ValueError`` before reaching the mocked download paths. Each test was exercising the documented startup config-file load scenario; pass ``config_file_path="/any/path"`` to make that intent explicit and route past the gate. Affected: test_s3_download_success, test_gcs_download_success, test_invalid_url_format, test_download_failure_handling, test_file_cleanup.	2026-05-13 01:13:52 +00:00
harish-berri	8f25942ecf	Litellm key rotation bug (#27756 ) * fix(proxy): resolve cache handling issues in _lookup_deprecated_key - Updated the in-memory cache for deprecated key lookups to store a 3-tuple (active_token_id, cache_expires_at_ts, revoke_at_ts) instead of a 2-tuple, ensuring proper unpacking and backward compatibility. - Removed duplicate cache reads and added logic to handle legacy cache entries gracefully. - Enhanced unit tests to cover scenarios for cache hits, DB misses, and respect for revoke_at timestamps, ensuring robust handling of the grace-period key-rotation feature. * refactor(proxy): streamline cache handling in _lookup_deprecated_key - Simplified the cache retrieval logic by directly unpacking the 3-tuple cache entries, removing the need for backward compatibility checks for 2-tuple entries. - Updated unit tests to ensure that pre-warmed 3-tuple cache entries are served correctly without unnecessary database lookups. * chore(ci): add new unit test for deprecated key grace period - Included `test_deprecated_key_grace_period.py` in the CI workflow to enhance coverage for deprecated key handling scenarios. * fix(proxy): remove unnecessary check for revoke_at in _lookup_deprecated_key - Eliminated the redundant check for None on revoke_at, streamlining the logic for handling deprecated keys in the cache. This change enhances the efficiency of the key lookup process. * test(proxy): add end-to-end tests for deprecated key lookup behavior - Introduced a new test class `TestDeprecatedKeyLookupDbE2E` to validate the behavior of deprecated key lookups against a real Prisma-backed database. - The test ensures that old key hashes resolve correctly and that repeated lookups utilize the in-memory cache without errors. - Cleaned up the `_lookup_deprecated_key` function by removing an unnecessary check for `revoke_at`, enhancing the efficiency of the key lookup process.	2026-05-12 17:16:37 -07:00
Yuneng Jiang	4a78bfcd28	fix(proxy): always merge caller-supplied tags into request metadata Caller-supplied tags (`x-litellm-tags` header, body `tags`, `metadata.tags`) were silently dropped unless the key/team had `metadata.allow_client_tags: true` set. Restore the documented behavior: tags from the request always flow into `metadata.tags` and union with any admin-configured static tags from key/team/project metadata. Removes the `allow_client_tags` opt-in flag from the pre-call pipeline. The flag was only ever read here; it has no schema or endpoint footprint, so leftover values in existing key metadata are inert. Test cleanup mirrors the simplification: drop the three tests that verified the strip-when-not-opted-in path, drop the `allow_client_tags` fixture lines from the merge/union tests.	2026-05-12 14:38:50 -07:00
Tai An	80445299b8	fix(proxy): coerce non-str x-litellm-* header values to avoid httpx TypeError (#27458 ) (#27504 ) Squash-merged by litellm-agent from Anai-Guo's PR.	2026-05-09 20:32:31 +00:00
Milan	fa4c7a2ac6	Add unit tests for virtual-key model max budget Redis flush. Assert _push_in_memory_increments_to_redis runs after async_log_success_event when dual_cache.redis_cache is set, and is skipped when Redis is not configured. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-05-07 01:21:19 +03:00
oss-agent-shin	c8e47dcb43	Fix early proxy request size enforcement (#27311 ) * Add early proxy request size guard Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> * Address request size review feedback Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com> --------- Co-authored-by: oss-agent-shin <279349115+oss-agent-shin@users.noreply.github.com> Co-authored-by: ishaan-berri <ishaan-berri@users.noreply.github.com>	2026-05-06 12:29:11 -07:00
yuneng-jiang	9a338e1b6b	[Test] Tests: Stop parametrizing API keys into pytest test IDs (#27249 ) Several tests parametrized over (model, api_key, ...) tuples or raw token strings, causing pytest to embed those values in the test ID and print them in CI logs. Refactored each affected test to keep the same coverage without putting key material into parametrize. - audio_tests/test_audio_speech.py: split env-var keys into separate azure/openai test functions sharing a helper; sync_mode parametrize preserved. - audio_tests/test_whisper.py: split into openai_whisper / azure_whisper functions sharing a helper; response_format parametrize preserved. - local_testing/test_embedding.py: single-case parametrize inlined. - proxy_unit_tests/test_user_api_key_auth.py: 5 header parametrize cases split into 5 named tests sharing an _assert helper. - proxy_unit_tests/test_proxy_utils.py: 4 api_key_value cases split into 4 named tests. - test_litellm/proxy/auth/test_user_api_key_auth.py: 5 key-prefix cases (Bearer / Basic / lowercase bearer / raw / AWS SigV4) split into 5 named tests. Verified: black clean; 14 refactored unit tests pass; pytest collects audio/embedding tests with safe IDs (no key material in test IDs).	2026-05-05 17:21:18 -07:00
Ryan Crabbe	01ef723c3f	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_fix-ag-not-resolved	2026-05-01 17:21:29 -07:00
Ryan Crabbe	bf4c250d86	fix: gate key access_group override on group's own assignment Replaces the previous intersect-with-team.access_group_ids check, which made the override unreachable in practice (the team-gate fallback already covered every case the intersection allowed). The override now resolves each of the key's access_group_ids via get_access_object and accepts the group only if its assigned_team_ids includes the key's team_id, or its assigned_key_ids includes the key's token. This fulfills the original ask (a key can extend a team's allow-list via a group the admin granted to that team or that specific key) while still rejecting foreign groups referenced by team members of other teams.	2026-05-01 16:29:33 -07:00
Ryan Crabbe	f17d779666	fix: scope key access_group_ids override by team's assigned groups A team member could set any access_group_ids on their key (e.g. a group assigned only to a different team) and override the team's model restriction. Intersect the key's access_group_ids with team_object.access_group_ids in _key_access_group_grants_model so foreign groups are dropped before model expansion. Adds a regression test that asserts expansion is never called for foreign groups.	2026-05-01 15:54:03 -07:00
Yuneng Jiang	650821b538	Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_fix-config-update-targeted-upserts # Conflicts: # tests/test_litellm/proxy/test_proxy_server.py	2026-05-01 10:38:34 -07:00
harish-berri	7c8fe86fd9	Merge branch 'litellm_internal_staging' into litellm_token_verification_query_opt	2026-04-30 17:25:12 -07:00
yuneng-jiang	15b7386859	Merge pull request #26815 from stuxf/fix/get-image-lfi-ssrf chore(proxy): contain UI_LOGO_PATH / LITELLM_FAVICON_URL on unauthenticated asset endpoints	2026-04-30 17:10:15 -07:00
yuneng-jiang	4ff8f0e901	Merge pull request #26851 from stuxf/codex/fix-callback-env-secret-resolution chore(proxy): block env callback refs in key metadata	2026-04-30 13:11:32 -07:00
yuneng-jiang	aa76ab2df7	Merge pull request #26862 from stuxf/codex/control-field-sanitization chore(proxy): harden request control fields	2026-04-30 13:10:58 -07:00
Michael-RZ-Berri	9637d8c17b	Merge pull request #26802 from BerriAI/litellm_lazyLoadedFrontPage [Feat / Fix] Lazy loaded imports, lazy loaded front page	2026-04-30 13:04:42 -07:00
harish-berri	8df24b5413	Merge branch 'litellm_internal_staging' into litellm_token_verification_query_opt	2026-04-30 12:04:08 -07:00
user	b67a81da47	test(proxy): align favicon remote asset expectations	2026-04-30 11:46:45 -07:00
user	215f538d4f	fix(static-assets): browser-load remote branding assets	2026-04-30 11:30:57 -07:00
user	f48dfdbdd9	fix(proxy): require opt in for audit header fallback	2026-04-30 11:17:04 -07:00
user	db00e674e2	test(proxy): cover control field hardening branches	2026-04-29 23:57:50 -07:00
user	119c70b576	fix(proxy): gate delegated audit attribution	2026-04-29 23:10:14 -07:00
user	842eea0131	chore(proxy): harden request control fields	2026-04-29 22:35:17 -07:00
user	22c01adeb2	chore(proxy): ignore invalid callback metadata rows	2026-04-29 20:05:35 -07:00
user	f2f1e3a0ba	chore(proxy): block env callback refs in key metadata	2026-04-29 19:54:40 -07:00
Michael Riad Zaky	de75cd777e	test_proxy_routes: dedupe lazy force-load to match vector_store test pattern	2026-04-29 17:20:56 -07:00
Michael Riad Zaky	0f8dd28542	lazy-load optional feature routers on first request	2026-04-29 17:20:55 -07:00
user	75d1a0116e	fix(static-assets): use async_safe_get; drop SVG; serve bytes inline on cache miss Three review items addressed: * Veria (Medium): SSRF via redirect. ``fetch_validated_image_bytes`` was calling ``validate_url(url)`` once and then fetching with the default httpx client, so a 3xx to an internal IP would have been followed unvalidated. Switched to ``async_safe_get`` (the existing SSRF primitive used elsewhere in the codebase) which walks each redirect hop, re-validates, and rejects redirects to blocked networks. Default ``litellm.user_url_validation`` is True so protection is on out of the box. * Greptile (P2): SVG can embed JS. Removed ``image/svg+xml`` from the allowed-Content-Type set. The hardcoded response media type (``image/jpeg`` / ``image/x-icon``) means a real SVG body wouldn't render as SVG anyway in modern browsers — the allowlist entry was giving up XSS surface for no actual SVG-rendering benefit. If real SVG support is wanted later, that's a deliberate feature PR with CSP / nosniff bundled. * Greptile (P2): cache-write OSError drops validated bytes. When the upstream fetch succeeded but ``open(cache_path, "wb")`` raised (read-only assets dir), the bytes were discarded and the default logo was served — a silent regression for that deployment. Now serve the validated bytes inline via ``Response(...)`` as a fallback before falling back to default. Tests: - Replaced low-level mocks of ``validate_url`` with mocks of ``async_safe_get`` directly, exercising the helper's contract rather than the SSRF primitive's internals. - New ``test_rejects_svg_content_type`` confirms SVG is blocked. - ``test_get_image_cache_logic`` fixture now sets ``mock_response.is_redirect = False`` so ``async_safe_get`` doesn't treat the Mock's truthy attribute as a redirect. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 21:57:22 +00:00
user	55d393d77d	fix(static-assets): unblock CI — pass headers explicitly + harden + update legacy tests Three CI failures from the previous push, all addressed: * ``lint`` (mypy): ``async_client.get(url, *request_kwargs)`` confused mypy because ``AsyncHTTPHandler.get``'s second positional arg is typed ``bool \| None``. Switched to an explicit branch: ``await async_client.get(rewritten_url, headers={"host": host_header})`` for the HTTP-rewritten case, plain ``get(rewritten_url)`` otherwise. ``proxy-infra`` / ``test_get_image_custom_local_logo_bypasses_cache``: the existing test set ``UI_LOGO_PATH=/app/custom_logo.jpg`` with no ``LITELLM_ASSETS_PATH``, asserting the path was served verbatim. That was the LFI behaviour the new path-containment guard closes. Updated the test to set ``LITELLM_ASSETS_PATH=/app`` so the path is inside an allowed root, and patched the helper's ``realpath`` / ``isfile`` to go along with the mocked filesystem. Test intent (bypass cache when ``UI_LOGO_PATH`` is local) is preserved. * ``auth-and-jwt`` / ``test_get_image_cache_logic``: existing test built a ``Mock`` response without ``headers``, so the new Content-Type check tripped on ``Mock().split(";")[0]``. Two fixes: 1. Set ``mock_response.headers = {"content-type": "image/jpeg"}`` on the test (matches the real upstream contract — a logo CDN always sets a Content-Type). 2. Make ``fetch_validated_image_bytes`` defensive: if the Content-Type header is missing or non-string, treat as non-image and fall back to default. Closes a subtle hole — pre-fix, an upstream that omits Content-Type entirely would have served arbitrary bytes under the ``image/jpeg`` wrapper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 21:47:41 +00:00
user	bdb00c43cf	fix(spend-tracking): drop orphaned imports; align tests with alias contract CI surfaced two issues from the previous commit: 1. ``general_settings`` and ``master_key`` were still imported at the top of ``get_logging_payload`` but had no remaining users after the master-key hash-detection blocks were removed. Drop the import. 2. ``tests/proxy_unit_tests/test_user_api_key_auth.py::test_x_litellm_api_key`` and ``tests/proxy_unit_tests/test_key_generate_prisma.py::test_master_key_hashing`` asserted ``valid_token.token == hash_token(master_key)`` — the pre-alias behavior. The new contract is ``valid_token.token == LITELLM_PROXY_MASTER_KEY_ALIAS`` (and != ``hash_token(master_key)``), since the master key (and its hash) must not propagate to the verification-token column or any other downstream consumer. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-29 18:53:12 +00:00
harish-berri	1d62ca0e23	Merge branch 'litellm_internal_staging' into litellm_token_verification_query_opt	2026-04-28 17:34:17 -07:00
Krrish Dholakia	fd32f29e39	Revert "lazy-load optional feature routers on first request (#26534 )" (#26727 ) This reverts commit `21ed38971d`.	2026-04-29 00:21:41 +00:00
Michael-RZ-Berri	21ed38971d	lazy-load optional feature routers on first request (#26534 ) Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>	2026-04-28 17:04:40 -07:00
harish-berri	84b6bd60af	update test cases to match new behaviour. The earlier test cases assumed the cache stores a pydantic object	2026-04-28 21:08:46 +00:00
Yuneng Jiang	abbe5d7f85	fix(proxy): /config/update writes only sent sections, drop store_model_in_db gate The endpoint loaded the full merged YAML+DB config and re-saved every top-level section to LiteLLM_Config rows via save_config(), so a UI toggle of one field persisted unrelated YAML state to DB as a side effect. It also rejected every request when store_model_in_db was False — including the request that would flip the flag to True (chicken-and-egg). Replace save_config with targeted per-section upserts: read the existing litellm_config row, merge in the request, upsert just that row. Sections the caller did not send are not touched. Drop the blanket store_model_in_db guard — the endpoint already requires prisma_client, and the startup-side override at proxy_server.py:6491 picks up general_settings.store_model_in_db=True from the DB on next restart.	2026-04-27 14:59:33 -07:00

1 2 3 4 5 ...

482 Commits