test_add_litellm_data_to_request_duplicate_tags tests the request/key
tag merge when tags overlap. The merge requires caller-supplied tags to
flow through — set allow_client_tags=True on the key so the merge path
stays testable under the new default-deny regime.
Silent strip is the worst debug UX: admin's client sends routing tags,
they disappear, admin can't figure out why. Emit a warning naming the
metadata key the tags came from and telling the admin exactly which
flag to set if this is intentional.
Two pre-existing tests codified the pre-fix behavior where any caller-
supplied metadata.tags would flow through to spend logs and routing:
- test_add_key_or_team_level_spend_logs_metadata_to_request exercised
the request/key/team tag merge. Set allow_client_tags=True on the key
metadata so the merge path is still tested under the new regime.
- test_create_file_with_nested_litellm_metadata asserted that
litellm_metadata[tags] form-data propagated to the handler. Drop the
tag field; the test still proves nested form-parser correctness via
spend_logs_metadata and environment.
VERIA-28 (High) follow-up: tag-based routing and tag budget enforcement
read metadata.tags directly from the request, letting an attacker reach
restricted tag-routed deployments or misattribute spend to a victim
team's tag.
Strip metadata.tags (and litellm_metadata.tags) at the pre-call boundary
unless the caller's key or team metadata opts in with
allow_client_tags=True. Default-deny: existing clients that need to pass
routing tags must have the flag set explicitly on their key or team.
Preserves the tag-routing feature for admins who trust their callers;
closes the injection path for everyone else.
Expand the pre-call metadata strip to also remove user_api_key_metadata
and user_api_key_team_metadata. The proxy writes these fields into
data[_metadata_variable_name] with admin-authoritative values, but only
into that one metadata key; the caller's value in the OTHER metadata
key (metadata vs litellm_metadata) would otherwise persist and be
picked up by _get_admin_metadata, letting a caller supply their own
'admin' config to disable guardrails, opt out of global policies, etc.
VERIA-28 (High): Security Policy and Guardrail Bypass via Unsanitized
Request Metadata.
Add regression test at the proxy boundary verifying the strip, and
extend the guardrail test to cover the post-strip admin-config path.
Greptile P2: _get_admin_metadata used 'litellm_metadata or metadata',
meaning a caller sending a non-empty litellm_metadata would shadow
admin config the proxy had injected into data['metadata']. Admin
exemptions would be silently ignored.
Check both keys and prefer whichever contains admin fields. Add
regression test covering the shadowing scenario.
Include user_api_key_team_metadata alongside user_api_key_metadata in
_get_admin_metadata() so team-level guardrail settings are respected.
Key-level settings take precedence over team-level.
Remove turn_off_message_logging from _supported_callback_params so it
cannot be set via request metadata. Admin controls logging globally
or via key/team configuration.
Update tests to verify user-injected guardrail flags are ignored while
admin-configured flags are respected.
Extract _get_admin_metadata() in CustomGuardrail to deduplicate metadata
lookup. Hoist tag resolution above the deployment loop in budget limiter.
Update stale comment in tag routing.
Read guardrail control flags (disable_global_guardrails, opted_out_global_guardrails)
from admin-configured key metadata instead of the request body. This ensures
callers cannot override admin security policies.
Fix tag-based routing to enforce strict tag checks regardless of whether the
request includes tags. Fix budget limiter to use the same dynamic metadata
key resolution as the tag router for consistent tag extraction.
Workers in llm_translation_testing have been crashing mid-run with
"Not properly terminated" (OOM), even after bumping resource_class to
xlarge. Reduce xdist workers from 8 to 4 to lower peak memory, and add
--max-worker-restart=5 so a crashed worker is replaced instead of
failing the whole run.
langgraph-prebuilt was previously pulled in as a transitive of langgraph
so PyPI license metadata was reported as unknown. Now that it is
explicitly pinned (==1.0.8) to avoid the broken 1.0.9 release, the
license checker flags it. It is published under MIT by the same
langchain-ai/langgraph repository as langgraph itself.
langgraph-prebuilt 1.0.9 imports ExecutionInfo and ServerInfo from
langgraph.runtime, but those symbols are not exported until
langgraph 1.1.0. Our pin of langgraph==1.0.10 allows
langgraph-prebuilt<1.1.0,>=1.0.8, and uv resolves to 1.0.9 (the
latest in range), which breaks at import time in every test that
touches langgraph.prebuilt (e.g. tests/pass_through_tests/test_mcp_routes.py):
ImportError: cannot import name 'ExecutionInfo' from 'langgraph.runtime'
Pinning langgraph-prebuilt to 1.0.8 pairs correctly with
langgraph==1.0.10 and restores the import path.
Bedrock rejects clear_thinking_20251015 unless thinking is enabled or adaptive.
Inject minimal extended thinking and interleaved-thinking beta when Claude Code
sends context_management without thinking. Adds unit tests.
Made-with: Cursor
* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint (#25696)
* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint - Fixes#25538
* test(proxy): add tests for _get_openapi_url
---------
Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>
* feat(prometheus): add api_provider label to spend metric (#25693)
* feat(prometheus): add api_provider label to spend metric
Add `api_provider` to `litellm_spend_metric` labels so users can
build Grafana dashboards that break down spend by cloud provider
(e.g. bedrock, anthropic, openai, azure, vertex_ai).
The `api_provider` label already exists in UserAPIKeyLabelValues and
is populated from `standard_logging_payload["custom_llm_provider"]`,
but was not included in the spend metric's label list.
* add api_provider to requests metric + add test
Address review feedback:
- Add api_provider to litellm_requests_metric too (same call-site as
spend metric, keeps label sets in sync)
- Add test_api_provider_in_spend_and_requests_metrics following the
existing pattern in test_prometheus_labels.py
* fix: ensure `litellm_metadata` is attached to `pre_call` guardrail to align with `post_call` guardrail (#25641)
* fix: ensure `litellm_metadata` is attached to pre_call to align with post_call
* refactor: remove unused BaseTranslation._ensure_litellm_metadata
* refactor: module level imports for ensure_litellm_metadata and CodeQL
* fix: update based off of Codex comment
* revert: undo usage of `_guardrail_litellm_metadata`
* feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite-preview (#25610)
* fix(bedrock): skip synthetic tool injection for json_object with no schema (#25740)
When response_format={"type": "json_object"} is sent without a JSON
schema, _create_json_tool_call_for_response_format builds a tool with an
empty schema (properties: {}). The model follows the empty schema and
returns {} instead of the actual JSON the caller asked for.
This patch:
- Skips synthetic json_tool_call injection when no schema is provided.
The model already returns JSON when the prompt asks for it.
- Fixes finish_reason: after _filter_json_mode_tools strips all
synthetic tool calls, finish_reason stays "tool_calls" instead of
"stop". Callers (like the OpenAI SDK) misinterpret this as a pending
tool invocation.
json_schema requests with an explicit schema are unchanged.
Co-authored-by: Claude <noreply@anthropic.com>
* fix(utils): allowed_openai_params must not forward unset params as None
`_apply_openai_param_overrides` iterated `allowed_openai_params` and
unconditionally wrote `optional_params[param] = non_default_params.pop(param, None)`
for each entry. If the caller listed a param name but did not actually
send that param in the request, the pop returned `None` and `None` was
still written to `optional_params`. The openai SDK then rejected it as
a top-level kwarg:
AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking'
Reproducer (from #25697):
allowed_openai_params = ["chat_template_kwargs", "enable_thinking"]
body = {"chat_template_kwargs": {"enable_thinking": False}}
Here `enable_thinking` is only present nested inside
`chat_template_kwargs`, so the helper should forward
`chat_template_kwargs` and leave `enable_thinking` alone. Instead it
wrote `optional_params["enable_thinking"] = None`.
Fix: only forward a param if it was actually present in
`non_default_params`. Behavior is unchanged for the happy path (param
sent → still forwarded), and the explicit `None` leakage is gone.
Adds a regression test exercising the helper in isolation so the test
does not depend on any provider-specific `map_openai_params` plumbing.
Fixes#25697
---------
Co-authored-by: lovek629 <59618812+lovek629@users.noreply.github.com>
Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>
Co-authored-by: Ori Kotek <ori.k@codium.ai>
Co-authored-by: Alexander Grattan <51346343+agrattan0820@users.noreply.github.com>
Co-authored-by: Mohana Siddhartha Chivukula <103447836+iamsiddhu3007@users.noreply.github.com>
Co-authored-by: Amiram Mizne <amiramm@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
Noma v1 resolved application_id from user_api_key_alias when no explicit
value was set (PR #16832). Noma v2 (PR #21400) was rewritten from scratch
and this fallback was not ported, causing all requests from shared LiteLLM
instances to appear as a single generic "litellm" application in the Noma
dashboard — breaking per-user traceability.
Fix: after checking dynamic_params and self.application_id, fall back to
user_api_key_alias from litellm_metadata or metadata. This matches the
pattern used by PromptSecurityGuardrail._resolve_key_alias_from_request_data()
and restores the v1 behavior where each API key gets its own application
entry in the Noma dashboard.
Fixes#25794
Co-authored-by: Brendan Smith-Elion <brendan.smith-elion@arcadia.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* fix(ollama): propagate done_reason='length' as finish_reason for max_tokens truncation
Ollama returns done_reason='length' when a response is cut off by num_predict
(the max_tokens limit). Previously, non-streaming responses hardcoded
finish_reason='stop', and streaming used chunk.get('done_reason', 'stop')
which also defaulted to 'stop' when done_reason was absent.
This meant callers (e.g. the Anthropic pass-through adapter, which maps
OpenAI 'length' -> Anthropic 'max_tokens') could never detect truncation,
making stop_reason always appear as 'end_turn' even for cut-off responses.
Fix: read done_reason from the response JSON in the non-streaming path and
use `chunk.get('done_reason') or 'stop'` in the streaming path, so Ollama's
actual done_reason passes through to the caller unchanged.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Update test_ollama_chat_transformation.py
* Update litellm/llms/ollama/chat/transformation.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
The Vertex AI count-tokens endpoint rejects model names that include
version suffixes (@default, @20251001, etc.) with:
"claude-sonnet-4-6@default is not supported for token counting"
The same model without the suffix ("claude-sonnet-4-6") works correctly.
Strip @suffix from both the model parameter and request_data["model"]
in handle_count_tokens_request before sending to the API.
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
MaskedHTTPStatusError constructs a new httpx.Response from the original
error. Two bugs surfaced under real HTTP error responses:
1. The new Response was created without request=, so response.request
raised RuntimeError("The .request property has not been set.") for
any downstream caller (e.g. exception_mapping_utils) that inspected it.
2. The decoded response bytes were passed together with the original
Content-Encoding header. On construction httpx tried to decompress
the already-decoded bytes and raised httpx.DecodingError
("Error -3 while decompressing data: incorrect header check").
Set response.request to the masked Request and strip Content-Encoding
(and the now-stale Content-Length) before rebuilding the Response.
URL/message masking is unchanged; the new request carries the already
masked URL.
Also update test_logging_key_masking_gemini: the security commit
25f93bed91 moved Gemini API keys from ?key=... URL params to the
x-goog-api-key header, so api_base no longer contains the key.
The projected-spend alert in _update_key_cache read from
existing_spend_obj.litellm_budget_table["soft_budget"], but the nested
dict is never populated for virtual keys (the combined_view SQL maps
budget fields to flat top-level attributes instead). This made the
check dead code — it silently short-circuited on every request, and
when unblocked, crashed update_cache with a Pydantic ValidationError
because _get_projected_spend_over_limit returns a date object but
CallInfo.projected_exceeded_date expects str.
Fixes: read from the flat existing_spend_obj.soft_budget field that IS
populated, and stringify projected_exceeded_date.
Also marks team soft budget email alerts as enterprise in docs.
Closes#20324
RestrictedPython (ZPL-2.1, a BSD-style permissive license) was added as
a dependency for the custom_code guardrail sandbox, but the license
checker didn't recognize it. Add to authorized packages list.
- vertex_ai_context_caching.py: add explicit Optional[str] annotation on
auth_header so later branches that assign vertex_auth_header (Optional[str])
type-check against the first branch's dict assignment (which already has
type: ignore[assignment]).
- path_utils.py: remove unused pathlib.Path import (F401).
- emulated_handler.py: extract _extract_tool_call_fields,
_resolve_queries_from_args, _execute_file_search_tool_calls, and
_build_follow_up_input helpers to drop aresponses_with_emulated_file_search
below ruff's PLR0915 statement limit. Behavior unchanged.