Replaces the previous intersect-with-team.access_group_ids check, which
made the override unreachable in practice (the team-gate fallback already
covered every case the intersection allowed). The override now resolves
each of the key's access_group_ids via get_access_object and accepts the
group only if its assigned_team_ids includes the key's team_id, or its
assigned_key_ids includes the key's token. This fulfills the original ask
(a key can extend a team's allow-list via a group the admin granted to
that team or that specific key) while still rejecting foreign groups
referenced by team members of other teams.
A team member could set any access_group_ids on their key (e.g. a group
assigned only to a different team) and override the team's model
restriction. Intersect the key's access_group_ids with team_object.access_group_ids
in _key_access_group_grants_model so foreign groups are dropped before
model expansion. Adds a regression test that asserts expansion is never
called for foreign groups.
Three review items addressed:
* **Veria (Medium): SSRF via redirect.** ``fetch_validated_image_bytes``
was calling ``validate_url(url)`` once and then fetching with the
default httpx client, so a 3xx to an internal IP would have been
followed unvalidated. Switched to ``async_safe_get`` (the existing
SSRF primitive used elsewhere in the codebase) which walks each
redirect hop, re-validates, and rejects redirects to blocked
networks. Default ``litellm.user_url_validation`` is True so
protection is on out of the box.
* **Greptile (P2): SVG can embed JS.** Removed ``image/svg+xml`` from
the allowed-Content-Type set. The hardcoded response media type
(``image/jpeg`` / ``image/x-icon``) means a real SVG body wouldn't
render as SVG anyway in modern browsers — the allowlist entry was
giving up XSS surface for no actual SVG-rendering benefit. If real
SVG support is wanted later, that's a deliberate feature PR with CSP
/ nosniff bundled.
* **Greptile (P2): cache-write OSError drops validated bytes.** When
the upstream fetch succeeded but ``open(cache_path, "wb")`` raised
(read-only assets dir), the bytes were discarded and the default
logo was served — a silent regression for that deployment. Now
serve the validated bytes inline via ``Response(...)`` as a fallback
before falling back to default.
Tests:
- Replaced low-level mocks of ``validate_url`` with mocks of
``async_safe_get`` directly, exercising the helper's contract
rather than the SSRF primitive's internals.
- New ``test_rejects_svg_content_type`` confirms SVG is blocked.
- ``test_get_image_cache_logic`` fixture now sets
``mock_response.is_redirect = False`` so ``async_safe_get`` doesn't
treat the Mock's truthy attribute as a redirect.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three CI failures from the previous push, all addressed:
* ``lint`` (mypy): ``async_client.get(url, **request_kwargs)`` confused
mypy because ``AsyncHTTPHandler.get``'s second positional arg is typed
``bool | None``. Switched to an explicit branch:
``await async_client.get(rewritten_url, headers={"host": host_header})``
for the HTTP-rewritten case, plain ``get(rewritten_url)`` otherwise.
* ``proxy-infra`` /
``test_get_image_custom_local_logo_bypasses_cache``: the existing
test set ``UI_LOGO_PATH=/app/custom_logo.jpg`` with no
``LITELLM_ASSETS_PATH``, asserting the path was served verbatim. That
was the LFI behaviour the new path-containment guard closes. Updated
the test to set ``LITELLM_ASSETS_PATH=/app`` so the path is inside an
allowed root, and patched the helper's ``realpath`` / ``isfile`` to
go along with the mocked filesystem. Test intent (bypass cache when
``UI_LOGO_PATH`` is local) is preserved.
* ``auth-and-jwt`` / ``test_get_image_cache_logic``: existing test
built a ``Mock`` response without ``headers``, so the new
Content-Type check tripped on ``Mock().split(";")[0]``. Two fixes:
1. Set ``mock_response.headers = {"content-type": "image/jpeg"}``
on the test (matches the real upstream contract — a logo CDN
always sets a Content-Type).
2. Make ``fetch_validated_image_bytes`` defensive: if the
Content-Type header is missing or non-string, treat as non-image
and fall back to default. Closes a subtle hole — pre-fix, an
upstream that omits Content-Type entirely would have served
arbitrary bytes under the ``image/jpeg`` wrapper.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
CI surfaced two issues from the previous commit:
1. ``general_settings`` and ``master_key`` were still imported at the top
of ``get_logging_payload`` but had no remaining users after the
master-key hash-detection blocks were removed. Drop the import.
2. ``tests/proxy_unit_tests/test_user_api_key_auth.py::test_x_litellm_api_key``
and ``tests/proxy_unit_tests/test_key_generate_prisma.py::test_master_key_hashing``
asserted ``valid_token.token == hash_token(master_key)`` — the
pre-alias behavior. The new contract is
``valid_token.token == LITELLM_PROXY_MASTER_KEY_ALIAS`` (and !=
``hash_token(master_key)``), since the master key (and its hash)
must not propagate to the verification-token column or any other
downstream consumer.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The endpoint loaded the full merged YAML+DB config and re-saved every
top-level section to LiteLLM_Config rows via save_config(), so a UI toggle
of one field persisted unrelated YAML state to DB as a side effect. It
also rejected every request when store_model_in_db was False — including
the request that would flip the flag to True (chicken-and-egg).
Replace save_config with targeted per-section upserts: read the existing
litellm_config row, merge in the request, upsert just that row. Sections
the caller did not send are not touched. Drop the blanket
store_model_in_db guard — the endpoint already requires prisma_client,
and the startup-side override at proxy_server.py:6491 picks up
general_settings.store_model_in_db=True from the DB on next restart.
- test_prepare_key_update_data: replace bare MagicMock with
MagicMock(spec=LiteLLM_VerificationToken) and explicitly set
existing_key_row.metadata = {}, so reserved-field reads return real
values instead of MagicMock-returning-MagicMock. Fixes a regression
surfaced by the new reserved-metadata preservation logic.
- test_key_management_endpoints.py: black-format-only changes from
recent edits.
The routes in `global_spend_tracking_routes` (e.g. /global/spend/report,
/global/spend/teams, /global/spend/keys) return spend aggregated across
every team, customer, and api_key in the proxy. They were included in
`internal_user_routes` and `internal_user_view_only_routes`, so non-admin
roles could read proxy-wide spend.
Drop them from both non-admin route lists. PROXY_ADMIN and
PROXY_ADMIN_VIEW_ONLY access is preserved through their existing branches
in route_checks.py, and the `get_spend_routes` permission opt-in
continues to grant access for keys that need it.
Updates two pre-existing test parametrizations whose expected results
flip from True to False, and adds parametrized coverage over every
route in `global_spend_tracking_routes` for: PROXY_ADMIN_VIEW_ONLY
allowed, INTERNAL_USER blocked, INTERNAL_USER_VIEW_ONLY blocked,
INTERNAL_USER + get_spend_routes permission allowed.
Multiple paths through _user_api_key_auth_builder returned a
UserAPIKeyAuth without running common_checks(): OAuth2 token validation,
OAuth2 proxy header hook, JWT admin shortcut, master_key path,
pass-through custom headers, the /user/auth route, and the
allow_requests_on_db_unavailable fallback. An operator-configured key
model-access list, max_budget, team_blocked flag, or team model scope
was therefore silently skipped on those paths. The HA-fallback token
was worse: it was a full proxy-admin synthetic, so a DB outage granted
full admin to every caller.
Fix three root causes (VERIA-18):
1. Centralize common_checks in the user_api_key_auth wrapper. The
builder paths no longer call it; the wrapper runs it once after the
builder returns, for every path. Introduces _run_centralized_common_checks
which gathers team/user/project/end_user/global_spend context in
parallel via asyncio.gather. Preserves the existing
custom_auth_run_common_checks opt-out for custom-auth deployments.
2. Narrow is_database_connection_error — drop the blanket PrismaError
catch that routed data-layer errors (UniqueViolationError, etc.)
into the HA fallback. Only real connectivity failures plus the
no_db_connection marker now qualify.
3. DB-unavailable fallback issues an INTERNAL_USER token with user_id
DB_UNAVAILABLE_FALLBACK_USER_ID instead of proxy-admin. An outage
can no longer escalate an anonymous caller.
JWT admin / master_key tokens still grant admin via a synthesized
admin user_object (so non_proxy_admin_allowed_routes_check in
common_checks recognizes them); other common_checks branches
(team_blocked, team_model_access) now apply uniformly.
Two model-access gates run per request in `common_checks` and they're
asymmetric: `can_key_call_model` falls back to the key's
`access_group_ids`, but `can_team_access_model` only looks at
`team.models` + `team.access_group_ids`. A key granted a model via its
own access group on a model-restricted team is silently denied at the
team gate.
Wrap `can_team_access_model` in try/except in `common_checks`: on
`team_model_access_denied`, consult a new `_key_access_group_grants_model`
helper that expands `valid_token.access_group_ids` via the existing
`_get_models_from_access_groups` and checks via `_can_object_call_model`.
Re-raise if the key's access groups don't grant the model. Any other
exception propagates unchanged.
Effect: request allowed if `team allows X` OR `key's access group
grants X`, making the two gates symmetric.
Test: add three unit tests for `_key_access_group_grants_model`
covering: group covers model, key has no groups, group resolves but
does not cover model.
Addresses review feedback on the snapshot approach:
1. Class-instance mutable state
The snapshot only covers primitives + collections + None. Class
instances (DualCache, LLMClientCache) weren't reset between tests,
so in-place cache mutations could leak. Can't deepcopy these — they
hold thread locks — but they expose flush_cache(). Collect every
module attribute whose value implements flush_cache() at conftest
import, and invoke it per-test alongside the snapshot restore.
2. Silent skips are now warnings
_snapshot_mutable_state and _restore_mutable_state previously
swallowed exceptions, so if a future attr gained a property without
a setter (or other non-round-trippable state), an isolation gap
would have no signal. Emit warnings.warn on each failure path.
3. Docstring
Explicitly documents what IS and IS NOT reset, and tells authors to
use monkeypatch.setattr() for in-place mutations of instances
without flush_cache() (ProxyLogging, JWTHandler, etc.).
The previous snapshot only tracked list/dict/set values. Tests mutate
scalar module attrs too — master_key, premium_user, prisma_client — and
importlib.reload used to reset those implicitly. Under the snapshot
approach they were leaking between tests, so test_active_callbacks
failed in CI with "No api key passed in." once an earlier test left
master_key set to sk-1234.
Expand the snapshot to cover primitives (str/int/float/bool/bytes/tuple)
and None-valued attributes. Complex object instances are still skipped
to avoid deepcopy issues.
tests/proxy_unit_tests/conftest.py was calling importlib.reload(litellm) in an
autouse function-scoped fixture, which cost ~17s per test because it re-ran
the full litellm __init__ import chain. With 400+ proxy unit tests, this was
the single biggest driver of CI wall time — 18 of the top 20 slowest durations
in a typical run were just the 17s fixture setup.
Replace the reload with a snapshot-and-restore approach: snapshot the mutable
lists/dicts/sets on litellm and litellm.proxy.proxy_server once at conftest
import, then deep-copy that snapshot back before each test. Callback lists,
caches, router state, etc. still get reset between tests, but the expensive
import chain only runs once per worker.
Local measurement on test_proxy_utils.py: 188 tests in 3.50s (previously took
~15 minutes of CI wall time on a single worker).
test_add_litellm_data_to_request_duplicate_tags tests the request/key
tag merge when tags overlap. The merge requires caller-supplied tags to
flow through — set allow_client_tags=True on the key so the merge path
stays testable under the new default-deny regime.
Two pre-existing tests codified the pre-fix behavior where any caller-
supplied metadata.tags would flow through to spend logs and routing:
- test_add_key_or_team_level_spend_logs_metadata_to_request exercised
the request/key/team tag merge. Set allow_client_tags=True on the key
metadata so the merge path is still tested under the new regime.
- test_create_file_with_nested_litellm_metadata asserted that
litellm_metadata[tags] form-data propagated to the handler. Drop the
tag field; the test still proves nested form-parser correctness via
spend_logs_metadata and environment.
* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint (#25696)
* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint - Fixes#25538
* test(proxy): add tests for _get_openapi_url
---------
Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>
* feat(prometheus): add api_provider label to spend metric (#25693)
* feat(prometheus): add api_provider label to spend metric
Add `api_provider` to `litellm_spend_metric` labels so users can
build Grafana dashboards that break down spend by cloud provider
(e.g. bedrock, anthropic, openai, azure, vertex_ai).
The `api_provider` label already exists in UserAPIKeyLabelValues and
is populated from `standard_logging_payload["custom_llm_provider"]`,
but was not included in the spend metric's label list.
* add api_provider to requests metric + add test
Address review feedback:
- Add api_provider to litellm_requests_metric too (same call-site as
spend metric, keeps label sets in sync)
- Add test_api_provider_in_spend_and_requests_metrics following the
existing pattern in test_prometheus_labels.py
* fix: ensure `litellm_metadata` is attached to `pre_call` guardrail to align with `post_call` guardrail (#25641)
* fix: ensure `litellm_metadata` is attached to pre_call to align with post_call
* refactor: remove unused BaseTranslation._ensure_litellm_metadata
* refactor: module level imports for ensure_litellm_metadata and CodeQL
* fix: update based off of Codex comment
* revert: undo usage of `_guardrail_litellm_metadata`
* feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite-preview (#25610)
* fix(bedrock): skip synthetic tool injection for json_object with no schema (#25740)
When response_format={"type": "json_object"} is sent without a JSON
schema, _create_json_tool_call_for_response_format builds a tool with an
empty schema (properties: {}). The model follows the empty schema and
returns {} instead of the actual JSON the caller asked for.
This patch:
- Skips synthetic json_tool_call injection when no schema is provided.
The model already returns JSON when the prompt asks for it.
- Fixes finish_reason: after _filter_json_mode_tools strips all
synthetic tool calls, finish_reason stays "tool_calls" instead of
"stop". Callers (like the OpenAI SDK) misinterpret this as a pending
tool invocation.
json_schema requests with an explicit schema are unchanged.
Co-authored-by: Claude <noreply@anthropic.com>
* fix(utils): allowed_openai_params must not forward unset params as None
`_apply_openai_param_overrides` iterated `allowed_openai_params` and
unconditionally wrote `optional_params[param] = non_default_params.pop(param, None)`
for each entry. If the caller listed a param name but did not actually
send that param in the request, the pop returned `None` and `None` was
still written to `optional_params`. The openai SDK then rejected it as
a top-level kwarg:
AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking'
Reproducer (from #25697):
allowed_openai_params = ["chat_template_kwargs", "enable_thinking"]
body = {"chat_template_kwargs": {"enable_thinking": False}}
Here `enable_thinking` is only present nested inside
`chat_template_kwargs`, so the helper should forward
`chat_template_kwargs` and leave `enable_thinking` alone. Instead it
wrote `optional_params["enable_thinking"] = None`.
Fix: only forward a param if it was actually present in
`non_default_params`. Behavior is unchanged for the happy path (param
sent → still forwarded), and the explicit `None` leakage is gone.
Adds a regression test exercising the helper in isolation so the test
does not depend on any provider-specific `map_openai_params` plumbing.
Fixes#25697
---------
Co-authored-by: lovek629 <59618812+lovek629@users.noreply.github.com>
Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>
Co-authored-by: Ori Kotek <ori.k@codium.ai>
Co-authored-by: Alexander Grattan <51346343+agrattan0820@users.noreply.github.com>
Co-authored-by: Mohana Siddhartha Chivukula <103447836+iamsiddhu3007@users.noreply.github.com>
Co-authored-by: Amiram Mizne <amiramm@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>