Commit Graph

416 Commits

Author SHA1 Message Date
ishaan-berri 1c128a86b8 Merge pull request #25256 from BerriAI/litellm_ishaan_april6
Litellm ishaan april6
2026-04-17 16:26:45 -07:00
Ishaan Jaffer e8461b5b97 style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
Ishaan Jaffer f31d4faa87 Merge origin/main into litellm_ishaan_april6 2026-04-17 12:36:51 -07:00
Yuneng Jiang f8f356fac3 test: add parametrized tests for api_key value handling in credential check 2026-04-16 16:08:47 -07:00
Yuneng Jiang dafa1bf97c Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_apr15
# Conflicts:
#	litellm/litellm_core_utils/litellm_logging.py
#	uv.lock
2026-04-16 09:17:20 -07:00
Tim Ren dd4a41951f fix(utils): allowed_openai_params must not forward unset params as None (#25777)
* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint (#25696)

* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint - Fixes #25538

* test(proxy): add tests for _get_openapi_url

---------

Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>

* feat(prometheus): add api_provider label to spend metric (#25693)

* feat(prometheus): add api_provider label to spend metric

Add `api_provider` to `litellm_spend_metric` labels so users can
build Grafana dashboards that break down spend by cloud provider
(e.g. bedrock, anthropic, openai, azure, vertex_ai).

The `api_provider` label already exists in UserAPIKeyLabelValues and
is populated from `standard_logging_payload["custom_llm_provider"]`,
but was not included in the spend metric's label list.

* add api_provider to requests metric + add test

Address review feedback:
- Add api_provider to litellm_requests_metric too (same call-site as
  spend metric, keeps label sets in sync)
- Add test_api_provider_in_spend_and_requests_metrics following the
  existing pattern in test_prometheus_labels.py

* fix: ensure `litellm_metadata` is attached to `pre_call` guardrail to align with `post_call` guardrail (#25641)

* fix: ensure `litellm_metadata` is attached to pre_call to align with post_call

* refactor: remove unused BaseTranslation._ensure_litellm_metadata

* refactor: module level imports for ensure_litellm_metadata and CodeQL

* fix: update based off of Codex comment

* revert: undo usage of `_guardrail_litellm_metadata`

* feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite-preview (#25610)

* fix(bedrock): skip synthetic tool injection for json_object with no schema (#25740)

When response_format={"type": "json_object"} is sent without a JSON
schema, _create_json_tool_call_for_response_format builds a tool with an
empty schema (properties: {}). The model follows the empty schema and
returns {} instead of the actual JSON the caller asked for.

This patch:
- Skips synthetic json_tool_call injection when no schema is provided.
  The model already returns JSON when the prompt asks for it.
- Fixes finish_reason: after _filter_json_mode_tools strips all
  synthetic tool calls, finish_reason stays "tool_calls" instead of
  "stop". Callers (like the OpenAI SDK) misinterpret this as a pending
  tool invocation.

json_schema requests with an explicit schema are unchanged.

Co-authored-by: Claude <noreply@anthropic.com>

* fix(utils): allowed_openai_params must not forward unset params as None

`_apply_openai_param_overrides` iterated `allowed_openai_params` and
unconditionally wrote `optional_params[param] = non_default_params.pop(param, None)`
for each entry. If the caller listed a param name but did not actually
send that param in the request, the pop returned `None` and `None` was
still written to `optional_params`. The openai SDK then rejected it as
a top-level kwarg:

    AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking'

Reproducer (from #25697):

    allowed_openai_params = ["chat_template_kwargs", "enable_thinking"]
    body = {"chat_template_kwargs": {"enable_thinking": False}}

Here `enable_thinking` is only present nested inside
`chat_template_kwargs`, so the helper should forward
`chat_template_kwargs` and leave `enable_thinking` alone. Instead it
wrote `optional_params["enable_thinking"] = None`.

Fix: only forward a param if it was actually present in
`non_default_params`. Behavior is unchanged for the happy path (param
sent → still forwarded), and the explicit `None` leakage is gone.

Adds a regression test exercising the helper in isolation so the test
does not depend on any provider-specific `map_openai_params` plumbing.

Fixes #25697

---------

Co-authored-by: lovek629 <59618812+lovek629@users.noreply.github.com>
Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>
Co-authored-by: Ori Kotek <ori.k@codium.ai>
Co-authored-by: Alexander Grattan <51346343+agrattan0820@users.noreply.github.com>
Co-authored-by: Mohana Siddhartha Chivukula <103447836+iamsiddhu3007@users.noreply.github.com>
Co-authored-by: Amiram Mizne <amiramm@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-16 19:04:26 +05:30
user f521e27371 test(gemini): align API key expectations 2026-04-14 23:28:13 +00:00
Darien Kindlund 17e145a083 fix(proxy): use model_group for model_max_budget spend tracking cache key (#25549)
The model_max_budget limiter tracks spend in one code path
(async_log_success_event) and enforces budget limits in another
(is_key_within_model_budget via user_api_key_auth). These two paths
used different model name formats to build cache keys:

- Tracking used standard_logging_payload["model"], which is the
  deployment-level model name (e.g. "vertex_ai/claude-opus-4-6@default")
- Enforcement used request_data["model"], which is the model group
  alias (e.g. "claude-opus-4-6")

Because the cache keys never matched, the enforcement path always read
None for current spend, silently allowing all requests through even
after the budget was exceeded. This affected any provider that decorates
model names with provider prefixes or version suffixes (Vertex AI,
Bedrock, etc.).

Fix: use model_group (the user-facing alias) from StandardLoggingPayload
for spend tracking, falling back to model when model_group is None.
This aligns the tracking cache key with the enforcement cache key.

Fixes the same root cause reported in #15223 and #10052.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 19:37:58 -07:00
Ishaan Jaffer d22a07a9ba Merge remote-tracking branch 'origin/main' into ci-fix-april6-fixes 2026-04-11 12:04:14 -07:00
Ryan Crabbe 3af7de4222 retain ui_routes enum alias for JWT config backwards compatibility 2026-04-10 08:55:32 -07:00
Ryan Crabbe 26e99f22b3 refactor: consolidate route auth for UI and API tokens
Unify UI and API token authorization through the shared RBAC path
and backfill missing routes in role-based route lists.
2026-04-09 21:36:35 -07:00
Krrish Dholakia f42ffed2bd Litellm oss staging 04 02 2026 p1 (#25055)
* fix(vertex_ai): support pluggable (executable) credential_source for WIF auth (#24700)

The WIF credential dispatch in load_auth() only handled identity_pool and
aws credential types. When credential_source.executable was present (used
for Azure Managed Identity via Workload Identity Federation), it fell
through to identity_pool.Credentials which rejected it with MalformedError.

Add dispatch to google.auth.pluggable.Credentials for executable-type
credential sources, following the same pattern as the existing identity_pool
and aws helpers.

Fixes authentication for Azure Container Apps → GCP Vertex AI via WIF
with executable credential sources.

* feat(logging): add component and logger fields to JSON logs for 3rd p… (#24447)

* feat(logging): add component and logger fields to JSON logs for 3rd party filtering

* Let user-supplied extra fields win over auto-generated component/logger, tighten test assertions

* Feat - Add organization into the metrics metadata for org_id & org_alias (#24440)

* Add org_id and org_alias label names to Prometheus metric definitions

* Add user_api_key_org_alias to StandardLoggingUserAPIKeyMetadata

* Populate user_api_key_org_alias in pre-call metadata

* Pass org_id and org_alias into per-request Prometheus metric labels

* Add test for org labels on per-request Prometheus metrics

* chore: resolve test mockdata

* Address review: populate org_alias from DB view, add feature flag, use .get() for org metadata

* Add org labels to failure path and verify flag behavior in test

* Fix test: build flag-off enum_values without org fields

* Gate org labels behind feature flag in get_labels() instead of static metric lists

* Scope org label injection to metrics that carry team context, remove orphaned budget label defs, add test teardown

* Use explicit metric allowlist for org label injection instead of team heuristic

* Fix duplicate org label guard, move _org_label_metrics to class constant

* Reset custom_prometheus_metadata_labels after duplicate label assertion

* fix: emit org labels by default, remove flag, fix missing org_alias in all metadata paths

* fix: emit org labels by default, no opt-in flag required

* fix: write org_alias to metadata unconditionally in proxy_server.py

* fix: 429s from batch creation being converted to 500 (#24703)

* add us gov models (#24660)

* add us gov models

* added max tokens

* Litellm dev 04 02 2026 p1 (#25052)

* fix: replace hardcoded url

* fix: Anthropic web search cost not tracked for Chat Completions

The ModelResponse branch in response_object_includes_web_search_call()
only checked url_citation annotations and prompt_tokens_details, missing
Anthropic's server_tool_use.web_search_requests field. This caused
_handle_web_search_cost() to never fire for Anthropic Claude models.

Also routes vertex_ai/claude-* models to the Anthropic cost calculator
instead of the Gemini one, since Claude on Vertex uses the same
server_tool_use billing structure as the direct Anthropic API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(anthropic): pass logging_obj to client.post for litellm_overhead_time_ms (#24071)

When LITELLM_DETAILED_TIMING=true, litellm_overhead_time_ms was null for
Anthropic because the handler did not pass logging_obj to client.post(),
so track_llm_api_timing could not set llm_api_duration_ms. Pass
logging_obj=logging_obj at all four post() call sites (make_call,
make_sync_call, acompletion, completion). Add test to ensure make_call
passes logging_obj to client.post.

Made-with: Cursor

* sap - add additional parameters for grounding

- additional parameter for grounding added for the sap provider

* sap - fix models

* (sap) add filtering, masking, translation SAP GEN AI Hub modules

* (sap) add tests and docs for new SAP modules

* (sap) add support of multiple modules config

* (sap) code refactoring

* (sap) rename file

* test(): add safeguard tests

* (sap) update tests

* (sap) update docs, solve merge conflict in transformation.py

* (sap) linter fix

* (sap) Align embedding request transformation with current API

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) mock commit

* (sap) run black formater

* (sap) add literals to models, add negative tests, fix test for tool transformation

* (sap) fix formating

* (sap) fix models

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) commit for rerun bot review

* (sap) minor improve

* (sap) fix after bot review

* (sap) lint fix

* docs(sap): update documentation

* fix(sap): change creds priority

* fix(sap): change creds priority

* fix(sap): fix sap creds unit test

* fix(sap): linter fix

* fix(sap): linter fix

* linter fix

* (sap) update logic of fetching creds, add additional tests

* (sap) clean up code

* (sap) fix after review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) add a possibility to put the service key by both variants

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) update test

* (sap) update service key resolve function

* (sap) run black formater

* (sap) fix validate credentials, add negative tests for credential fetching

* (sap) fix validate credentials, add negative tests for credential fetching

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) lint fix

* (sap) lint fix

* feat: support service_tier in gemini

* chore: add a service_tier field mapping from openai to gemini

* fix: use x-gemini-service-tier header in response

* docs: add service_tier to gemini docs

* chore: add defaut/standard mapping, and some tests

* chore: tidying up some case insensitivity

* chore: remove unnecessary guard

* fix: remove redundant test file

* fix: handle 'auto' case-insensitively

* fix: return service_tier on final steamed chunk

* chore: black

* feat: enable supports_service_tier to gemini models

* Fix get_standard_logging_metadata tests

* Fix test_get_model_info_bedrock_models

* Fix test_get_model_info_bedrock_models

* Fix remaining tests

* Fix mypy issues

* Fix tests

* Fix merge conflicts

* Fix code qa

* Fix code qa

* Fix code qa

* Fix greptile review

---------

Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Josh <36064836+J-Byron@users.noreply.github.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Alperen Kömürcü <alperen.koemuercue@sap.com>
Co-authored-by: Vasilisa Parshikova <vasilisa.parshikova@sap.com>
Co-authored-by: Lin Xu <lin.xu03@sap.com>
Co-authored-by: Mark McDonald <macd@google.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
2026-04-08 21:37:10 -07:00
Yuneng Jiang 48a68230c8 fix(test): update check_responses_cost tests for _expire_stale_rows
PR #25258 changed _cleanup_stale_managed_objects from update_many to
execute_raw via _expire_stale_rows, but the tests were not updated.
The tests now mock _expire_stale_rows on the instance and assert
update_many calls only for job completion, not stale cleanup.
2026-04-07 10:09:11 -07:00
Otavio Brito 8b0fadf99c [bug-fix]return actual status code - /v1/messages/count_tokens endpoint (#21352)
* return actual status code - /count_tokens endpoint

* Apply suggestion from @greptile-apps[bot]

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix greptile suggestion

* rollback file

* add test case

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: ishaan-berri <155045088+ishaan-berri@users.noreply.github.com>
2026-04-06 13:42:24 -07:00
ishaan-berri 61b295238b cherry-pick: tag query fix + MCP metadata support (#25145)
* added support for metadata (#24261)

* added support for metadata

* fix: PR review - meta truthiness, BlobResourceContents mimeType, add Blob+empty meta tests

Made-with: Cursor

* pyproject to .25

* feat(teams): resolve access group models/MCPs/agents in team endpoints

Add access_group_models, access_group_mcp_server_ids, and
access_group_agent_ids to /team/info and /v2/team/list responses.
These fields contain resources inherited from access groups, kept
separate from direct assignments so the UI can distinguish the source.

Backend: _resolve_access_group_resources() helper resolves access
group resources via existing _get_*_from_access_groups() functions.

UI: Teams table and detail view show direct models as blue badges
and access-group-sourced models as green badges.

* perf(teams): single-pass access group resolution + asyncio.gather in list endpoint

- Fetch each access group object once and extract all 3 resource fields
  in a single pass instead of 3 separate calls (3N → N lookups)
- Use asyncio.gather to resolve access groups across teams concurrently
  in list_team_v2 instead of sequential awaits
- Add 5 unit tests for _resolve_access_group_resources

* docs: add default_team_params to config reference and update examples

- Add default_team_params to litellm_settings reference table in
  config_settings.md with all sub-fields documented
- Update self_serve.md and msft_sso.md examples to include
  team_member_permissions, tpm_limit, and rpm_limit
- Fix misleading comment that implied default_team_params only applies
  to SSO auto-created teams — it applies to all /team/new calls

* docs: clarify that models sub-field only applies to SSO auto-created teams

* fix: lazy import get_access_object to break cyclic import + short-circuit all-proxy-models display

- Remove get_access_object from module-level import in team_endpoints.py
  and use a lazy _get_access_object wrapper to avoid cyclic dependency
- Add _prisma_client is None early-exit guard in _resolve_access_group_resources
- Short-circuit UI to show "All Proxy Models" when team.models is empty
  or contains "all-proxy-models", skipping access group model resolution

* add: making organizations a select instead of read only badges

* fix(ui): only send organization_id when changed and use raw initial value

* fix(ui): add paginated team search to usage page filter

Replace the static team dropdown on the usage page with a new
TeamMultiSelect component that uses the paginated v2/team/list
endpoint with debounced server-side search and infinite scroll.

* fix(ui): fix imports and update placeholder for team multi select

* fix(ui): wire team_id filter to key alias dropdown on Virtual Keys tab

The Key Alias dropdown on the Virtual Keys page was showing aliases from
all teams regardless of which team was selected. The team_id was never
passed through the frontend chain to the backend /key/aliases endpoint.

- Backend: add optional team_id query param to /key/aliases endpoint
- networking.tsx: add team_id param to keyAliasesCall
- useKeyAliases: accept and forward team_id to API call and query key
- filter.tsx: pass allFilters context to custom filter components
- PaginatedKeyAliasSelect: read Team ID from allFilters and pass to hook

* fix(tests): correct mock targets in TestResolveAccessGroupResources

Three tests were patching the non-existent `get_access_object` instead
of `_get_access_object` (the lazy-import wrapper), causing AttributeError.
Also added missing `prisma_client` mock so tests get past the early-exit
guard and actually exercise the resolution logic.

* fix: use direct attribute access with or [] fallback in _resolve_access_group_resources

Replace getattr(ag, "field", []) with ag.field or [] for cleaner
access and safe handling if a field is None.

* fix(ui): remove model source legend from team detail view

The blue/green color distinction is self-explanatory; the legend added
visual clutter without providing enough value.

* fix(ui): add missing access_group fields to TeamData.team_info type

The TeamData interface was missing access_group_models,
access_group_mcp_server_ids, and access_group_agent_ids fields,
causing a TypeScript build failure.

* perf(teams): batch-fetch access groups in single DB query

Replace per-ID _resolve_access_group_resources loop with a single
find_many call that deduplicates IDs across all teams. Removes the
N+1 query pattern on cold cache for the team list endpoint.

* refactor(proxy): extract helpers to fix PLR0915 violations

Extract `_apply_non_admin_alias_scope` from `key_aliases`,
`_resolve_team_access_group_resources` from `team_info`, and
`_enforce_list_team_v2_access` from `list_team_v2` to bring each
function under ruff's 50-statement limit. No behavior changes.

* test(ui): update tests to match new team_id / access-group signatures

- useKeyAliases, PaginatedKeyAliasSelect: add trailing `undefined` to
  spy matchers for the new `team_id` param on `useInfiniteKeyAliases`
  and `keyAliasesCall`.
- EntityUsage: mock new `TeamMultiSelect` child so QueryClientProvider
  is not required for team-entity tests.
- ModelsCell: replace the overflow-accordion test with one that
  verifies the new collapse-on-`all-proxy-models` behavior (no
  accordion, single badge).

* fix(ui): send null (not '') for cleared organization_id on team update

AntD <Select allowClear> returns undefined when the user clears the
selection. Coalescing to "" caused the team-update payload to carry
organization_id: "" instead of null, relying on the backend to coerce
it. Send null directly so the intent is explicit at the source.

* poetry

* chore: regen poetry.lock for litellm-proxy-extras 0.4.64 bump

* chore: update Next.js build artifacts (2026-04-04 17:55 UTC, node v22.16.0)

---------

Co-authored-by: shivam <shivam@uni.minerva.edu>
Co-authored-by: Ryan Crabbe <ryan@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* Tag query fix (#25094)

* feat(tag-spend): implement separate scheduler job for daily tag spend updates

* fix(docker): add g++ to build dependencies in Dockerfile

* initial test cases. TODO: check scheduler init and test cases in proxy_server related to it

* resolved QPS issue when redis transaction buffer is enabled

* resolving circular import error flagged by greptile

* fix(mypy): use Optional[str] for api_base in PydanticAI provider to match superclass signature

---------

Co-authored-by: Shivam Rawat <shivam@berri.ai>
Co-authored-by: shivam <shivam@uni.minerva.edu>
Co-authored-by: Ryan Crabbe <ryan@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: Harish <harishgokul01@gmail.com>
Co-authored-by: Ishaan Jaffer <ishaan@berri.ai>
2026-04-04 16:44:02 -07:00
ishaan-berri 51876292a0 Litellm ishaan april4 2 (#25150)
* feat(router): integrate allowed_fails_policy into health check failures (#24988)

* feat(router): integrate allowed_fails_policy into health check failures

Health check failures now increment the same per-deployment failure
counters used by allowed_fails_policy, so users can control how many
health check failures of each error type are required before a
deployment enters cooldown.

- ahealth_check() preserves the original exception in its return dict
- run_with_timeout() returns a litellm.Timeout on health check timeout
- _perform_health_check() propagates exceptions to unhealthy endpoints
- _write_health_state_to_router_cache() calls _set_cooldown_deployments
  for each unhealthy endpoint that has an exception
- When allowed_fails_policy is set, the binary health check filter is
  bypassed so cooldown is the sole routing exclusion mechanism
- Safety net: if all deployments are in cooldown with
  enable_health_check_routing=True, the cooldown filter is bypassed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(router): add health_check_ignore_transient_errors flag

When enabled, health check failures with 429 (rate limit) or 408 (timeout)
status codes are skipped from the cooldown pipeline. These are transient
load issues, not broken deployments. Auth errors (401), 404, and 5xx errors
still increment counters and trigger cooldown as before.

Config (general_settings):
  health_check_ignore_transient_errors: true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(router): also exclude 429/408 from health state cache when ignore_transient_errors set

The previous fix only skipped cooldown counter increments. The health state
cache was still marking 429/408 endpoints as is_healthy=False, causing the
binary health check filter to exclude them from routing.

Now, when health_check_ignore_transient_errors=True, 429/408 endpoints are
also excluded from the unhealthy list passed to build_deployment_health_states(),
so the binary filter treats them as unaffected (not unhealthy).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(router): add health check driven routing guide

New standalone page covering the full health check routing feature:
allowed_fails_policy integration, health_check_ignore_transient_errors,
architecture SVG, step-by-step setup, and gotchas (TTL, AllowedFails semantics).

Replaces the inline section in health.md with a link to the new page.
Added to the Routing & Load Balancing sidebar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): fix three CI failures

- Add "exception" to ILLEGAL_DISPLAY_PARAMS in health_check.py so the
  exception object is stripped before the health endpoint serializes
  results to JSON (fixes TypeError: 'URL' object is not iterable)
- Add allowed_fails_policy = None to FakeRouter stubs in
  test_router_health_check_routing.py (fixes AttributeError)
- Add health_check_ignore_transient_errors to config_settings.md router
  settings reference table (fixes documentation test)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix litellm/tests/proxy_unit_tests/test_proxy_server.py

* fix(router): address greptile review comments

- Narrow cooldown safety-net bypass: only fires when allowed_fails_policy
  is set (cooldown is health-check driven). Without a policy, cooldowns
  are from real request failures and must not be bypassed.
- Restore cooldown deployments DEBUG log that was accidentally removed.
- Fix test_health TypeError: move exception extraction to a separate
  exceptions_by_model_id dict returned alongside endpoints, so exception
  objects never appear in the endpoint dicts that get JSON-serialized
  by the /health response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): properly isolate exceptions from health response

Return exceptions_by_model_id as a separate third value from
_perform_health_check / perform_health_check so exception objects
(which contain non-JSON-serializable httpx URL types) never appear
in the endpoint dicts that get serialized by the /health response.

Callers updated: _health_endpoints.py, shared_health_check_manager.py,
proxy_server.py background loop. All use the exceptions dict only for
cooldown integration, not for display.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(shared-health-check): fix remaining 2-value return sites and update type annotation

* fix(health-check-routing): fix P0 cooldown integration never firing

The cooldown loop was reading endpoint.get("exception") which is always
None because exceptions are now returned via exceptions_by_model_id, not
stored in endpoint dicts. Fixed to use _exceptions.get(model_id).

Also fixes the transient-error filter to use _exceptions instead of
endpoint.get("exception"), and fixes all remaining 2-value return sites
in shared_health_check_manager.py. Tests updated to pass exceptions via
exceptions_by_model_id parameter instead of endpoint dicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): fix P1 transient-error filter broken on cache hits

When SharedHealthCheckManager returns cached results, exceptions_by_model_id
is always {} so the transient-error filter defaulted to status 500 for all
endpoints, incorrectly marking 429/408 endpoints as unhealthy.

Fix: store integer exception_status on each unhealthy endpoint dict in
_perform_health_check. _get_endpoint_exception_status() uses the live
exception object when available (direct path) and falls back to the stored
integer (cache-hit path). The integer is JSON-serializable and survives
the shared cache round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): gate cooldown loop behind allowed_fails_policy

Without the policy, cooldown is not the routing exclusion mechanism.
Firing _set_cooldown_deployments for all enable_health_check_routing users
was a backwards-incompatible change — 401s would immediately cooldown
deployments that the binary filter would have recovered on the next cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* revert: undo allowed_fails_policy gate on cooldown loop

Cooldown integration via health checks is intentional for all
enable_health_check_routing users, not just those with allowed_fails_policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs+tests): fix health_check_ignore_transient_errors doc section and test coverage

- Move health_check_ignore_transient_errors from router_settings to
  general_settings in config_settings.md (code reads it from general_settings)
- Remove duplicate enable_health_check_routing / health_check_staleness_threshold
  entries that were incorrectly listed under router_settings
- Replace TestHealthCheckEndpointExceptionPropagation tests with ones that
  exercise the real _perform_health_check code path via mocked ahealth_check,
  verifying exceptions appear in exceptions_by_model_id and NOT in endpoint dicts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tests+docs): fix tuple unpacking and docs test failures

- Update test mocks that return (healthy, unhealthy) to return
  (healthy, unhealthy, {}) to match the new 3-value signature
- Update test unpackings of perform_shared_health_check to use
  healthy, unhealthy, _ = ...
- Add health_check_ignore_transient_errors to router_settings section
  in config_settings.md (it is a Router constructor param, so the doc
  test requires it there; it also lives in general_settings for proxy use)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix CodeQL errors

* fix(tests): fix 2-value unpackings of _perform_health_check in test_health_check.py

* fix(tests): fix mock _perform_health_check returning 2-tuple instead of 3

* fix team routing

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add distributed lock for key rotation job (#23364)

* fix: add distributed lock for key rotation job

* fix: address Greptile review feedback on key rotation lock (#23834)

* fix: address Greptile review feedback on key rotation lock

* fix req changes greptile

* feat(proxy): Optional on_error for guardrail pipeline (API / technical failures) (#24831)

* guardrails fallback

* docs

* docs: add LITELLM_KEY_ROTATION_LOCK_TTL_SECONDS to environment variables reference

* fix(mypy): accept Union[Dict, Any] in _get_deployment_order and use typed list to fix min() type error

* fix(mypy): use Optional[str] for api_base in PydanticAI provider to match superclass signature

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Shivam Rawat <shivam@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
2026-04-04 23:09:42 +00:00
jayden 9ca1560501 chore: fix test 2026-03-30 19:14:01 -07:00
Krrish Dholakia c7e2bfc577 fix: cleanup tests 2026-03-30 16:24:35 -07:00
Krrish Dholakia 4c00a14ce0 fix: fix ci/cd + handle oidc jwt tokens 2026-03-30 16:12:58 -07:00
Krrish Dholakia 1fb677702d test: update to new vertex ai keys 2026-03-28 20:19:05 -07:00
Krrish Dholakia bc829d51f2 test: test 2026-03-28 19:17:38 -07:00
ryan-crabbe-berri 2eb3c20e76 Merge pull request #24718 from BerriAI/litellm_ryan-march-26
litellm ryan march 26
2026-03-28 09:01:11 -07:00
Ryan Crabbe 8e3755931d test(auth): add regression tests for JWTHandler.is_jwt(None)
Add None-token test cases to both proxy_unit_tests and test_litellm
to cover the guard added in the previous commit. Also add -> bool
return type annotation to is_jwt().
2026-03-27 16:51:08 -07:00
Sameer Kankute 1fac58abb3 fix(tests): reset module-level cache in stale alias bypass tests
Reset _ENABLE_TEAM_STALE_ALIAS_BYPASS to None in both test functions
to ensure test isolation and prevent ordering-dependent failures

Made-with: Cursor
2026-03-27 20:11:28 +05:30
Sameer Kankute 592ac98ddc fix(router): address Greptile P1/P2 review comments
- Add deduplication guard in _update_team_model_index to prevent duplicate indices
- Add wildcard comment in map_team_model for clarity
- Add monkeypatch to test_team_alias_stale_bypass_disabled_by_default for determinism
- Extract _get_team_deployments helper to centralize DB access pattern
- Add clarifying comments for team_public_model_name assignment ordering

Made-with: Cursor
2026-03-27 20:11:28 +05:30
Sameer Kankute 173695f5e0 Fix greptile comments 2026-03-27 20:11:27 +05:30
Sameer Kankute 8ad2068711 Merge pull request #24106 from BerriAI/Sameerlite/pre-ratelimit-bg
fix(polling): check rate limits before creating polling ID
2026-03-20 17:41:24 +05:30
Sameer Kankute 66f97a00a4 fix(test): rewrite polling pre-call guard test to call responses_api() directly
Previously the test called common_processing_pre_call_logic in isolation,
making generate_polling_id.assert_not_called() vacuously true. Now the test
calls responses_api() end-to-end so it actually verifies that a rate-limited
request never receives a polling ID.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-19 14:30:29 +05:30
Sameer Kankute c12717f494 fix: address Greptile review comments
- Guard logging_obj for None when skip_pre_call_logic=True: raise ValueError
  if litellm_logging_obj not in data, preventing AttributeError downstream
- Add model=None to common_processing_pre_call_logic call in endpoints.py
  to match style of other call sites
- Add test verifying rate-limited request never receives polling ID
2026-03-19 14:10:58 +05:30
Sameer Kankute 4dc645fc33 feat(polling): check rate limits before creating polling ID
Move pre-call checks (rate limits, guardrails, budget) to run BEFORE
polling ID creation in the background streaming flow. This prevents the
edge case where a rate-limited request receives a polling ID that
immediately fails.

Changes:
- Add skip_pre_call_logic parameter to base_process_llm_request to allow
  skipping pre-call checks (avoiding double-counting of RPM/parallel requests)
- Run common_processing_pre_call_logic before generating polling ID in the
  responses API endpoint. If rate limits/guardrails fail, return error
  immediately without creating a polling ID
- Background streaming task passes skip_pre_call_logic=True to avoid re-running
  pre-call checks that were already done before polling ID creation
- Add tests verifying skip_pre_call_logic parameter works correctly

Fixes the edge case where polling_via_cache would return a polling ID
for a request that immediately fails due to rate limiting.
2026-03-19 13:59:59 +05:30
Alexey 71b687e00a fix(proxy): sync normalized call_type into model_call_details for proxy-only errors 2026-03-18 23:49:07 +03:00
Krish Dholakia 244bdffd1b Merge pull request #23509 from michelligabriele/fix/pass-through-duplicate-failure-logs
fix(proxy): prevent duplicate callback logs for pass-through endpoint failures
2026-03-18 11:57:50 -07:00
Xianzong Xie cb88836486 Add incomplete response error propagation test
Committed-By-Agent: codex
Co-authored-by: codex <noreply@openai.com>
2026-03-17 11:39:12 -07:00
yuneng-jiang 4fc0975d22 Fix flaky e2e batch test: set batch_processed=True on completion in retrieve_batch
The retrieve_batch endpoint sets batch status to "complete" but never set
batch_processed=True, permanently blocking file deletion. CheckBatchCost
(the safety net) also excluded completed batches from its primary query,
so batch_processed was never set by either path.

Three fixes:
1. update_batch_in_database sets batch_processed=True when status reaches
   "complete", with old-schema fallback retry
2. CheckBatchCost primary query no longer excludes complete/completed
   (batch_processed=False filter prevents reprocessing)
3. retrieve_batch early-return now includes "complete" (DB-normalized
   spelling) to avoid unnecessary provider re-polls

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 18:18:32 -07:00
xianzongxie-stripe 81474c17fe Handle response.failed, response.incomplete, and response.cancelled (#23492)
* Handle response.failed, response.incomplete, and response.cancelled terminal events in background streaming

Previously the background streaming task only handled response.completed and
hardcoded the final status to "completed". This missed three other terminal
event types from the OpenAI streaming spec, causing failed/incomplete/cancelled
responses to be incorrectly marked as completed.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Remove unused terminal_response_data variable

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Address code review: derive fallback status from event type, rewrite tests as integration tests

1. Replace hardcoded "completed" fallback in response_data.get("status")
   with _event_to_status lookup so that response.incomplete and
   response.cancelled events get the correct fallback if the response
   body ever omits the status field.

2. Replace duplicated-logic unit tests with integration tests that
   exercise background_streaming_task directly using mocked streaming
   responses and assert on the final update_state call arguments.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Remove dead mock_processor and unused mock_response parameter from test helper

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Remove FastAPI and UserAPIKeyAuth imports from test file

These types were only used as Mock(spec=...) arguments. Drop the spec
constraints and remove the top-level imports to avoid pulling FastAPI
into test files outside litellm/proxy/.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

* Log warning when streaming response has no body_iterator

If base_process_llm_request returns a non-streaming response (no
body_iterator), log a warning since this likely indicates a
misconfiguration or provider error rather than a successful completion.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Committed-By-Agent: claude

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-13 23:02:09 -07:00
yuneng-jiang 0e44c460b5 Merge pull request #23584 from BerriAI/litellm_release_day_03_12_2026
[Infra] Merge Release Day Branch with Main
2026-03-13 14:31:27 -07:00
Ishaan Jaff 1b96064600 fix(proxy): prevent OOM/Prisma connection loss from unbounded managed-object poll (#23472)
* fix(proxy): cap managed-object poll size + expire stale rows + kill-switch flag to prevent OOM/Prisma connection loss

* fix(constants): simplify PROXY_BATCH_POLLING_ENABLED readability

* docs+test: document new polling env vars, add pagination+stale-cleanup tests

* fix: exclude stale_expired from batch poll queries; fix update_many assertions in tests

* fix: scope stale cleanup to file_purpose, fix file_object mocks, add CheckBatchCost tests

* fix: avoid duplicate cost logging in fallback path; guard integer constants against zero/negative values

* fix: cache _has_batch_processed_column; guard cleanup from aborting poll; narrow fallback except

* fix: add complete/completed to primary query not_in; fix vacuous test assertion

- Primary find_many was missing "complete" and "completed" in its not_in
  filter, creating asymmetry with the fallback query. A job whose status
  was set to "complete" but whose batch_processed flag update failed would
  be silently re-fetched and re-processed every cycle, emitting duplicate
  cost logs.

- test_fallback_completion_update_omits_batch_processed patched
  _is_base64_encoded_unified_file_id to return None, causing an immediate
  continue — so update() was never called and the assertion looped over an
  empty list (vacuously true). Rewrote the test to mock the full
  completion pipeline, verify update() is called exactly once, and assert
  batch_processed is absent from the update data.

- Added symmetric test (primary path) proving batch_processed IS included
  when the column exists.

Made-with: Cursor
2026-03-13 11:01:40 -07:00
yuneng-jiang 2b71b0fb25 Revert "QA: improve gpt-5.4 code/bugs" 2026-03-13 10:15:47 -07:00
yuneng-jiang 8dc198eccf Merge pull request #23535 from Sameerlite/litellm_improve_qa-5.4
QA:  improve gpt-5.4 code/bugs
2026-03-13 09:37:30 -07:00
yuneng-jiang 3489d1dbef fix(tests): update outdated model names in wildcard model tests
The expected model names in test_get_known_models_from_wildcard were
removed from the model registry (claude-3-5-haiku-20241022, gemini-1.5-flash,
gemini-1.5-pro). Updated to current model names.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-12 22:53:31 -07:00
michelligabriele a4f94b241b fix(proxy): prevent duplicate callback logs for pass-through endpoint failures
Pass-through endpoint failures fired both async_failure_handler and
async_post_call_failure_hook, causing duplicate logs in callback
integrations. Add pass-through guards to the failure path, matching
the existing success path behavior.
2026-03-13 04:32:36 +01:00
Emerson Gomes 92d39c308c fix(gemini): preserve toolConfig on native generate_content (#23493) 2026-03-12 17:48:09 -07:00
Ishaan Jaff 28c33f53a3 CircleCI test stability (#23055)
* fix: resolve ruff lint errors and mypy type error

- Remove unused import get_user_credential (F401)
- Add noqa: PLR0915 for 3 large functions exceeding 50 statements
- Cast result_data['q'] to str for _append_domain_filters (mypy arg-type)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add /vertex_ai/live to supported endpoints and azure gpt-5.1 reasoning flags

- Add /vertex_ai/live to JSON schema validation enum in test_utils.py
- Add supports_none_reasoning_effort=true to 10 azure/gpt-5.1 model entries
  (matching the OpenAI gpt-5.1 behavior)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: handle non-string team_alias/key_alias in PolicyMatchContext

Prevent Pydantic validation errors when team_alias or key_alias are not
proper strings (e.g. MagicMock in tests). Only pass values that are
actually strings; default to None otherwise.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: initialize jwt_handler.litellm_jwtauth in JWT test

The test_jwt_non_admin_team_route_access test was failing because
user_api_key_auth now accesses jwt_handler.litellm_jwtauth.virtual_key_claim_field
before reaching the mocked JWTAuthManager.auth_builder. Initialize the
jwt_handler with a default LiteLLM_JWTAuth object.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add missing mock attributes to MCP server test

The test_add_update_server_fallback_to_server_id test was failing because
MagicMock auto-creates attributes when accessed. build_mcp_server_from_table
accesses many fields via getattr(), which on a MagicMock returns another
MagicMock instead of None, causing Pydantic validation errors in MCPServer.

Explicitly set all required mock attributes.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: update UI tests for leftnav, navbar, and KeyLifecycleSettings

- leftnav: Add mock for useTeams hook, add isUserTeamAdminForAnyTeam to
  roles mock, update topLevelLabels to match current component menu items
- navbar: Add mocks for useDisableBouncingIcon, BlogDropdown, UserDropdown,
  and serverRootPath. Update test to work with the new component structure.
- KeyLifecycleSettings: Fix placeholder and tooltip assertions to match
  actual component behavior

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: update health check test assertion from 'connected' to 'healthy'

The /health/readiness endpoint now returns {"status": "healthy"} with the
DB status in a separate field, instead of the previous {"status": "connected"}.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: clear litellm.api_key in OpenRouter validate_environment test

The test_validate_environment_raises_without_key test was failing because
litellm.api_key may be set globally in the test environment. Clear it
along with OPENROUTER_API_KEY and OR_API_KEY env vars using monkeypatch.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: patch HTTPHandler class-level in VLLM embedding test

The test_encoding_format_not_sent_in_actual_request test was patching
client.post on an instance, but the handler uses the class method.
Patch HTTPHandler.post at class level, add caching=False to prevent
cache hits, and remove broad try/except that hid errors.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: make test_redaction_responses_api_stream resilient to async callback timing

Replace fixed 1s sleep with polling wait for async_log_success_event.
Streaming success handler runs via asyncio.create_task; 1s was insufficient
in CI. Add 0.5s initial sleep for event loop to schedule the task, then
poll up to 10s for the callback to fire.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: update dompurify and svgo to fix security CVEs

- CVE-2026-0540: dompurify XSS vulnerability - fix by upgrading to 3.3.2+
- CVE-2026-29074: svgo DoS via entity expansion - fix by upgrading to 3.3.3+

Added npm overrides in docs/my-website/package.json and regenerated
package-lock.json.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: remove unused json import in config_override_endpoints.py

Ruff F401: json is imported but unused (safe_json_loads/safe_dumps
are used instead)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add missing MCP mock attributes and provider documentation entries

- Add missing mock attributes to test_add_update_server_with_alias and
  test_add_update_server_without_alias (same fix as fallback test)
- Add bedrock_mantle and searchapi to provider_endpoints_support.json
- Remove unused json import from config_override_endpoints.py

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: override _supports_reasoning_effort_level for Azure gpt5_series prefix

The Azure GPT-5 config uses 'gpt5_series/' as a routing prefix, but
_supports_factory(model='gpt5_series/gpt-5.1') fails to resolve because
'gpt5_series' is not a recognized provider. Override the method to strip
the prefix and prepend 'azure/' for correct model info lookup.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: accept both 'healthy' and 'connected' in health check test

The test_health_and_chat_completion test runs against both source builds
(which return 'healthy') and pip-installed versions (which may return
'connected'). Accept both values.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock extract_mcp_auth_context in streamable HTTP MCP handler test

The handle_streamable_http_mcp function now calls extract_mcp_auth_context
before session_manager.handle_request, but the test didn't mock it. The
auth extraction fails with the minimal mock scope, preventing
handle_request from being called. Also relax assertion to not check
exact args since the send wrapper may be modified by debug injection.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add test for _combine_fallback_usage to satisfy router code coverage

The router_code_coverage.py check requires all functions in router.py
to be called in test files. Add a basic test for _combine_fallback_usage.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add @log_guardrail_information decorator to CrowdStrike AIDR guardrail

The check_guardrail_apply_decorator.py CI check requires all guardrail
apply_guardrail methods to have the @log_guardrail_information decorator.
The CrowdStrike AIDR handler was missing it.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: document PRISMA_RECONNECT_ESCALATION_THRESHOLD and REDIS_CLUSTER_NODES env keys

Add missing environment variable documentation to config_settings.md
to satisfy the test_env_keys.py CI check.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: document enforced_file_expires_after and enforced_batch_output_expires_after in new_team docstring

The test_api_docs.py CI check validates that all Pydantic model fields
are documented in the function docstring. Add missing parameter docs
for enforced_file_expires_after and enforced_batch_output_expires_after.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: regenerate poetry.lock to match pyproject.toml

The poetry.lock file was out of sync with pyproject.toml, causing
proxy_e2e_azure_batches_tests to fail during dependency installation.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: set master_key=None in test_create_file_with_deep_nested_litellm_metadata

The test was missing the master_key monkeypatch that other tests in the
same file set. In CI with parallel execution (-n 4), another test may
set master_key to a non-None value, causing auth failures (500) when
the test sends 'Bearer test-key'.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: document enforced_*_expires_after in update_team docstring too

Same missing params as new_team - also needed in update_team docstring
for the test_api_docs.py CI check to pass.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: use get_async_httpx_client in a2a_protocol and add master_key monkeypatch to files tests

- Replace httpx.AsyncClient() with get_async_httpx_client() in a2a_protocol/main.py
  to satisfy the ensure_async_clients_test CI check
- Add httpxSpecialProvider.A2AProvider enum value
- Add master_key=None monkeypatch to test_managed_files_with_loadbalancing

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: remove unused httpx import from a2a_protocol/main.py

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: use cache-key-only param for A2A extra_headers to avoid AsyncHTTPHandler init error

The 'extra_headers' key in params was being passed to AsyncHTTPHandler.__init__()
which doesn't accept it. Use 'disable_aiohttp_transport' as the cache-key-only
param since it's explicitly filtered out before reaching the constructor.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add additionalProperties:false and resolve $defs/$ref in Anthropic output_format schemas

Anthropic API now requires additionalProperties=false for all object-type
schemas in output_format. Also resolve $defs/$ref references by inlining
them using unpack_defs before sending to Anthropic, since Anthropic
doesn't support external schema references.

Fixes: llm_translation_testing Anthropic JSON schema failures

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: allowlist CVE-2026-2297 and GHSA-qffp-2rhf-9h96 in security scans

- CVE-2026-2297: Python 3.13 SourcelessFileLoader audit hook bypass,
  no fix available in base image
- GHSA-qffp-2rhf-9h96: tar hardlink path traversal, from nodejs_wheel
  bundled npm, not used in application runtime code

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: isolate files endpoint tests from shared proxy state in CI parallel execution

Override user_api_key_auth dependency to return a fixed UserAPIKeyAuth
with PROXY_ADMIN role, avoiding auth lookups via prisma_client,
user_api_key_cache, or master_key. Set prisma_client=None to prevent
DB state contamination. Use try/finally to clean up dependency overrides.

Fixes persistent test_create_file_with_deep_nested_litellm_metadata and
test_managed_files_with_loadbalancing 500 errors in CI with -n 4.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: apply same auth override to test_managed_files_with_loadbalancing

Same CI parallel execution fix as test_create_file_with_deep_nested -
override user_api_key_auth dependency and set prisma_client=None.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-03-07 15:19:39 -08:00
Harshit Jain 07cb6d5bec Merge pull request #22372 from BerriAI/litellm_jwt_vkey_map
Litellm jwt vkey map
2026-03-05 06:24:49 +05:30
Harshit28j 2f15686ea2 fix: address greptile feedback - redact hashed tokens, proper error codes, add tests
- Remove token field from JWTKeyMappingResponse to prevent hashed key exposure
- Use _to_response() helper on all CRUD endpoints to control returned fields
- Return 409 for unique constraint violations, 400 for FK violations, 404 for not found
- Add response_model to endpoint decorators
- Add 8 new unit tests covering error handling and token redaction

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-05 03:46:03 +05:30
Julio Quinteros Pro 740cdc5c20 fix: use real State object in mock_request to fix _safe_get_request_headers
The _safe_get_request_headers caching (commit e7175a52) uses
request.state._cached_headers. With Mock(spec=Request), getattr on
state returns a Mock (truthy), causing RedactedDict to receive a Mock
instead of a dict. Using a real starlette State object fixes this.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 20:05:20 -03:00
Ishaan Jaff 29e3fd5d79 [Release Fix] (#22411)
* fix(lint): suppress PLR0915 for 3 complex methods that exceed 50-statement limit

- streaming_iterator.py: _process_event (84 statements)
- transformation.py: translate_messages_to_responses_input (51 statements)
- transformation.py: transform_realtime_response (54 statements)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): resolve type errors in public_endpoints, user_api_key_auth, common_utils, transformation

- public_endpoints.py: fix _cached_endpoints type annotation
- user_api_key_auth.py: accept Optional[str] for end_user_id parameter
- common_utils.py: add NewProjectRequest/UpdateProjectRequest to Union type
- transformation.py: add ChatCompletionRedactedThinkingBlock and list[Any] to content type

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(proxy-extras): bump version to 0.4.50 and sync schema

- Bump litellm-proxy-extras from 0.4.49 to 0.4.50
- Sync schema.prisma with main proxy schema
- Includes new LiteLLM_ClaudeCodePluginTable model
- Includes new @@index([startTime, request_id]) on SpendLogs
- Update version references in requirements.txt and pyproject.toml

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(router): use string id in test_add_deployment and add defensive str() in register_model

- Change test to use string '100' instead of int 100 for model_info.id
- Add str() conversion in register_model to prevent AttributeError on non-string keys

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(security): update minimatch to 10.2.4 to fix CVE-2026-27903 and CVE-2026-27904

- Run npm audit fix in docs/my-website
- Updates minimatch from 10.2.1 to 10.2.4 (fixes HIGH severity ReDoS vulnerabilities)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): update realtime guardrail test assertions to match actual guardrail behavior

- test_text_message_blocked_by_guardrail_no_ai_response: allow guardrail's own block
  message text in response.done (previously expected empty content)
- test_voice_transcript_blocked_by_guardrail: allow guardrail to send response.cancel
  + block message + response.create flow (previously expected no response.create)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: revert proxy-extras version in requirements.txt and pyproject.toml

The litellm-proxy-extras 0.4.50 is not published to PyPI yet, so consumer
references must stay at 0.4.49. Only the source package pyproject.toml
should be bumped to 0.4.50 for the publish_proxy_extras CI job.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: make transcript delta check optional in voice guardrail test

The guardrail sends an error event (guardrail_violation) when blocking
voice transcripts; it does not always produce transcript deltas. Remove
the assertion requiring response.audio_transcript.delta since the error
event is the primary signal that blocked content was handled.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add missing env keys to documentation: LITELLM_MAX_STREAMING_DURATION_SECONDS and LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES

These two environment variables were used in code but not documented in the
environment variables reference section of config_settings.md, causing the
test_env_keys.py CI test to fail.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Fix 13 mypy type errors across 6 files

- in_flight_requests_middleware.py: Fix type: ignore error codes from
  [union-attr] to [attr-defined], add [arg-type] for Gauge **kwargs
- transformation.py: Add [assignment] ignore for output_format reassignment,
  add fallback empty string for tool use id to fix arg-type
- responses/main.py: Remove redundant type annotation on second
  secret_fields assignment to fix no-redef
- streaming_iterator.py: Add [assignment] ignores for intermediate
  cache token assignments
- handler.py: Add [typeddict-item] ignore for AnthropicMessagesRequest
  construction from dict
- public_endpoints.py: Add [arg-type] ignore for _load_endpoints()
  return type mismatch with SupportedEndpoint model

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add auth overrides to spend tracking tests, fix realtime guardrail assertion, update UI minimatch

- Add app.dependency_overrides for user_api_key_auth in 4 spend tracking tests
  that were returning 401 Unauthorized (error_code, error_message,
  error_code_and_key_alias, key_hash)
- Fix realtime guardrail test to check ANY error event for guardrail_violation
  instead of just the first (OpenAI may send its own errors first)
- Update ui/litellm-dashboard/package-lock.json to fix minimatch vulnerability

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Fix failing MCP e2e and create_mcp_server UI tests

Test 1 (test_independent_clients_no_shared_session):
- Add allow_all_keys: true to MCP servers in test config. With master_key
  and no DB, get_allowed_mcp_servers returned empty, causing 0 tools and
  403 on tool calls. allow_all_keys bypasses per-key restrictions.
- Add asyncio.sleep(0.5) between client connections to allow MCP SDK
  TaskGroup cleanup and avoid ExceptionGroup on connection close (MCP #915).

Test 2 (create_mcp_server 'auth value is provided'):
- Use userEvent.setup({ delay: null }) for instant keystrokes to avoid
  timeout from default typing delay on CI.
- Increase per-test timeout to 15000ms for CI environments.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: stabilize proxy unit tests for parallel execution

- test_response_polling_handler: add xdist_group to prevent heavy import OOM
- test_db_schema_migration: use temp dir for worker isolation, sync schema.prisma index
- test_custom_tokenizer_bug: use lighter tokenizer to prevent OOM in parallel

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add auth overrides to more spend tracking and model info tests

- Fix test_ui_view_spend_logs_pagination missing auth override (401)
- Fix test_view_spend_tags missing auth override (401)
- Fix test_view_spend_tags_no_database missing auth override (401)
- Fix test_empty_model_list.py to use app.dependency_overrides instead of patch()
  for FastAPI dependency injection auth

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): use patch.object for aiohttp transport test to work in parallel execution

The @patch decorator was not intercepting the static method call in parallel
xdist workers. Using patch.object on the directly-imported class is more reliable.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(security): update minimatch from 10.2.1 to 10.2.4 in Dockerfile

The Docker image was explicitly pinning minimatch@10.2.1 which has HIGH
severity ReDoS vulnerabilities (GHSA-7r86-cg39-jmmj, GHSA-23c5-xmqv-rm74).
Update to 10.2.4 which includes fixes for both CVEs.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ui): prevent MCP and TeamInfo test timeouts on CI

- Add userEvent.setup({ delay: null }) to all tests using userEvent in both files
- Add timeout: 15000 to tests with significant user interaction (typing, multiple clicks)
- Fixes: create_mcp_server Bearer Token test, TeamInfo cancel button test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: stabilize parallel test execution and aiohttp transport test

- test_aiohttp_handler: rewrite transport test to not rely on static method mock
  (consistently fails in parallel xdist workers)
- test_proxy_cli: add xdist_group to prevent timeout during heavy imports
- test_swagger_chat_completions: add xdist_group to prevent timeout

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(security): add serialize-javascript override to fix GHSA-5c6j-r48x-rmvq

Add npm override for serialize-javascript>=7.0.3 in docs/my-website
to fix HIGH severity RCE vulnerability via RegExp.flags.
Also bump minimatch override to >=10.2.4.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Fix flaky tests: remove broken Vertex model, add retries for Anthropic

- Remove vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas from
  test_partner_models_httpx_streaming - consistently returns 400 BadRequest
- Add @pytest.mark.flaky(retries=6, delay=10) to test_function_call_parsing
  for transient Anthropic API overload errors
- Add @pytest.mark.flaky(retries=6, delay=10) to test_openai_stream_options_call
  for transient Anthropic InternalServerError

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): add xdist_group(proxy_heavy) to prevent OOM in parallel proxy tests

- Add pytestmark = pytest.mark.xdist_group('proxy_heavy') to test_proxy_utils.py
- Change test_db_schema_migration.py from schema_migration to proxy_heavy group
- Add @pytest.mark.xdist_group('proxy_heavy') to test_proxy_server.py::test_health

Groups heavy proxy tests to run on same worker, avoiding worker OOM crashes.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Fix vertex AI qwen global endpoint test to mock vertexai module import

The test_vertex_ai_qwen_global_endpoint_url test was failing because the
VertexAIPartnerModels.completion() method tries to 'import vertexai' before
any of the mocked code runs. In environments without google-cloud-aiplatform
installed, this import fails with a VertexAIError(status_code=400).

Fix by:
- Adding patch.dict('sys.modules', {'vertexai': MagicMock()}) to mock the
  vertexai module import
- Adding vertex_ai_location parameter to the acompletion call for completeness

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): add xdist_group to health endpoint and watsonx tests for parallel stability

- test_health_liveliness_endpoint: add xdist_group('proxy_health') to prevent timeout
- test_watsonx_gpt_oss tests: add xdist_group('watsonx_heavy') to prevent mock interference

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): pre-populate WatsonX IAM token cache to prevent parallel test interference

The watsonx prompt transformation test was failing in parallel execution because
litellm.module_level_client.post mock was being interfered with by other tests.
Pre-populating the IAM token cache avoids the HTTP call entirely.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): add spend data polling with retries for e2e pass-through tests

- test_vertex_with_spend.test.js: Replace 15s fixed wait with polling loop
  (up to 6 attempts, 10s apart) for spend data to appear in DB
- Increase test timeout from 25s to 90s to accommodate polling
- base_anthropic_messages_tool_search_test.py: Add flaky(retries=3) for
  streaming test that depends on live Anthropic API

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): reduce parallel workers from 8 to 4 for proxy tests to prevent OOM

- litellm_proxy_unit_testing_part2: -n 8 -> -n 4
- litellm_mapped_tests_proxy_part2: -n 8 -> -n 4, timeout 60 -> 120
- Worker crashes consistently caused by too many parallel proxy tests
  each loading the full FastAPI app and heavy dependency tree

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(db): add migration for SpendLogs composite index (startTime, request_id)

The @@index([startTime, request_id]) was added to schema.prisma but had no
corresponding migration. This caused test_aaaasschema_migration_check to fail
because prisma migrate diff detected the missing index.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(db): add migration for MCP available_on_public_internet default change to true

The schema.prisma changed the default for available_on_public_internet from
false to true, but no migration was created. This caused the schema migration
test to detect drift.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): increase server wait time and add retry to flaky external API tests

- test_basic_python_version.py: increase server startup wait from 60s to 90s
  for slower CI environments (fixes installing_litellm_on_python_3_13)
- test_a2a_agent.py: add flaky(retries=3, delay=5) for non-streaming test
  that depends on live A2A agent endpoint

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): add flaky retries to all intermittent external API tests for 0-fail CI

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): add auth overrides to file endpoint tests that return 500

The test_target_storage tests were getting 500 because the FastAPI auth
dependency wasn't overridden. Added app.dependency_overrides for proper
auth bypass in test environment.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-02-28 09:46:35 -08:00
Harshit28j 465adce872 feat reaq changes 2026-02-28 13:40:53 +05:30
Harshit28j 9dc085694c feat: jwt mapping vkeyv 2026-02-28 13:29:46 +05:30
milan-berri 3e60ca3682 fix: populate user_id and user_info for admin users in /user/info (#22239)
* fix: populate user_id and user_info for admin users in /user/info endpoint

Fixes #22179

When admin users call /user/info without a user_id parameter, the endpoint
was returning null for both user_id and user_info fields. This broke
budgeting tooling that relies on /user/info to look up current budget and spend.

Changes:
- Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter
- Added logic to fetch admin's own user info from database
- Updated function to return admin's user_id and user_info instead of null
- Updated unit test to verify admin user_id is populated

The fix ensures admin users get their own user information just like regular users.

* test: make mock get_data signature match real method

- Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts
- Makes mock more robust against future refactors
- Added datetime and Union imports
- Mock now returns None when user_id is not provided
2026-02-27 19:12:16 -08:00