Commit Graph

7349 Commits

Author SHA1 Message Date
Sameer Kankute b08445837b fix(logging): preserve ModelResponse choices format in redacted standard_logging_object + add Charity Engine provider endpoint
- Fix perform_redaction to handle dict representation of ModelResponse (from model_dump())
- Preserve full choices structure when redacting, redact content/audio in place
- Add _redact_standard_logging_object helper for standard_logging_object field
- Update test_logging_redaction_e2e_test assertions to expect choices format
- Add charity_engine to provider_endpoints_support.json

Fixes: test_standard_logging_payload, test_standard_logging_payload_audio
Made-with: Cursor
2026-03-10 10:22:57 +05:30
Sameer Kankute c1b860b3c1 Revert "fix: strip empty text content blocks in /v1/messages endpoint (#23097)"
This reverts commit 2c738cc939.
2026-03-10 09:53:19 +05:30
Krish Dholakia dd6f0d6c55 fix: forward recognized OpenAI params from kwargs in completion() (#23224)
Any param in DEFAULT_CHAT_COMPLETION_PARAM_VALUES that arrives via
completion(**kwargs) is now automatically forwarded to
get_optional_params(), even if it's not a named parameter of
completion().

Previously, get_non_default_completion_params() excluded params in
OPENAI_CHAT_COMPLETION_PARAMS (assuming they'd be forwarded via the
named-param path), while optional_param_args only contained explicitly
named params. Params like 'store' that were in the known-params list
but not named params fell through both paths and were silently dropped.

The fix adds a 7-line loop after building optional_param_args that
forwards any kwargs present in DEFAULT_CHAT_COMPLETION_PARAM_VALUES.
This means new OpenAI params only need to be added to the constants
dict — no boilerplate changes to 3+ function signatures required.

Fixes #23087

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-03-09 20:56:27 -07:00
tristanolive 30b82c3a0c feat(charity_engine): add Charity Engine provider (#23223)
* feat(charity_engine): add Charity Engine provider

Charity Engine is a crowdsourced distributed computing platform that
donates processing power to charitable causes. Its inference API
provides OpenAI-compatible chat, completions, and embeddings endpoints.

* test(charity_engine): add provider config and resolution tests

Verify JSONProviderRegistry config, provider list membership,
model routing for charity_engine/<model>, and Router compatibility.

* feat(charity_engine): add Charity Engine to LlmProviders enum

Enables provider_list membership and LlmProviders.CHARITY_ENGINE
resolution required by the provider and test suite.

* fix(charity_engine): remove api_base_env to fix non-deterministic test

The CHARITY_ENGINE_API_BASE env var could override the base_url in CI,
causing test_charity_engine_provider_resolution to fail intermittently.

* fix(charity_engine): remove trailing slash from base_url
2026-03-09 20:46:43 -07:00
Maxwell Calkin 2c738cc939 fix: strip empty text content blocks in /v1/messages endpoint (#23097)
Claude's API returns assistant messages with empty text blocks
({"type": "text", "text": ""}) alongside tool_use blocks during
multi-turn tool-use conversations. These blocks are rejected when
sent back to the API with "text content blocks must be non-empty".

Sanitization already exists for other code paths (/v1/chat/completions
for both Anthropic and Bedrock), but NOT for the /v1/messages native
path. This adds the same treatment by stripping empty text blocks
from messages in async_anthropic_messages_handler before they are
forwarded to the provider.

Fixes #22930
2026-03-09 19:51:25 -07:00
Krish Dholakia 9500fc18d1 Fix TypeError: LiteLLM_Params.__init__() got multiple values for argument 'self' (#23220)
The bug occurred when user data inadvertently contained reserved Python
keywords like 'self', 'params', or '__class__' as keys. When such a dict
was unpacked via **kwargs to LiteLLM_Params() or GenericLiteLLMParams(),
Python raised TypeError because 'self' was passed both implicitly and
as a keyword argument.

The fix:
- Add a Pydantic model_validator(mode='before') to GenericLiteLLMParams
  that filters out reserved keys ('self', 'params', '__class__') before
  validation
- Move the max_retries str-to-int conversion into the same validator
- Remove the custom __init__ methods from both GenericLiteLLMParams and
  LiteLLM_Params, since the validator now handles the preprocessing
- Clean up unused VERTEX_CREDENTIALS_TYPES import

This fix applies to all classes that inherit from GenericLiteLLMParams,
including LiteLLM_Params and updateLiteLLMParams.

Added comprehensive tests in tests/test_litellm/test_litellm_params_reserved_keys.py

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-03-09 19:33:52 -07:00
yuneng-jiang c1d042c2a3 Fix flaky test_stream_chunk_builder_openai_audio_output_usage
The test calls OpenAI's gpt-4o-audio-preview model which sometimes
doesn't return usage data in the streaming response. Fixed by:
- Adding @pytest.mark.flaky(retries=5, delay=2) for retry handling
- Fixing usage_obj loop to check chunk.usage is not None
- Skipping gracefully when OpenAI doesn't return usage data

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 17:18:00 -07:00
yuneng-jiang 6fe82d3886 Merge pull request #23211 from BerriAI/litellm_/sharp-keller
[Fix] Skills API test failing with duplicate skill name 500
2026-03-09 17:13:29 -07:00
yuneng-jiang af8f91ef66 [Fix] Use unique skill names in Skills API test to avoid duplicate-name 500s
The test_create_skill test was consistently failing in CI with a 500 from
Anthropic because the SKILL.md frontmatter always used the same hardcoded
name (test-skill-litellm). Since test_delete_skill is permanently skipped,
skills accumulate in the CI account, and re-creating with a duplicate name
triggers an Internal Server Error on Anthropic's side.

Fix: pass a timestamp-based unique_suffix to create_skill_zip so each run
produces a distinct skill name in the zip's SKILL.md frontmatter.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 17:09:15 -07:00
yuneng-jiang 4c3f873bde Merge pull request #23198 from BerriAI/litellm_fix_nova_pro_max_tokens
[Fix] Claude Agent SDK E2E Test Nova Pro max_tokens Limit
2026-03-09 15:54:00 -07:00
yuneng-jiang d719c8a53c Merge branch 'main' into litellm_fix_nova_pro_max_tokens 2026-03-09 15:47:53 -07:00
yuneng-jiang 2a836c7103 Fix Claude Agent SDK E2E test for Nova Pro max_tokens limit
The Claude Agent SDK sends max_tokens=32000 for unrecognized model names
(like "bedrock-nova-pro"), which exceeds Nova Pro's 10,000 limit. Enable
modify_params in the test proxy config so LiteLLM clamps max_tokens to the
model's actual limit. Also swap nova-premier to nova-pro since premier
requires provisioned throughput unavailable in CI.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:45:24 -07:00
yuneng-jiang ffd1eb18e0 Merge remote main and resolve conflicts
Kept our sync test fix, accepted upstream's xdist_group marker on
the async handler test.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:34:50 -07:00
yuneng-jiang 74ed6a16ac Fix flaky test_watsonx_gpt_oss_prompt_transformation
The test was flaky under pytest-xdist parallel execution because it used
async acompletion (which runs completion() in a thread pool via
run_in_executor) and relied on shared global state (known_tokenizer_config,
iam_token_cache, module_level_client) that could be modified by other tests
running in parallel. Failures were silently swallowed by a broad try/except,
causing mock_post.call_count to remain 0.

Fix:
- Convert from async acompletion to sync completion, matching every other
  test in the file. The test's intent is verifying prompt transformation,
  not async behavior.
- Use monkeypatch.setitem for known_tokenizer_config to ensure proper
  teardown isolation.
- Remove unnecessary mock layers (async template fetchers, iam_token_cache
  pre-population, mock completion response) that were only needed for the
  async code path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:32:30 -07:00
yuneng-jiang 29ca052064 Merge remote main, resolve conflict keeping new unit tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:20:20 -07:00
yuneng-jiang b7ac688b2b Replace SearXNG integration tests with unit tests for request/response transformation
The SearXNG search tests were failing in CI because they depend on a live
SearXNG instance that returns results. Since this provider is used by a
very small subset of customers, replace the flaky integration tests with
deterministic unit tests that validate request payloads, URL construction,
response parsing, and header configuration without requiring external infra.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 15:13:58 -07:00
yuneng-jiang 8ecac84789 Revert "feat(proxy): add Prisma DB pool and engine health metrics to Promethe…"
This reverts commit 0bb26c3f1b.
2026-03-09 14:55:11 -07:00
yuneng-jiang be9d1798b2 Merge pull request #23182 from BerriAI/litellm_/exciting-swanson
[Fix] Model pricing schema test missing output_cost_per_image_token_batches
2026-03-09 14:26:21 -07:00
yuneng-jiang 379ce1aae5 [Fix] Add output_cost_per_image_token_batches to model pricing schema test
The gemini-3.1-flash-image-preview model introduced a new pricing field
that was missing from the test's validation schema and cost_fields list.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 14:17:52 -07:00
michelligabriele c47f77a348 fix(agentcore): handle JSON responses from agents using sync return (#23165)
* fix(agentcore): handle JSON responses from agents using sync return

BedrockAgentCoreApp agents that use synchronous `return` (instead of
async `yield`) respond with Content-Type: application/json instead of
text/event-stream. The streaming parser only handles SSE format, silently
discarding the JSON body and returning empty content to the client.

This adds Content-Type detection in both sync and async streaming
wrappers — when application/json is received, the response is parsed
and converted to a single-chunk stream. Also extends _parse_json_response
with a fallback chain supporting multiple agent response schemas (standard
AgentCore, Strands framework, plain string, raw JSON fallback).

* fix(agentcore): add dict-type guard to _parse_json_response

Prevent AttributeError when json.loads() returns a non-dict
(e.g. JSON array or primitive) by adding an isinstance check
at the top of _parse_json_response. Non-dict values fall back
to raw JSON string content.

* fix(agentcore): handle malformed JSON and split streaming chunks

- Wrap json.loads() in try/except in both sync and async streaming
  wrappers so malformed JSON bodies raise a structured BedrockError
  instead of a raw JSONDecodeError
- Split the JSON-fallback streaming path into two chunks (content
  chunk with finish_reason=None, then stop sentinel with empty delta)
  to match the SSE path convention

* fix(agentcore): catch IO errors in streaming JSON path + async error test

- Broaden except clause to catch both json.JSONDecodeError and IO-level
  exceptions (httpx.ReadError, etc.) from response.read()/aread(), so
  all failures surface as structured BedrockError
- Add async malformed-JSON test to mirror the sync test coverage
2026-03-09 10:22:36 -07:00
Aarish Alam e21b06265a fix fkey violation on deleting user (#23115) 2026-03-09 08:53:11 -07:00
ohadgur 0bb26c3f1b feat(proxy): add Prisma DB pool and engine health metrics to Prometheus (#22655)
* feat(proxy): add Prisma DB pool and engine health metrics to Prometheus

Add a PrismaMetricsCollector that periodically queries pg_stat_activity
and the Prisma engine process to expose connection pool and engine health
as Prometheus gauges/counters. Auto-enabled when prometheus_system is in
service_callback.

New metrics:
- litellm_db_pool_active_connections (Gauge)
- litellm_db_pool_idle_connections (Gauge)
- litellm_db_pool_total_connections (Gauge)
- litellm_db_pool_waiting_connections (Gauge)
- litellm_db_engine_up (Gauge)
- litellm_db_engine_restarts_total (Counter)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Greptile review feedback

- Only increment engine_restarts counter on heavy reconnects (engine
  actually dead), not lightweight network-blip reconnects
- Fix potential KeyError in _get_or_create_gauge/counter fallback path
  when REGISTRY._names_to_collectors is absent
- Rename litellm_db_pool_waiting_connections to
  litellm_db_pool_lock_waiting_connections to clarify it measures lock
  contention, not pool slot queuing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: warn when prometheus_system enabled but watchdog disabled

Log a warning when users have prometheus_system in service_callback
but PRISMA_HEALTH_WATCHDOG_ENABLED=false, since DB pool and engine
metrics won't be collected in that configuration.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* ci: retrigger CI checks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor: use labeled gauge for DB pool connection metrics

Replace 3 separate pool gauges (active, idle, total) with a single
`litellm_db_pool_connections` gauge using a `state` label. This is more
Prometheus-idiomatic and exposes all pg_stat_activity states (active,
idle, idle in transaction, etc.) without ambiguity about what "total"
includes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: address Greptile review — stale labels and fallback re-registration

- Zero out known pg_stat_activity states that are absent from the current
  query result, preventing stale gauge values from persisting.
- Simplify _get_or_create_gauge/counter by removing the fallback loop
  that could re-register an already-registered metric (ValueError).
- Add test for stale label clearing across collection cycles.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: include "unknown" in _PG_STATES for stale label clearing

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: collect immediately on start and consolidate into single query

- Move sleep to end of loop so metrics appear on /metrics immediately
  after startup instead of after a 30s delay.
- Combine pool state and lock waiting queries into a single SQL query
  using conditional aggregation, halving per-cycle DB overhead.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: prevent tight spin loop on collection error

Move asyncio.sleep outside the try/except so it always executes even
when _collect_engine_health() or _collect_pool_metrics() raises.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: add multiprocess_mode to _get_or_create_gauge initialization

- Include `multiprocess_mode` parameter to properly support multiprocessing in Gauge creation.
- Ensure consistent behavior for labeled and unlabeled Gauges.

* fix: handle invalid env var and document watchdog prerequisite

- Add try/except ValueError for PRISMA_METRICS_COLLECTION_INTERVAL_SECONDS
  to prevent proxy startup crash on non-numeric values (e.g. "30s")
- Document that DB metrics require both prometheus_system callback and
  PRISMA_HEALTH_WATCHDOG_ENABLED=true

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: use defensive null coalescing for query_raw row values

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add invalid env var fallback test and fix mock signature

- Add test for non-numeric PRISMA_METRICS_COLLECTION_INTERVAL_SECONDS
- Add **kwargs to mock _patched_get_or_create_gauge for forward compat

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 08:49:46 -07:00
milan-berri df2e1bca46 feat: allow JWT and OAuth2 auth to coexist on the same instance (#23153)
When both enable_jwt_auth and enable_oauth2_auth are True, the proxy now
routes tokens based on their format:
- JWT tokens (3 dot-separated parts) -> JWT auth handler
- Opaque tokens -> OAuth2 auth handler

This enables using JWT for human users and OAuth2 for M2M (machine) clients
on the same LiteLLM instance. Previously, enabling OAuth2 would intercept
all tokens on LLM API routes before JWT auth could run.

When only one auth method is enabled, behavior is unchanged (backward compatible).
2026-03-09 08:41:27 -07:00
Ihsan Soydemir b1a6ba7711 feat(search): add Serper (serper.dev) as search provider (#23112)
* Add Serper (serper.dev) as a new search provider

* Add @greptileai fixes
2026-03-09 08:40:37 -07:00
Joe Reyna 36e04b6efe fix(tests): restore litellm_params=None on mock agent in a2a invoke test (#23125) 2026-03-09 07:16:02 -07:00
Joe Reyna 0bc1bd6871 fix(tests): use AsyncMock for prisma find_unique in agent get-by-id test (#23122) 2026-03-09 07:13:50 -07:00
Giulio Leone 556c64875e fix(models): set gpt-5.4-pro mode to responses instead of chat
gpt-5.4-pro and gpt-5.4-pro-2026-03-05 do not support the
/v1/chat/completions endpoint — OpenAI returns a 404 with
"This is not a chat model". These models are responses-only,
like o3-pro and o1-pro.

Changes:
- Set mode from "chat" to "responses" for both model entries
- Update supported_endpoints to ["/v1/responses", "/v1/batch"]
- Add regression test for responses API bridge routing

Fixes BerriAI/litellm#23014
2026-03-09 12:10:08 +01:00
Sameer Kankute a8301d5614 Fix: varaitions endpoint geting 401 2026-03-09 12:51:21 +05:30
Sameer Kankute 4b1929ce93 Fix mistral ocr failing test 2026-03-09 11:29:33 +05:30
Sameer Kankute b20c0afb64 Fix test_anthropic_messages_openai_model_streaming_cost_injection & openrouter image gen 2026-03-09 11:29:04 +05:30
yuneng-jiang 3a1ac964f7 fix: pass organization_ids=None in get_users test calls
When calling get_users() directly (not via FastAPI), Query() defaults
are not resolved. Pass organization_ids=None explicitly to avoid
'Query' object has no attribute 'split' error.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-07 23:15:16 -08:00
yuneng-jiang ac5128493e fix: repair test regressions from org admin auth changes
- test_get_users_*: pass proxy admin user_api_key_dict since get_users
  now calls _authorize_user_list_request which checks user_role
- test_validate_team_member_add_permissions_non_admin: set
  organization_id on mock team since _is_user_org_admin_for_team
  accesses it

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-07 23:10:53 -08:00
yuneng-jiang 8f33983389 Merge remote-tracking branch 'origin' into litellm_org_admin_add_user_e2e 2026-03-07 22:58:57 -08:00
yuneng-jiang ce317148b9 feat: org admin access to team management — backend auth, UI visibility, tests
- Add _is_user_org_admin_for_team() reusable helper to common_utils.py
- Grant org admins access to /team/list, /team/info, /team/member_add,
  /team/member_delete, /team/member_update, /team/model/add,
  /team/model/delete, /team/permissions_list, /team/permissions_update
- Make validate_membership async with org admin fallback
- Add /user/list to self_managed_routes (endpoint handles own auth)
- UI: org admins see Members, Member Permissions, Settings tabs in team view
- UI: CreateUserButton uses useOrganizations() for org dropdown
- UI: org admin delete-member respects disable_team_admin_delete_team_user
- Add 16 unit tests for _is_user_org_admin_for_team, validate_membership,
  _user_is_org_admin route check, and privilege escalation prevention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-07 22:43:09 -08:00
yuneng-jiang 8bf3c0c67f fix: org admin invite user — multi-org selector, organizations list in POST body, auth check
- Thread org objects {organization_id, organization_alias} instead of bare IDs from
  users/page.tsx → view_users.tsx → CreateUserButton so the selector can show aliases
- Replace single-select org dropdown with multi-select; always shown when organizationIds
  is non-null; disabled/pre-selected for single-org admins; displays "Alias (id)"
- handleCreate: maps organization_ids → organizations before POST, removes redundant
  organizationMemberAddCall (backend _add_user_to_organizations handles it)
- _user_is_org_admin: also checks organizations list field in addition to singular
  organization_id so /user/new succeeds for org admins
- Add 5 backend unit tests for _user_is_org_admin and 2 frontend tests for new form behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 20:34:12 -08:00
yuneng-jiang 67884c279a fix: allow any authenticated user to call /user/available_roles
Org admins and team admins opening the invite-user modal could not see
the 4 global proxy roles because GET /user/available_roles has no
request body, so the org-admin route check (which requires
organization_id in the payload) always returned False and blocked them.

Add /user/available_roles to self_managed_routes so the route-access
check passes for any authenticated user. The endpoint's existing
Depends(user_api_key_auth) still requires a valid API key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 20:11:35 -08:00
Giulio Leone 1c3787264b fix(bedrock): strip output_config from Bedrock Invoke requests (#23042)
* fix(bedrock): strip output_config from Bedrock Invoke requests

Bedrock Invoke API does not support the output_config parameter
(added to Anthropic Messages API). Requests with output_config cause
400 errors: 'extraneous key [output_config] is not permitted'.

Strip output_config in both Bedrock Invoke transformation layers
(messages and chat), consistent with how output_format is already
handled and how VertexAI strips both parameters.

Fixes: https://github.com/BerriAI/litellm/issues/22797

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

* test(bedrock): add output_config test for chat/invoke path

Addresses review feedback — the chat/invoke_transformations path now has
symmetric test coverage matching the messages/invoke_transformations path.

Fixes: https://github.com/BerriAI/litellm/issues/22797

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

---------

Co-authored-by: giulio-leone <6887247+giulio-leone@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
2026-03-07 19:33:53 -08:00
Krish Dholakia cf439c269c Agents - add max budget + tpm/rpm limiting per agent AND per agent session (#22849)
* feat: enforce x-litellm-trace-id in header, if required

* feat: update spend for agent

* refactor: update agent table to follow similar format as other entities - also add a spend column - allows us to see spend of an agent

* fix: cleanup ui

* feat: return spend on agent endpoints

* feat: scope pr

* feat(agents/): support budgets + rate limiting on agents + agent sessions

* fix: address PR review feedback

- Add missing tpm_limit, rpm_limit, session_tpm_limit, session_rpm_limit
  columns to root schema.prisma to match proxy and extras schemas
- Add backwards-compatible fallback to key metadata for max_iterations
  so existing users don't silently lose enforcement

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: qa'ed RPM limiting on agents

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 19:12:42 -08:00
Krish Dholakia 03ca98123f Agents health checks (#23044)
* feat: add health check toggle to agents page

Backend:
- Add health_check query parameter to GET /v1/agents endpoint
- When health_check=true, performs concurrent GET requests to each agent's
  URL and filters out agents with unreachable URLs (5s timeout)
- Agents returning HTTP <500 are considered healthy; 5xx and connection
  errors mark agents as unhealthy

UI:
- Add Health Check toggle (Switch) to agents panel header
- Toggle triggers re-fetch with health_check=true, filtering the agent list
- Icon color changes (green/gray) to indicate toggle state
- Tooltip explains behavior: 'only agents with reachable URLs are shown'

Networking:
- Update getAgentsList to accept optional healthCheck boolean parameter

Tests:
- Backend: 9 new tests covering health check filtering, _check_agent_url_health
  helper (no URL, 200, 404, 500, connection error cases)
- UI: 3 new tests verifying toggle renders, initial fetch without health check,
  and fetch with health check after toggle click

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>

* fix: fix greptile comment re: security issue

* fix: fix based on greptile feedback

* fix: align health check tests with implementation

- Rename test_should_return_unhealthy_when_no_url to
  test_should_return_healthy_when_no_url (implementation returns
  healthy=True for agents without a URL)
- Patch get_async_httpx_client instead of httpx.AsyncClient so mocks
  actually intercept the HTTP calls made by _check_agent_url_health
- Remove unnecessary __aenter__/__aexit__ context-manager mocks

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* chore: undo _experimental/out renames from cherry-pick

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Update litellm/proxy/agent_endpoints/endpoints.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-07 18:32:47 -08:00
Krish Dholakia e7714f0ce6 Fix CVEs: bump tar/minimatch/pypdf + harden Docker SBOM patching (#23082)
* fix(docker): bump tar/minimatch/pypdf for CVE fixes + harden SBOM patching

- Bump tar 7.5.8→7.5.10, minimatch 10.2.1→10.2.4, pypdf 6.6.2→6.7.3
- Add sed-based SBOM metadata patching with properly indented find/sed
- Add npm package manager cleanup (apk del / apt-get purge) to remove
  stale SBOM entries from image scanners
- Scope || true to only apk del via brace grouping { ... || true; }
- Guard npm root -g with non-empty assertion to prevent silent failures
- Scope minimatch sed regex to ^10.x to avoid matching other major versions

Addresses: CVE-2026-27903, CVE-2026-27904, GHSA-qffp-2rhf-9h96, CVE-2026-27888

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(docker): scope find to /usr/local/lib /usr/lib, drop autoremove

- Replace `find /` with `find /usr/local/lib /usr/lib` to avoid
  traversing /proc, /sys, /dev during SBOM metadata patching
- Remove `apt-get autoremove -y` from Debian-based Dockerfiles to
  prevent nodejs from being removed as an auto-installed dependency

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-07 18:31:27 -08:00
yuneng-jiang 9e531195ec Merge pull request #23057 from BerriAI/litellm_fix_user_filter_scope
[Fix] User Filter Scope - Make Org-Scoping Opt-In
2026-03-07 18:22:26 -08:00
Ishaan Jaff fc81edc4c4 revert: undo PR #22589 and follow-up vertex anyOf fixes (#23083)
* Revert "fix(vertex): drop bare {} schemas from anyOf before adding nullable=True (#23060)"

This reverts commit 3ad9a536d3.

* Revert "Merge pull request #22589 from Chesars/fix/vertex-preserve-any-type-schema"

This reverts commit da941e4261, reversing
changes made to f77f28a5f8.
2026-03-07 17:49:49 -08:00
yuneng-jiang 906288a1b2 Merge pull request #23065 from BerriAI/litellm_fix_team_scoped_virtual_keys
fix: scoping virtual keys in the teams view to be applying the team filter
2026-03-07 17:43:25 -08:00
Ishaan Jaff 2b8db87a35 fix(pass_through): inject cost into Anthropic streaming chunks + fix SSE parsing in tests (#23078)
streaming_handler.py: EndpointType.ANTHROPIC was missing from the cost
injection block — only VERTEX_AI was handled, so Anthropic passthrough
streaming never got cost injected into message_delta chunks even with
include_cost_in_streaming_usage: true.

test_anthropic_passthrough.py: AnthropicResponsesStreamWrapper yields
full multi-line SSE frames as single bytes objects (e.g.
"event: message_delta\ndata: {...}\n\n"). The tests were checking
startswith('data: ') on the whole chunk, which starts with 'event:',
so every message_delta event was silently skipped. Fix: split each chunk
by \n before checking for the data: prefix. Also removes the
@pytest.mark.skip added with wrong diagnosis on the OpenAI model test.
2026-03-07 17:27:51 -08:00
Ishaan Jaff a30b71c946 fix(tests): generate square PNG in image_url fixture for DALL-E 2 variation test (#23073)
DALL-E 2 create_variation requires a square PNG. The old fixture fetched
the LiteLLM logo from S3 which is non-square, causing API rejections.
Replace with a programmatically-generated 1024x1024 RGBA PNG via Pillow.
2026-03-07 16:58:27 -08:00
Ishaan Jaff 34984d22ae fix(test): update openrouter image generation assertion for gemini-2.5-flash-image (#23070)
* fix(anthropic/skills): remove ?beta=true query param from Skills API URLs

Beta access is controlled via the anthropic-beta header (already set
to skills-2025-10-02), not a URL query param. The spurious ?beta=true
was causing 500 errors from Anthropic's server.

* fix(test): update openrouter image generation assertion to accept any image format

gemini-2.5-flash-image returns JPEG, not PNG. The assertion was hardcoded
to png after the model was swapped from gemini-2.5-flash-image-preview
(which returned PNG) in commit 34e8e972.
2026-03-07 16:52:04 -08:00
Ishaan Jaff 66c822435e fix(ci): image variation openai sdk 2.24.0 compat + swap bedrock nova-premier to nova-pro (#23066)
* fix(ci): fix image variation test for openai sdk 2.24.0 and swap nova-premier to nova-pro

image_gen_tests: openai==2.24.0 (bumped Feb 25) requires BytesIO objects to have
a .name attribute for MIME type detection in multipart uploads. Add .name to the
fixture so create_variation works. Also guard with OPENAI_API_KEY skipif.

proxy_e2e_anthropic_messages_tests: nova-premier requires provisioned throughput
not available via standard on-demand cross-region inference on the CI account.
Swap to nova-pro which uses standard inference profiles.

* fix: remove skipif, keep only .name fix for openai sdk compat
2026-03-07 16:41:54 -08:00
Ryan Crabbe 2cd0c767ee fix: regression test 2026-03-07 16:40:29 -08:00
Ryan Crabbe daf7c0c3a8 fix: scoping virtual keys in the teams view to be applying the team filter globally instead of an or branch 2026-03-07 16:23:12 -08:00
Ishaan Jaff e8a7116899 fix(tests): fix repeating chunk and audio usage streaming tests (#23061)
- Replace ModelResponse(stream=True) with ModelResponseStream in
  test_unit_test_custom_stream_wrapper_repeating_chunk — stream=True
  stores delta as a plain dict causing AttributeError in CustomStreamWrapper
- Accept MidStreamFallbackError alongside InternalServerError in the
  repeating-chunk safety check assertion
- Add @pytest.mark.flaky(retries=3) to the live OpenAI audio output
  usage test
2026-03-07 16:18:51 -08:00