Commit Graph

7561 Commits

Author SHA1 Message Date
Cursor Agent e242356570 fix(ci): fix ruff lint errors and 9 failing unit tests on main
Lint fixes (check_code_and_doc_quality job):
- Remove unused variable reasoning_effort in gpt_5_transformation.py (F841)
- Remove unused timezone imports in mcp_server rest_endpoints.py and server.py (F401)
- Remove unused ProxyBaseLLMRequestProcessing import in realtime endpoints.py (F401)
- Add BaseRealtimeHTTPConfig to TYPE_CHECKING block in utils.py (F821)
- Add PLR0915 per-file-ignore for mcp_server/rest_endpoints.py in ruff.toml

Test fixes (litellm_mapped_tests_llms job):
- Gemini video cost tests: pass explicit model_info to video_generation_cost()
  instead of relying on gemini/veo-3.0-generate-preview being in model_prices JSON
- Anthropic max_tokens tests: mock get_max_tokens() to return expected values
  instead of depending on claude-3-5-sonnet-20241022 being in model_prices JSON
- Vertex AI pydantic obj test: update from removed gemini-1.5-pro to gemini-2.5-flash,
  update expected request body to use response_json_schema format
- Vertex AI/Bedrock file_content integration tests: update mocks to target
  base_llm_http_handler.retrieve_file_content (the new code path via
  ProviderConfigManager) instead of the old vertex_ai_files_instance/
  bedrock_files_instance paths

Co-authored-by: yuneng-jiang <yuneng-jiang@users.noreply.github.com>
2026-03-12 19:58:43 +00:00
Chesars 690ad4c45b fix(openai): drop all reasoning_effort for gpt-5.4 + tools, including 'none'
OpenAI rejects any reasoning_effort (even 'none') with tools in
/v1/chat/completions for gpt-5.4. Update the guard to drop reasoning_effort
regardless of value. Add docs explaining the auto-drop behavior.
2026-03-12 16:22:40 -03:00
Cesar Garcia ec763784e0 Merge branch 'main' into litellm_oss_staging_03_11_2026 2026-03-12 16:21:28 -03:00
Joe Reyna f2f843448e Merge pull request #23414 from joereyna/fix/pass-through-server-root-path
fix: strip SERVER_ROOT_PATH prefix before checking mapped pass-through routes
2026-03-12 11:57:13 -07:00
Cesar Garcia e01d722803 Merge branch 'main' into litellm_oss_staging_03_11_2026 2026-03-12 13:53:14 -03:00
Sameer Kankute d507f840d3 Merge pull request #23432 from BerriAI/litellm_azure-model-router-show-actual-model
feat(azure_ai): show actual model used in Azure Model Router response
2026-03-12 22:18:50 +05:30
Sameer Kankute d1a99f571e Merge pull request #23446 from BerriAI/litellm_add_webrtc_support
[Feat] Add WebRTC support
2026-03-12 22:16:55 +05:30
Chesars 4e6e1d8de8 merge: resolve conflicts with upstream staging (bedrock + mcp tests)
Keep both sets of tests: upstream's OAuth2 token injection test and
our case-insensitive tool matching tests. Use upstream's version of
the bedrock output_config test (more comprehensive).
2026-03-12 13:40:16 -03:00
Chesars feed274aa3 Reapply "feat: add model_cost aliases expansion support"
This reverts commit 3d2df7e8b5.
2026-03-12 13:36:57 -03:00
michelligabriele 7c5e2e8389 fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints (#22985)
* fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints

The response headers hook had 5 gaps that prevented callbacks from
reliably extracting routing metadata across endpoint types:

1. Hook never fired for /audio/transcriptions (endpoint bypasses
   base_process_llm_request)
2. custom_llm_provider not accessible in hook data for any endpoint
3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params
   (unlike chat completions)
4. model_info under inconsistent keys (metadata vs litellm_metadata)
5. request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes
routing metadata (custom_llm_provider, model_info, api_base, model_id)
regardless of endpoint type. Also stamps custom_llm_provider on
Responses API responses, adds the hook call to the transcription
handler, and passes request_headers at all call sites.

Supersedes PR #21385.

* fix(proxy): address review feedback — safer backwards compat and None guards

- Replace try/except TypeError with inspect.signature() check for
  litellm_call_info backwards compatibility. This avoids masking real
  TypeErrors inside callback implementations and prevents double
  invocation with inconsistent parameters.

- Use (data.get("key") or {}) instead of data.get("key", {}) to guard
  against keys that exist with an explicit None value, which would
  cause AttributeError on the subsequent .get() call.

* fix(proxy): cache inspect.signature result for callback compat check

Move the inspect.signature() call into a module-level helper with a
dict cache keyed by callback identity. Avoids repeated introspection
per request per callback in the hot path.

* fix(proxy): use class identity for signature cache key

Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of
id(cb) to avoid stale entries from Python address reuse after GC.
All instances of the same callback class share the same method
signature, so class identity is both safer and more cache-efficient.
2026-03-12 08:51:00 -07:00
joereyna 1af7f11dae fix: extract normalize_route_for_root_path to deduplicate root-path stripping; fix mock target 2026-03-12 08:16:00 -07:00
Cesar Garcia 6bd7cd7573 Merge branch 'main' into litellm_oss_staging_03_11_2026 2026-03-12 10:43:08 -03:00
Sameer Kankute 291e6e1841 Merge pull request #23435 from BerriAI/litellm_vector-store-retrieve-list-update-delete
Add vector store retrieve list update delete
2026-03-12 19:08:39 +05:30
Sameer Kankute 4f5b6ae556 Merge pull request #23448 from BerriAI/litellm_cicd_1203126
Litellm cicd 1203126
2026-03-12 19:07:33 +05:30
Sameer Kankute bb451cfcb0 address greptile review feedback (greploop iteration 2)
- Thread api_version through HTTP handlers to Azure realtime endpoints
- Make expires_at optional in RealtimeClientSecretResponse
- Fix test token expiry times to be in the future
- Populate user_id and team_id in minimal_auth for spend tracking

Made-with: Cursor
2026-03-12 18:53:22 +05:30
Sameer Kankute f5be79419c Fix test_claude_agent_sdk_streaming 2026-03-12 18:36:04 +05:30
Sameer Kankute 982f3917c5 Fix test_standard_logging_payload 2026-03-12 18:35:01 +05:30
Sameer Kankute 15d873e204 Fix update deprecated model test 2026-03-12 18:34:20 +05:30
Sameer Kankute 374c35a6b7 Fix update deprecated model test 2026-03-12 18:34:15 +05:30
Sameer Kankute 0f91a4f9da Fix test_get_tools_for_single_server 2026-03-12 18:33:14 +05:30
Sameer Kankute 412a283569 Revert "fix(vertex): skip harmful schema transforms for Gemini 2.0+ tool parameters"
This reverts commit a9c3095cc5.
2026-03-12 18:26:11 +05:30
Chesars 1be6b31e2f merge: resolve conflicts between main and litellm_oss_staging_03_11_2026 2026-03-12 09:38:31 -03:00
Sameer Kankute 7778af6c78 Add tests 2026-03-12 17:54:57 +05:30
Sameer Kankute 36ec80d90c Fix azure model router 2026-03-12 12:40:37 +05:30
Joe Reyna 2848d5607f Merge pull request #23417 from joereyna/fix/vertex-batch-cost-model-name
fix: update stale model name in vertex AI batch cost calculation test
2026-03-11 23:47:11 -07:00
Sameer Kankute 5927345eab Add get, list and delete for vector store endpoints 2026-03-12 12:09:51 +05:30
Sameer Kankute 18a05f7a40 feat(vector-stores): add retrieve/list/update/delete handlers
- Add vector_store_retrieve/list/update/delete handlers in llm_http_handler
- Fix AsyncHTTPHandler.get() timeout arg (not supported)
- Fix update/delete URL (api_base already includes /vector_stores)
- Clean metadata for update to avoid UserAPIKeyAuth JSON serialization

Made-with: Cursor
2026-03-12 11:58:44 +05:30
Sameer Kankute 5b83aae715 feat(azure_ai): show actual model used in Azure Model Router response
- Azure Model Router transform_response: let parent extract actual model from raw response
- common_request_processing: skip model override for Azure Model Router requests
- proxy_server: skip streaming chunk model restamp for Azure Model Router
- Add _is_azure_model_router_request helper
- Add tests for non-streaming and streaming

Made-with: Cursor
2026-03-12 11:41:19 +05:30
Joe Reyna c4aa15b4e2 Merge pull request #23418 from joereyna/fix/gemini-passthrough-stale-model-name
fix: update stale gemini-1.5-flash model name in passthrough logging handler test
2026-03-11 22:55:19 -07:00
Ishaan Jaff 19db79db17 fix(mcp): OAuth2 chat connect - tools fetch, auth, and status fixes (#23406)
* fix(mcp): OAuth2 chat connect - tools fetch, auth flow, and status fixes

- schema.prisma: add missing MCP table fields (approval_status, submitted_by, submitted_at, reviewed_at, review_notes) to prevent destructive migrations
- rest_endpoints.py: inject user OAuth token via extra_headers for OAuth2 servers so tools list is populated; add server name->UUID resolution so MCPConnectPicker name lookups work
- mcp_registry.json: fix Atlassian defaults (transport: http, url: .../v1/mcp)
- ChatPage.tsx: read mcpOauthReturn param to init sidebarView="apps" on OAuth return, clean up param after mount
- MCPAppsPanel.tsx: auto-add OAuth2 servers to selectedServers when credential detected; onConnect also enables server for chat; disconnect removes from selectedServers
- mcp_servers.tsx: sort servers by created_at DESC
- useUserMcpOAuthFlow.tsx: append mcpOauthReturn=apps to return URL so Apps panel is mounted on return

* address greptile review feedback (greploop iteration 1)

* fix(mcp): inject stored OAuth2 token when fetching tools via /responses API

When a user has connected an OAuth2 MCP server (e.g. Atlassian) and then
uses the /responses endpoint with that server, tool listing was failing
because the stored per-user OAuth token was never injected.

Two fixes:
1. server.py: add _get_user_oauth_extra_headers_from_db() helper; call it
   in _get_tools_from_mcp_servers when oauth2_headers is None for an OAuth2
   server, falling back to the user's stored token in LiteLLM_MCPUserCredentials
2. litellm_proxy_mcp_handler.py: also intercept MCP tools whose server_url
   matches */mcp/<server_name> (e.g. http://localhost:4000/mcp/atlassian_test)
   by rewriting them to litellm_proxy/mcp/<server_name> so they go through
   the internal handler (and get the OAuth token injected) instead of being
   forwarded to OpenAI raw where localhost is unreachable

* address greptile review feedback (greploop iteration 2)

* test(mcp): add unit test for OAuth2 token injection in _get_tools_from_mcp_servers

Verifies that when _get_tools_from_mcp_servers is called for an OAuth2 MCP
server without oauth2_headers in the request, the implementation:
- calls _prefetch_oauth_creds_for_user once (not per-server) to avoid N+1 queries
- passes the stored token as extra_headers={"Authorization": "Bearer ..."} to
  _get_tools_from_server so the upstream OAuth2 MCP server authenticates correctly

* address greptile review feedback (greploop iteration 3)

* address greptile review feedback (greploop iteration 4)

* address greptile review feedback (greploop iteration 5)

* redesign credentials table to use Tremor table layout matching Keys page

* fix: /server/oauth authorize 422 - make client_id optional, fall back to real DB server

* fix: mcp_token client_id optional, resolve from server record

* fix: look up real server by UUID (get_mcp_server_by_id) before falling back to name

* Update litellm/responses/mcp/litellm_proxy_mcp_handler.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: address greptile feedback - client_id guards, dict spread, helper refactor, tests

- mcp_management_endpoints: raise 400 when resolved_client_id is empty in
  mcp_authorize and mcp_token instead of forwarding "" to upstream
- litellm_proxy_mcp_handler: use {**tool, "server_url": ...} spread instead
  of dict(tool) + mutation for shallow copy safety
- rest_endpoints: extract _oauth2_server_ids set comprehension to a named
  _get_oauth2_server_ids() helper for clarity; add Set to typing imports
- test_rest_endpoints: add tests for name→UUID resolution path,
  access-denied when resolved UUID not in allowed list, and OAuth2 user
  token injection for single-server requests; fix fake_get_tools signature
  to accept extra_headers kwarg

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-11 22:07:02 -07:00
Alvin Tang 2b7b7d3086 fix(snowflake): transform string tool_choice to object format (#23318)
Snowflake's Cortex LLM API (like Anthropic) requires tool_choice as an
object with a "type" field, not as a bare string. Passing tool_choice="auto"
(or "required"/"none") results in error 390142 "invalid payload".

This fix transforms OpenAI string tool_choice values to the Snowflake
object format:
- "auto"     -> {"type": "auto"}
- "required" -> {"type": "any"}  (Snowflake/Anthropic convention)
- "none"     -> {"type": "none"}

The dict-to-dict transformation for specific function tool choices
({"type": "function", "function": {"name": "..."}} -> {"type": "tool",
"name": [...]}) remains unchanged.

Fixes #23284

Co-authored-by: gambletan <tan@echooo.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
2026-03-11 21:29:59 -07:00
yuneng-jiang 626d120873 Merge pull request #23425 from BerriAI/cursor/litellm-ci-stability-4513
[Infra] CI/CD Fixes
2026-03-11 21:08:16 -07:00
Sameer Kankute 49d653c3aa Revert "chore: cleanup deprecated models from pricing JSON" 2026-03-12 09:27:40 +05:30
yuneng-jiang ce80e16755 Merge pull request #23419 from BerriAI/litellm_audit_log_admin_viewer
[Feature] Allow Admin Viewers to Access Audit Logs
2026-03-11 20:40:48 -07:00
Cursor Agent d5fc63f63f fix(ci): fix deprecated model refs and schema validation in unit tests
- Replace gemini-pro with gemini-3-pro-preview in test_cost_discount_vertex_ai
  (gemini-pro removed from cost map)
- Replace github/claude-3-5-sonnet-latest with github/claude-3-7-sonnet-20250219
  in test_supports_function_calling_github_anthropic_alias (model removed)
- Add supports_multimodal, uses_embed_content, input/output_cost_per_token_above_256k_tokens
  to JSON schema in test_utils.py (new properties added to model cost map)

Co-authored-by: yuneng-jiang <yuneng-jiang@users.noreply.github.com>
2026-03-12 03:28:24 +00:00
Cursor Agent aacc7b18f8 fix(ci): add missing provider docs, fix deprecated model refs in cost tests
- Add black_forest_labs and charity_engine to provider_endpoints_support.json
  (fixes check_code_and_doc_quality job)
- Replace o1-mini with o1 in test_reasoning_tokens_no_price_set (model removed
  from cost map)
- Replace gemini-2.5-pro-exp-03-25 with gemini-2.5-pro in
  test_generic_cost_per_token_above_200k_tokens (model removed from cost map)
- Fix test_get_cost_for_anthropic_web_search to use claude-3-7-sonnet-20250219
  with custom_llm_provider='anthropic' so web search cost is computed correctly

Co-authored-by: yuneng-jiang <yuneng-jiang@users.noreply.github.com>
2026-03-12 03:11:29 +00:00
yuneng-jiang 76cff9ae0e Allow proxy_admin_viewer to access audit log endpoints
Add /audit and /audit/{id} to admin_viewer_routes so read-only admins
can view audit logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 20:07:51 -07:00
joereyna 5c20617a21 fix: mock completion_cost in routing test and restore helper consistency
- Mock litellm.completion_cost in test_pass_through_success_handler_gemini_routing
  to decouple it from model_prices_and_context_window.json; prevents the same
  breakage if gemini-2.0-flash is ever removed from the pricing map
- Revert _create_passthrough_logging_payload URL back to gemini-1.5-flash to
  eliminate inconsistency with the other tests that use gemini-1.5-flash explicitly
2026-03-11 20:05:40 -07:00
yuneng-jiang 5d7bfe6260 Merge pull request #23407 from BerriAI/cursor/litellm-ci-stability-4513
LiteLLM CI stability
2026-03-11 20:03:07 -07:00
ryan-crabbe 6229e73dfc Merge pull request #23415 from BerriAI/litellm_style-created-by-not-uuid
style: make virtual keys tables' created by not a UUID
2026-03-11 19:53:39 -07:00
Joe Reyna 36819ffb6f fix: null AWS SigV4 fields on MagicMock in TestTemporaryMCPSessionEndpoints (#23408)
* fix(test): null AWS SigV4 fields on MagicMock in test_inherit_credentials_from_existing_server

* fix(test): null AWS SigV4 fields on MagicMock in test_add_session_mcp_server_caches_and_redacts_credentials
2026-03-11 19:46:08 -07:00
Ryan Crabbe dfda7c10fc fix: set created_by on mock keys in test_list_keys_with_expand_user 2026-03-11 19:45:04 -07:00
joereyna 59778f3ce7 fix: update stale gemini-1.5-flash model name to gemini-2.0-flash in passthrough logging handler test 2026-03-11 19:30:41 -07:00
joereyna 0bd6cab7db fix(test): update stale gemini-1.5-flash-001 model name to gemini-2.0-flash-001 in batch cost test 2026-03-11 19:20:55 -07:00
Ryan Crabbe 4973311070 test: add tests for created_by_user expansion in key list 2026-03-11 19:16:09 -07:00
Shivam Rawat 4e7003afef Fix 404 when fetching config-based MCP servers by ID (#22711)
* fixed mcp api

* added non-admin test

* resolved greptile comemnt

* fix: add IP filtering to get_mcp_server_by_id path in fetch_mcp_server

Apply _is_server_accessible_from_ip check after get_mcp_server_by_id lookup
to prevent external callers from accessing MCP servers configured with
available_on_public_internet=False when they know the server_id.

Made-with: Cursor
2026-03-11 18:30:30 -07:00
Cursor Agent 49dc391a46 fix(ci): remove unused is_expired variable (ruff F841) and handle ModelDeprecated in image gen test
- Remove dead code: is_expired was assigned but never used in
  mcp_management_endpoints.py (the raw expires_at timestamp is passed
  directly to the client per existing comment)
- Handle Azure DALL-E 3 ModelDeprecated (HTTP 410) error gracefully in
  base_image_generation_test.py so CI doesn't fail on deprecated model
  deployments

Co-authored-by: yuneng-jiang <yuneng-jiang@users.noreply.github.com>
2026-03-12 01:27:42 +00:00
yuneng-jiang 56557bfae1 Merge pull request #23400 from BerriAI/litellm_/elated-hoover
[Fix] Replace deprecated models in tests
2026-03-11 17:25:37 -07:00
yuneng-jiang 82de82f1b6 Fix test_completion_cost_prompt_caching gemini parametrization
gemini/gemini-2.5-flash lacks cache_creation_input_token_cost in the
model cost map, causing a TypeError when the test multiplies
cache_creation_input_tokens by None. Use claude-haiku-4-5 instead,
which has the required prompt caching cost fields.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:12:15 -07:00
yuneng-jiang c9f7075690 Replace additional deprecated models across test files
- tests/local_testing/test_completion_cost.py:
  - claude-3-5-sonnet-20240620 -> claude-sonnet-4-6
  - gemini/gemini-1.5-flash-001 -> gemini/gemini-2.5-flash

- tests/test_litellm/test_utils.py:
  - claude-3-5-sonnet-20240620 -> claude-sonnet-4-6 (VertexAI config test, proxy tests)
  - gemini-1.5-pro -> gemini-2.5-pro (pre_process_non_default_params)
  - gemini/gemini-1.5-pro -> gemini/gemini-2.5-pro (proxy tests)

- tests/litellm_utils_tests/test_utils.py:
  - claude-3-opus-20240229 -> claude-sonnet-4-6 (trimming, vision tests)
  - gemini-pro -> gemini-2.5-pro (function calling test)
  - gemini-pro-vision -> gemini-2.5-flash (vision test)
  - gemini-1.5-pro -> gemini-2.5-pro (response schema test)
  - gemini/gemini-1.5-flash -> gemini/gemini-2.5-flash (function calling test)
  - gemini-1.5-pro -> gemini-2.5-pro (vision gemini test)
  - gpt-4-vision-preview -> gpt-4o (vision test)

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 17:03:54 -07:00