Commit Graph

37248 Commits

Author SHA1 Message Date
Yuneng Jiang dafa1bf97c Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_yj_apr15
# Conflicts:
#	litellm/litellm_core_utils/litellm_logging.py
#	uv.lock
2026-04-16 09:17:20 -07:00
Sameer Kankute 26937a2146 Merge pull request #25831 from BerriAI/litellm_oss_staging_04_15_2026_p1
litellm oss staging 04/15/2026
2026-04-16 19:53:00 +05:30
Sameer Kankute 4b5c86b8a1 Fix code qa 2026-04-16 19:29:08 +05:30
Sameer Kankute baf19b4413 Fix import error 2026-04-16 19:16:49 +05:30
waani d9a8a8a42e fix(credentials): sync in-memory credential_list after update (#25758) 2026-04-16 19:04:26 +05:30
Tim Ren dd4a41951f fix(utils): allowed_openai_params must not forward unset params as None (#25777)
* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint (#25696)

* feat(proxy): add NO_OPENAPI env var to disable /openapi.json endpoint - Fixes #25538

* test(proxy): add tests for _get_openapi_url

---------

Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>

* feat(prometheus): add api_provider label to spend metric (#25693)

* feat(prometheus): add api_provider label to spend metric

Add `api_provider` to `litellm_spend_metric` labels so users can
build Grafana dashboards that break down spend by cloud provider
(e.g. bedrock, anthropic, openai, azure, vertex_ai).

The `api_provider` label already exists in UserAPIKeyLabelValues and
is populated from `standard_logging_payload["custom_llm_provider"]`,
but was not included in the spend metric's label list.

* add api_provider to requests metric + add test

Address review feedback:
- Add api_provider to litellm_requests_metric too (same call-site as
  spend metric, keeps label sets in sync)
- Add test_api_provider_in_spend_and_requests_metrics following the
  existing pattern in test_prometheus_labels.py

* fix: ensure `litellm_metadata` is attached to `pre_call` guardrail to align with `post_call` guardrail (#25641)

* fix: ensure `litellm_metadata` is attached to pre_call to align with post_call

* refactor: remove unused BaseTranslation._ensure_litellm_metadata

* refactor: module level imports for ensure_litellm_metadata and CodeQL

* fix: update based off of Codex comment

* revert: undo usage of `_guardrail_litellm_metadata`

* feat: add pricing entry for openrouter/google/gemini-3.1-flash-lite-preview (#25610)

* fix(bedrock): skip synthetic tool injection for json_object with no schema (#25740)

When response_format={"type": "json_object"} is sent without a JSON
schema, _create_json_tool_call_for_response_format builds a tool with an
empty schema (properties: {}). The model follows the empty schema and
returns {} instead of the actual JSON the caller asked for.

This patch:
- Skips synthetic json_tool_call injection when no schema is provided.
  The model already returns JSON when the prompt asks for it.
- Fixes finish_reason: after _filter_json_mode_tools strips all
  synthetic tool calls, finish_reason stays "tool_calls" instead of
  "stop". Callers (like the OpenAI SDK) misinterpret this as a pending
  tool invocation.

json_schema requests with an explicit schema are unchanged.

Co-authored-by: Claude <noreply@anthropic.com>

* fix(utils): allowed_openai_params must not forward unset params as None

`_apply_openai_param_overrides` iterated `allowed_openai_params` and
unconditionally wrote `optional_params[param] = non_default_params.pop(param, None)`
for each entry. If the caller listed a param name but did not actually
send that param in the request, the pop returned `None` and `None` was
still written to `optional_params`. The openai SDK then rejected it as
a top-level kwarg:

    AsyncCompletions.create() got an unexpected keyword argument 'enable_thinking'

Reproducer (from #25697):

    allowed_openai_params = ["chat_template_kwargs", "enable_thinking"]
    body = {"chat_template_kwargs": {"enable_thinking": False}}

Here `enable_thinking` is only present nested inside
`chat_template_kwargs`, so the helper should forward
`chat_template_kwargs` and leave `enable_thinking` alone. Instead it
wrote `optional_params["enable_thinking"] = None`.

Fix: only forward a param if it was actually present in
`non_default_params`. Behavior is unchanged for the happy path (param
sent → still forwarded), and the explicit `None` leakage is gone.

Adds a regression test exercising the helper in isolation so the test
does not depend on any provider-specific `map_openai_params` plumbing.

Fixes #25697

---------

Co-authored-by: lovek629 <59618812+lovek629@users.noreply.github.com>
Co-authored-by: Progressive-engg <lov.kumari55@gmail.com>
Co-authored-by: Ori Kotek <ori.k@codium.ai>
Co-authored-by: Alexander Grattan <51346343+agrattan0820@users.noreply.github.com>
Co-authored-by: Mohana Siddhartha Chivukula <103447836+iamsiddhu3007@users.noreply.github.com>
Co-authored-by: Amiram Mizne <amiramm@users.noreply.github.com>
Co-authored-by: Claude <noreply@anthropic.com>
2026-04-16 19:04:26 +05:30
Brendan Smith-Elion 265a960472 fix(noma-v2): fall back to key_alias for application_id in Noma dashboard (#25795)
Noma v1 resolved application_id from user_api_key_alias when no explicit
value was set (PR #16832). Noma v2 (PR #21400) was rewritten from scratch
and this fallback was not ported, causing all requests from shared LiteLLM
instances to appear as a single generic "litellm" application in the Noma
dashboard — breaking per-user traceability.

Fix: after checking dynamic_params and self.application_id, fall back to
user_api_key_alias from litellm_metadata or metadata. This matches the
pattern used by PromptSecurityGuardrail._resolve_key_alias_from_request_data()
and restores the v1 behavior where each API key gets its own application
entry in the Noma dashboard.

Fixes #25794

Co-authored-by: Brendan Smith-Elion <brendan.smith-elion@arcadia.io>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-16 19:04:24 +05:30
Jared Everett 3cbb36aa13 fix(ollama): propagate done_reason='length' as finish_reason for max_tokens truncation (#25824)
* fix(ollama): propagate done_reason='length' as finish_reason for max_tokens truncation

Ollama returns done_reason='length' when a response is cut off by num_predict
(the max_tokens limit). Previously, non-streaming responses hardcoded
finish_reason='stop', and streaming used chunk.get('done_reason', 'stop')
which also defaulted to 'stop' when done_reason was absent.

This meant callers (e.g. the Anthropic pass-through adapter, which maps
OpenAI 'length' -> Anthropic 'max_tokens') could never detect truncation,
making stop_reason always appear as 'end_turn' even for cut-off responses.

Fix: read done_reason from the response JSON in the non-streaming path and
use `chunk.get('done_reason') or 'stop'` in the streaming path, so Ollama's
actual done_reason passes through to the caller unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Update test_ollama_chat_transformation.py

* Update litellm/llms/ollama/chat/transformation.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-16 19:03:41 +05:30
Darien Kindlund 6b2973b29a fix(vertex): strip version suffix from model name in count_tokens requests (#25800)
The Vertex AI count-tokens endpoint rejects model names that include
version suffixes (@default, @20251001, etc.) with:
"claude-sonnet-4-6@default is not supported for token counting"

The same model without the suffix ("claude-sonnet-4-6") works correctly.

Strip @suffix from both the model parameter and request_data["model"]
in handle_count_tokens_request before sending to the API.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 19:03:40 +05:30
ryan-crabbe-berri ed0138b50e Merge pull request #25812 from BerriAI/litellm_fix-invalidate-orgs-on-team-mutation
fix(ui): invalidate org queries after team mutations
2026-04-15 22:51:20 -07:00
ryan-crabbe-berri 18c93e0ccd Merge pull request #25809 from BerriAI/litellm_fix_tool_test_panel_bool_rendering
fix(ui): use antd Select for MCP ToolTestPanel bool inputs
2026-04-15 22:50:57 -07:00
ryan-crabbe-berri cf4f0516be Merge pull request #25806 from BerriAI/litellm_fix_guardrail_optional_params_bool_rendering
fix(ui): render guardrail optional_params bool defaults in Select
2026-04-15 22:50:20 -07:00
Ryan Crabbe 96415a5ac2 Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_fix-invalidate-orgs-on-team-mutation 2026-04-15 22:41:38 -07:00
Ryan Crabbe 83095c24c6 Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_fix_tool_test_panel_bool_rendering 2026-04-15 22:41:09 -07:00
Ryan Crabbe bbf204e602 Merge remote-tracking branch 'origin/litellm_internal_staging' into litellm_fix_guardrail_optional_params_bool_rendering 2026-04-15 22:40:37 -07:00
ryan-crabbe-berri 2dd060b4e4 Merge pull request #25838 from BerriAI/litellm_fix-virtual-key-projected-spend-alert
fix(proxy): fix virtual key projected-spend soft budget alerts
2026-04-15 22:22:33 -07:00
Yuneng Jiang c8cfc5de21 fix(httpx): set response.request and strip content-encoding in MaskedHTTPStatusError
MaskedHTTPStatusError constructs a new httpx.Response from the original
error. Two bugs surfaced under real HTTP error responses:

1. The new Response was created without request=, so response.request
   raised RuntimeError("The .request property has not been set.") for
   any downstream caller (e.g. exception_mapping_utils) that inspected it.

2. The decoded response bytes were passed together with the original
   Content-Encoding header. On construction httpx tried to decompress
   the already-decoded bytes and raised httpx.DecodingError
   ("Error -3 while decompressing data: incorrect header check").

Set response.request to the masked Request and strip Content-Encoding
(and the now-stale Content-Length) before rebuilding the Response.
URL/message masking is unchanged; the new request carries the already
masked URL.

Also update test_logging_key_masking_gemini: the security commit
25f93bed91 moved Gemini API keys from ?key=... URL params to the
x-goog-api-key header, so api_base no longer contains the key.
2026-04-15 22:03:48 -07:00
Ryan Crabbe f639769ca9 fix(proxy): use flat soft_budget field for virtual key projected-spend alerts
The projected-spend alert in _update_key_cache read from
existing_spend_obj.litellm_budget_table["soft_budget"], but the nested
dict is never populated for virtual keys (the combined_view SQL maps
budget fields to flat top-level attributes instead). This made the
check dead code — it silently short-circuited on every request, and
when unblocked, crashed update_cache with a Pydantic ValidationError
because _get_projected_spend_over_limit returns a date object but
CallInfo.projected_exceeded_date expects str.

Fixes: read from the flat existing_spend_obj.soft_budget field that IS
populated, and stringify projected_exceeded_date.

Also marks team soft budget email alerts as enterprise in docs.

Closes #20324
2026-04-15 21:38:18 -07:00
Yuneng Jiang 070374d03a fix(ci): authorize RestrictedPython in liccheck.ini
RestrictedPython (ZPL-2.1, a BSD-style permissive license) was added as
a dependency for the custom_code guardrail sandbox, but the license
checker didn't recognize it. Add to authorized packages list.
2026-04-15 21:20:40 -07:00
Yuneng Jiang fdeeed6df8 fix(ci): resolve mypy and ruff lint failures
- vertex_ai_context_caching.py: add explicit Optional[str] annotation on
  auth_header so later branches that assign vertex_auth_header (Optional[str])
  type-check against the first branch's dict assignment (which already has
  type: ignore[assignment]).
- path_utils.py: remove unused pathlib.Path import (F401).
- emulated_handler.py: extract _extract_tool_call_fields,
  _resolve_queries_from_args, _execute_file_search_tool_calls, and
  _build_follow_up_input helpers to drop aresponses_with_emulated_file_search
  below ruff's PLR0915 statement limit. Behavior unchanged.
2026-04-15 21:12:51 -07:00
yuneng-jiang be1b802501 Merge pull request #25834 from stuxf/fix/path-traversal-guardrail-yaml
fix(proxy): add shared path utilities, prevent directory traversal
2026-04-15 21:01:48 -07:00
yuneng-jiang 0c8b83c0a1 Merge pull request #25827 from stuxf/fix/outbound-host-validation
fix(proxy): harden request parameter handling
2026-04-15 20:57:45 -07:00
user a4faacecaf style: move import os to module level, fix import ordering 2026-04-16 03:29:55 +00:00
user a44c4d0f27 style: fix import ordering in prompt_endpoints 2026-04-16 03:26:31 +00:00
user c2b3b62996 test: add unit tests for path_utils safe_join and safe_filename 2026-04-16 03:25:42 +00:00
user 278e3f4a6b refactor: harden path utils, move imports to module level
Add null byte rejection to safe_join and safe_filename. Normalize
backslash separators in safe_filename for cross-platform safety.
Include resolved path in ValueError for debugging. Move imports
to module level per project conventions.
2026-04-16 03:15:04 +00:00
user 9691649606 fix(proxy): add shared path utilities, prevent directory traversal
Add safe_join() and safe_filename() in proxy/common_utils/path_utils.py
for constructing filesystem paths from user-controlled inputs. Apply to
guardrail category YAML endpoint and dotprompt file converter.
2026-04-16 03:11:50 +00:00
ishaan-berri 0b7335201b Merge pull request #25699 from BerriAI/litellm_ishaan_april14
Litellm ishaan april14
2026-04-15 19:01:06 -07:00
Ishaan Jaffer def9c4ec47 chore: merge litellm_internal_staging, resolve uv.lock conflict 2026-04-15 18:51:19 -07:00
Ishaan Jaffer 26136708bb chore: trigger CI re-evaluation 2026-04-15 18:48:13 -07:00
ishaan-berri ae2aba0e15 Merge pull request #25622 from Sameerlite/litellm_docs_cost_discrepancy_guide
docs(troubleshoot): cost discrepancy debugging guide
2026-04-15 18:43:15 -07:00
ishaan-berri a588f76789 Litellm ishaan april15 2 (#25828)
* [Test] Add Azure async chat completion timeout test. WIP

* Capture TTFT for /v1/messages streaming responses

The pass-through streaming path for /v1/messages (Anthropic, Bedrock,
Vertex AI, Azure AI, Minimax) logged completion_start_time only after
the entire stream finished. async_success_handler then fell back to
end_time, making TTFT equal to total duration or null in the UI and
Prometheus.

Record the timestamp of the first chunk in async_sse_wrapper and
propagate it to model_call_details before the logging handler runs,
so gen_ai.response.time_to_first_token reflects the real first-chunk
latency.

Fixes #25598

* [Refactor] Implement timeout resolution logic in completion function

add fetch ``request_timeout`` from litellm_settings

* remove stale test case

* remove extra print statement

* default request timeout value in constants to 600s to match timeout defaults handled in the proxy

* fix request timeout if using default value from constants.py

* update code structure, test cases

* only override if the global timeout sets timeout to 6000s

* update code structure, move hard coded values to const and make the reslve function readable by moving fallback logic to a seperate function

* modify default timeout values, replacing hard coded ones with default values defined

---------

Co-authored-by: harish876 <harishgokul01@gmail.com>
Co-authored-by: Joaquin Hui Gomez <joaquinhuigomez@users.noreply.github.com>
2026-04-15 18:42:23 -07:00
user 47214be317 fix(proxy): harden request parameter handling
Tighten validation of request body parameters in the proxy routing
layer. Use context variables for internal call state management
instead of passing flags through request kwargs. Clean up metadata
handling at the proxy boundary.
2026-04-16 01:38:12 +00:00
Ishaan Jaffer 9977e63e3c Merge remote-tracking branch 'origin/main' into worktree-foamy-jumping-coral 2026-04-15 18:29:55 -07:00
ishaan-berri 10131374ee Merge pull request #25813 from BerriAI/litellm_ishaan_april15
Litellm ishaan april15
2026-04-15 18:29:22 -07:00
ishaan-berri 7a6b7ade03 Merge pull request #25807 from BerriAI/litellm_fix_provider_headers_in_logging
fix(logging): preserve provider response headers in StandardLoggingPayload
2026-04-15 18:29:03 -07:00
Ishaan Jaffer 537e72c742 style: black format test_mcp_server.py 2026-04-15 18:19:21 -07:00
Ishaan Jaffer fcd71e0026 style: black format test_mcp_server_manager.py 2026-04-15 18:19:17 -07:00
Ishaan Jaffer f768946549 style: black format test_anthropic_common_utils.py 2026-04-15 18:19:12 -07:00
Ishaan Jaffer 9a154a3be7 style: black format test_mcp_sigv4_auth.py 2026-04-15 18:19:08 -07:00
Ishaan Jaffer c8a0fe193f style: black format test_unit_test_caching.py 2026-04-15 18:19:04 -07:00
Ishaan Jaffer 93a90a53be style: black format test_mcp_client.py 2026-04-15 18:19:01 -07:00
Ishaan Jaffer f2a1dbe7c9 style: black format test_health_check_max_tokens.py 2026-04-15 18:18:56 -07:00
Ishaan Jaffer 3847a59d79 style: black format test_model_param_helper.py 2026-04-15 18:18:52 -07:00
Ishaan Jaffer 13952b0b1b style: black format types/mcp_server/mcp_server_manager.py 2026-04-15 18:18:48 -07:00
Ishaan Jaffer 107003a713 style: black format model_param_helper.py 2026-04-15 18:18:45 -07:00
Ishaan Jaffer e5adafc768 style: black format anthropic_messages transformation.py 2026-04-15 18:18:41 -07:00
Ishaan Jaffer 0acd05207b style: black format health_check.py 2026-04-15 18:18:36 -07:00
Ishaan Jaffer 65061b1e3c style: black format mcp server.py 2026-04-15 18:18:33 -07:00
Ishaan Jaffer d8dbb46dcf style: black format mcp_server_manager.py 2026-04-15 18:18:29 -07:00