mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 18:48:36 +00:00
5e2d75d75db745d5ef8164ff29ed10bdbd4877a6
34 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
492891cad8 |
CI: copy of #25177 (OCI GenAI: embeddings, streaming/reasoning fixes, model catalog) (#28223)
* fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455)
Squash-merged by litellm-agent from Anai-Guo's PR.
* feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508)
Squash-merged by litellm-agent from yimao's PR.
* fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503)
Squash-merged by litellm-agent from krisxia0506's PR.
* Fix Gemini MIME detection for extensionless GCS URIs (#27278)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107)
Squash-merged by litellm-agent from voidborne-d's PR.
* feat(chart): add support for autoscaling behavior in HPA (#27990)
Squash-merged by litellm-agent from FabrizioCafolla's PR.
* feat(proxy): add blocked flag to models for pause/resume from the UI (#27927)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix: pass socket timeouts to Redis cluster clients (#27920)
Squash-merged by litellm-agent from tomdee's PR.
* Fix/cache token (#28009)
Squash-merged by litellm-agent from escon1004's PR.
* fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080)
Squash-merged by litellm-agent from Divyansh8321's PR.
* fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617)
* fix: reset org and tag budgets (#27326)
* reset org budgets
* reset tag budgets
---------
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
* fix(ui): omit allowed_routes from key edit save when unchanged (#27553)
* fix(ui): omit allowed_routes from key edit save when unchanged
When a team admin opens Edit Settings on a key with key_type=AI APIs and
saves without changing anything, the UI re-sends the existing allowed_routes
value, which the backend's _check_allowed_routes_caller_permission gate
rejects for non-proxy-admins (LIT-2681).
Strip allowed_routes from the patch in handleSubmit when it deep-equals the
original keyData.allowed_routes. The backend treats absence as "leave alone,"
so no-op saves now succeed for non-admins. Admins explicitly editing the
field still send the new value.
* fix(ui): order-insensitive allowed_routes diff + cover null-original case
Address Greptile review:
- Switch the "is allowed_routes unchanged" check to a Set-based comparison so
a server-side reorder of the array doesn't register as a user edit and
re-trigger LIT-2681.
- Add two regression tests: (1) keyData.allowed_routes is null and the form
is untouched — patch should strip the field; (2) server returned routes in
a different order than the user originally entered — patch should still
recognize the value as unchanged.
* chore(ui): strip ticket refs and tighten comments in key edit fix
- Remove internal-tracker references from in-code comments
- Tighten the WHY comment in handleSubmit to two lines
- Drop redundant test-block comments — test names already describe the case
* fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc
* fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests
GuardrailRaisedException and BlockedPiiEntityError both lacked a
status_code attribute. When these exceptions reached the proxy
exception handler (getattr(e, 'status_code', 500)), the fallback
defaulted to HTTP 500 — making intentional guardrail blocks
indistinguishable from server errors and causing unnecessary client
retries.
Changes:
- Add status_code=400 (keyword-only) to GuardrailRaisedException
- Add status_code=400 (keyword-only) to BlockedPiiEntityError
- Update _is_guardrail_intervention() to recognize both exceptions
so downstream loggers record 'guardrail_intervened' instead of
'guardrail_failed_to_respond'
- Add 6 unit tests for default/custom status codes and getattr pattern
- Strengthen existing blocked-action test with status_code assertion
Fixes #24348
---------
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
* fix(router/proxy): address Greptile P1+P2 review comments on PR #28161
- router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429)
when a specifically-addressed deployment is administratively blocked; 429 misleads
retry-enabled clients into spinning forever against a paused model
- proxy_server: compute get_fully_blocked_model_names() once before both branches in
model_list() instead of duplicating the call in each branch
- deepseek: upgrade silent debug log to warning when injecting placeholder
reasoning_content so callers are clearly notified of degraded multi-turn quality
- tests: update two blocked-deployment assertions to expect ServiceUnavailableError
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address bug detection findings (cache token order, mutable defaults)
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix: address bugs in async pass-through, anthropic cache token detection, rerank tests
- async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments
- cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0
- dashscope rerank tests: pass request to httpx.Response constructions for consistency
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix code qa
* fix(vertex_ai/gemini): strip MIME parameters from GCS contentType
GCS object metadata's contentType field can include parameters such as
'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases
so downstream get_file_extension_from_mime_type sees a bare MIME type.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(vertex_ai/gemini): clarify mime-type error message string concatenation
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* feat(oci): add embeddings, fix streaming/reasoning, expand model catalog
- Add OCIEmbedConfig with full Cohere embed support (7 models, batch up to 96)
- Fix sync streaming: split SSE events on \n\n before JSON parsing
- Fix reasoning models (Gemini 2.5, xAI Grok): make completionTokens and message
optional in OCIResponseChoice to handle max_tokens exhausted on reasoning
- Fix compartment_id resolution in chat transform to use resolve_oci_credentials
- Fix tool call id: make OCIToolCall.id optional, generate UUID fallback for
providers (Google via OCI) that omit it
- Add OCI_KEY env var support for inline PEM keys
- Fix datetime.utcnow() deprecation in request signing
- Expand model catalog: 29 OCI models including Llama 4, Gemini 2.5, xAI Grok,
Cohere Command A, and all Cohere embed variants
- Add 37 live integration tests: sync/async completions for Meta/Google/xAI/Cohere,
sync/async embeddings, tool use across all vendors, streaming, env var auth
- Add 23 embed unit tests covering all transform and validation paths
* fix(oci): remove dead OCI elif branch in utils.py, align async split_chunks with sync version
* test(oci): add unit tests for split_chunks fix and no-duplicate-OCI-branch guard
* fix(oci): address remaining bugs from issue #25082 — streaming signed body, Cohere stop sequences, hardcoded defaults
- Bug 1: sync and async streaming paths now use signed_json_body when provided
instead of re-serializing data with json.dumps() — the OCI RSA-SHA256 signature
covers the exact request body bytes, so re-serializing produces an invalid sig
- Bug 3: Cohere stop sequences now map to 'stopSequences' (was incorrectly 'stop')
- Bug 4: removed hardcoded Cohere defaults (maxTokens=600, temperature=1, topK=0,
topP=0.75, frequencyPenalty=0) that silently overrode user intent on every call
- Added 6 unit tests covering all three fixes
* fix(oci): comprehensive code quality pass — bugs, tests, schema accuracy
- Fix Cohere tool call IDs (was always call_0; now UUID per call)
- Fix TOOL_CALL finish reason mapping in both sync and streaming paths
- Fix Cohere stop parameter mapping (stop → stopSequences)
- Remove hardcoded Cohere defaults (maxTokens/topK/topP/frequencyPenalty)
- Fix content[0] safety guard against empty content arrays
- Fix streaming signed body used consistently (not re-serialized)
- Raise OCIError (not bare Exception/ValueError) throughout
- Centralize OCI_API_VERSION constant; import uuid at module level
- Fix embed get_complete_url to strip trailing slashes from api_base
- Fix OCIEmbedResponse schema: add inputTextTokenCounts (actual OCI field)
- Fix embed usage computed from inputTextTokenCounts (sum of per-input counts)
- Fix Cohere toolCallId included in tool result messages
- Add OCIToolCall.id as Optional (absent in Google/xAI streaming chunks)
- Update tests to reflect correct behavior (no hardcoded defaults, UUID ids,
deferred credential validation, OCIError vs ValueError, real response schema)
* test(oci): move integration tests to tests/llm_translation/
Addresses greptile P1: tests/test_litellm/ is for mock-only unit tests
(make test-unit target). Real-network OCI tests now live in the correct
location alongside other provider integration tests.
* fix(oci): align types and transformation with official OCI SDK
- Remove OCIVendors.GEMINI — apiFormat="GEMINI" is invalid; all non-Cohere
models use apiFormat="GENERIC"
- Add toolChoice, logitBias, logProbs to OCIChatRequestPayload so params
present in the mapping are no longer silently dropped by Pydantic
- Exclude n→numGenerations from Cohere param map (not a Cohere API field)
- Fix CohereToolResult: change callId/result to call/outputs matching
the OCI SDK's CohereToolResult structure
- Fix CohereToolMessage: replace non-existent toolCallId with toolResults
list; update adapt_messages_to_cohere_standard to build proper tool-result
history entries by resolving tool call name+params from preceding assistant
messages
- Map generic-model stream finish reasons to OpenAI convention
(COMPLETE→stop, MAX_TOKENS→length, TOOL_CALLS→tool_calls), consistent
with the existing Cohere streaming path
- Add optional id field to OCIEmbedResponse so valid API responses
carrying an id are not rejected by the Pydantic model
* fix(oci): use 'output' key in Cohere tool result outputs (matches reference impl)
* fix(oci): port schema/type utilities from langchain-oracle reference impl
- Add resolve_oci_schema_refs: inline $ref/$defs — OCI rejects JSON Schema refs
- Add resolve_oci_schema_anyof: flatten Optional[T] anyOf (Pydantic v2 emits these)
- Add sanitize_oci_schema: strip title, normalise null types, ensure array items
- Add OCI_JSON_TO_PYTHON_TYPES: Cohere expects Python type names (str/int/float),
not JSON Schema names (string/integer/number)
- Add enrich_cohere_param_description: embed enum/format/range/pattern constraints
into description since CohereParameterDefinition has no dedicated fields
- Apply all of the above in adapt_tool_definitions_to_cohere_standard and
adapt_tool_definition_to_oci_standard
- Fix toolChoice conversion: map OpenAI string ('auto','none','required') to OCI
dict form ({"type":"AUTO"} etc.) — the API rejects plain strings
- Update unit test expectations to match correct Python type names and enriched
descriptions
* refactor(oci): split transformation.py into cohere.py and generic.py
transformation.py was 1 243 lines doing too many jobs. Split along the
same boundaries as the langchain-oracle reference (providers/cohere.py,
providers/generic.py):
chat/cohere.py — Cohere message/tool building, response + stream parsing
chat/generic.py — Generic message/tool building, response + stream parsing
transformation.py — thin OCIChatConfig orchestrator + OCIStreamWrapper
Public symbols (OCIChatConfig, OCIStreamWrapper, adapt_messages_to_*,
OCIRequestWrapper, version, …) remain importable from transformation.py
for backward compatibility. OCIStreamWrapper gains delegating shims for
_handle_cohere_stream_chunk and _handle_generic_stream_chunk so existing
test call sites keep working unchanged.
transformation.py: 1 243 → 620 lines
* refactor(oci): principal-level code quality pass
- Remove _extract_text_content duplication — single definition in cohere.py,
imported where needed; instance method on OCIChatConfig eliminated
- Move cryptography imports to module level with _CRYPTOGRAPHY_AVAILABLE flag
and _require_cryptography() guard; no more re-import on every signing call
- Move litellm version import to module level via litellm._version; remove
inline import inside validate_oci_environment
- sign_with_manual_credentials now returns Tuple[dict, bytes] matching
sign_with_oci_signer — asymmetry eliminated, Optional[bytes] guards removed
throughout stream wrappers (signed_json_body: bytes = b"")
- Rename _openai_to_oci_cohere_param_map → openai_to_oci_cohere_param_map
for consistency with openai_to_oci_generic_param_map
- Remove double-key bug in map_openai_params where responseFormat was stored
under both OCI and OpenAI key names simultaneously
- Remove delegating shims (adapt_messages_to_cohere_standard,
adapt_tool_definitions_to_cohere_standard, _handle_generic_stream_chunk)
from OCIChatConfig/OCIStreamWrapper; tests now import directly from
cohere.py and generic.py where symbols live
- Trim __all__ to 7 genuine public symbols; remove the 13-symbol list that
existed only to support test imports
- Collapse per-model integration test classes into pytest.mark.parametrize;
CHAT_MODELS list is the single source of truth for model-specific config
- Black + Ruff clean across all OCI files
* fix(oci): address PR review findings
- types/llms/oci.py: add "TOOL_CALL" to CohereChatResponse.finishReason
Literal so Pydantic does not raise ValidationError on non-streaming
Cohere tool-use calls (Greptile P1)
- test_oci_cohere_tool_calls.py: add test covering TOOL_CALL finish reason
- model_prices_and_context_window.json: remove 6 duplicate oci/cohere.embed-*
keys that were silently overridden by the more complete entries already
present in the file (Greptile P1)
- common_utils.py: move OCI_API_VERSION here from chat/transformation.py
so embed/transformation.py does not need to import chat/transformation;
change Protocol stub body from ... to pass (CodeQL "statement no effect");
add comment to sha256_base64 clarifying it implements OCI HTTP signing
spec, not password hashing (CodeQL false positive)
- chat/transformation.py: import CustomStreamWrapper from
litellm_core_utils.streaming_handler instead of litellm.utils to reduce
import cycle depth (CodeQL cyclic import)
- chat/cohere.py, chat/generic.py: import Usage and
ChatCompletionMessageToolCall from litellm.types.utils instead of
litellm.utils for the same reason
- embed/transformation.py: import OCI_API_VERSION from common_utils
instead of chat/transformation (removes the embed→chat import edge)
* test(oci): add unit tests to improve patch coverage
- test_oci_common_utils.py (new): covers sha256_base64, build_signature_string,
OCIRequestWrapper.path_url, resolve_oci_credentials, get_oci_base_url,
validate_oci_environment, sign_with_oci_signer error paths, sign_oci_request
routing, load_private_key_from_file error paths, resolve_oci_schema_refs
(including circular ref and external $ref), resolve_oci_schema_anyof,
sanitize_oci_schema (all branches), enrich_cohere_param_description
- test_oci_generic_chat.py (new): covers content-message error paths (non-dict
item, unsupported type, non-string text, invalid image_url), tool-call
validation error paths, adapt_messages_to_generic_oci_standard error paths,
handle_generic_response (None message, text content, tool calls),
handle_generic_stream_chunk (finish reasons, streaming tool calls),
OCIStreamWrapper non-string chunk error
- test_oci_chat_transformation.py: add error paths for validate_environment
(empty messages), transform_request (missing compartment_id, Cohere without
user messages), transform_response (error key), map_openai_params
(unsupported param with and without drop_params), tool_choice string mapping
- test_oci_cohere_tool_calls.py: add edge cases for stream chunk finish
reasons (TOOL_CALL, MAX_TOKENS, unknown), _extract_text_content with
non-dict list items and non-string input,
adapt_messages_to_cohere_standard with malformed JSON tool arguments
* fix(oci): rename supports_streaming to supports_native_streaming in model prices
The JSON schema for model_prices_and_context_window.json uses
`supports_native_streaming` (not `supports_streaming`) and has
`additionalProperties: false`. Rename the field across all OCI
entries to pass the schema validation test.
* test(oci): add 67 tests targeting uncovered happy paths for coverage
Boost patch coverage on the four lowest-coverage OCI files:
- common_utils.py: sign_with_manual_credentials (oci_key / oci_key_file
paths), sign_oci_request routing, _require_cryptography
- generic.py: adapt_messages_to_generic_oci_standard (all roles),
adapt_tool_definition_to_oci_standard, adapt_tools_to_openai_standard,
handle_generic_stream_chunk text/finish-reason paths
- cohere.py: _extract_text_content, adapt_messages_to_cohere_standard
(all roles including tool results), handle_cohere_response /
handle_cohere_stream_chunk all finish-reason branches
- transformation.py: get_vendor_from_model, OCIChatConfig._get_optional_params
(toolChoice string→dict, responseFormat, tools for both vendors),
transform_request for GENERIC model, get_sync/async_custom_stream_wrapper
with mocked HTTP, OCIStreamWrapper.chunk_creator happy paths
* fix(oci): suppress CodeQL false positive on sha256_base64 (OCI HTTP signing, not password hashing)
* fix(oci): remove 6 duplicate model price entries and reconcile conflicting values
Six OCI chat model keys appeared twice in model_prices_and_context_window.json
with conflicting pricing/context data (JSON parsers silently discard the first).
Remove the first-occurrence entries and update the surviving entries:
- meta.llama-4-maverick / llama-4-scout: keep updated entries (free preview
pricing, larger context windows, vision support)
- meta.llama-3.1-70b: keep original pricing, restore supports_native_streaming
- google.gemini-2.5-{flash,pro,flash-lite}: keep OCI pricing page values,
restore supports_native_streaming
* fix(oci): route GPT-5 family to maxCompletionTokens
GPT-5 / GPT-5-mini / GPT-5-nano / GPT-5.5 on OCI reject "maxTokens"
with HTTP 400:
Invalid 'maxTokens': Unsupported parameter: 'maxTokens' is not
supported with this model. Use 'maxCompletionTokens' instead.
(Same convention as OpenAI's reasoning-API contract.)
Add a model-aware rename in OCIChatConfig._get_optional_params so the
request payload uses maxCompletionTokens when the model id starts with
openai.gpt-5. Regular Llama / Cohere / Gemini / GPT-4.x continue to use
maxTokens unchanged.
Also widen OCIChatRequestPayload to carry the new optional field so it
survives Pydantic serialization.
Verified live against OCI us-chicago-1:
- openai.gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.5 all return 200
- Full feature sweep on gpt-5.5 (basic, system, multi-turn, streaming,
tools, usage) all green
- meta.llama-3.3-70b-instruct still uses maxTokens (no regression)
4 new unit tests cover the helper, the routing in both pre- and
post-translation states, and Pydantic serialization.
* ci(oci): fix CI failures — black formatting + recursive_detector ignore
- Run black on litellm/llms/oci/common_utils.py + 3 OCI test files
that drifted out of black-compliance during the rebase.
- Add the three bounded recursive functions in oci/common_utils.py
(`_resolve`, `resolve_oci_schema_anyof`, `sanitize_oci_schema`) to
the recursive_detector IGNORE_FUNCTIONS list. All three are bounded:
`_resolve` uses a `resolving_stack` cycle guard; the other two are
bounded by JSON-schema tree depth (no cycles in well-formed input),
matching the pattern of the existing OCI/Vertex schema walkers
already on the list.
* fix(oci): silence MyPy errors in cohere.py — typed-dict access
Two errors flagged by `lint` CI:
llms/oci/chat/cohere.py:73: "object" has no attribute "__iter__"
llms/oci/chat/cohere.py:119: No overload variant of "get" of "dict"
matches argument types "object", "CohereToolCall"
Both stem from `msg.get("tool_calls")` / `msg.get("tool_call_id")`
returning `object` per the AllMessageValues TypedDict union. Bind to
`Any` locally for the iteration and coerce the lookup key with `str()`,
removing the now-unused `# type: ignore` on those lines.
No behaviour change — pure type-narrowing for the type checker.
* fix(oci): silence CodeQL py/weak-sensitive-data-hashing on sha256_base64
CodeQL's taint analysis traces request bodies back to environment-loaded
secrets and flags `hashlib.sha256(body).digest()` as
`py/weak-sensitive-data-hashing` — even though SHA-256 is the algorithm
mandated by the OCI HTTP request signing spec for the
`x-content-sha256` header (not a password/secret hash).
The previous suppression used legacy `# lgtm[...]` syntax which the
modern CodeQL action ignores. Switch to Python's standard
`hashlib.sha256(..., usedforsecurity=False)` (Python 3.9+) which CodeQL
honours as a non-security declaration. Behaviour unchanged.
* feat(oci): add reasoning_effort passthrough — only true missing primitive
OCI's GenericChatRequest exposes a reasoningEffort field
(NONE/MINIMAL/LOW/MEDIUM/HIGH) that's the single biggest cost knob for
reasoning-capable models on the service:
- GPT-5 family
- Gemini 2.5
- Grok reasoning variants (3-mini, 4-fast, 4.20)
- Cohere Command-A-Reasoning
Setting reasoning_effort=LOW typically cuts reasoning-token spend 5-10×
vs the default. Without exposing this, litellm users had no way to tune
cost-vs-quality on these models.
The other GenericChatRequest fields (verbosity, parallel_tool_calls,
logit_bias, n, metadata, web_search_options, prediction) are not
exposed because they are not missing primitives — they either duplicate
prompt-engineering, framework-level controls, or are too niche to
justify the maintenance surface. We only ship what users genuinely
can't accomplish another way.
Excluded from the Cohere v1 param map: CohereChatRequest has no
reasoningEffort field, and Cohere reasoning models
(cohere.command-a-reasoning) use COHEREV2 which is a separate request
type not covered by this PR.
Verified live: GPT-5.5 + reasoning_effort="HIGH" sends
{"reasoningEffort": "HIGH"} on the wire and OCI accepts the request.
* feat(oci): reasoning_effort + reasoning_tokens for OCI GenAI
Three small additions for OCI reasoning models, requested by users
testing the PR in production fork builds:
1. **reasoning_effort param mapping (GENERIC vendors).** OCI expects
uppercase levels ("LOW"/"MEDIUM"/"HIGH"/"NONE") on `reasoningEffort`,
but OpenAI-compatible clients send lowercase. Mapped + uppercased in
`_get_optional_params`. Marked unsupported on Cohere V1/V2 since OCI
Cohere has no reasoning models (avoids Pydantic validation failure
on CohereChatRequest).
2. **"disable" → "NONE" mapping.** OpenAI uses "disable" to turn off
reasoning; OCI uses "NONE". Without this, callers get a 400.
3. **reasoning_tokens propagated to Usage.** OCI returns
`completionTokensDetails.reasoningTokens` but it wasn't being passed
to LiteLLM's Usage object. Now flows through to
`Usage.completion_tokens_details.reasoning_tokens` so callers can
track reasoning token consumption for cost/observability.
Tests: 7 new unit tests in TestOCIReasoningEffort covering upper/lower
case, "disable"→"NONE", Cohere drop/raise paths, and reasoning_tokens
extraction (with and without completionTokensDetails). 5 new live
integration tests against xai.grok-3-mini in us-chicago-1 verifying the
full request/response loop end-to-end. Existing
test_transform_response_simple_text assertion that
completion_tokens_details was None has been updated to assert
reasoning_tokens flows through.
Verified live on xai.grok-3-mini: reasoning_effort=low → OCI accepts
"LOW", returns reasoningTokens=316 in usage. reasoning_effort=disable
→ OCI accepts "NONE". Full suite: 370/370 unit + 51/51 integration.
* fix(codeql): re-scope py/weak-sensitive-data-hashing exclusion to OCI signing file
CodeQL's taint analysis re-fires the `py/weak-sensitive-data-hashing`
alert at `litellm/llms/oci/common_utils.py:103` whenever upstream code
paths into the OCI signing module change (touching `transformation.py`
opens new flow paths that CodeQL re-evaluates from scratch). The
`hashlib.sha256(..., usedforsecurity=False)` declaration silences the
direct-call form of the query but not the taint-flow form.
SHA-256 here is mandated by the OCI HTTP signing specification for the
x-content-sha256 content-integrity header — not for password storage:
https://docs.oracle.com/en-us/iaas/Content/API/Concepts/signingrequests.htm
CodeQL has no per-query path filter and GitHub Code Scanning ignores
inline lgtm/codeql comments, so path-ignoring this single ~560-line
signing utility file is the narrowest available suppression. All other
files retain full coverage of py/weak-sensitive-data-hashing — including
litellm/proxy/utils.py where the rule legitimately applies.
This restores the NEUTRAL CodeQL state the PR had on prior commits
(see `2111c98af7` for the same approach on the previous branch
evolution that the cherry-pick was rebased onto a different baseline).
* fix(oci): drop duplicate text on Cohere streaming terminal chunk
OCI Cohere's terminal SSE event re-sends the full assembled response in
`text` alongside a populated `chatHistory`. Emitting that text as another
delta concatenates the entire response onto the already-streamed output
(e.g. "How can I help?How can I help?").
Use `chatHistory is not None` as the discriminator for the consolidated
terminal event — `finishReason` is a weaker signal that could in principle
appear on a non-consolidated chunk. The two coincide today; this preserves
correctness if OCI ever ships finishReason on an incremental chunk.
Adds a live-OCI integration regression test that compares streamed vs
non-streamed length and asserts the response prefix appears only once.
Verified to fail under the previous code with the exact reported
reproduction: 'Hello! How can I help you today?Hello! How can I help you today?'.
Reported by @gotsysdba on PR #25177.
* fix(oci): buffer SSE stream across HTTP read boundaries
The old split_chunks helper split each individual HTTP read on "\n\n",
which assumed SSE event boundaries always aligned with read boundaries.
In practice the OCI streaming endpoint delivers events that may:
- straddle two reads (chunk_creator gets a truncated JSON and crashes)
- arrive separated by a single "\n" instead of "\n\n"
- share a read with multiple complete events
Replace the inline split with module-level helpers _iter_sse_events
(sync) / _aiter_sse_events (async) that maintain a buffer across reads,
split on any newline, and yield only complete "data:" lines.
Add 25 regression tests covering event-split-across-reads, tiny-chunk
reads, single-newline separators, keepalive/comment lines, trailing
partial events flushed at EOF, "\r\n" line endings, and an end-to-end
smoke test that feeds an awkwardly-chopped payload through the splitter
into OCIStreamWrapper.chunk_creator.
Reported by John Lathouwers.
* test(oci): repoint TestOCIKeyNormalization to sign_with_manual_credentials
The signing helper moved from OCIChatConfig._sign_with_manual_credentials
to a module-level sign_with_manual_credentials in common_utils.py. Four
tests in TestOCIKeyNormalization still called the old method:
- 2 failed outright with AttributeError
- 2 passed by accident because they used pytest.raises(Exception),
which happily caught the AttributeError instead of exercising the
intended OCIError path
Repoint all four to the new module-level function so they exercise the
actual oci_key type-validation branch.
* fix(oci): validate oci_region before URL interpolation to prevent SSRF
Anchor oci_region to ^[a-z][a-z0-9-]{0,30}[a-z0-9]$ inside get_oci_base_url
so user-supplied regions that would redirect the signed request to an
attacker-controlled host (e.g. 'evil.com/#') fail with HTTP 400 before
the URL or signature is built. Empty string still falls back to the
us-ashburn-1 default, so existing callers are unaffected.
* test(audio): skip when gpt-4o-audio-preview is unavailable upstream
OpenAI retired `gpt-4o-audio-preview` (404 model_not_found in CI as of
2026-05-19), and the existing try/except in these tests only re-raised
on 'openai-internal' errors. Other exceptions were silently swallowed,
so the next line ran with an unbound `response`/`completion` and
failed with an unrelated UnboundLocalError that masked the real cause.
Extend the skip condition to also cover model_not_found / 'does not exist'
so the suite reports the upstream outage cleanly, matching the pattern
used in
|
||
|
|
4d92bc8b86 |
fix(vector-stores): re-raise HTTPException from get_vector_store_info; allowlist recursion
Two issues from the previous push's review:
1. **Greptile P1**: ``get_vector_store_info`` had the same catch-all
``except Exception`` pattern as ``update_vector_store``, so the
HTTPException(403/404) raised by both the in-memory access check and
the new ``_fetch_and_authorize_vector_store`` helper was rewritten as
500. Mirror the ``except HTTPException: raise`` guard from
``update_vector_store``.
2. **code-quality CI** (``tests/code_coverage_tests/recursive_detector.py``)
flagged ``_redact_sensitive_litellm_params`` as an unallowlisted
recursive function. Match the convention of other allowlisted
helpers ("max depth set"): bound recursion at depth 10 (well above
any plausible nesting level for real ``litellm_params`` payloads),
return the redaction sentinel on overflow, and add the function
name to ``IGNORE_FUNCTIONS``.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
||
|
|
e7f4e77af0 |
Bound _get_masked_values recursion depth
Add _depth/_max_depth guards (default 20) so the nested dict masking cannot run away, and allowlist the function in the recursive_detector CI check alongside the other bounded recursive helpers. |
||
|
|
a93c069dd5 |
[Fix] Add max_depth guard to BFL _read_image_bytes recursive function
Use the standard depth/max_depth pattern with DEFAULT_MAX_RECURSE_DEPTH to guard the recursive list-unwrapping in _read_image_bytes, matching the existing pattern used by _read_all_bytes in vertex_imagen. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
add3183308 | IGNORE_FUNCTIONS | ||
|
|
ebce0e5f8c | [Release - 02/10/2026] v1.81.10-nightly | ||
|
|
7056d9984e |
Custom Code Guardrails UI Playground (#20377)
* feat(guardrails/): allow custom code execution for guardrails first step in allowing teams to submit custom code for guardrails * feat: custom_code_guardrail.md support passing custom code for guardrails * feat: initial commit adding ui for custom code guardrails allows users to write guardrails based on custom code * feat: expose new test custom code guardrail endpoint allows ui testing playground to sanity check if guardrail is working as expected * fix: fix linting errors * fix: fix max recursion check * fix: fix linting error |
||
|
|
9ed11c5cdf |
[Feat] Allow calling A2A agents through LiteLLM /chat/completions API (#20358)
* init A2AConfig * add transform files * feat: A2A * feat A2AConfig * fix get_secret_str * init: A2AConfig * init A2AConfig common utils * A2AConfig * test_a2a_completion_async_non_streaming * fix * Update litellm/main.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * add multi part conversation support * extract_text_from_a2a_message --------- Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> |
||
|
|
c23e4b87dc |
[Feat] New LiteLLM Policy engine - create policies to manage guardrails, conditions - permissions per Key, Team (#19612)
* init PolicyMatcher * TestPolicyMatcherGetMatchingPolicies * TestPolicyMatcherGetMatchingPolicies * feat: init PolicyResolver * init resolver types * init policy from config * inint PolicyValidator * validate policy * init Architecture Diagram * test_add_guardrails_from_policy_engine * init _init_policy_engine * test updates * test fixws * new attachment config * simplify types * TestPolicyResolverInheritance * fix policy resolver * fix policies * fix applied policy * docs fix * docs fix * fix linting + QA checks * fix linting + QA fixes * test fixes |
||
|
|
e3fe02148d | _mask_sequence | ||
|
|
a7da4833da | [Fix] CI/CD - check_code_and_doc_quality (#18560) | ||
|
|
892d7e8d70 |
[Fix] CI/CD - Fix Bedrock tool calling test failures with non-serializable objects and internal parameters (#17930)
* fix(bedrock): filter non-serializable objects from request params - Enhanced filter_exceptions_from_params() to filter callable objects (functions) and Logging objects - Applied filtering in Bedrock's _prepare_request_params() before deepcopy - Applied filtering to additional_request_params before JSON serialization - Prevents TypeError during deepcopy (APIConnectionError objects) and JSON serialization (functions, Logging objects) - Fixes test_bedrock_tool_calling test failures Root cause: MCP-related functions (handle_chat_completion_with_mcp, completion_callable) and litellm_logging_obj were incorrectly added to optional_params via add_provider_specific_params_to_optional_params(), which then ended up in additional_request_params. These objects should be in litellm_params, not optional_params. * fix(bedrock): filter internal MCP parameters from API requests Filter out LiteLLM internal/MCP-related parameters (skip_mcp_handler, mcp_handler_context, _skip_mcp_handler) from additional_request_params before sending to Bedrock API to prevent 'extraneous key' errors. - Added filter_internal_params() helper function in core_helpers.py - Applied filtering in Bedrock's _prepare_request_params() method - Fixes test_bedrock_completion.py::test_bedrock_tool_calling * fix: mypy type error * fix: add filter_exceptions_from_params to recursive function ignore list - Add filter_exceptions_from_params to IGNORE_FUNCTIONS in recursive_detector.py - Function is safe: has max_depth parameter (default 20) to prevent infinite recursion |
||
|
|
b39c21d90c |
fix: Add _delete_nested_value_custom to recursive function ignore list
The _delete_nested_value_custom function is recursive but has bounded depth (limited by the number of path segments), preventing infinite recursion. This is necessary for nested field removal in additional_drop_params. |
||
|
|
5c192a23c3 |
[Feat] Add new RAG API on LiteLLM AI Gateway (#17109)
* init RAG api types * add RAG endpoints * init main.py for RAG ingest API * init RecursiveCharacterTextSplitter * add BaseRAGIngestion * fix OpenAIRAGIngestion * fix img handler * init OpenAIRAGIngestion * init BedrockRAGIngestion * init BedrockRAGIngestion * init rag tests * init BedrockVectorStoreOptions * implement BedrockRAGIngestion * add BaseRAGAPI * add endpoint for RAG ingest * add ingest RAG endpoints * add test doc * add parse_rag_ingest_request * update endpoints * docs add docs for new RAG API * fix qa check * fix linting * docs ficx * docs * add max depth checks * docs anthropic |
||
|
|
4b25398afe |
[Infra] CI/CD Fixes (#16937)
* Attempt CI/CD Fix * Adding test for coverage * Adding max depth to copilot and vertex * Fixing mypy lint and docker database * Fixing UI build issues * Update playwright test |
||
|
|
d32890ba55 | fix _redact_base64 | ||
|
|
f1699d4cbf | fix IGNORE_FUNCTIONS | ||
|
|
4c7b3e10ed |
[Bug Fix] Gemini Tool Calling - fix gemini empty enum property (#14155)
* fix: _convert_schema_types * fix recursive detector * test_convert_schema_types_type_array_conversion * fix: DEFAULT_NUM_WORKERS_LITELLM_PROXY * add _fix_enum_empty_strings * test_tool_call_with_empty_enum_property * test_fix_enum_empty_strings * fix _fix_enum_empty_strings |
||
|
|
2331fb45d5 |
[Bug]: Gemini 2.5 Pro – schema validation fails with OpenAI-style type arrays in tools (#14154)
* fix: _convert_schema_types * fix recursive detector * test_convert_schema_types_type_array_conversion * fix: DEFAULT_NUM_WORKERS_LITELLM_PROXY |
||
|
|
621b3dca7b |
[Bug Fix] Mistral Tool Calling - Grammar error: at 3(11): failed to compile JSON schema (#13389)
* test_claude_tool_use_with_gemini * add _remove_json_schema_refs * add _clean_tool_schema_for_mistral * fixes mistral tool calls * _remove_json_schema_refs * fix - vertex, remove hardcoded test |
||
|
|
e7f1fa26ab |
UI - Azure Content Guardrails (#12341)
* build(model_prices_and_context_window.json): remove 'supports_tool_choice' for specific mistral models Closes https://github.com/BerriAI/litellm/issues/11750 * feat: initial commit adding cleaner ui for azure text moderation guardrails * feat(guardrail_endpoints.py): add discoverable guardrail configs and improve converting base model to dict with types * fix(guardrail_provider_fields.tsx): render from api endpoint correctly * fix(guardrail_provider_fields.tsx): cleanup * refactor(guardrail_endpoints.py): refactor to handle dictionaries with literal - allows multiselect * feat(ui/): render dictionary with known keys correctly * feat(ui/): render optional params on separate page * style(ui/): style improvements to rendering optional params on the UI * feat(azure/prompt_shield.py): add azure prompt shield back on UI * fix(add_guardrail_form.tsx): fix form to handle updated api * fix(guardrail_optional_params.tsx): ensure values are nested correctly for writing to api * fix: fix linting error * feat(text_moderation.py): handle str to int conversion * fix(guardrail_info.tsx): only render pii settings if guardrail is presidio * fix(guardrail_info.tsx): add guardrail specific fields to update settings allows updating guardrail fields (e.g. severity threshold) post-create * fix(guardrail_endpoints.py): set guardrail_id in guardrail object ensures duplicate objects not created on guardrail update * fix(guardrail_endpoints.py): allow provider specific fields to be updated on patch update * refactor(guardrail_endpoints.py): remove duplicate info endpoint * fix(guardrail_endpoints.py): mask sensitive keys on returning via guardrail `/info` Prevent leaking keys * fix(guardrail_optional_params.tsx): return numerical input when numerical component used fixes issue where output was a str * fix(guardrail_optional_params.tsx): render dict keys correctly * fix(text_moderation.py): fix severity by category check * fix(proxy/utils.py): check if guardrail should run for post call streaming hook Prevents invalid guardrails from running if not requested * test: fix import * fix: fix linting error * test: update test * fix: fix tests * fix: fix code qa errors * fix(guardrail_endpoints.py): set max depth for function * test: update recursive_detector.py * test: update list * build: merge main * fix: fix ruff check errors |
||
|
|
f1c7024e70 |
[Feat] Add Bridge from generateContent <> /chat/completions (#12081)
* add GenerateContentToCompletionHandler * working - non streaming bridge * add GoogleGenAIAdapter * add google gen ai adapter * working streaming bridge * working streaming usage for adapter * tool calling transform for generate content to openai * fixes for accumulating tool calls * fix code qa checks * Best Practices for Production * fix code qa checks * test_streaming_partial_tool_calls_accumulation * linting fixes * add supported_openai_chat_completion_params * fix translate_generate_content_to_completion * test_google_genai_adapter.py |
||
|
|
06484f6e5a |
Xai, VertexAI, Google AI Studio - live web search support in OpenAI format (#11251)
* build(model_prices_and_context_window.json): fix 'supports_web_search' flag - openai only supports it on 2 models - gpt-4o-search-preview and gpt-4o-mini-search-preview
* feat(xai/chat): add xai web search options param support
* test: add max tokens to test
xai output very verbose
* build(xai/): add web search support for all xai models
* build(model_prices_and_cost.json): add gemini-2.0 supports web search
* feat(gemini/): map openai 'web_search_options' to google's 'googlesearch' tool
* build(model_prices_and_context_window.json): add supports_web_search for vertex_ai/gemini-2 models
* fix: fix circular reference error
* fix(convert_dict_to_response.py): handle scenario where xai returns finish reason as 'stop' for tool calls
* fix: reduce function size
* fix: import session handling
* Revert "fix: import session handling"
This reverts commit
|
||
|
|
fdfef04d93 |
Gemini Multimodal Live API support (#10841)
* fix: initial commit * refactor(gemini/realtime/transformation.py): initial instrumentation * fix: fix default api base if not set * feat(gemini/): passes initial user request to backend * feat(realtime_streaming.py): support transforming message input before sending it enables gemini realtime streaming to be sent in correct format * feat: initial working commit of setup message being sent + working * fix(gemini/): initial commit supporting realtime response transformation * feat(gemini/realtime): transform session.created event correctly * fix(realtime_streaming.py): more gemini/realtime response mapping - handle new message * test(gemini/realtime/test_): add more unit tests * feat(gemini/realtime): handles consecutive deltas * feat(gemini/realtime): support openai 'response.text.done' event * feat(gemini/realtime): add openai 'response.text.done' and 'response.content_part.done' event support * fix(gemini/realtime): add openai 'response.done' event support unified realtime api support * fix: fix linting errors * fix: fix linting error * fix: fix linting error * fix: handle infinite loop * fix: fix linting error * fix: fix recursive detector * fix: fix file * fix: fix linting error * fix: fix linting error |
||
|
|
7210b713dc |
Add target model name validation (#10722)
* fix(auth_checks.py): enforce auth checks on target model names ensures user has access to models they are trying to call * test(test_auth_utils.py): add unit tests for auth check * fix(exception_mapping_utils.py): handle mistral 429 exception * fix: fix linting error * fix(auth_checks.py): add max fallback depth |
||
|
|
bf9382a182 |
Handle more gemini tool calling edge cases + support bedrock 'stable-image-core' (#10351)
* test(test_amazing_vertex_completion.py): try to repro https://github.com/BerriAI/litellm/issues/10319 * fix(common_utils.py): handle edge case on tools Fixes https://github.com/BerriAI/litellm/issues/10319 * test: add unit testing for infinite loops * fix(amazon_stability3_transformation.py): support 'stable-image-core' transformation Fixes https://github.com/BerriAI/litellm/issues/8488 * test: add unit testing for stable image core model * test: update test |
||
|
|
fdfa1108a6 |
Add property ordering for vertex ai schema (#9828) + Fix combining multiple tool calls (#10040)
* fix #9783: Retain schema field ordering for google gemini and vertex (#9828) * test: update test * refactor(groq.py): initial commit migrating groq to base_llm_http_handler * fix(streaming_chunk_builder_utils.py): fix how tool content is combined Fixes https://github.com/BerriAI/litellm/issues/10034 * fix(vertex_ai/common_utils.py): prevent infinite loop in helper function * fix(groq/chat/transformation.py): handle groq streaming errors correctly * fix(groq/chat/transformation.py): handle max_retries --------- Co-authored-by: Adrian Lyjak <adrian@chatmeter.com> |
||
|
|
87733c8193 |
Fix anthropic prompt caching cost calc + trim logged message in db (#9838)
* fix(spend_tracking_utils.py): prevent logging entire mp4 files to db Fixes https://github.com/BerriAI/litellm/issues/9732 * fix(anthropic/chat/transformation.py): Fix double counting cache creation input tokens Fixes https://github.com/BerriAI/litellm/issues/9812 * refactor(anthropic/chat/transformation.py): refactor streaming to use same usage calculation block as non-streaming reduce errors * fix(bedrock/chat/converse_transformation.py): don't increment prompt tokens with cache_creation_input_tokens * build: remove redisvl from requirements.txt (temporary) * fix(spend_tracking_utils.py): handle circular references * test: update code cov test * test: update test |
||
|
|
8e3c7b2de0 |
fix(vertex_ai.py): move to only passing in accepted keys by vertex ai response schema (#8992)
* fix(vertex_ai.py): common_utils.py move to only passing in accepted keys by vertex ai prevent json schema compatible keys like $id, and $comment from causing vertex ai openapi calls to fail * fix(test_vertex.py): add testing to ensure only accepted schema params passed in * fix(common_utils.py): fix linting error * test: update test * test: accept function |
||
|
|
1f2bbda11d | Add recursion depth to convert_anyof_null_to_nullable, constants.py. Fix recursive_detector.py raise error state | ||
|
|
fff15543d9 |
(UI + Proxy) Cache Health Check Page - Cleanup/Improvements (#8665)
* fixes for redis cache ping serialization * fix cache ping check * fix cache health check ui * working error details on ui * ui expand / collapse error * move cache health check to diff file * fix displaying error from cache health check * ui allow copying errors * ui cache health fixes * show redis details * clean up cache health page * ui polish fixes * fix error handling on cache health page * fix redis_cache_params on cache ping response * error handling * cache health ping response * fx error response from cache ping * parsedLitellmParams * fix cache health check * fix cache health page * cache safely handle json dumps issues * test caching routes * test_primitive_types * fix caching routes * litellm_mapped_tests * fix pytest-mock * fix _serialize * fix linting on safe dumps * test_default_max_depth * pip install "pytest-mock==3.12.0" * litellm_mapped_tests_coverage * add readme on new litellm test dir |
||
|
|
f651d51f26 |
Litellm dev 02 07 2025 p2 (#8377)
* fix(caching_routes.py): mask redis password on `/cache/ping` route * fix(caching_routes.py): fix linting erro * fix(caching_routes.py): fix linting error on caching routes * fix: fix test - ignore mask_dict - has a breakpoint * fix(azure.py): add timeout param + elapsed time in azure timeout error * fix(http_handler.py): add elapsed time to http timeout request makes it easier to debug how long request took before failing |
||
|
|
53a3ea3d06 |
(Refactor) Langfuse - remove prepare_metadata, langfuse python SDK now handles non-json serializable objects (#7925)
* test_langfuse_logging_completion_with_langfuse_metadata * fix litellm - remove prepare metadata * test_langfuse_logging_with_non_serializable_metadata * detailed e2e langfuse metadata tests * clean up langfuse logging * fix langfuse * remove unused imports * fix code qa checks * fix _prepare_metadata |
||
|
|
dd385410df |
(Code quality) - Ban recursive functions in codebase (#7910)
* code qa add RecursiveFunctionFinder * test_recursive_detector * RecursiveFunctionFinder * fix check * recursive_detector |