mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 20:48:32 +00:00
8f41fdbd218341fa4fec33e6df2d6a4e8cd65a8f
11 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
492891cad8 |
CI: copy of #25177 (OCI GenAI: embeddings, streaming/reasoning fixes, model catalog) (#28223)
* fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455)
Squash-merged by litellm-agent from Anai-Guo's PR.
* feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508)
Squash-merged by litellm-agent from yimao's PR.
* fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503)
Squash-merged by litellm-agent from krisxia0506's PR.
* Fix Gemini MIME detection for extensionless GCS URIs (#27278)
Squash-merged by litellm-agent from krisxia0506's PR.
* fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107)
Squash-merged by litellm-agent from voidborne-d's PR.
* feat(chart): add support for autoscaling behavior in HPA (#27990)
Squash-merged by litellm-agent from FabrizioCafolla's PR.
* feat(proxy): add blocked flag to models for pause/resume from the UI (#27927)
Squash-merged by litellm-agent from Cyberfilo's PR.
* fix: pass socket timeouts to Redis cluster clients (#27920)
Squash-merged by litellm-agent from tomdee's PR.
* Fix/cache token (#28009)
Squash-merged by litellm-agent from escon1004's PR.
* fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080)
Squash-merged by litellm-agent from Divyansh8321's PR.
* fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617)
* fix: reset org and tag budgets (#27326)
* reset org budgets
* reset tag budgets
---------
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
* fix(ui): omit allowed_routes from key edit save when unchanged (#27553)
* fix(ui): omit allowed_routes from key edit save when unchanged
When a team admin opens Edit Settings on a key with key_type=AI APIs and
saves without changing anything, the UI re-sends the existing allowed_routes
value, which the backend's _check_allowed_routes_caller_permission gate
rejects for non-proxy-admins (LIT-2681).
Strip allowed_routes from the patch in handleSubmit when it deep-equals the
original keyData.allowed_routes. The backend treats absence as "leave alone,"
so no-op saves now succeed for non-admins. Admins explicitly editing the
field still send the new value.
* fix(ui): order-insensitive allowed_routes diff + cover null-original case
Address Greptile review:
- Switch the "is allowed_routes unchanged" check to a Set-based comparison so
a server-side reorder of the array doesn't register as a user edit and
re-trigger LIT-2681.
- Add two regression tests: (1) keyData.allowed_routes is null and the form
is untouched — patch should strip the field; (2) server returned routes in
a different order than the user originally entered — patch should still
recognize the value as unchanged.
* chore(ui): strip ticket refs and tighten comments in key edit fix
- Remove internal-tracker references from in-code comments
- Tighten the WHY comment in handleSubmit to two lines
- Drop redundant test-block comments — test names already describe the case
* fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc
* fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests
GuardrailRaisedException and BlockedPiiEntityError both lacked a
status_code attribute. When these exceptions reached the proxy
exception handler (getattr(e, 'status_code', 500)), the fallback
defaulted to HTTP 500 — making intentional guardrail blocks
indistinguishable from server errors and causing unnecessary client
retries.
Changes:
- Add status_code=400 (keyword-only) to GuardrailRaisedException
- Add status_code=400 (keyword-only) to BlockedPiiEntityError
- Update _is_guardrail_intervention() to recognize both exceptions
so downstream loggers record 'guardrail_intervened' instead of
'guardrail_failed_to_respond'
- Add 6 unit tests for default/custom status codes and getattr pattern
- Strengthen existing blocked-action test with status_code assertion
Fixes #24348
---------
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
* fix(router/proxy): address Greptile P1+P2 review comments on PR #28161
- router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429)
when a specifically-addressed deployment is administratively blocked; 429 misleads
retry-enabled clients into spinning forever against a paused model
- proxy_server: compute get_fully_blocked_model_names() once before both branches in
model_list() instead of duplicating the call in each branch
- deepseek: upgrade silent debug log to warning when injecting placeholder
reasoning_content so callers are clearly notified of degraded multi-turn quality
- tests: update two blocked-deployment assertions to expect ServiceUnavailableError
Co-authored-by: Cursor <cursoragent@cursor.com>
* fix: address bug detection findings (cache token order, mutable defaults)
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix: address bugs in async pass-through, anthropic cache token detection, rerank tests
- async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments
- cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0
- dashscope rerank tests: pass request to httpx.Response constructions for consistency
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix code qa
* fix(vertex_ai/gemini): strip MIME parameters from GCS contentType
GCS object metadata's contentType field can include parameters such as
'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases
so downstream get_file_extension_from_mime_type sees a bare MIME type.
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* fix(vertex_ai/gemini): clarify mime-type error message string concatenation
Co-authored-by: Yassin Kortam <yassin@berri.ai>
* feat(oci): add embeddings, fix streaming/reasoning, expand model catalog
- Add OCIEmbedConfig with full Cohere embed support (7 models, batch up to 96)
- Fix sync streaming: split SSE events on \n\n before JSON parsing
- Fix reasoning models (Gemini 2.5, xAI Grok): make completionTokens and message
optional in OCIResponseChoice to handle max_tokens exhausted on reasoning
- Fix compartment_id resolution in chat transform to use resolve_oci_credentials
- Fix tool call id: make OCIToolCall.id optional, generate UUID fallback for
providers (Google via OCI) that omit it
- Add OCI_KEY env var support for inline PEM keys
- Fix datetime.utcnow() deprecation in request signing
- Expand model catalog: 29 OCI models including Llama 4, Gemini 2.5, xAI Grok,
Cohere Command A, and all Cohere embed variants
- Add 37 live integration tests: sync/async completions for Meta/Google/xAI/Cohere,
sync/async embeddings, tool use across all vendors, streaming, env var auth
- Add 23 embed unit tests covering all transform and validation paths
* fix(oci): remove dead OCI elif branch in utils.py, align async split_chunks with sync version
* test(oci): add unit tests for split_chunks fix and no-duplicate-OCI-branch guard
* fix(oci): address remaining bugs from issue #25082 — streaming signed body, Cohere stop sequences, hardcoded defaults
- Bug 1: sync and async streaming paths now use signed_json_body when provided
instead of re-serializing data with json.dumps() — the OCI RSA-SHA256 signature
covers the exact request body bytes, so re-serializing produces an invalid sig
- Bug 3: Cohere stop sequences now map to 'stopSequences' (was incorrectly 'stop')
- Bug 4: removed hardcoded Cohere defaults (maxTokens=600, temperature=1, topK=0,
topP=0.75, frequencyPenalty=0) that silently overrode user intent on every call
- Added 6 unit tests covering all three fixes
* fix(oci): comprehensive code quality pass — bugs, tests, schema accuracy
- Fix Cohere tool call IDs (was always call_0; now UUID per call)
- Fix TOOL_CALL finish reason mapping in both sync and streaming paths
- Fix Cohere stop parameter mapping (stop → stopSequences)
- Remove hardcoded Cohere defaults (maxTokens/topK/topP/frequencyPenalty)
- Fix content[0] safety guard against empty content arrays
- Fix streaming signed body used consistently (not re-serialized)
- Raise OCIError (not bare Exception/ValueError) throughout
- Centralize OCI_API_VERSION constant; import uuid at module level
- Fix embed get_complete_url to strip trailing slashes from api_base
- Fix OCIEmbedResponse schema: add inputTextTokenCounts (actual OCI field)
- Fix embed usage computed from inputTextTokenCounts (sum of per-input counts)
- Fix Cohere toolCallId included in tool result messages
- Add OCIToolCall.id as Optional (absent in Google/xAI streaming chunks)
- Update tests to reflect correct behavior (no hardcoded defaults, UUID ids,
deferred credential validation, OCIError vs ValueError, real response schema)
* test(oci): move integration tests to tests/llm_translation/
Addresses greptile P1: tests/test_litellm/ is for mock-only unit tests
(make test-unit target). Real-network OCI tests now live in the correct
location alongside other provider integration tests.
* fix(oci): align types and transformation with official OCI SDK
- Remove OCIVendors.GEMINI — apiFormat="GEMINI" is invalid; all non-Cohere
models use apiFormat="GENERIC"
- Add toolChoice, logitBias, logProbs to OCIChatRequestPayload so params
present in the mapping are no longer silently dropped by Pydantic
- Exclude n→numGenerations from Cohere param map (not a Cohere API field)
- Fix CohereToolResult: change callId/result to call/outputs matching
the OCI SDK's CohereToolResult structure
- Fix CohereToolMessage: replace non-existent toolCallId with toolResults
list; update adapt_messages_to_cohere_standard to build proper tool-result
history entries by resolving tool call name+params from preceding assistant
messages
- Map generic-model stream finish reasons to OpenAI convention
(COMPLETE→stop, MAX_TOKENS→length, TOOL_CALLS→tool_calls), consistent
with the existing Cohere streaming path
- Add optional id field to OCIEmbedResponse so valid API responses
carrying an id are not rejected by the Pydantic model
* fix(oci): use 'output' key in Cohere tool result outputs (matches reference impl)
* fix(oci): port schema/type utilities from langchain-oracle reference impl
- Add resolve_oci_schema_refs: inline $ref/$defs — OCI rejects JSON Schema refs
- Add resolve_oci_schema_anyof: flatten Optional[T] anyOf (Pydantic v2 emits these)
- Add sanitize_oci_schema: strip title, normalise null types, ensure array items
- Add OCI_JSON_TO_PYTHON_TYPES: Cohere expects Python type names (str/int/float),
not JSON Schema names (string/integer/number)
- Add enrich_cohere_param_description: embed enum/format/range/pattern constraints
into description since CohereParameterDefinition has no dedicated fields
- Apply all of the above in adapt_tool_definitions_to_cohere_standard and
adapt_tool_definition_to_oci_standard
- Fix toolChoice conversion: map OpenAI string ('auto','none','required') to OCI
dict form ({"type":"AUTO"} etc.) — the API rejects plain strings
- Update unit test expectations to match correct Python type names and enriched
descriptions
* refactor(oci): split transformation.py into cohere.py and generic.py
transformation.py was 1 243 lines doing too many jobs. Split along the
same boundaries as the langchain-oracle reference (providers/cohere.py,
providers/generic.py):
chat/cohere.py — Cohere message/tool building, response + stream parsing
chat/generic.py — Generic message/tool building, response + stream parsing
transformation.py — thin OCIChatConfig orchestrator + OCIStreamWrapper
Public symbols (OCIChatConfig, OCIStreamWrapper, adapt_messages_to_*,
OCIRequestWrapper, version, …) remain importable from transformation.py
for backward compatibility. OCIStreamWrapper gains delegating shims for
_handle_cohere_stream_chunk and _handle_generic_stream_chunk so existing
test call sites keep working unchanged.
transformation.py: 1 243 → 620 lines
* refactor(oci): principal-level code quality pass
- Remove _extract_text_content duplication — single definition in cohere.py,
imported where needed; instance method on OCIChatConfig eliminated
- Move cryptography imports to module level with _CRYPTOGRAPHY_AVAILABLE flag
and _require_cryptography() guard; no more re-import on every signing call
- Move litellm version import to module level via litellm._version; remove
inline import inside validate_oci_environment
- sign_with_manual_credentials now returns Tuple[dict, bytes] matching
sign_with_oci_signer — asymmetry eliminated, Optional[bytes] guards removed
throughout stream wrappers (signed_json_body: bytes = b"")
- Rename _openai_to_oci_cohere_param_map → openai_to_oci_cohere_param_map
for consistency with openai_to_oci_generic_param_map
- Remove double-key bug in map_openai_params where responseFormat was stored
under both OCI and OpenAI key names simultaneously
- Remove delegating shims (adapt_messages_to_cohere_standard,
adapt_tool_definitions_to_cohere_standard, _handle_generic_stream_chunk)
from OCIChatConfig/OCIStreamWrapper; tests now import directly from
cohere.py and generic.py where symbols live
- Trim __all__ to 7 genuine public symbols; remove the 13-symbol list that
existed only to support test imports
- Collapse per-model integration test classes into pytest.mark.parametrize;
CHAT_MODELS list is the single source of truth for model-specific config
- Black + Ruff clean across all OCI files
* fix(oci): address PR review findings
- types/llms/oci.py: add "TOOL_CALL" to CohereChatResponse.finishReason
Literal so Pydantic does not raise ValidationError on non-streaming
Cohere tool-use calls (Greptile P1)
- test_oci_cohere_tool_calls.py: add test covering TOOL_CALL finish reason
- model_prices_and_context_window.json: remove 6 duplicate oci/cohere.embed-*
keys that were silently overridden by the more complete entries already
present in the file (Greptile P1)
- common_utils.py: move OCI_API_VERSION here from chat/transformation.py
so embed/transformation.py does not need to import chat/transformation;
change Protocol stub body from ... to pass (CodeQL "statement no effect");
add comment to sha256_base64 clarifying it implements OCI HTTP signing
spec, not password hashing (CodeQL false positive)
- chat/transformation.py: import CustomStreamWrapper from
litellm_core_utils.streaming_handler instead of litellm.utils to reduce
import cycle depth (CodeQL cyclic import)
- chat/cohere.py, chat/generic.py: import Usage and
ChatCompletionMessageToolCall from litellm.types.utils instead of
litellm.utils for the same reason
- embed/transformation.py: import OCI_API_VERSION from common_utils
instead of chat/transformation (removes the embed→chat import edge)
* test(oci): add unit tests to improve patch coverage
- test_oci_common_utils.py (new): covers sha256_base64, build_signature_string,
OCIRequestWrapper.path_url, resolve_oci_credentials, get_oci_base_url,
validate_oci_environment, sign_with_oci_signer error paths, sign_oci_request
routing, load_private_key_from_file error paths, resolve_oci_schema_refs
(including circular ref and external $ref), resolve_oci_schema_anyof,
sanitize_oci_schema (all branches), enrich_cohere_param_description
- test_oci_generic_chat.py (new): covers content-message error paths (non-dict
item, unsupported type, non-string text, invalid image_url), tool-call
validation error paths, adapt_messages_to_generic_oci_standard error paths,
handle_generic_response (None message, text content, tool calls),
handle_generic_stream_chunk (finish reasons, streaming tool calls),
OCIStreamWrapper non-string chunk error
- test_oci_chat_transformation.py: add error paths for validate_environment
(empty messages), transform_request (missing compartment_id, Cohere without
user messages), transform_response (error key), map_openai_params
(unsupported param with and without drop_params), tool_choice string mapping
- test_oci_cohere_tool_calls.py: add edge cases for stream chunk finish
reasons (TOOL_CALL, MAX_TOKENS, unknown), _extract_text_content with
non-dict list items and non-string input,
adapt_messages_to_cohere_standard with malformed JSON tool arguments
* fix(oci): rename supports_streaming to supports_native_streaming in model prices
The JSON schema for model_prices_and_context_window.json uses
`supports_native_streaming` (not `supports_streaming`) and has
`additionalProperties: false`. Rename the field across all OCI
entries to pass the schema validation test.
* test(oci): add 67 tests targeting uncovered happy paths for coverage
Boost patch coverage on the four lowest-coverage OCI files:
- common_utils.py: sign_with_manual_credentials (oci_key / oci_key_file
paths), sign_oci_request routing, _require_cryptography
- generic.py: adapt_messages_to_generic_oci_standard (all roles),
adapt_tool_definition_to_oci_standard, adapt_tools_to_openai_standard,
handle_generic_stream_chunk text/finish-reason paths
- cohere.py: _extract_text_content, adapt_messages_to_cohere_standard
(all roles including tool results), handle_cohere_response /
handle_cohere_stream_chunk all finish-reason branches
- transformation.py: get_vendor_from_model, OCIChatConfig._get_optional_params
(toolChoice string→dict, responseFormat, tools for both vendors),
transform_request for GENERIC model, get_sync/async_custom_stream_wrapper
with mocked HTTP, OCIStreamWrapper.chunk_creator happy paths
* fix(oci): suppress CodeQL false positive on sha256_base64 (OCI HTTP signing, not password hashing)
* fix(oci): remove 6 duplicate model price entries and reconcile conflicting values
Six OCI chat model keys appeared twice in model_prices_and_context_window.json
with conflicting pricing/context data (JSON parsers silently discard the first).
Remove the first-occurrence entries and update the surviving entries:
- meta.llama-4-maverick / llama-4-scout: keep updated entries (free preview
pricing, larger context windows, vision support)
- meta.llama-3.1-70b: keep original pricing, restore supports_native_streaming
- google.gemini-2.5-{flash,pro,flash-lite}: keep OCI pricing page values,
restore supports_native_streaming
* fix(oci): route GPT-5 family to maxCompletionTokens
GPT-5 / GPT-5-mini / GPT-5-nano / GPT-5.5 on OCI reject "maxTokens"
with HTTP 400:
Invalid 'maxTokens': Unsupported parameter: 'maxTokens' is not
supported with this model. Use 'maxCompletionTokens' instead.
(Same convention as OpenAI's reasoning-API contract.)
Add a model-aware rename in OCIChatConfig._get_optional_params so the
request payload uses maxCompletionTokens when the model id starts with
openai.gpt-5. Regular Llama / Cohere / Gemini / GPT-4.x continue to use
maxTokens unchanged.
Also widen OCIChatRequestPayload to carry the new optional field so it
survives Pydantic serialization.
Verified live against OCI us-chicago-1:
- openai.gpt-5, gpt-5-mini, gpt-5-nano, gpt-5.5 all return 200
- Full feature sweep on gpt-5.5 (basic, system, multi-turn, streaming,
tools, usage) all green
- meta.llama-3.3-70b-instruct still uses maxTokens (no regression)
4 new unit tests cover the helper, the routing in both pre- and
post-translation states, and Pydantic serialization.
* ci(oci): fix CI failures — black formatting + recursive_detector ignore
- Run black on litellm/llms/oci/common_utils.py + 3 OCI test files
that drifted out of black-compliance during the rebase.
- Add the three bounded recursive functions in oci/common_utils.py
(`_resolve`, `resolve_oci_schema_anyof`, `sanitize_oci_schema`) to
the recursive_detector IGNORE_FUNCTIONS list. All three are bounded:
`_resolve` uses a `resolving_stack` cycle guard; the other two are
bounded by JSON-schema tree depth (no cycles in well-formed input),
matching the pattern of the existing OCI/Vertex schema walkers
already on the list.
* fix(oci): silence MyPy errors in cohere.py — typed-dict access
Two errors flagged by `lint` CI:
llms/oci/chat/cohere.py:73: "object" has no attribute "__iter__"
llms/oci/chat/cohere.py:119: No overload variant of "get" of "dict"
matches argument types "object", "CohereToolCall"
Both stem from `msg.get("tool_calls")` / `msg.get("tool_call_id")`
returning `object` per the AllMessageValues TypedDict union. Bind to
`Any` locally for the iteration and coerce the lookup key with `str()`,
removing the now-unused `# type: ignore` on those lines.
No behaviour change — pure type-narrowing for the type checker.
* fix(oci): silence CodeQL py/weak-sensitive-data-hashing on sha256_base64
CodeQL's taint analysis traces request bodies back to environment-loaded
secrets and flags `hashlib.sha256(body).digest()` as
`py/weak-sensitive-data-hashing` — even though SHA-256 is the algorithm
mandated by the OCI HTTP request signing spec for the
`x-content-sha256` header (not a password/secret hash).
The previous suppression used legacy `# lgtm[...]` syntax which the
modern CodeQL action ignores. Switch to Python's standard
`hashlib.sha256(..., usedforsecurity=False)` (Python 3.9+) which CodeQL
honours as a non-security declaration. Behaviour unchanged.
* feat(oci): add reasoning_effort passthrough — only true missing primitive
OCI's GenericChatRequest exposes a reasoningEffort field
(NONE/MINIMAL/LOW/MEDIUM/HIGH) that's the single biggest cost knob for
reasoning-capable models on the service:
- GPT-5 family
- Gemini 2.5
- Grok reasoning variants (3-mini, 4-fast, 4.20)
- Cohere Command-A-Reasoning
Setting reasoning_effort=LOW typically cuts reasoning-token spend 5-10×
vs the default. Without exposing this, litellm users had no way to tune
cost-vs-quality on these models.
The other GenericChatRequest fields (verbosity, parallel_tool_calls,
logit_bias, n, metadata, web_search_options, prediction) are not
exposed because they are not missing primitives — they either duplicate
prompt-engineering, framework-level controls, or are too niche to
justify the maintenance surface. We only ship what users genuinely
can't accomplish another way.
Excluded from the Cohere v1 param map: CohereChatRequest has no
reasoningEffort field, and Cohere reasoning models
(cohere.command-a-reasoning) use COHEREV2 which is a separate request
type not covered by this PR.
Verified live: GPT-5.5 + reasoning_effort="HIGH" sends
{"reasoningEffort": "HIGH"} on the wire and OCI accepts the request.
* feat(oci): reasoning_effort + reasoning_tokens for OCI GenAI
Three small additions for OCI reasoning models, requested by users
testing the PR in production fork builds:
1. **reasoning_effort param mapping (GENERIC vendors).** OCI expects
uppercase levels ("LOW"/"MEDIUM"/"HIGH"/"NONE") on `reasoningEffort`,
but OpenAI-compatible clients send lowercase. Mapped + uppercased in
`_get_optional_params`. Marked unsupported on Cohere V1/V2 since OCI
Cohere has no reasoning models (avoids Pydantic validation failure
on CohereChatRequest).
2. **"disable" → "NONE" mapping.** OpenAI uses "disable" to turn off
reasoning; OCI uses "NONE". Without this, callers get a 400.
3. **reasoning_tokens propagated to Usage.** OCI returns
`completionTokensDetails.reasoningTokens` but it wasn't being passed
to LiteLLM's Usage object. Now flows through to
`Usage.completion_tokens_details.reasoning_tokens` so callers can
track reasoning token consumption for cost/observability.
Tests: 7 new unit tests in TestOCIReasoningEffort covering upper/lower
case, "disable"→"NONE", Cohere drop/raise paths, and reasoning_tokens
extraction (with and without completionTokensDetails). 5 new live
integration tests against xai.grok-3-mini in us-chicago-1 verifying the
full request/response loop end-to-end. Existing
test_transform_response_simple_text assertion that
completion_tokens_details was None has been updated to assert
reasoning_tokens flows through.
Verified live on xai.grok-3-mini: reasoning_effort=low → OCI accepts
"LOW", returns reasoningTokens=316 in usage. reasoning_effort=disable
→ OCI accepts "NONE". Full suite: 370/370 unit + 51/51 integration.
* fix(codeql): re-scope py/weak-sensitive-data-hashing exclusion to OCI signing file
CodeQL's taint analysis re-fires the `py/weak-sensitive-data-hashing`
alert at `litellm/llms/oci/common_utils.py:103` whenever upstream code
paths into the OCI signing module change (touching `transformation.py`
opens new flow paths that CodeQL re-evaluates from scratch). The
`hashlib.sha256(..., usedforsecurity=False)` declaration silences the
direct-call form of the query but not the taint-flow form.
SHA-256 here is mandated by the OCI HTTP signing specification for the
x-content-sha256 content-integrity header — not for password storage:
https://docs.oracle.com/en-us/iaas/Content/API/Concepts/signingrequests.htm
CodeQL has no per-query path filter and GitHub Code Scanning ignores
inline lgtm/codeql comments, so path-ignoring this single ~560-line
signing utility file is the narrowest available suppression. All other
files retain full coverage of py/weak-sensitive-data-hashing — including
litellm/proxy/utils.py where the rule legitimately applies.
This restores the NEUTRAL CodeQL state the PR had on prior commits
(see `2111c98af7` for the same approach on the previous branch
evolution that the cherry-pick was rebased onto a different baseline).
* fix(oci): drop duplicate text on Cohere streaming terminal chunk
OCI Cohere's terminal SSE event re-sends the full assembled response in
`text` alongside a populated `chatHistory`. Emitting that text as another
delta concatenates the entire response onto the already-streamed output
(e.g. "How can I help?How can I help?").
Use `chatHistory is not None` as the discriminator for the consolidated
terminal event — `finishReason` is a weaker signal that could in principle
appear on a non-consolidated chunk. The two coincide today; this preserves
correctness if OCI ever ships finishReason on an incremental chunk.
Adds a live-OCI integration regression test that compares streamed vs
non-streamed length and asserts the response prefix appears only once.
Verified to fail under the previous code with the exact reported
reproduction: 'Hello! How can I help you today?Hello! How can I help you today?'.
Reported by @gotsysdba on PR #25177.
* fix(oci): buffer SSE stream across HTTP read boundaries
The old split_chunks helper split each individual HTTP read on "\n\n",
which assumed SSE event boundaries always aligned with read boundaries.
In practice the OCI streaming endpoint delivers events that may:
- straddle two reads (chunk_creator gets a truncated JSON and crashes)
- arrive separated by a single "\n" instead of "\n\n"
- share a read with multiple complete events
Replace the inline split with module-level helpers _iter_sse_events
(sync) / _aiter_sse_events (async) that maintain a buffer across reads,
split on any newline, and yield only complete "data:" lines.
Add 25 regression tests covering event-split-across-reads, tiny-chunk
reads, single-newline separators, keepalive/comment lines, trailing
partial events flushed at EOF, "\r\n" line endings, and an end-to-end
smoke test that feeds an awkwardly-chopped payload through the splitter
into OCIStreamWrapper.chunk_creator.
Reported by John Lathouwers.
* test(oci): repoint TestOCIKeyNormalization to sign_with_manual_credentials
The signing helper moved from OCIChatConfig._sign_with_manual_credentials
to a module-level sign_with_manual_credentials in common_utils.py. Four
tests in TestOCIKeyNormalization still called the old method:
- 2 failed outright with AttributeError
- 2 passed by accident because they used pytest.raises(Exception),
which happily caught the AttributeError instead of exercising the
intended OCIError path
Repoint all four to the new module-level function so they exercise the
actual oci_key type-validation branch.
* fix(oci): validate oci_region before URL interpolation to prevent SSRF
Anchor oci_region to ^[a-z][a-z0-9-]{0,30}[a-z0-9]$ inside get_oci_base_url
so user-supplied regions that would redirect the signed request to an
attacker-controlled host (e.g. 'evil.com/#') fail with HTTP 400 before
the URL or signature is built. Empty string still falls back to the
us-ashburn-1 default, so existing callers are unaffected.
* test(audio): skip when gpt-4o-audio-preview is unavailable upstream
OpenAI retired `gpt-4o-audio-preview` (404 model_not_found in CI as of
2026-05-19), and the existing try/except in these tests only re-raised
on 'openai-internal' errors. Other exceptions were silently swallowed,
so the next line ran with an unbound `response`/`completion` and
failed with an unrelated UnboundLocalError that masked the real cause.
Extend the skip condition to also cover model_not_found / 'does not exist'
so the suite reports the upstream outage cleanly, matching the pattern
used in
|
||
|
|
d949085310 |
Merge pull request #24697 from BerriAI/litellm_codeql_gha
[Infra] Improve CodeQL scanning coverage and schedule |
||
|
|
ec4273ed8b |
[Infra] Improve CodeQL scanning coverage and schedule
Switch query suite from security-extended to security-and-quality to match the default GitHub Advanced Security setup. Run scheduled scans daily instead of weekly. Remove paths-ignore for _experimental/out so build artifacts are also scanned. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
dfb543369b | fix: address zizmor comments | ||
|
|
b90a0af0d7 | remove extra @ | ||
|
|
f86b240d7e | pin github scripts + remove unused | ||
|
|
1310a275d2 |
ci: narrow codeql guard to schedule-only
Use event_name check so push/PR-triggered CodeQL scans still run on forks — only the scheduled run is skipped. |
||
|
|
91bc095e18 |
ci: skip scheduled workflows on forks
Add `if: github.repository == 'BerriAI/litellm'` guard to scheduled jobs in stale.yml, codeql.yml, and create_daily_staging_branch.yml. This matches the existing pattern in auto_update_price_and_context_window.yml and prevents these workflows from running unnecessarily on fork repositories. |
||
|
|
40210ce750 | fix(codeql): remove ruby from language matrix (#23227) | ||
|
|
d7340b595b |
Update .github/workflows/codeql.yml
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> |
||
|
|
53f3123030 |
fix(ci): add custom CodeQL workflow to replace expensive default setup
The default CodeQL setup runs all 45 Python security queries against the entire codebase. Two queries (CleartextLogging, PolynomialReDoS) produce result sets > 2 GiB, causing 49+ minute runs that fail and block CI. - Add custom workflow with 30-minute timeout and concurrency limits - Exclude py/clear-text-logging-sensitive-data (CWE-312) - Exclude py/polynomial-redos (CWE-730) - Skip scanning tests/, docs/, and UI build output NOTE: The Default Setup must be disabled in repo Settings > Code security before merging, otherwise both will run simultaneously. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> |