Commit Graph

5569 Commits

Author SHA1 Message Date
Sameer Kankute 11dbae85d1 Merge pull request #19390 from BerriAI/litellm_consistent_id_streaming_responses
Fix: ID mismatch between text-start and text-delta
2026-01-20 20:46:34 +05:30
Sameer Kankute 9e1275b76c Merge branch 'main' into litellm_staging_01_19_2026 2026-01-20 19:19:36 +05:30
Sameer Kankute a3c1f4758d Merge branch 'main' into litellm_consistent_id_streaming_responses 2026-01-20 19:02:23 +05:30
Sameer Kankute e69c12b6db Merge pull request #19396 from BerriAI/litellm_responses_route_fix
Fix for Prometheus Metric Cardinality Issue with /responses Endpoint
2026-01-20 19:01:18 +05:30
Sameer Kankute 3cc19c56ba Revert "feat: Add Redis-based migration lock with bug fixes (#19261)"
This reverts commit 98e87c3e67.
2026-01-20 18:46:26 +05:30
Sameer Kankute dc3ee63359 fix: test_env_keys 2026-01-20 18:37:56 +05:30
Sameer Kankute 2153db5e64 fix: test_convert_to_bedrock_format_post_call_streaming_hook 2026-01-20 18:27:36 +05:30
Sameer Kankute 8b24720638 fix: test_standard_logging_payload_includes_guardrail_information 2026-01-20 18:21:32 +05:30
Sameer Kankute cd96c8cbb0 Fix:test_aaaaazure_tenant_id_auth 2026-01-20 17:39:08 +05:30
Sameer Kankute f0785d5a51 Fix:test_supported_params_limited_to_docs 2026-01-20 17:26:40 +05:30
Sameer Kankute dd6b35e825 Merge pull request #19401 from BerriAI/main
Merge main 01 20 2026
2026-01-20 16:45:08 +05:30
Sameer Kankute deb9142117 Merge pull request #19400 from BerriAI/main
merge main iin 19/1 staging
2026-01-20 16:45:01 +05:30
Sameer Kankute 5f80e8d5e8 Fix for Prometheus Metric Cardinality Issue with /responses Endpoint 2026-01-20 15:28:09 +05:30
Sameer Kankute f945fd9a84 Fix: ID mismatch between text-start and text-delta 2026-01-20 11:15:37 +05:30
Sameer Kankute 3eb3594ab7 Merge pull request #19346 from Chesars/fix/drop-params-prompt-cache-key-19225
fix: drop_params not dropping prompt_cache_key for non-OpenAI providers
2026-01-20 10:15:45 +05:30
Sameer Kankute 931998f170 Merge pull request #19266 from VedantMadane/fix-prompt-caching-string-content
Fix extract_cacheable_prefix to handle string content with message-level cache_control
2026-01-20 10:11:48 +05:30
victorigualada 7d6d419a67 fix: preserve tool output ordering for gemini in responses bridge (#19360)
* fix: preserve tool output ordering for gemini in responses bridge

- Keep function_call_output adjacent to its function_call when building chat messages
- Normalize function_call_output.output lists (input_* parts) into tool message content

* fix test

* small improvements
2026-01-19 20:37:59 -08:00
victorigualada 581d086c20 fix(responses): stream tool call events in completion bridge (#19368)
Emit Responses API streaming events for tool calls when the underlying chat stream contains tool_call deltas, and recover tool calls into the stream when they only appear in the final response.
2026-01-19 20:29:50 -08:00
Sameer Kankute 2ae308028d Merge pull request #18787 from aproorg/fix/bedrock-thinking-tool-call-2
fix(bedrock): handle thinking with tool calls for Claude 4 models
2026-01-20 09:43:07 +05:30
YutaSaito 00814d4d90 Merge pull request #19379 from BerriAI/litellm_feat_mcp_version_up
[feat] mcp version up
2026-01-20 13:09:29 +09:00
Yuta Saito ab11ceff32 tests: patch MCP client mocks via module alias to avoid real network calls 2026-01-20 12:31:27 +09:00
이명현 0dfc3fad5a Fix: bedrock invoke claude 4 optional params #19318 (#19381) 2026-01-19 19:14:58 -08:00
Ryan Malloy 58c8c2b7b1 fix: HTTP client memory leaks in Presidio, OpenAI, and Gemini (#19190)
* fix: prevent HTTP client memory leaks in Presidio and OpenAI wrappers

Fixes multiple memory leak issues reported in #14540 and related tickets:

**Presidio Guardrail Fix (#14540)**
- Problem: Every guardrail check created a new aiohttp.ClientSession
- Impact: High-traffic proxies accumulated thousands of unclosed sessions
- Solution: Share a single session across all guardrail checks
  - Added `self._http_session` instance variable
  - Lazy session creation via `_get_http_session()`
  - Proper cleanup via `_close_http_session()` and `__del__()`
- Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py

**OpenAI HTTP Client Caching (#14540)**
- Problem: `_get_async_http_client()` created new httpx.AsyncClient on each call
- Impact: OpenAI/Azure completions bypassed client caching system
- Solution: Route through `get_async_httpx_client()` for TTL-based caching
  - Caches clients by provider and SSL config
  - Fallback to direct creation if caching fails
  - Applied to both async and sync client methods
- Files: litellm/llms/openai/common_utils.py

**Test Script**
- Added validation script to demonstrate fixes
- Counts file descriptors and unclosed session objects
- Files: test_oom_fixes.py

Related issues: #14384, #13251, #12443

* fix(oom): prevent memory leaks in Presidio guardrails and OpenAI client creation

Fixes two high-impact memory leaks:

1. Presidio Guardrail Session Leak (issue #14540)
   - Problem: Created new aiohttp.ClientSession on every guardrail check
   - Impact: Runs on EVERY proxy request when PII masking enabled
   - Fix: Shared session pattern with lifecycle management
   - Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py

2. OpenAI HTTP Client Cache Bypass (issue #14540)
   - Problem: _get_async_http_client() created new httpx.AsyncClient, bypassing TTL cache
   - Impact: Every completion created new client with own connection pool
   - Fix: Route through get_async_httpx_client() for proper caching
   - Critical: Include SSL config in cache key for correctness
   - Files: litellm/llms/openai/common_utils.py

Validation:
- Presidio: 100 requests → 0 new sessions (was 100)
- OpenAI: 100 calls → 1 unique client (was 100)
- test_oom_fixes.py: Automated validation script

* fix(oom): resolve Gemini aiohttp session leak (issue #12443)

Fixes persistent "Unclosed client session" warnings when using Gemini models.

Root Causes:
1. Broken atexit cleanup - get_event_loop() fails at exit time
2. On-demand session creation without reliable cleanup

Changes:

1. Fixed atexit Cleanup (async_client_cleanup.py)
   - OLD: Used get_event_loop() which fails when loop is closed
   - NEW: Always create fresh event loop at exit time
   - Ensures cleanup runs successfully even when main loop is closed

2. Added __del__ Cleanup (aiohttp_handler.py)
   - Defense-in-depth: cleanup on garbage collection
   - Handles abnormal termination cases
   - Similar pattern to Presidio guardrail fix

3. Enhanced Cleanup Scope (async_client_cleanup.py)
   - Now closes global base_llm_aiohttp_handler instance
   - Previously only checked cache, missed module-level handler

Validation:
- Test 1: __del__ cleanup → 0 sessions leaked ✓
- Test 2: atexit cleanup → 0 sessions leaked ✓
- test_gemini_session_leak.py: Automated validation

Related: #14540 (broader OOM issue tracking)

* fix(types): use LlmProviders enum for get_async_httpx_client

MyPy was failing because llm_provider parameter expects Union[LlmProviders, httpxSpecialProvider], not a string.

Changed from string "openai" to LlmProviders.OPENAI enum value.

* test: move validation tests to proper CI directories

- Move test_oom_fixes.py to tests/test_litellm/llms/
- Move test_gemini_session_leak.py to tests/test_litellm/llms/custom_httpx/
- Fix pytest warning: use pytest.skip() instead of return True

This ensures CI actually runs our OOM fix validation tests.

* fix(oom): add asyncio.Lock to prevent race conditions in Presidio session creation

- Make _get_http_session() async with asyncio.Lock protection
- Prevents multiple concurrent requests from creating orphaned sessions
- Add concurrent load test (50 parallel requests) to validate fix
- Test confirms only 1 session created under concurrent load

Critical fix: Previous implementation had race condition where
concurrent guardrail checks could create multiple sessions,
defeating the shared session pattern and causing memory leaks.

* fix(presidio): eliminate race condition in session lock initialization

Move asyncio.Lock creation from lazy initialization in _get_http_session()
to __init__. The previous lazy init had a race condition where concurrent
coroutines could both see _session_lock as None, both create locks, and
end up with different lock instances - defeating the synchronization.

asyncio.Lock() can be safely created without an event loop; it only
requires one when awaited.
2026-01-19 19:02:55 -08:00
南辰燏炚 004bde2c45 feat (volcengine) : Support Volcengine responses api (#18508)
* Add Volcengine responses adapter

* fix llms/volcengine/responses/transformation.py:507:9: F841 Local variable `origin` is assigned to but never used

fix llms/volcengine/responses/transformation.py:95: error: Argument "headers" to "VolcEngineError" has incompatible type

add more supported optional params

removed redundant manual logging/utils fallbacks so litellm/__init__.py uses the registry only.
2026-01-19 19:02:29 -08:00
Emerson Gomes 13d887a275 Fix queue persistence to Redis (#19304)
* Fix queue persistence to Redis

* add test
2026-01-19 19:01:34 -08:00
Ishaan Jaff 818913ee23 [Fix] Fix Pass through routes to work with server root path (#19383)
* test_build_full_path_with_root_default

* fix pt feat
2026-01-19 18:28:55 -08:00
Yuta Saito ec7bf0ff1a Merge remote-tracking branch 'upstream/main' into litellm_feat_mcp_version_up 2026-01-20 09:52:38 +09:00
Yuta Saito 51cf782292 chore: switch experimental client to streamable_http_client API 2026-01-20 07:37:50 +09:00
Yuta Saito 05d9fb6fd6 feat: SEP-986 2026-01-20 07:24:39 +09:00
Ishaan Jaff a82467d679 [Feat] - Add self hosted Claude Code Plugin Marketplace (#19378)
* init schema

* init endpoints

* fix: claude_code_marketplace_router

* refactor

* fix: claude_code_marketplace_router

* claude_code_marketplace_router
2026-01-19 14:05:47 -08:00
0x1f99d 1cce718551 fix(bedrock): deduplicate tool calls in assistant history (#15178) (#19324)
* fix: Avoid attaching tool calls when a call_id already exists

* fix: Prevent MCP responses from reviving past tool calls via previous_response_id

* test: Parametrize MCP streaming test to cover OpenAI and Anthropic models

* test: Fail MCP streaming test when LiteLLM logs errors during follow-up calls

* test: Let MCP tool-execution mock accept new kwargs for streaming tests

* chore: fix lint error

* docs: Add Google Workload Identity Federation (WIF) documentation to Vertex AI (#19320)

- Added new section documenting WIF support for Vertex AI authentication
- Included SDK and Proxy configuration examples
- Added sample WIF credentials file format for AWS federation
- Mentioned LLM Credentials UI as an alternative for credential management
- Added link to Google Cloud WIF documentation

Co-authored-by: Cursor Agent <cursoragent@cursor.com>

* fix(bedrock): deduplicate tool calls in assistant history (#15178)

* fix(types): add missing Set import to factory.py

---------

Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp>
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com>
2026-01-19 10:56:49 -08:00
Cesar Garcia d30c25af21 feat(gemini): use responseJsonSchema for Gemini 2.0+ models (#19314)
* feat(gemini): add opt-in support for responseJsonSchema

Add support for Gemini's native responseJsonSchema parameter which uses
standard JSON Schema format instead of OpenAPI-style responseSchema.

Benefits of responseJsonSchema (Gemini 2.0+ only):
- Standard JSON Schema format (lowercase types)
- Supports additionalProperties for stricter validation
- Better compatibility with Pydantic's model_json_schema()
- No propertyOrdering required

Usage:
```python
response_format={
    "type": "json_schema",
    "json_schema": {"schema": {...}},
    "use_json_schema": True  # opt-in
}
```

This is backwards compatible - existing code continues to use
responseSchema by default.

Closes #16340

* docs: add documentation for use_json_schema parameter

Document the new use_json_schema option for Gemini 2.0+ models
in the JSON Mode documentation.

* refactor(gemini): use responseJsonSchema by default for Gemini 2.0+

Remove opt-in flag `use_json_schema` and automatically detect model version:
- Gemini 2.0+: uses responseJsonSchema (standard JSON Schema, supports additionalProperties)
- Gemini 1.5: uses responseSchema (OpenAPI format, legacy)

This follows LiteLLM's philosophy of abstracting provider differences -
users write the same code regardless of model version.

* test(vertex): update json_schema tests to accept both responseSchema formats

Gemini 2.x+ uses responseJsonSchema while Gemini 1.x uses responseSchema.
Update tests to accept both formats since litellm now auto-selects based
on model version.
2026-01-19 10:45:37 -08:00
Cesar Garcia 57b1d99b44 feat(azure): add support for Azure OpenAI v1 API (#19313)
* feat(azure): add support for Azure OpenAI v1 API

When api_version is 'v1', 'latest', or 'preview', use the standard
OpenAI client instead of AzureOpenAI client with base_url pointing
to /openai/v1/ endpoint.

This follows Microsoft's documentation for the new v1 API format:
https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#api-specs

Changes:
- Add OpenAI/AsyncOpenAI imports to common_utils.py and azure.py
- Modify get_azure_openai_client() to detect v1 API versions and
  create appropriate client type
- Update isinstance checks and type hints to accept both client types
- Add unit tests for v1 API client creation

* fix(azure): fix MyPy type errors for v1 API support

- Add type: ignore for AsyncOpenAI constructor
- Update type hints in files/handler.py and batches/handler.py
- Add OpenAI/AsyncOpenAI to Union types for client parameters
- Update isinstance checks to include OpenAI/AsyncOpenAI

* fix(azure): update type hints in files and batches handlers for v1 API

Update async method signatures to accept Union[AsyncAzureOpenAI, AsyncOpenAI]
to fix mypy errors when using v1 API client.
2026-01-19 10:44:38 -08:00
Cesar Garcia 4ad5de10cb fix(realtime): disable SSL for ws:// WebSocket connections (#19345)
When using http:// api_base (converted to ws://), the websockets library
throws "ssl argument is incompatible with a ws:// URI". Only pass SSL
context for secure wss:// connections.

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
2026-01-19 10:37:41 -08:00
Ishaan Jaff e817aa713e [Fix] Claude Code x Bedrock Invoke fails with advanced-tool-use-2025-11-20 (#19373)
* _filter_unsupported_beta_headers_for_bedrock

* test_bedrock_sonnet_4_5_with_advanced_tool_use_beta_header
2026-01-19 10:16:18 -08:00
Benedikt Óskarsson 406cdbe321 Merge branch 'litellm_staging_01_19_2026' into fix/bedrock-thinking-tool-call-2 2026-01-19 15:18:47 +00:00
Sameer Kankute ff7bb59824 Merge branch 'main' into litellm_fix_streaming_test 2026-01-19 19:43:16 +05:30
Sameer Kankute c9de4776bc Fix test_process_chunk_exception_calls_handle_failure_once 2026-01-19 19:39:12 +05:30
Sameer Kankute d6baa9a4ba Merge pull request #19234 from BerriAI/litellm_staging_01_16_2026
Litellm staging 01 16 2026
2026-01-19 19:34:53 +05:30
Harshit Jain 1dc2d2ddac fix(utils.py): correctly extract messages from google genai contents (#19156)
* fix(utils.py): correctly extract messages from google genai contents

* refactor use shared utilities
2026-01-19 06:00:23 -08:00
Harshit Jain 98e87c3e67 feat: Add Redis-based migration lock with bug fixes (#19261) 2026-01-19 05:57:24 -08:00
Harshit Jain fe92f4af9c fix(langfuse_otel): ignore service logs and fix callback shadowing (#19298)
* fix(langfuse_otel): ignore service logs and fix callback shadowing

* add test cases for service logger
2026-01-19 05:53:47 -08:00
Cesar Garcia b49f0a91e4 fix(responses): resolve deepcopy error with tool_choice ValidatorIterator (#17192) (#17205)
Replace copy.deepcopy with model_dump + model_validate in streaming
iterator logging to handle Pydantic ValidatorIterator objects that
cannot be pickled when tool_choice uses allowed_tools mode.

Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
2026-01-19 05:44:20 -08:00
Sameer Kankute daf70f7221 Merge pull request #19329 from BerriAI/litellm_vector_store_sync
Fix: vector store sync issues
2026-01-19 19:11:48 +05:30
Manuel Schweigert 29adf34313 Add ChatGPT subscription support and responses bridge (#19030)
* Add ChatGPT subscription support and responses bridge

* Fix typing import for responses bridge

* Guard device code timestamp parsing

* add /v1/messages endpoint to chatgpt model
2026-01-19 05:37:45 -08:00
Sameer Kankute 574391c118 Revert "Fix audio cost per second override (#19158)"
This reverts commit 2a0f87bde0.
2026-01-19 18:51:08 +05:30
Jón Levy 5db0e3289a fix(agentcore): simplify agentcore streaming (#17141)
* fix(agentcore): simplify agentcore streaming

* fix(agentcore): move CustomStreamWrapper import to module level

The deferred imports inside streaming methods caused initialization delays
during health check requests, leading to timeouts in ECS deployments.

- Move CustomStreamWrapper import to module-level (line 19)
- Remove deferred imports from get_sync_custom_stream_wrapper (line 588)
- Remove deferred import from get_async_custom_stream_wrapper (line 747)
- Remove from TYPE_CHECKING block to use actual import

This ensures the import happens at module load time rather than during
first request processing, preventing health check endpoint blocking.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>

* test(agentcore): ensure sync response

* chore: upgrade boto3 to 1.40.76 in pyproject.toml

* chore: added taplo.toml

* fix(types): correct annotation type hint for MyPy compatibility

Update _convert_annotations_to_chat_format return type from
Dict[str, Any] to ChatCompletionAnnotation TypedDict to match
the Message class's expected type signature.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: Benedikt Óskarsson <bensi94@hotmail.com>
2026-01-19 05:20:24 -08:00
Harshit Jain 6cd4b3603f fix(router): prevent retrying 4xx client errors (#19275) 2026-01-19 05:18:35 -08:00
Benedikt Óskarsson f09cae2107 Merge branch 'main' into fix/bedrock-thinking-tool-call-2 2026-01-19 13:08:17 +00:00
Sameer Kankute 480fa13b1d Merge pull request #19343 from BerriAI/litellm_anthropic_header_fix_19_jan
Fix: anthropic-beta is getting overriden and set to anthropic-beta
2026-01-19 18:24:46 +05:30