litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-07-05 15:08:18 +00:00

Author	SHA1	Message	Date
Sameer Kankute	11dbae85d1	Merge pull request #19390 from BerriAI/litellm_consistent_id_streaming_responses Fix: ID mismatch between text-start and text-delta	2026-01-20 20:46:34 +05:30
Sameer Kankute	9e1275b76c	Merge branch 'main' into litellm_staging_01_19_2026	2026-01-20 19:19:36 +05:30
Sameer Kankute	a3c1f4758d	Merge branch 'main' into litellm_consistent_id_streaming_responses	2026-01-20 19:02:23 +05:30
Sameer Kankute	e69c12b6db	Merge pull request #19396 from BerriAI/litellm_responses_route_fix Fix for Prometheus Metric Cardinality Issue with /responses Endpoint	2026-01-20 19:01:18 +05:30
Sameer Kankute	3cc19c56ba	Revert "feat: Add Redis-based migration lock with bug fixes (#19261 )" This reverts commit `98e87c3e67`.	2026-01-20 18:46:26 +05:30
Sameer Kankute	dc3ee63359	fix: test_env_keys	2026-01-20 18:37:56 +05:30
Sameer Kankute	2153db5e64	fix: test_convert_to_bedrock_format_post_call_streaming_hook	2026-01-20 18:27:36 +05:30
Sameer Kankute	8b24720638	fix: test_standard_logging_payload_includes_guardrail_information	2026-01-20 18:21:32 +05:30
Sameer Kankute	cd96c8cbb0	Fix:test_aaaaazure_tenant_id_auth	2026-01-20 17:39:08 +05:30
Sameer Kankute	f0785d5a51	Fix:test_supported_params_limited_to_docs	2026-01-20 17:26:40 +05:30
Sameer Kankute	dd6b35e825	Merge pull request #19401 from BerriAI/main Merge main 01 20 2026	2026-01-20 16:45:08 +05:30
Sameer Kankute	deb9142117	Merge pull request #19400 from BerriAI/main merge main iin 19/1 staging	2026-01-20 16:45:01 +05:30
Sameer Kankute	5f80e8d5e8	Fix for Prometheus Metric Cardinality Issue with /responses Endpoint	2026-01-20 15:28:09 +05:30
Sameer Kankute	f945fd9a84	Fix: ID mismatch between text-start and text-delta	2026-01-20 11:15:37 +05:30
Sameer Kankute	3eb3594ab7	Merge pull request #19346 from Chesars/fix/drop-params-prompt-cache-key-19225 fix: drop_params not dropping prompt_cache_key for non-OpenAI providers	2026-01-20 10:15:45 +05:30
Sameer Kankute	931998f170	Merge pull request #19266 from VedantMadane/fix-prompt-caching-string-content Fix extract_cacheable_prefix to handle string content with message-level cache_control	2026-01-20 10:11:48 +05:30
victorigualada	7d6d419a67	fix: preserve tool output ordering for gemini in responses bridge (#19360 ) * fix: preserve tool output ordering for gemini in responses bridge - Keep function_call_output adjacent to its function_call when building chat messages - Normalize function_call_output.output lists (input_* parts) into tool message content * fix test * small improvements	2026-01-19 20:37:59 -08:00
victorigualada	581d086c20	fix(responses): stream tool call events in completion bridge (#19368 ) Emit Responses API streaming events for tool calls when the underlying chat stream contains tool_call deltas, and recover tool calls into the stream when they only appear in the final response.	2026-01-19 20:29:50 -08:00
Sameer Kankute	2ae308028d	Merge pull request #18787 from aproorg/fix/bedrock-thinking-tool-call-2 fix(bedrock): handle thinking with tool calls for Claude 4 models	2026-01-20 09:43:07 +05:30
YutaSaito	00814d4d90	Merge pull request #19379 from BerriAI/litellm_feat_mcp_version_up [feat] mcp version up	2026-01-20 13:09:29 +09:00
Yuta Saito	ab11ceff32	tests: patch MCP client mocks via module alias to avoid real network calls	2026-01-20 12:31:27 +09:00
이명현	0dfc3fad5a	Fix: bedrock invoke claude 4 optional params #19318 (#19381 )	2026-01-19 19:14:58 -08:00
Ryan Malloy	58c8c2b7b1	fix: HTTP client memory leaks in Presidio, OpenAI, and Gemini (#19190 ) * fix: prevent HTTP client memory leaks in Presidio and OpenAI wrappers Fixes multiple memory leak issues reported in #14540 and related tickets: Presidio Guardrail Fix (#14540) - Problem: Every guardrail check created a new aiohttp.ClientSession - Impact: High-traffic proxies accumulated thousands of unclosed sessions - Solution: Share a single session across all guardrail checks - Added `self._http_session` instance variable - Lazy session creation via `_get_http_session()` - Proper cleanup via `_close_http_session()` and `__del__()` - Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py OpenAI HTTP Client Caching (#14540) - Problem: `_get_async_http_client()` created new httpx.AsyncClient on each call - Impact: OpenAI/Azure completions bypassed client caching system - Solution: Route through `get_async_httpx_client()` for TTL-based caching - Caches clients by provider and SSL config - Fallback to direct creation if caching fails - Applied to both async and sync client methods - Files: litellm/llms/openai/common_utils.py Test Script - Added validation script to demonstrate fixes - Counts file descriptors and unclosed session objects - Files: test_oom_fixes.py Related issues: #14384, #13251, #12443 * fix(oom): prevent memory leaks in Presidio guardrails and OpenAI client creation Fixes two high-impact memory leaks: 1. Presidio Guardrail Session Leak (issue #14540) - Problem: Created new aiohttp.ClientSession on every guardrail check - Impact: Runs on EVERY proxy request when PII masking enabled - Fix: Shared session pattern with lifecycle management - Files: litellm/proxy/guardrails/guardrail_hooks/presidio.py 2. OpenAI HTTP Client Cache Bypass (issue #14540) - Problem: _get_async_http_client() created new httpx.AsyncClient, bypassing TTL cache - Impact: Every completion created new client with own connection pool - Fix: Route through get_async_httpx_client() for proper caching - Critical: Include SSL config in cache key for correctness - Files: litellm/llms/openai/common_utils.py Validation: - Presidio: 100 requests → 0 new sessions (was 100) - OpenAI: 100 calls → 1 unique client (was 100) - test_oom_fixes.py: Automated validation script * fix(oom): resolve Gemini aiohttp session leak (issue #12443) Fixes persistent "Unclosed client session" warnings when using Gemini models. Root Causes: 1. Broken atexit cleanup - get_event_loop() fails at exit time 2. On-demand session creation without reliable cleanup Changes: 1. Fixed atexit Cleanup (async_client_cleanup.py) - OLD: Used get_event_loop() which fails when loop is closed - NEW: Always create fresh event loop at exit time - Ensures cleanup runs successfully even when main loop is closed 2. Added __del__ Cleanup (aiohttp_handler.py) - Defense-in-depth: cleanup on garbage collection - Handles abnormal termination cases - Similar pattern to Presidio guardrail fix 3. Enhanced Cleanup Scope (async_client_cleanup.py) - Now closes global base_llm_aiohttp_handler instance - Previously only checked cache, missed module-level handler Validation: - Test 1: __del__ cleanup → 0 sessions leaked ✓ - Test 2: atexit cleanup → 0 sessions leaked ✓ - test_gemini_session_leak.py: Automated validation Related: #14540 (broader OOM issue tracking) * fix(types): use LlmProviders enum for get_async_httpx_client MyPy was failing because llm_provider parameter expects Union[LlmProviders, httpxSpecialProvider], not a string. Changed from string "openai" to LlmProviders.OPENAI enum value. * test: move validation tests to proper CI directories - Move test_oom_fixes.py to tests/test_litellm/llms/ - Move test_gemini_session_leak.py to tests/test_litellm/llms/custom_httpx/ - Fix pytest warning: use pytest.skip() instead of return True This ensures CI actually runs our OOM fix validation tests. * fix(oom): add asyncio.Lock to prevent race conditions in Presidio session creation - Make _get_http_session() async with asyncio.Lock protection - Prevents multiple concurrent requests from creating orphaned sessions - Add concurrent load test (50 parallel requests) to validate fix - Test confirms only 1 session created under concurrent load Critical fix: Previous implementation had race condition where concurrent guardrail checks could create multiple sessions, defeating the shared session pattern and causing memory leaks. * fix(presidio): eliminate race condition in session lock initialization Move asyncio.Lock creation from lazy initialization in _get_http_session() to __init__. The previous lazy init had a race condition where concurrent coroutines could both see _session_lock as None, both create locks, and end up with different lock instances - defeating the synchronization. asyncio.Lock() can be safely created without an event loop; it only requires one when awaited.	2026-01-19 19:02:55 -08:00
南辰燏炚	004bde2c45	feat (volcengine) : Support Volcengine responses api (#18508 ) * Add Volcengine responses adapter * fix llms/volcengine/responses/transformation.py:507:9: F841 Local variable `origin` is assigned to but never used fix llms/volcengine/responses/transformation.py:95: error: Argument "headers" to "VolcEngineError" has incompatible type add more supported optional params removed redundant manual logging/utils fallbacks so litellm/__init__.py uses the registry only.	2026-01-19 19:02:29 -08:00
Emerson Gomes	13d887a275	Fix queue persistence to Redis (#19304 ) * Fix queue persistence to Redis * add test	2026-01-19 19:01:34 -08:00
Ishaan Jaff	818913ee23	[Fix] Fix Pass through routes to work with server root path (#19383 ) * test_build_full_path_with_root_default * fix pt feat	2026-01-19 18:28:55 -08:00
Yuta Saito	ec7bf0ff1a	Merge remote-tracking branch 'upstream/main' into litellm_feat_mcp_version_up	2026-01-20 09:52:38 +09:00
Yuta Saito	51cf782292	chore: switch experimental client to streamable_http_client API	2026-01-20 07:37:50 +09:00
Yuta Saito	05d9fb6fd6	feat: SEP-986	2026-01-20 07:24:39 +09:00
Ishaan Jaff	a82467d679	[Feat] - Add self hosted Claude Code Plugin Marketplace (#19378 ) * init schema * init endpoints * fix: claude_code_marketplace_router * refactor * fix: claude_code_marketplace_router * claude_code_marketplace_router	2026-01-19 14:05:47 -08:00
0x1f99d	1cce718551	fix(bedrock): deduplicate tool calls in assistant history (#15178 ) (#19324 ) * fix: Avoid attaching tool calls when a call_id already exists * fix: Prevent MCP responses from reviving past tool calls via previous_response_id * test: Parametrize MCP streaming test to cover OpenAI and Anthropic models * test: Fail MCP streaming test when LiteLLM logs errors during follow-up calls * test: Let MCP tool-execution mock accept new kwargs for streaming tests * chore: fix lint error * docs: Add Google Workload Identity Federation (WIF) documentation to Vertex AI (#19320) - Added new section documenting WIF support for Vertex AI authentication - Included SDK and Proxy configuration examples - Added sample WIF credentials file format for AWS federation - Mentioned LLM Credentials UI as an alternative for credential management - Added link to Google Cloud WIF documentation Co-authored-by: Cursor Agent <cursoragent@cursor.com> * fix(bedrock): deduplicate tool calls in assistant history (#15178) * fix(types): add missing Set import to factory.py --------- Co-authored-by: Yuta Saito <uc4w6c@bma.biglobe.ne.jp> Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: YutaSaito <36355491+uc4w6c@users.noreply.github.com>	2026-01-19 10:56:49 -08:00
Cesar Garcia	d30c25af21	feat(gemini): use responseJsonSchema for Gemini 2.0+ models (#19314 ) * feat(gemini): add opt-in support for responseJsonSchema Add support for Gemini's native responseJsonSchema parameter which uses standard JSON Schema format instead of OpenAPI-style responseSchema. Benefits of responseJsonSchema (Gemini 2.0+ only): - Standard JSON Schema format (lowercase types) - Supports additionalProperties for stricter validation - Better compatibility with Pydantic's model_json_schema() - No propertyOrdering required Usage: ```python response_format={ "type": "json_schema", "json_schema": {"schema": {...}}, "use_json_schema": True # opt-in } ``` This is backwards compatible - existing code continues to use responseSchema by default. Closes #16340 * docs: add documentation for use_json_schema parameter Document the new use_json_schema option for Gemini 2.0+ models in the JSON Mode documentation. * refactor(gemini): use responseJsonSchema by default for Gemini 2.0+ Remove opt-in flag `use_json_schema` and automatically detect model version: - Gemini 2.0+: uses responseJsonSchema (standard JSON Schema, supports additionalProperties) - Gemini 1.5: uses responseSchema (OpenAPI format, legacy) This follows LiteLLM's philosophy of abstracting provider differences - users write the same code regardless of model version. * test(vertex): update json_schema tests to accept both responseSchema formats Gemini 2.x+ uses responseJsonSchema while Gemini 1.x uses responseSchema. Update tests to accept both formats since litellm now auto-selects based on model version.	2026-01-19 10:45:37 -08:00
Cesar Garcia	57b1d99b44	feat(azure): add support for Azure OpenAI v1 API (#19313 ) * feat(azure): add support for Azure OpenAI v1 API When api_version is 'v1', 'latest', or 'preview', use the standard OpenAI client instead of AzureOpenAI client with base_url pointing to /openai/v1/ endpoint. This follows Microsoft's documentation for the new v1 API format: https://learn.microsoft.com/en-us/azure/ai-services/openai/reference#api-specs Changes: - Add OpenAI/AsyncOpenAI imports to common_utils.py and azure.py - Modify get_azure_openai_client() to detect v1 API versions and create appropriate client type - Update isinstance checks and type hints to accept both client types - Add unit tests for v1 API client creation * fix(azure): fix MyPy type errors for v1 API support - Add type: ignore for AsyncOpenAI constructor - Update type hints in files/handler.py and batches/handler.py - Add OpenAI/AsyncOpenAI to Union types for client parameters - Update isinstance checks to include OpenAI/AsyncOpenAI * fix(azure): update type hints in files and batches handlers for v1 API Update async method signatures to accept Union[AsyncAzureOpenAI, AsyncOpenAI] to fix mypy errors when using v1 API client.	2026-01-19 10:44:38 -08:00
Cesar Garcia	4ad5de10cb	fix(realtime): disable SSL for ws:// WebSocket connections (#19345 ) When using http:// api_base (converted to ws://), the websockets library throws "ssl argument is incompatible with a ws:// URI". Only pass SSL context for secure wss:// connections. Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>	2026-01-19 10:37:41 -08:00
Ishaan Jaff	e817aa713e	[Fix] Claude Code x Bedrock Invoke fails with `advanced-tool-use-2025-11-20` (#19373 ) * _filter_unsupported_beta_headers_for_bedrock * test_bedrock_sonnet_4_5_with_advanced_tool_use_beta_header	2026-01-19 10:16:18 -08:00
Benedikt Óskarsson	406cdbe321	Merge branch 'litellm_staging_01_19_2026' into fix/bedrock-thinking-tool-call-2	2026-01-19 15:18:47 +00:00
Sameer Kankute	ff7bb59824	Merge branch 'main' into litellm_fix_streaming_test	2026-01-19 19:43:16 +05:30
Sameer Kankute	c9de4776bc	Fix test_process_chunk_exception_calls_handle_failure_once	2026-01-19 19:39:12 +05:30
Sameer Kankute	d6baa9a4ba	Merge pull request #19234 from BerriAI/litellm_staging_01_16_2026 Litellm staging 01 16 2026	2026-01-19 19:34:53 +05:30
Harshit Jain	1dc2d2ddac	fix(utils.py): correctly extract messages from google genai contents (#19156 ) * fix(utils.py): correctly extract messages from google genai contents * refactor use shared utilities	2026-01-19 06:00:23 -08:00
Harshit Jain	98e87c3e67	feat: Add Redis-based migration lock with bug fixes (#19261 )	2026-01-19 05:57:24 -08:00
Harshit Jain	fe92f4af9c	fix(langfuse_otel): ignore service logs and fix callback shadowing (#19298 ) * fix(langfuse_otel): ignore service logs and fix callback shadowing * add test cases for service logger	2026-01-19 05:53:47 -08:00
Cesar Garcia	b49f0a91e4	fix(responses): resolve deepcopy error with tool_choice ValidatorIterator (#17192 ) (#17205 ) Replace copy.deepcopy with model_dump + model_validate in streaming iterator logging to handle Pydantic ValidatorIterator objects that cannot be pickled when tool_choice uses allowed_tools mode. Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>	2026-01-19 05:44:20 -08:00
Sameer Kankute	daf70f7221	Merge pull request #19329 from BerriAI/litellm_vector_store_sync Fix: vector store sync issues	2026-01-19 19:11:48 +05:30
Manuel Schweigert	29adf34313	Add ChatGPT subscription support and responses bridge (#19030 ) * Add ChatGPT subscription support and responses bridge * Fix typing import for responses bridge * Guard device code timestamp parsing * add /v1/messages endpoint to chatgpt model	2026-01-19 05:37:45 -08:00
Sameer Kankute	574391c118	Revert "Fix audio cost per second override (#19158 )" This reverts commit `2a0f87bde0`.	2026-01-19 18:51:08 +05:30
Jón Levy	5db0e3289a	fix(agentcore): simplify agentcore streaming (#17141 ) * fix(agentcore): simplify agentcore streaming * fix(agentcore): move CustomStreamWrapper import to module level The deferred imports inside streaming methods caused initialization delays during health check requests, leading to timeouts in ECS deployments. - Move CustomStreamWrapper import to module-level (line 19) - Remove deferred imports from get_sync_custom_stream_wrapper (line 588) - Remove deferred import from get_async_custom_stream_wrapper (line 747) - Remove from TYPE_CHECKING block to use actual import This ensures the import happens at module load time rather than during first request processing, preventing health check endpoint blocking. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com> * test(agentcore): ensure sync response * chore: upgrade boto3 to 1.40.76 in pyproject.toml * chore: added taplo.toml * fix(types): correct annotation type hint for MyPy compatibility Update _convert_annotations_to_chat_format return type from Dict[str, Any] to ChatCompletionAnnotation TypedDict to match the Message class's expected type signature. Co-Authored-By: Claude <noreply@anthropic.com> --------- Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: Benedikt Óskarsson <bensi94@hotmail.com>	2026-01-19 05:20:24 -08:00
Harshit Jain	6cd4b3603f	fix(router): prevent retrying 4xx client errors (#19275 )	2026-01-19 05:18:35 -08:00
Benedikt Óskarsson	f09cae2107	Merge branch 'main' into fix/bedrock-thinking-tool-call-2	2026-01-19 13:08:17 +00:00
Sameer Kankute	480fa13b1d	Merge pull request #19343 from BerriAI/litellm_anthropic_header_fix_19_jan Fix: anthropic-beta is getting overriden and set to anthropic-beta	2026-01-19 18:24:46 +05:30

1 2 3 4 5 ...

5569 Commits