litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-28 15:10:00 +00:00

Author	SHA1	Message	Date
Derek Duenas	bbaf0af907	Grayswan guardrail passthrough on flagged (#16891 ) * attempt to implement the passthrough feature * Formatting and small change * Fix formatting * Format test file --------- Co-authored-by: Xiaohan Fu <xiaohan@grayswan.ai>	2025-11-21 20:01:35 -08:00
Dima-Mediator	a0d4d0b304	Gemini models: capture image_tokens and support cost_per_output_image_token in costs calculations (#16912 )	2025-11-21 19:59:24 -08:00
Alexsander Hamir	6e70c279f8	[Fix] - Router's Cache: Fix routing for requests with same cacheable prefix but different user messages (#16951 ) * fix(router): use cacheable prefix for prompt caching cache keys Fix issue where requests with same cacheable prefix but different user messages were routing to different deployments, preventing cached token reuse. The cache key now correctly includes only the cacheable prefix (up to and including the last cache_control block) instead of the entire messages array. ## New Functions ### extract_cacheable_prefix() Static method that extracts the cacheable prefix from messages for prompt caching. The cacheable prefix is defined as everything UP TO AND INCLUDING the LAST content block (across all messages) that has cache_control with type "ephemeral". This includes ALL blocks before the last cacheable block (even if they don't have cache_control themselves). - Finds the last content block with cache_control across all messages - Returns all messages and content blocks up to and including that last cacheable block - Excludes everything after the last cacheable block (including user messages that come after) - Returns empty list if no cacheable blocks are found ## Changed Functions ### get_prompt_caching_cache_key() Modified to use the cacheable prefix instead of the full messages array when generating cache keys. This ensures that requests with the same cacheable prefix but different user messages generate the same cache key, enabling proper routing to the same deployment. - Now calls extract_cacheable_prefix() to get only cacheable content - Returns None if no cacheable prefix is found (can't generate key) - Cache key is now based on cacheable prefix only, not full messages ### async_get_model_id() Completely refactored to use the cacheable prefix directly instead of the previous workaround that checked progressively shorter message slices. The previous implementation was inefficient and unreliable. - Removed progressive message slicing logic (messages[:-1], messages[:-2], etc.) - Now uses single direct cache lookup with cacheable prefix-based key - More efficient (1 lookup instead of up to 4) - More reliable (uses correct cache key based on cacheable prefix) - Returns None if no cacheable prefix found ### add_model_id() Added None check for cache_key to prevent caching when no cacheable prefix is found. This ensures we don't attempt to cache when there's no meaningful cache key to use. - Added guard: returns early if cache_key is None - Prevents attempting to cache when no cacheable prefix exists ### async_add_model_id() Added None check for cache_key to prevent caching when no cacheable prefix is found. Matches the behavior of add_model_id() for consistency. - Added guard: returns early if cache_key is None - Prevents attempting to cache when no cacheable prefix exists ### get_model_id() Added None check for cache_key to handle cases where no cacheable prefix is found. Ensures consistent behavior across all cache methods. - Added guard: returns None if cache_key is None - Prevents calling get_cache() with None key ## Test ### test_router_prompt_caching_same_cacheable_prefix_routes_to_same_deployment() New end-to-end test that validates the fix. Tests that requests with the same cacheable prefix (system blocks with cache_control) but different user messages: 1. Generate the same cache key 2. Successfully perform cache lookup 3. Route to the same deployment This test reproduces the exact scenario from the user's bug report where three requests with different user messages should route to the same deployment but were previously routing to different ones. Fixes issue where cached tokens couldn't be reused because requests were routed to different providers due to different cache keys. * fix(router): use cast() for proper type handling in extract_cacheable_prefix Replace type annotation with type: ignore comment with proper cast() from typing module, matching the pattern used throughout the codebase for creating modified AllMessageValues dictionaries.	2025-11-21 19:13:40 -08:00
yuneng-jiang	b074c79734	Allow partial matches for user id in user table (#16952 )	2025-11-21 19:12:16 -08:00
yuneng-jiang	6881594632	[Fix] Exclude litellm_credential_name from Sensitive Data Masker (Updated) (#16958 ) * Exclude litellm_credential_name from sensitive masker * Adding missing file	2025-11-21 19:09:48 -08:00
Justin Tahara	703f619e08	feat(bedrock): Add Claude 4.5 to US Gov Cloud (#16957 ) * feat(bedrock): Add Claude 4.5 to US Gov Cloud * Adding west and tests	2025-11-21 19:06:26 -08:00
Mubashir Osmani	db58f6aeb1	fix: arize phoenix logging (#16301 ) * arize phx * fix arize integration * traces to specific project name * fix * look for http endpoint	2025-11-21 18:46:18 -08:00
yuneng-jiang	eb48d5cc42	Revert "Exclude litellm_credential_name from sensitive masker (#16950 )" (#16956 ) This reverts commit `5cfacb96e6`.	2025-11-21 18:09:54 -08:00
Ishaan Jaffer	1f36fad94b	TestDockerModelRunnerIntegration	2025-11-21 17:39:33 -08:00
Ishaan Jaffer	3296ffd3ca	test fixes	2025-11-21 17:38:20 -08:00
Ishaan Jaffer	8b8b31ecd8	fix img gen	2025-11-21 17:18:48 -08:00
Ishaan Jaffer	6439aed3ac	snowflake test fix	2025-11-21 17:12:55 -08:00
Ishaan Jaffer	e7a32c1e8f	docker test fixes	2025-11-21 16:52:58 -08:00
yuneng-jiang	5cfacb96e6	Exclude litellm_credential_name from sensitive masker (#16950 )	2025-11-21 16:40:17 -08:00
Ishaan Jaffer	69da15e65e	test_fal_ai_image_generation_basic	2025-11-21 16:23:41 -08:00
Ishaan Jaffer	4a9f163db1	TestPromptVersionsEndpoint	2025-11-21 16:21:13 -08:00
Ishaan Jaffer	4e8f1d0143	fix prompt manager	2025-11-21 16:17:44 -08:00
Ishaan Jaffer	2226450437	test_ensure_initialize_azure_sdk_client_always_used	2025-11-21 16:15:35 -08:00
yuneng-jiang	0abfb07ab8	Remove UI Session Token from user/info return (#16851 )	2025-11-21 16:11:58 -08:00
Ishaan Jaff	8e318dd06c	[Feat] New LLM Provider - Docker Model Runner (#16948 ) * add DOCKER_MODEL_RUNNER * add DockerModelRunnerChatConfig Transorm * add docker_model_runner * add docker_model_runner * docs docker model runner * add DockerModelRunnerChatConfig * add docker_model_runner to providers * test_completion_hits_correct_url_and_body * fix sidebar * TestDockerModelRunnerIntegration * test_completion_with_custom_engine_and_host * docs docker model runner * docs fix	2025-11-21 16:09:32 -08:00
Eiliya	d88580fa28	fix(gemini-video): inherit BaseVideoConfig to enable async content response (#16875 ) This fix addresses the same issue that was resolved for OpenAI video in PR #16708. The GeminiVideoConfig class was importing BaseVideoConfig only within TYPE_CHECKING, causing it to be 'Any' at runtime. This prevented the async_transform_video_content_response method from being available during video content downloads. Changes: - Moved BaseVideoConfig import from TYPE_CHECKING to top-level imports - Added test_gemini_video_config_has_async_transform() to verify the fix - Ensures GeminiVideoConfig properly inherits BaseVideoConfig at runtime Fixes video generation errors for Gemini Veo models: 'GeminiVideoConfig' object has no attribute 'async_transform_video_content_response'	2025-11-21 16:01:21 -08:00
Ishaan Jaff	ed6c3b4c86	[Bug Fix]: Search APIs - error in firecrawl-search "Invalid request body" (#16943 ) * add search_tool_name in litellm params * test_search_tool_name_in_all_litellm_params * bump config	2025-11-21 14:56:19 -08:00
Ishaan Jaff	6ae22908b7	[Feat] Prompt Versioning - Allow specifying prompt version in code (#16929 ) * add _get_prompt_data_from_dotprompt_content * fix pre call hook for prompt template * fix: get_latest_version_prompt_id * fix get_latest_version_prompt_id * test_get_latest_version_prompt_id	2025-11-21 13:58:49 -08:00
yuneng-jiang	4b25398afe	[Infra] CI/CD Fixes (#16937 ) * Attempt CI/CD Fix * Adding test for coverage * Adding max depth to copilot and vertex * Fixing mypy lint and docker database * Fixing UI build issues * Update playwright test	2025-11-21 13:58:19 -08:00
colinlin-stripe	f9d8eeaf8e	[stripe] gemini 3 thought signatures in tool call id (#16895 ) * though signature tool call id * [stripe] refactor and tests * [stripe] remove md and move to factory * [stripe] remove redudant test * [stripe] ran black formatting * [stripe] add thought signature docs * [stripe] remove unused import	2025-11-21 13:44:53 -08:00
Ishaan Jaff	97d9da93e0	[Feat] Prompt Management - Allow viewing version history (#16901 ) * TestPromptRequest * add prompts/test endpoint for testing prompt * TestPromptTestEndpoint * feat: working v1 of this ui * workig prompt endpoints * add chat ui for prompts * add conversation panel * add init chat ui * allow clicking edit prompt * fix use get_base_prompt_id * add endpoints for viewing prompt versions * TestPromptVersioning * add getPromptVersions * add VersionHistorySidePanel * allow viewing version history * add version history	2025-11-21 08:54:52 -08:00
Ishaan Jaff	41566722af	[Feat] UI - Prompt Management - Allow testing prompts with Chat UI (#16898 ) * TestPromptRequest * add prompts/test endpoint for testing prompt * TestPromptTestEndpoint * feat: working v1 of this ui * workig prompt endpoints * add chat ui for prompts * add conversation panel * add init chat ui	2025-11-21 08:53:18 -08:00
YutaSaito	041ac054b6	feat: allow custom violation message for tool-permission guardrail (#16916 )	2025-11-21 08:52:01 -08:00
yuneng-jiang	7225fc066f	Fix key model alias (#16896 )	2025-11-20 16:05:49 -08:00
YutaSaito	93affcb732	[Feat] mcp resources support (#16800 ) * feat: mcp prompts support * feat: mcp resources support	2025-11-20 14:53:44 -08:00
Ishaan Jaff	57544f1662	[Feat] Adds IAM role assumption support for AWS Secret Manager (#16887 ) * add AWS fields for KeyManagementSettings * docs IAM roles * use aws iam auth on secret manager v2 * fix: load_aws_secret_manager * test_secret_manager_with_iam_role_settings	2025-11-20 12:38:48 -08:00
Sameer Kankute	e5948770dd	Fix audio transcription cost tracking (#16478 )	2025-11-19 20:29:39 -08:00
Sameer Kankute	c3143e388e	Add thought signature support to v1/messages api (#16812 ) * Add thought signature support to v1/messages api * update the thinking level handling logic * update the thinking level handling logic * Add streaming support * fix intalling litellm error	2025-11-19 20:24:31 -08:00
Sebastian	cb843684b8	fix(vertex_ai): add includeThoughts=True for Gemini 3 reasoning_effort (#16838 ) Gemini 3 models require 'includeThoughts: True' in the thinkingConfig to return the actual thought text. Previously, using reasoning_effort set the 'thinkingLevel' but missed the boolean flag, resulting in empty reasoning_content. This fix: 1. Updates `_map_reasoning_effort_to_thinking_level` to include `includeThoughts: True` for low/medium/high. 2. Adds unit tests to verify the config mapping.	2025-11-19 19:14:42 -08:00
Alex Huang	3b6f3e48cb	Fix optional param mapping (#16852 ) * Direct string check instead of tuple string inclusion check * Add test	2025-11-19 19:10:04 -08:00
Cesar Garcia	7d5cb8ebb2	fix(gemini): Add reasoning_content to streaming responses with tools (#16854 ) Fixes #16805 When using Gemini models (2.5/3.0) with streaming + tools enabled, the reasoning_content field was missing from stream chunks, even though thinking_blocks were present in non-streaming responses. Changes: - Convert thinking_blocks to reasoning_content for streaming responses - Extract "thinking" field from each thinking_block - Concatenate multiple thinking parts with newlines - Assign to reasoning_content in chat_completion_message for streaming Testing: - Added test_streaming_chunk_with_tool_calls_includes_reasoning_content - Test verifies reasoning_content appears with tool calls in streaming - All 39 existing Gemini tests pass	2025-11-19 19:09:37 -08:00
Nigel Kukard	c5c563c302	fix: fixed openai conversion from responses to completions (#16864 ) - Fix blank function name in completions response when using native function calling - Fix Enum name being used instead of Enum value for comparison in chunk conversion - Added additional tests to cover changes Thanks to @mcowger for the invaluable assitance with figuring this issue out! Fixed #16863	2025-11-19 19:02:52 -08:00
Sameer Kankute	6fc7397dde	Add Vertex AI Image Edit Support (#16828 ) * Add vertex ai image edit support * Fix lint errors	2025-11-19 18:39:28 -08:00
idola9	e1005cb9d3	Use LiteLLM key alias as fallback Noma applicationId in NomaGuardrail (#16832 ) * Use auth key name if there are no app id in in headers or in extra_data * use key alias instead of key name * Fix * last priority key alias * Fix * Add tests	2025-11-19 17:44:23 -08:00
Sameer Kankute	9622829fa1	Fix vector store create issue (#16804 )	2025-11-19 16:53:20 -08:00
Ishaan Jaff	c7cf18cf67	[Feat] Prompt Management - Allow storing prompt version in DB (#16848 ) * test_dotprompt_auto_detection_with_model_only * fix _auto_detect_prompt_management_logger * test_dotprompt_with_prompt_version * add v1, v2 tests * add _compile_prompt_helper * fix _compile_prompt_helper * test_dotprompt_with_prompt_version * test_dotprompt_with_prompt_version, test_get_prompt_with_version * add version in schema * feat add _get_prompt_spec_for_db_prompt * add _get_prompt_spec_for_db_prompt * feat add _get_prompt_spec_for_db_prompt * update prompt table * add version in prompt DB * test_get_prompt_spec_for_db_prompt_with_versions	2025-11-19 13:19:56 -08:00
Naki	98dd866b26	feat(github-copilot): Add Responses API support for gpt-5.1-codex model (#16845 ) - Implement GithubCopilotResponsesAPIConfig for /responses endpoint - Add support for models requiring responses API (e.g., gpt-5.1-codex) - Auto-detect vision requests and set X-Initiator header - Follow OpenAI Responses API compatibility pattern - Add comprehensive unit tests (16 tests passing) Fixes #16820	2025-11-19 13:17:19 -08:00
Ishaan Jaff	3ebe489082	[Feat] Prompt Management - Add support for versioning prompts (#16836 ) * test_dotprompt_auto_detection_with_model_only * fix _auto_detect_prompt_management_logger * test_dotprompt_with_prompt_version * add v1, v2 tests * add _compile_prompt_helper * fix _compile_prompt_helper * test_dotprompt_with_prompt_version * test_dotprompt_with_prompt_version, test_get_prompt_with_version	2025-11-19 13:16:03 -08:00
Ishaan Jaff	1f8fe007a1	[Feat] Prompt Management - Allow specifying just prompt_id in a request to a model (#16834 ) * test_dotprompt_auto_detection_with_model_only * fix _auto_detect_prompt_management_logger * test_dotprompt_auto_detection_with_model_only	2025-11-19 10:20:58 -08:00
Alan Ponnachan	b92cc2b2f9	fix(bedrock): Ensure consistent chunk IDs in Bedrock streaming responses (#16596 ) * ensure consistent chunk IDs in streaming responses * use native conversationId for consistent stream chunk IDs	2025-11-18 20:37:21 -08:00
Sameer Kankute	34cc532d8d	Make sure that user inherits team permissions (#16639 )	2025-11-18 20:14:42 -08:00
Alexsander Hamir	2e007505da	fix(spend-logs): trim logged response strings (#16654 ) * fix(spend-logs): trim logged response strings - route spend-log responses through the existing string sanitizer so oversized base64/text fields are truncated before persistence - add unit tests covering the truncation path and the feature flag Note: embeddings-specific truncation (numeric vectors) is still pending and will be handled separately. * remove unnecessary comment * add: sanitization unit test for embeddings * fix: simplify sanatization logic I overcomplicated a simple change for lack of understanding, fixed.	2025-11-18 20:12:49 -08:00
Sameer Kankute	9e93d65ee2	Add extra_body support for response api params from chat completion (#16765 )	2025-11-18 20:05:47 -08:00
Cesar Garcia	5e70c78b94	fix(cost-tracking): support base_model lookup in litellm_metadata for Responses API (#16778 ) Cost tracking was failing for Responses API when using custom deployment names with base_model configuration. The issue occurred because: - Chat Completions API stores model_info in 'metadata' - Responses API stores model_info in 'litellm_metadata' - Cost calculator only checked 'metadata', missing Responses API costs Changes: - Updated _get_base_model_from_metadata() to check both metadata locations - Added comprehensive unit tests covering all scenarios - Maintains backward compatibility (metadata takes precedence) Fixes #16772	2025-11-18 19:53:18 -08:00
yuneng-jiang	fe05e33723	Fix e2e ui playwright test (#16799 )	2025-11-18 17:56:40 -08:00

1 2 3 4 5 ...

4222 Commits