litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-28 17:08:40 +00:00

Author	SHA1	Message	Date
Sameer Kankute	7660f39fdb	fix(file_search): promote DB helper, suppress sub-call billing, add queries-plural test - Promote _fetch_managed_vector_stores_by_uuids from @staticmethod to a module-level async helper get_managed_vector_store_rows_by_uuids, following the same standalone helper pattern as get_team_object / get_key_object so the hot-path DB read is a named importable function rather than an inline prisma_client.db.* call - Pass no-log=True to both inner _call_aresponses sub-calls so they do not fire independent billing/monitoring callbacks; cost is accumulated in the synthesized response's _hidden_params for the outer responses() call - Add test_H11b covering the primary queries (plural array) function-tool schema, complementing H11 which exercises only the backward-compat singular query path Made-with: Cursor	2026-03-18 11:38:49 +05:30
Sameer Kankute	76176f2a64	fix(file_search): restore should_use_emulated helper, fix dedup, extract DB helper, clean docstring - Re-add should_use_emulated_file_search() to emulated_handler.py so H5/H6/H7/H13 tests don't fail with ImportError - Remove per-file-id deduplication from _build_search_results_for_include so all chunks are returned (matching OpenAI native file_search behaviour); update test_H14 to assert 2 results - Extract raw prisma DB query in check_vector_store_ids_access into a static _fetch_managed_vector_stores_by_uuids helper so the hot request path uses a named, testable function instead of an inline prisma_client.db.* call - Remove developer-local path from test module docstring Made-with: Cursor	2026-03-18 11:26:27 +05:30
Sameer Kankute	1ff7c70011	fix(file_search): serialize first_response output items to dicts for follow-up input Pydantic model instances (ResponseFunctionToolCall, etc.) from first_response.output were included raw in follow_up_input; the transformation layer expects plain dicts and called .get() on them, raising AttributeError. Serialize via model_dump(exclude_none=True). Made-with: Cursor	2026-03-18 10:12:13 +05:30
Sameer Kankute	dc7b7f852d	fix(file_search): address greptile review — dead code, follow-up context, cost tracking - Remove dead `should_use_emulated_file_search` (main.py uses its own inline guard) - Remove dead `fallback_vector_store_ids` param from `_run_vector_searches` - Include all first_response.output items in follow_up_input so text blocks/reasoning from providers like Anthropic aren't dropped from conversation context - Accumulate first provider call's response_cost into synthesized _hidden_params so billing callbacks see the total cost of both emulated-flow LLM calls - Remove broad tools=[] filter from transformation.py (backward-incompatible); the follow-up call already passes tools=None which is filtered by the v is not None guard Made-with: Cursor	2026-03-18 10:10:29 +05:30
Sameer Kankute	8b7eac5dc9	Fix doc	2026-03-17 18:10:24 +05:30
Sameer Kankute	464ac7be12	Fix doc	2026-03-17 18:08:07 +05:30
Sameer Kankute	5692db8123	fix(file_search): address latest greptile feedback Strip internal logging ids from emulated sub-calls, dedupe included search_results by file_id, clean unused imports, and add unit coverage for dedupe behavior. Made-with: Cursor	2026-03-17 15:33:11 +05:30
Sameer Kankute	77a5093ce2	fix(file_search): preserve emulated response params and hidden metadata Forward explicit responses() params on emulated file search calls and preserve hidden params on synthesized responses so callback billing/logging context is retained. Made-with: Cursor	2026-03-17 15:20:56 +05:30
Sameer Kankute	729f7d48eb	fix(file_search): address greptile review on follow-up calls and tests Include all function_call items when building emulated follow-up input and update tests to assert real emulated routing + Responses-format function tool structure. Made-with: Cursor	2026-03-17 15:10:46 +05:30
Sameer Kankute	e22d9031e0	docs(response_api): move file_search details to dedicated tutorial Replace inline file_search documentation in response_api.md with a canonical link and add the new tutorial to sidebars so users discover the usage-first guide. Made-with: Cursor	2026-03-17 14:59:55 +05:30
Sameer Kankute	82c2dce6b9	docs(file_search): streamline guide with usage tabs, architecture, and Q&A Replace duplicate path-by-path sections with a single usage-first doc format that includes SDK/Proxy tabs, an architecture diagram, and a focused Q&A section. Made-with: Cursor	2026-03-17 14:54:53 +05:30
Sameer Kankute	e6d5e3af02	fix(responses): avoid sending empty tools list in follow-up turns Drop tools=[] from transformed chat-completion requests so providers like Anthropic return normal assistant text after tool_result turns. Made-with: Cursor	2026-03-17 14:36:38 +05:30
Sameer Kankute	289f698a3c	fix(responses): align emulated file_search output and multi-query behavior Ensure non-OpenAI emulated file_search matches native Responses output by populating search_results (when requested), fixing TypedDict field access, and supporting multi-query searches from tool calls. Made-with: Cursor	2026-03-17 14:36:31 +05:30
Sameer Kankute	1d6c55de50	docs: add e2e testing tutorial for file_search Responses API Covers both paths: - Native passthrough (OpenAI/Azure): create vector store, run via SDK and proxy - Emulated fallback (Anthropic/any): register managed store, run via SDK and proxy Includes output format validation script and troubleshooting section. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 11:45:08 +05:30
Sameer Kankute	c735251570	feat(responses): file_search support — Phase 1 native passthrough + Phase 2 emulated fallback Phase 1 (native passthrough): - _decode_vector_store_ids_in_tools(): decode LiteLLM-managed unified vector_store_ids to provider-native IDs in file_search tools - Split update_responses_tools_with_model_file_ids() into decode pass (always runs) + code_interpreter mapping pass (guarded) - BaseResponsesAPIConfig.supports_native_file_search() → False by default; OpenAIResponsesAPIConfig overrides to True - ManagedFiles.async_pre_call_hook(): batch team-level access check for unified vector_store_ids in file_search tools (no N+1) - Docs: file_search section in response_api.md Phase 2 (emulated fallback for non-native providers): - litellm/responses/file_search/emulated_handler.py: converts file_search tool → function tool, intercepts tool call, runs asearch(), makes follow-up call, synthesizes OpenAI-format output (file_search_call + message + file_citation annotations) - responses/main.py: routes to emulated handler when provider doesn't support file_search natively Tests: 41 unit tests across 8 families (A-H) in test_file_search_responses.py Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>	2026-03-17 11:41:44 +05:30
yuneng-jiang	278c9babc6	[Infra] Merging RC Branch with Main (#23786 ) * fix(test): add missing mocks for test_streamable_http_mcp_handler_mock The test was missing mocks for extract_mcp_auth_context and set_auth_context, causing the handler to fail silently in the except block instead of reaching session_manager.handle_request. This mirrors the fix already applied to the sibling test_sse_mcp_handler_mock. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ci): route OpenAI models through chat completions in pass-through tests The test_anthropic_messages_openai_model_streaming_cost_injection test fails because the OpenAI Responses API returns 400 for requests routed through the Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true routes OpenAI models through the stable chat completions path instead. Cost injection still works since it happens at the proxy level. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(ci): fix assemblyai custom auth and router wildcard test flakiness 1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth user can access management endpoints like /key/generate. The test test_assemblyai_transcribe_with_non_admin_key was hidden behind an earlier -x failure and was never reached before. 2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s to 2s for test_router_get_model_group_usage_wildcard_routes. The async callback needs time to write usage to cache, and 1s is insufficient on slower CI hardware. Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * ci: retrigger CI pipeline Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles \| None' Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926) * fix: don't close HTTP/SDK clients on LLMClientCache eviction Removing the _remove_key override that eagerly called aclose()/close() on evicted clients. Evicted clients may still be held by in-flight streaming requests; closing them causes: RuntimeError: Cannot send a request, as the client has been closed. This is a regression from commit `fb72979432`. Clients that are no longer referenced will be garbage-collected naturally. Explicit shutdown cleanup happens via close_litellm_async_clients(). Fixes production crashes after the 1-hour cache TTL expires. * test: update LLMClientCache unit tests for no-close-on-eviction behavior Flip the assertions: evicted clients must NOT be closed. Replace test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client and equivalents for sync/eviction paths. Add test_remove_key_removes_plain_values for non-client cache entries. Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks). Remove test_remove_key_no_event_loop variant that depended on old behavior. * test: add e2e tests for OpenAI SDK client surviving cache eviction Add two new e2e tests using real AsyncOpenAI clients: - test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction doesn't close the client - test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry eviction doesn't close the client Both tests sleep after eviction so any create_task()-based close would have time to run, making the regression detectable. Also expand the module docstring to explain why the sleep is required. * docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction * docs(CLAUDE.md): add HTTP client cache safety guideline * [Fix] Install bsdmainutils for column command in security scans The security_scans.sh script uses `column` to format vulnerability output, but the package wasn't installed in the CI environment. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: handle string callback values in prometheus multiproc setup When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`) instead of a list, the proxy crashes on startup with: TypeError: can only concatenate str (not "list") to str Normalize each callback setting to a list before concatenating. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * bump: version 1.82.2 → 1.82.3 * fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement The --enforce_prisma_migration_check flag is now required to trigger sys.exit(1) on DB migration failure, after #23675 flipped the default behavior to warn-and-continue. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token), completion() registers pricing under the model name, but _select_model_name_for_cost_calc was selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0. Now checks whether the router_model_id entry actually has pricing before preferring it. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 15:32:20 -07:00
Sameer Kankute	3dccdde9c8	Merge pull request #23686 from BerriAI/litellm_oss_staging_03_14_2026 Litellm oss staging 03 14 2026	2026-03-16 20:00:17 +05:30
Sameer Kankute	71dfd0115c	Merge pull request #23737 from BerriAI/litellm_create-character-endpoint-fixes [Feat] Add create character endpoints and other new videos Endpoints	2026-03-16 19:53:35 +05:30
Sameer Kankute	1a6eb016bf	fix(critical): remove @abstractmethod from video character/edit/extension methods Convert all 8 new video methods from @abstractmethod to concrete implementations that raise NotImplementedError. This prevents breaking external third-party BaseVideoConfig subclasses at import time. Methods affected: - transform_video_create_character_request/response - transform_video_get_character_request/response - transform_video_edit_request/response - transform_video_extension_request/response External integrators can now upgrade without instantiation errors; NotImplementedError is only raised when operations are actually called on unsupported providers. This restores backward compatibility with the project's policy. Made-with: Cursor	2026-03-16 19:48:28 +05:30
Sameer Kankute	ee24abe86e	fix(test): skip new video character endpoints in Azure SDK initialization test Add avideo_create_character, avideo_get_character, avideo_edit, and avideo_extension to the skip condition since Azure video calls don't use initialize_azure_sdk_client. Tests now properly skip with expected behavior instead of failing: - test_ensure_initialize_azure_sdk_client_always_used[avideo_create_character] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_get_character] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_edit] ✓ - test_ensure_initialize_azure_sdk_client_always_used[avideo_extension] ✓ Made-with: Cursor	2026-03-16 19:45:57 +05:30
Sameer Kankute	1255382fb7	Fix docs	2026-03-16 19:39:22 +05:30
Sameer Kankute	32842a52bc	Fix docs	2026-03-16 19:33:23 +05:30
Sameer Kankute	c1179b835d	docs: add edit/extension curl examples and managed ID explanation - Add curl examples for avideo_edit and avideo_extension APIs - Explain how LiteLLM encodes/decodes managed character IDs - Show metadata included in character IDs (provider, model_id) - Detail transparent router-first routing benefits Made-with: Cursor	2026-03-16 19:27:15 +05:30
Sameer Kankute	48e0f59520	docs: add concise blog post on reusable video characters - Clear examples for SDK and proxy usage - Feature highlights: router support, encoding, error handling - Best practices for character uploads and prompting - Available from LiteLLM v1.83.0+ - Troubleshooting guide for common issues Made-with: Cursor	2026-03-16 19:24:19 +05:30
Sameer Kankute	2ec4ce178c	fix(routing): include avideo_create_character and avideo_get_character in router-first routing Add avideo_create_character and avideo_get_character to the list of video endpoints that use router-first routing when a model is provided (either from decoded IDs or target_model_names). Previously only avideo_edit and avideo_extension were in the router-first block. This ensures both character endpoints benefit from multi-deployment load balancing and model resolution, making them consistent with the other video operations. This allows: - avideo_create_character: Router picks among multiple deployments when target_model_names is set - avideo_get_character: Router assists with multi-model environments for consistency Made-with: Cursor	2026-03-16 19:21:18 +05:30
Sameer Kankute	ddf62e0651	fix(critical): add HTTP error checks before parsing response bodies in video handlers Add response.raise_for_status() before transform_*_response() calls in all eight video character/edit/extension handler methods (sync and async): - video_create_character_handler / async_video_create_character_handler - video_get_character_handler / async_video_get_character_handler - video_edit_handler / async_video_edit_handler - video_extension_handler / async_video_extension_handler Without these checks, httpx does not raise on 4xx/5xx responses, so provider errors (e.g., 401 Unauthorized) pass directly to Pydantic model constructors, causing ValidationError instead of meaningful HTTP errors. The raise_for_status() ensures the exception handler receives proper HTTPStatusError for translation into actionable messages. Made-with: Cursor	2026-03-16 19:20:03 +05:30
Sameer Kankute	1ccf67dd93	fix(greptile-review): address backward compatibility and code quality issues - Remove duplicate DecodedCharacterId TypedDict from litellm/types/videos/main.py - Remove dead LITELLM_MANAGED_VIDEO_CHARACTER_COMPLETE_STR constant from litellm/types/utils.py - Add FastAPI Form validation for name field in video_create_character endpoint Made-with: Cursor	2026-03-16 19:17:06 +05:30
Sameer Kankute	b796ee9f03	Merge pull request #23530 from Sameerlite/litellm_preserve-final-streaming-attributes fix(streaming): preserve custom attributes on final stream chunk	2026-03-16 19:12:41 +05:30
Sameer Kankute	0bbdd2a249	Merge pull request #23715 from BerriAI/litellm_anthropic_beta_header_order Refactor: Filtering beta header after transformation	2026-03-16 19:07:08 +05:30
Sameer Kankute	10d5475ce8	Merge pull request #23547 from Sameerlite/litellm_blog-webrtc docs(blog): add WebRTC blog post link	2026-03-16 19:06:32 +05:30
Sameer Kankute	ab377f396e	Merge pull request #23718 from BerriAI/litellm_fix_vertex_ai_batch Fix: Vertex ai Batch Output File Download Fails with 500	2026-03-16 19:05:49 +05:30
Sameer Kankute	9beec825d4	Merge branch 'main' into litellm_create-character-endpoint-fixes	2026-03-16 17:58:16 +05:30
Sameer Kankute	430f3ac429	Add new videos docs	2026-03-16 17:57:14 +05:30
Sameer Kankute	14a691ffd5	Add new videos transformation	2026-03-16 17:56:21 +05:30
Sameer Kankute	8dab5dec88	Add new videos endpoints routing and init	2026-03-16 17:54:35 +05:30
Sameer Kankute	c33889200a	Add new videos endpoints	2026-03-16 17:54:03 +05:30
Sameer Kankute	79c787b85d	Add new videos endpoints	2026-03-16 17:53:54 +05:30
Sameer Kankute	94405b6218	fix(types): use direct FileTypes import in video schemas Avoid the temporary Any alias and use a concrete FileTypes import compatible with type checks. Made-with: Cursor	2026-03-16 16:13:11 +05:30
Sameer Kankute	4a7ef7b1d2	fix(video): enforce character endpoint video MIME handling Use typed character response models and video multipart helpers so /videos/characters forwards uploaded MP4 files with video/* content type. Made-with: Cursor	2026-03-16 16:12:07 +05:30
Sameer Kankute	61519d6c65	fix(video): decode managed character ids robustly Support missing base64 padding in managed character/video IDs so copied encoded IDs still decode to the original upstream character ID. Made-with: Cursor	2026-03-16 16:11:21 +05:30
yuneng-jiang	58e74a631c	Merge pull request #23721 from BerriAI/litellm_ci_optimize [Infra] Optimize CI Pipeline	2026-03-16 01:04:55 -07:00
yuneng-jiang	8f56ddb9c6	Merge remote main into litellm_ci_optimize Resolved conflict in test_claude_agent_sdk.py by keeping main's additions. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 00:50:22 -07:00
yuneng-jiang	9cec81a087	[Fix] Revert proxy unit test groupings to prevent xdist state pollution Part1 had 4 test files combined (was originally 2), causing cross-file state pollution under xdist. Reverted to original grouping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 00:48:56 -07:00
yuneng-jiang	ccfe4b57d5	[Fix] Restore unconditional importlib.reload for llm_translation conftest The xdist-conditional reload (manual reset in xdist mode) was missing attributes that importlib.reload resets, causing Azure connection errors. The original conftest used importlib.reload unconditionally (even under xdist) and that worked on main. Restore that behavior. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 00:35:02 -07:00
yuneng-jiang	2372427dbc	[Fix] Remove xdist from caching_unit_tests to fix GCS cache test failures GCS cache tests (test_gcs_cache_unit_tests.py) rely on module-level state (vertex_chat_completion singleton, credential caches) that importlib.reload resets but the xdist-safe function-scoped fixture does not. Removing -n 4 from this job restores single-process execution where module reload properly resets all state before each test, while CI-level parallelism (parallelism: 2) still splits test files across nodes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 00:23:04 -07:00
yuneng-jiang	f434cdbdce	[Fix] Remove flush_cache from llm_translation conftest to prevent connection churn The old conftest never flushed HTTP client cache. Adding flush_cache() before every test forces new TCP connections to external APIs, causing transient connection failures under xdist parallelism. Global state isolation is already handled by _SCALAR_DEFAULTS reset. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-16 00:20:40 -07:00
yuneng-jiang	acfaea9d25	[Fix] Reset api_base/api_key in xdist conftest to prevent cross-test leakage test_rerank.py sets litellm.api_base = "http://localhost:4000" which leaked to all subsequent tests on the same xdist worker, causing connection failures across every provider (Cohere, Azure, OpenAI, etc.). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 23:55:44 -07:00
Sameer Kankute	22b333cae6	Fix downloading vertex ai files	2026-03-16 12:08:06 +05:30
yuneng-jiang	5db6aef834	[Fix] Restore xdist test isolation: capture true defaults and poll cooldowns The revert of `9711e3adfe` left xdist tests without proper state isolation. Module-level assignments like `litellm.num_retries = 3` in 12+ test files pollute shared globals, and the fixture was saving/restoring contaminated values instead of resetting to true defaults. - Capture true litellm defaults at conftest import time and reset before each test (local_testing + llm_translation) - Make llm_translation/conftest.py xdist-safe (skip reload under xdist, add state isolation) - Replace asyncio.sleep(2) with polling in cooldown handler tests Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 23:33:21 -07:00
Krish Dholakia	cd37ee1459	fix: make db migration failure exit opt-in via --enforce_prisma_migration_check (#23675 ) * fix: improve db migration failure messaging and fix pyright errors in proxy_cli - Clarify --skip_db_migration_check messaging so users know how to opt into warn-and-continue behavior when database setup fails - Fix pyright reportArgumentType error by casting get_secret result to str - Fix pyright reportPossiblyUnboundVariable by initializing litellm_settings Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: replace --skip_db_migration_check with --enforce_prisma_migration_check Flip the default behavior: database migration failures now warn and continue by default. Only when --enforce_prisma_migration_check (or ENFORCE_PRISMA_MIGRATION_CHECK=true) is explicitly set will the proxy exit on migration failure. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>	2026-03-15 23:21:23 -07:00

1 2 3 4 5 ...

35578 Commits