Commit Graph

35578 Commits

Author SHA1 Message Date
Sameer Kankute 7660f39fdb fix(file_search): promote DB helper, suppress sub-call billing, add queries-plural test
- Promote _fetch_managed_vector_stores_by_uuids from @staticmethod to a module-level
  async helper get_managed_vector_store_rows_by_uuids, following the same standalone
  helper pattern as get_team_object / get_key_object so the hot-path DB read is a
  named importable function rather than an inline prisma_client.db.* call
- Pass no-log=True to both inner _call_aresponses sub-calls so they do not fire
  independent billing/monitoring callbacks; cost is accumulated in the synthesized
  response's _hidden_params for the outer responses() call
- Add test_H11b covering the primary queries (plural array) function-tool schema,
  complementing H11 which exercises only the backward-compat singular query path

Made-with: Cursor
2026-03-18 11:38:49 +05:30
Sameer Kankute 76176f2a64 fix(file_search): restore should_use_emulated helper, fix dedup, extract DB helper, clean docstring
- Re-add should_use_emulated_file_search() to emulated_handler.py so H5/H6/H7/H13 tests don't fail with ImportError
- Remove per-file-id deduplication from _build_search_results_for_include so all chunks are returned (matching OpenAI native file_search behaviour); update test_H14 to assert 2 results
- Extract raw prisma DB query in check_vector_store_ids_access into a static _fetch_managed_vector_stores_by_uuids helper so the hot request path uses a named, testable function instead of an inline prisma_client.db.* call
- Remove developer-local path from test module docstring

Made-with: Cursor
2026-03-18 11:26:27 +05:30
Sameer Kankute 1ff7c70011 fix(file_search): serialize first_response output items to dicts for follow-up input
Pydantic model instances (ResponseFunctionToolCall, etc.) from first_response.output
were included raw in follow_up_input; the transformation layer expects plain dicts and
called .get() on them, raising AttributeError. Serialize via model_dump(exclude_none=True).

Made-with: Cursor
2026-03-18 10:12:13 +05:30
Sameer Kankute dc7b7f852d fix(file_search): address greptile review — dead code, follow-up context, cost tracking
- Remove dead `should_use_emulated_file_search` (main.py uses its own inline guard)
- Remove dead `fallback_vector_store_ids` param from `_run_vector_searches`
- Include all first_response.output items in follow_up_input so text blocks/reasoning
  from providers like Anthropic aren't dropped from conversation context
- Accumulate first provider call's response_cost into synthesized _hidden_params so
  billing callbacks see the total cost of both emulated-flow LLM calls
- Remove broad tools=[] filter from transformation.py (backward-incompatible); the
  follow-up call already passes tools=None which is filtered by the v is not None guard

Made-with: Cursor
2026-03-18 10:10:29 +05:30
Sameer Kankute 8b7eac5dc9 Fix doc 2026-03-17 18:10:24 +05:30
Sameer Kankute 464ac7be12 Fix doc 2026-03-17 18:08:07 +05:30
Sameer Kankute 5692db8123 fix(file_search): address latest greptile feedback
Strip internal logging ids from emulated sub-calls, dedupe included search_results by file_id, clean unused imports, and add unit coverage for dedupe behavior.

Made-with: Cursor
2026-03-17 15:33:11 +05:30
Sameer Kankute 77a5093ce2 fix(file_search): preserve emulated response params and hidden metadata
Forward explicit responses() params on emulated file search calls and preserve hidden params on synthesized responses so callback billing/logging context is retained.

Made-with: Cursor
2026-03-17 15:20:56 +05:30
Sameer Kankute 729f7d48eb fix(file_search): address greptile review on follow-up calls and tests
Include all function_call items when building emulated follow-up input and update tests to assert real emulated routing + Responses-format function tool structure.

Made-with: Cursor
2026-03-17 15:10:46 +05:30
Sameer Kankute e22d9031e0 docs(response_api): move file_search details to dedicated tutorial
Replace inline file_search documentation in response_api.md with a canonical link and add the new tutorial to sidebars so users discover the usage-first guide.

Made-with: Cursor
2026-03-17 14:59:55 +05:30
Sameer Kankute 82c2dce6b9 docs(file_search): streamline guide with usage tabs, architecture, and Q&A
Replace duplicate path-by-path sections with a single usage-first doc format that includes SDK/Proxy tabs, an architecture diagram, and a focused Q&A section.

Made-with: Cursor
2026-03-17 14:54:53 +05:30
Sameer Kankute e6d5e3af02 fix(responses): avoid sending empty tools list in follow-up turns
Drop tools=[] from transformed chat-completion requests so providers like Anthropic return normal assistant text after tool_result turns.

Made-with: Cursor
2026-03-17 14:36:38 +05:30
Sameer Kankute 289f698a3c fix(responses): align emulated file_search output and multi-query behavior
Ensure non-OpenAI emulated file_search matches native Responses output by populating search_results (when requested), fixing TypedDict field access, and supporting multi-query searches from tool calls.

Made-with: Cursor
2026-03-17 14:36:31 +05:30
Sameer Kankute 1d6c55de50 docs: add e2e testing tutorial for file_search Responses API
Covers both paths:
- Native passthrough (OpenAI/Azure): create vector store, run via SDK and proxy
- Emulated fallback (Anthropic/any): register managed store, run via SDK and proxy

Includes output format validation script and troubleshooting section.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:45:08 +05:30
Sameer Kankute c735251570 feat(responses): file_search support — Phase 1 native passthrough + Phase 2 emulated fallback
Phase 1 (native passthrough):
- _decode_vector_store_ids_in_tools(): decode LiteLLM-managed unified
  vector_store_ids to provider-native IDs in file_search tools
- Split update_responses_tools_with_model_file_ids() into decode pass
  (always runs) + code_interpreter mapping pass (guarded)
- BaseResponsesAPIConfig.supports_native_file_search() → False by default;
  OpenAIResponsesAPIConfig overrides to True
- ManagedFiles.async_pre_call_hook(): batch team-level access check for
  unified vector_store_ids in file_search tools (no N+1)
- Docs: file_search section in response_api.md

Phase 2 (emulated fallback for non-native providers):
- litellm/responses/file_search/emulated_handler.py: converts file_search
  tool → function tool, intercepts tool call, runs asearch(), makes
  follow-up call, synthesizes OpenAI-format output (file_search_call +
  message + file_citation annotations)
- responses/main.py: routes to emulated handler when provider doesn't
  support file_search natively

Tests: 41 unit tests across 8 families (A-H) in test_file_search_responses.py

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
2026-03-17 11:41:44 +05:30
yuneng-jiang 278c9babc6 [Infra] Merging RC Branch with Main (#23786)
* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock

The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): route OpenAI models through chat completions in pass-through tests

The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): fix assemblyai custom auth and router wildcard test flakiness

1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
   user can access management endpoints like /key/generate. The test
   test_assemblyai_transcribe_with_non_admin_key was hidden behind an
   earlier -x failure and was never reached before.

2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
   to 2s for test_router_get_model_group_usage_wildcard_routes. The async
   callback needs time to write usage to cache, and 1s is insufficient on
   slower CI hardware.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retrigger CI pipeline

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic

Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)

* fix: don't close HTTP/SDK clients on LLMClientCache eviction

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.

* test: update LLMClientCache unit tests for no-close-on-eviction behavior

Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.

* test: add e2e tests for OpenAI SDK client surviving cache eviction

Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.

* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction

* docs(CLAUDE.md): add HTTP client cache safety guideline

* [Fix] Install bsdmainutils for column command in security scans

The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle string callback values in prometheus multiproc setup

When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
  TypeError: can only concatenate str (not "list") to str

Normalize each callback setting to a list before concatenating.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bump: version 1.82.2 → 1.82.3

* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing

When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.

Now checks whether the router_model_id entry actually has pricing before preferring it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 15:32:20 -07:00
Sameer Kankute 3dccdde9c8 Merge pull request #23686 from BerriAI/litellm_oss_staging_03_14_2026
Litellm oss staging 03 14 2026
2026-03-16 20:00:17 +05:30
Sameer Kankute 71dfd0115c Merge pull request #23737 from BerriAI/litellm_create-character-endpoint-fixes
[Feat] Add create character endpoints and other new videos Endpoints
2026-03-16 19:53:35 +05:30
Sameer Kankute 1a6eb016bf fix(critical): remove @abstractmethod from video character/edit/extension methods
Convert all 8 new video methods from @abstractmethod to concrete implementations
that raise NotImplementedError. This prevents breaking external third-party
BaseVideoConfig subclasses at import time.

Methods affected:
- transform_video_create_character_request/response
- transform_video_get_character_request/response
- transform_video_edit_request/response
- transform_video_extension_request/response

External integrators can now upgrade without instantiation errors; NotImplementedError
is only raised when operations are actually called on unsupported providers.

This restores backward compatibility with the project's policy.

Made-with: Cursor
2026-03-16 19:48:28 +05:30
Sameer Kankute ee24abe86e fix(test): skip new video character endpoints in Azure SDK initialization test
Add avideo_create_character, avideo_get_character, avideo_edit, and avideo_extension
to the skip condition since Azure video calls don't use initialize_azure_sdk_client.

Tests now properly skip with expected behavior instead of failing:
- test_ensure_initialize_azure_sdk_client_always_used[avideo_create_character] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_get_character] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_edit] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_extension] ✓

Made-with: Cursor
2026-03-16 19:45:57 +05:30
Sameer Kankute 1255382fb7 Fix docs 2026-03-16 19:39:22 +05:30
Sameer Kankute 32842a52bc Fix docs 2026-03-16 19:33:23 +05:30
Sameer Kankute c1179b835d docs: add edit/extension curl examples and managed ID explanation
- Add curl examples for avideo_edit and avideo_extension APIs
- Explain how LiteLLM encodes/decodes managed character IDs
- Show metadata included in character IDs (provider, model_id)
- Detail transparent router-first routing benefits

Made-with: Cursor
2026-03-16 19:27:15 +05:30
Sameer Kankute 48e0f59520 docs: add concise blog post on reusable video characters
- Clear examples for SDK and proxy usage
- Feature highlights: router support, encoding, error handling
- Best practices for character uploads and prompting
- Available from LiteLLM v1.83.0+
- Troubleshooting guide for common issues

Made-with: Cursor
2026-03-16 19:24:19 +05:30
Sameer Kankute 2ec4ce178c fix(routing): include avideo_create_character and avideo_get_character in router-first routing
Add avideo_create_character and avideo_get_character to the list of video endpoints
that use router-first routing when a model is provided (either from decoded IDs or
target_model_names).

Previously only avideo_edit and avideo_extension were in the router-first block.
This ensures both character endpoints benefit from multi-deployment load balancing
and model resolution, making them consistent with the other video operations.

This allows:
- avideo_create_character: Router picks among multiple deployments when target_model_names is set
- avideo_get_character: Router assists with multi-model environments for consistency

Made-with: Cursor
2026-03-16 19:21:18 +05:30
Sameer Kankute ddf62e0651 fix(critical): add HTTP error checks before parsing response bodies in video handlers
Add response.raise_for_status() before transform_*_response() calls in all eight
video character/edit/extension handler methods (sync and async):

- video_create_character_handler / async_video_create_character_handler
- video_get_character_handler / async_video_get_character_handler
- video_edit_handler / async_video_edit_handler
- video_extension_handler / async_video_extension_handler

Without these checks, httpx does not raise on 4xx/5xx responses, so provider
errors (e.g., 401 Unauthorized) pass directly to Pydantic model constructors,
causing ValidationError instead of meaningful HTTP errors. The raise_for_status()
ensures the exception handler receives proper HTTPStatusError for translation into
actionable messages.

Made-with: Cursor
2026-03-16 19:20:03 +05:30
Sameer Kankute 1ccf67dd93 fix(greptile-review): address backward compatibility and code quality issues
- Remove duplicate DecodedCharacterId TypedDict from litellm/types/videos/main.py
- Remove dead LITELLM_MANAGED_VIDEO_CHARACTER_COMPLETE_STR constant from litellm/types/utils.py
- Add FastAPI Form validation for name field in video_create_character endpoint

Made-with: Cursor
2026-03-16 19:17:06 +05:30
Sameer Kankute b796ee9f03 Merge pull request #23530 from Sameerlite/litellm_preserve-final-streaming-attributes
fix(streaming): preserve custom attributes on final stream chunk
2026-03-16 19:12:41 +05:30
Sameer Kankute 0bbdd2a249 Merge pull request #23715 from BerriAI/litellm_anthropic_beta_header_order
Refactor: Filtering beta header after transformation
2026-03-16 19:07:08 +05:30
Sameer Kankute 10d5475ce8 Merge pull request #23547 from Sameerlite/litellm_blog-webrtc
docs(blog): add WebRTC blog post link
2026-03-16 19:06:32 +05:30
Sameer Kankute ab377f396e Merge pull request #23718 from BerriAI/litellm_fix_vertex_ai_batch
Fix: Vertex ai Batch Output File Download Fails with 500
2026-03-16 19:05:49 +05:30
Sameer Kankute 9beec825d4 Merge branch 'main' into litellm_create-character-endpoint-fixes 2026-03-16 17:58:16 +05:30
Sameer Kankute 430f3ac429 Add new videos docs 2026-03-16 17:57:14 +05:30
Sameer Kankute 14a691ffd5 Add new videos transformation 2026-03-16 17:56:21 +05:30
Sameer Kankute 8dab5dec88 Add new videos endpoints routing and init 2026-03-16 17:54:35 +05:30
Sameer Kankute c33889200a Add new videos endpoints 2026-03-16 17:54:03 +05:30
Sameer Kankute 79c787b85d Add new videos endpoints 2026-03-16 17:53:54 +05:30
Sameer Kankute 94405b6218 fix(types): use direct FileTypes import in video schemas
Avoid the temporary Any alias and use a concrete FileTypes import compatible with type checks.

Made-with: Cursor
2026-03-16 16:13:11 +05:30
Sameer Kankute 4a7ef7b1d2 fix(video): enforce character endpoint video MIME handling
Use typed character response models and video multipart helpers so /videos/characters forwards uploaded MP4 files with video/* content type.

Made-with: Cursor
2026-03-16 16:12:07 +05:30
Sameer Kankute 61519d6c65 fix(video): decode managed character ids robustly
Support missing base64 padding in managed character/video IDs so copied encoded IDs still decode to the original upstream character ID.

Made-with: Cursor
2026-03-16 16:11:21 +05:30
yuneng-jiang 58e74a631c Merge pull request #23721 from BerriAI/litellm_ci_optimize
[Infra] Optimize CI Pipeline
2026-03-16 01:04:55 -07:00
yuneng-jiang 8f56ddb9c6 Merge remote main into litellm_ci_optimize
Resolved conflict in test_claude_agent_sdk.py by keeping main's additions.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 00:50:22 -07:00
yuneng-jiang 9cec81a087 [Fix] Revert proxy unit test groupings to prevent xdist state pollution
Part1 had 4 test files combined (was originally 2), causing cross-file
state pollution under xdist. Reverted to original grouping.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 00:48:56 -07:00
yuneng-jiang ccfe4b57d5 [Fix] Restore unconditional importlib.reload for llm_translation conftest
The xdist-conditional reload (manual reset in xdist mode) was missing
attributes that importlib.reload resets, causing Azure connection errors.
The original conftest used importlib.reload unconditionally (even under
xdist) and that worked on main. Restore that behavior.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 00:35:02 -07:00
yuneng-jiang 2372427dbc [Fix] Remove xdist from caching_unit_tests to fix GCS cache test failures
GCS cache tests (test_gcs_cache_unit_tests.py) rely on module-level state
(vertex_chat_completion singleton, credential caches) that importlib.reload
resets but the xdist-safe function-scoped fixture does not. Removing -n 4
from this job restores single-process execution where module reload properly
resets all state before each test, while CI-level parallelism (parallelism: 2)
still splits test files across nodes.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 00:23:04 -07:00
yuneng-jiang f434cdbdce [Fix] Remove flush_cache from llm_translation conftest to prevent connection churn
The old conftest never flushed HTTP client cache. Adding flush_cache() before
every test forces new TCP connections to external APIs, causing transient
connection failures under xdist parallelism. Global state isolation is already
handled by _SCALAR_DEFAULTS reset.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 00:20:40 -07:00
yuneng-jiang acfaea9d25 [Fix] Reset api_base/api_key in xdist conftest to prevent cross-test leakage
test_rerank.py sets litellm.api_base = "http://localhost:4000" which leaked
to all subsequent tests on the same xdist worker, causing connection failures
across every provider (Cohere, Azure, OpenAI, etc.).

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 23:55:44 -07:00
Sameer Kankute 22b333cae6 Fix downloading vertex ai files 2026-03-16 12:08:06 +05:30
yuneng-jiang 5db6aef834 [Fix] Restore xdist test isolation: capture true defaults and poll cooldowns
The revert of 9711e3adfe left xdist tests without proper state isolation.
Module-level assignments like `litellm.num_retries = 3` in 12+ test files
pollute shared globals, and the fixture was saving/restoring contaminated
values instead of resetting to true defaults.

- Capture true litellm defaults at conftest import time and reset before
  each test (local_testing + llm_translation)
- Make llm_translation/conftest.py xdist-safe (skip reload under xdist,
  add state isolation)
- Replace asyncio.sleep(2) with polling in cooldown handler tests

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 23:33:21 -07:00
Krish Dholakia cd37ee1459 fix: make db migration failure exit opt-in via --enforce_prisma_migration_check (#23675)
* fix: improve db migration failure messaging and fix pyright errors in proxy_cli

- Clarify --skip_db_migration_check messaging so users know how to opt
  into warn-and-continue behavior when database setup fails
- Fix pyright reportArgumentType error by casting get_secret result to str
- Fix pyright reportPossiblyUnboundVariable by initializing litellm_settings

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: replace --skip_db_migration_check with --enforce_prisma_migration_check

Flip the default behavior: database migration failures now warn and
continue by default. Only when --enforce_prisma_migration_check (or
ENFORCE_PRISMA_MIGRATION_CHECK=true) is explicitly set will the proxy
exit on migration failure.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-15 23:21:23 -07:00