- Promote _fetch_managed_vector_stores_by_uuids from @staticmethod to a module-level
async helper get_managed_vector_store_rows_by_uuids, following the same standalone
helper pattern as get_team_object / get_key_object so the hot-path DB read is a
named importable function rather than an inline prisma_client.db.* call
- Pass no-log=True to both inner _call_aresponses sub-calls so they do not fire
independent billing/monitoring callbacks; cost is accumulated in the synthesized
response's _hidden_params for the outer responses() call
- Add test_H11b covering the primary queries (plural array) function-tool schema,
complementing H11 which exercises only the backward-compat singular query path
Made-with: Cursor
- Re-add should_use_emulated_file_search() to emulated_handler.py so H5/H6/H7/H13 tests don't fail with ImportError
- Remove per-file-id deduplication from _build_search_results_for_include so all chunks are returned (matching OpenAI native file_search behaviour); update test_H14 to assert 2 results
- Extract raw prisma DB query in check_vector_store_ids_access into a static _fetch_managed_vector_stores_by_uuids helper so the hot request path uses a named, testable function instead of an inline prisma_client.db.* call
- Remove developer-local path from test module docstring
Made-with: Cursor
Pydantic model instances (ResponseFunctionToolCall, etc.) from first_response.output
were included raw in follow_up_input; the transformation layer expects plain dicts and
called .get() on them, raising AttributeError. Serialize via model_dump(exclude_none=True).
Made-with: Cursor
- Remove dead `should_use_emulated_file_search` (main.py uses its own inline guard)
- Remove dead `fallback_vector_store_ids` param from `_run_vector_searches`
- Include all first_response.output items in follow_up_input so text blocks/reasoning
from providers like Anthropic aren't dropped from conversation context
- Accumulate first provider call's response_cost into synthesized _hidden_params so
billing callbacks see the total cost of both emulated-flow LLM calls
- Remove broad tools=[] filter from transformation.py (backward-incompatible); the
follow-up call already passes tools=None which is filtered by the v is not None guard
Made-with: Cursor
Strip internal logging ids from emulated sub-calls, dedupe included search_results by file_id, clean unused imports, and add unit coverage for dedupe behavior.
Made-with: Cursor
Forward explicit responses() params on emulated file search calls and preserve hidden params on synthesized responses so callback billing/logging context is retained.
Made-with: Cursor
Include all function_call items when building emulated follow-up input and update tests to assert real emulated routing + Responses-format function tool structure.
Made-with: Cursor
Replace inline file_search documentation in response_api.md with a canonical link and add the new tutorial to sidebars so users discover the usage-first guide.
Made-with: Cursor
Replace duplicate path-by-path sections with a single usage-first doc format that includes SDK/Proxy tabs, an architecture diagram, and a focused Q&A section.
Made-with: Cursor
Drop tools=[] from transformed chat-completion requests so providers like Anthropic return normal assistant text after tool_result turns.
Made-with: Cursor
Covers both paths:
- Native passthrough (OpenAI/Azure): create vector store, run via SDK and proxy
- Emulated fallback (Anthropic/any): register managed store, run via SDK and proxy
Includes output format validation script and troubleshooting section.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock
The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): route OpenAI models through chat completions in pass-through tests
The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): fix assemblyai custom auth and router wildcard test flakiness
1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
user can access management endpoints like /key/generate. The test
test_assemblyai_transcribe_with_non_admin_key was hidden behind an
earlier -x failure and was never reached before.
2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
to 2s for test_router_get_model_group_usage_wildcard_routes. The async
callback needs time to write usage to cache, and 1s is insufficient on
slower CI hardware.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* ci: retrigger CI pipeline
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic
Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)
* fix: don't close HTTP/SDK clients on LLMClientCache eviction
Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:
RuntimeError: Cannot send a request, as the client has been closed.
This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().
Fixes production crashes after the 1-hour cache TTL expires.
* test: update LLMClientCache unit tests for no-close-on-eviction behavior
Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.
Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.
* test: add e2e tests for OpenAI SDK client surviving cache eviction
Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
eviction doesn't close the client
Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.
Also expand the module docstring to explain why the sleep is required.
* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction
* docs(CLAUDE.md): add HTTP client cache safety guideline
* [Fix] Install bsdmainutils for column command in security scans
The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle string callback values in prometheus multiproc setup
When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
TypeError: can only concatenate str (not "list") to str
Normalize each callback setting to a list before concatenating.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* bump: version 1.82.2 → 1.82.3
* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement
The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing
When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.
Now checks whether the router_model_id entry actually has pricing before preferring it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Convert all 8 new video methods from @abstractmethod to concrete implementations
that raise NotImplementedError. This prevents breaking external third-party
BaseVideoConfig subclasses at import time.
Methods affected:
- transform_video_create_character_request/response
- transform_video_get_character_request/response
- transform_video_edit_request/response
- transform_video_extension_request/response
External integrators can now upgrade without instantiation errors; NotImplementedError
is only raised when operations are actually called on unsupported providers.
This restores backward compatibility with the project's policy.
Made-with: Cursor
Add avideo_create_character, avideo_get_character, avideo_edit, and avideo_extension
to the skip condition since Azure video calls don't use initialize_azure_sdk_client.
Tests now properly skip with expected behavior instead of failing:
- test_ensure_initialize_azure_sdk_client_always_used[avideo_create_character] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_get_character] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_edit] ✓
- test_ensure_initialize_azure_sdk_client_always_used[avideo_extension] ✓
Made-with: Cursor
- Add curl examples for avideo_edit and avideo_extension APIs
- Explain how LiteLLM encodes/decodes managed character IDs
- Show metadata included in character IDs (provider, model_id)
- Detail transparent router-first routing benefits
Made-with: Cursor
- Clear examples for SDK and proxy usage
- Feature highlights: router support, encoding, error handling
- Best practices for character uploads and prompting
- Available from LiteLLM v1.83.0+
- Troubleshooting guide for common issues
Made-with: Cursor
Add avideo_create_character and avideo_get_character to the list of video endpoints
that use router-first routing when a model is provided (either from decoded IDs or
target_model_names).
Previously only avideo_edit and avideo_extension were in the router-first block.
This ensures both character endpoints benefit from multi-deployment load balancing
and model resolution, making them consistent with the other video operations.
This allows:
- avideo_create_character: Router picks among multiple deployments when target_model_names is set
- avideo_get_character: Router assists with multi-model environments for consistency
Made-with: Cursor
Add response.raise_for_status() before transform_*_response() calls in all eight
video character/edit/extension handler methods (sync and async):
- video_create_character_handler / async_video_create_character_handler
- video_get_character_handler / async_video_get_character_handler
- video_edit_handler / async_video_edit_handler
- video_extension_handler / async_video_extension_handler
Without these checks, httpx does not raise on 4xx/5xx responses, so provider
errors (e.g., 401 Unauthorized) pass directly to Pydantic model constructors,
causing ValidationError instead of meaningful HTTP errors. The raise_for_status()
ensures the exception handler receives proper HTTPStatusError for translation into
actionable messages.
Made-with: Cursor
- Remove duplicate DecodedCharacterId TypedDict from litellm/types/videos/main.py
- Remove dead LITELLM_MANAGED_VIDEO_CHARACTER_COMPLETE_STR constant from litellm/types/utils.py
- Add FastAPI Form validation for name field in video_create_character endpoint
Made-with: Cursor
Use typed character response models and video multipart helpers so /videos/characters forwards uploaded MP4 files with video/* content type.
Made-with: Cursor
Support missing base64 padding in managed character/video IDs so copied encoded IDs still decode to the original upstream character ID.
Made-with: Cursor
Part1 had 4 test files combined (was originally 2), causing cross-file
state pollution under xdist. Reverted to original grouping.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The xdist-conditional reload (manual reset in xdist mode) was missing
attributes that importlib.reload resets, causing Azure connection errors.
The original conftest used importlib.reload unconditionally (even under
xdist) and that worked on main. Restore that behavior.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
GCS cache tests (test_gcs_cache_unit_tests.py) rely on module-level state
(vertex_chat_completion singleton, credential caches) that importlib.reload
resets but the xdist-safe function-scoped fixture does not. Removing -n 4
from this job restores single-process execution where module reload properly
resets all state before each test, while CI-level parallelism (parallelism: 2)
still splits test files across nodes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The old conftest never flushed HTTP client cache. Adding flush_cache() before
every test forces new TCP connections to external APIs, causing transient
connection failures under xdist parallelism. Global state isolation is already
handled by _SCALAR_DEFAULTS reset.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
test_rerank.py sets litellm.api_base = "http://localhost:4000" which leaked
to all subsequent tests on the same xdist worker, causing connection failures
across every provider (Cohere, Azure, OpenAI, etc.).
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The revert of 9711e3adfe left xdist tests without proper state isolation.
Module-level assignments like `litellm.num_retries = 3` in 12+ test files
pollute shared globals, and the fixture was saving/restoring contaminated
values instead of resetting to true defaults.
- Capture true litellm defaults at conftest import time and reset before
each test (local_testing + llm_translation)
- Make llm_translation/conftest.py xdist-safe (skip reload under xdist,
add state isolation)
- Replace asyncio.sleep(2) with polling in cooldown handler tests
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: improve db migration failure messaging and fix pyright errors in proxy_cli
- Clarify --skip_db_migration_check messaging so users know how to opt
into warn-and-continue behavior when database setup fails
- Fix pyright reportArgumentType error by casting get_secret result to str
- Fix pyright reportPossiblyUnboundVariable by initializing litellm_settings
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: replace --skip_db_migration_check with --enforce_prisma_migration_check
Flip the default behavior: database migration failures now warn and
continue by default. Only when --enforce_prisma_migration_check (or
ENFORCE_PRISMA_MIGRATION_CHECK=true) is explicitly set will the proxy
exit on migration failure.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>