* feat(router): integrate allowed_fails_policy into health check failures (#24988)
* feat(router): integrate allowed_fails_policy into health check failures
Health check failures now increment the same per-deployment failure
counters used by allowed_fails_policy, so users can control how many
health check failures of each error type are required before a
deployment enters cooldown.
- ahealth_check() preserves the original exception in its return dict
- run_with_timeout() returns a litellm.Timeout on health check timeout
- _perform_health_check() propagates exceptions to unhealthy endpoints
- _write_health_state_to_router_cache() calls _set_cooldown_deployments
for each unhealthy endpoint that has an exception
- When allowed_fails_policy is set, the binary health check filter is
bypassed so cooldown is the sole routing exclusion mechanism
- Safety net: if all deployments are in cooldown with
enable_health_check_routing=True, the cooldown filter is bypassed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(router): add health_check_ignore_transient_errors flag
When enabled, health check failures with 429 (rate limit) or 408 (timeout)
status codes are skipped from the cooldown pipeline. These are transient
load issues, not broken deployments. Auth errors (401), 404, and 5xx errors
still increment counters and trigger cooldown as before.
Config (general_settings):
health_check_ignore_transient_errors: true
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(router): also exclude 429/408 from health state cache when ignore_transient_errors set
The previous fix only skipped cooldown counter increments. The health state
cache was still marking 429/408 endpoints as is_healthy=False, causing the
binary health check filter to exclude them from routing.
Now, when health_check_ignore_transient_errors=True, 429/408 endpoints are
also excluded from the unhealthy list passed to build_deployment_health_states(),
so the binary filter treats them as unaffected (not unhealthy).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(router): add health check driven routing guide
New standalone page covering the full health check routing feature:
allowed_fails_policy integration, health_check_ignore_transient_errors,
architecture SVG, step-by-step setup, and gotchas (TTL, AllowedFails semantics).
Replaces the inline section in health.md with a link to the new page.
Added to the Routing & Load Balancing sidebar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(health-check-routing): fix three CI failures
- Add "exception" to ILLEGAL_DISPLAY_PARAMS in health_check.py so the
exception object is stripped before the health endpoint serializes
results to JSON (fixes TypeError: 'URL' object is not iterable)
- Add allowed_fails_policy = None to FakeRouter stubs in
test_router_health_check_routing.py (fixes AttributeError)
- Add health_check_ignore_transient_errors to config_settings.md router
settings reference table (fixes documentation test)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix litellm/tests/proxy_unit_tests/test_proxy_server.py
* fix(router): address greptile review comments
- Narrow cooldown safety-net bypass: only fires when allowed_fails_policy
is set (cooldown is health-check driven). Without a policy, cooldowns
are from real request failures and must not be bypassed.
- Restore cooldown deployments DEBUG log that was accidentally removed.
- Fix test_health TypeError: move exception extraction to a separate
exceptions_by_model_id dict returned alongside endpoints, so exception
objects never appear in the endpoint dicts that get JSON-serialized
by the /health response.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(health-check-routing): properly isolate exceptions from health response
Return exceptions_by_model_id as a separate third value from
_perform_health_check / perform_health_check so exception objects
(which contain non-JSON-serializable httpx URL types) never appear
in the endpoint dicts that get serialized by the /health response.
Callers updated: _health_endpoints.py, shared_health_check_manager.py,
proxy_server.py background loop. All use the exceptions dict only for
cooldown integration, not for display.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(shared-health-check): fix remaining 2-value return sites and update type annotation
* fix(health-check-routing): fix P0 cooldown integration never firing
The cooldown loop was reading endpoint.get("exception") which is always
None because exceptions are now returned via exceptions_by_model_id, not
stored in endpoint dicts. Fixed to use _exceptions.get(model_id).
Also fixes the transient-error filter to use _exceptions instead of
endpoint.get("exception"), and fixes all remaining 2-value return sites
in shared_health_check_manager.py. Tests updated to pass exceptions via
exceptions_by_model_id parameter instead of endpoint dicts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(health-check-routing): fix P1 transient-error filter broken on cache hits
When SharedHealthCheckManager returns cached results, exceptions_by_model_id
is always {} so the transient-error filter defaulted to status 500 for all
endpoints, incorrectly marking 429/408 endpoints as unhealthy.
Fix: store integer exception_status on each unhealthy endpoint dict in
_perform_health_check. _get_endpoint_exception_status() uses the live
exception object when available (direct path) and falls back to the stored
integer (cache-hit path). The integer is JSON-serializable and survives
the shared cache round-trip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(health-check-routing): gate cooldown loop behind allowed_fails_policy
Without the policy, cooldown is not the routing exclusion mechanism.
Firing _set_cooldown_deployments for all enable_health_check_routing users
was a backwards-incompatible change — 401s would immediately cooldown
deployments that the binary filter would have recovered on the next cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* revert: undo allowed_fails_policy gate on cooldown loop
Cooldown integration via health checks is intentional for all
enable_health_check_routing users, not just those with allowed_fails_policy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(docs+tests): fix health_check_ignore_transient_errors doc section and test coverage
- Move health_check_ignore_transient_errors from router_settings to
general_settings in config_settings.md (code reads it from general_settings)
- Remove duplicate enable_health_check_routing / health_check_staleness_threshold
entries that were incorrectly listed under router_settings
- Replace TestHealthCheckEndpointExceptionPropagation tests with ones that
exercise the real _perform_health_check code path via mocked ahealth_check,
verifying exceptions appear in exceptions_by_model_id and NOT in endpoint dicts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(tests+docs): fix tuple unpacking and docs test failures
- Update test mocks that return (healthy, unhealthy) to return
(healthy, unhealthy, {}) to match the new 3-value signature
- Update test unpackings of perform_shared_health_check to use
healthy, unhealthy, _ = ...
- Add health_check_ignore_transient_errors to router_settings section
in config_settings.md (it is a Router constructor param, so the doc
test requires it there; it also lives in general_settings for proxy use)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix CodeQL errors
* fix(tests): fix 2-value unpackings of _perform_health_check in test_health_check.py
* fix(tests): fix mock _perform_health_check returning 2-tuple instead of 3
* fix team routing
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add distributed lock for key rotation job (#23364)
* fix: add distributed lock for key rotation job
* fix: address Greptile review feedback on key rotation lock (#23834)
* fix: address Greptile review feedback on key rotation lock
* fix req changes greptile
* feat(proxy): Optional on_error for guardrail pipeline (API / technical failures) (#24831)
* guardrails fallback
* docs
* docs: add LITELLM_KEY_ROTATION_LOCK_TTL_SECONDS to environment variables reference
* fix(mypy): accept Union[Dict, Any] in _get_deployment_order and use typed list to fix min() type error
* fix(mypy): use Optional[str] for api_base in PydanticAI provider to match superclass signature
---------
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Shivam Rawat <shivam@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* fix(lint): suppress PLR0915 for 3 complex methods that exceed 50-statement limit
- streaming_iterator.py: _process_event (84 statements)
- transformation.py: translate_messages_to_responses_input (51 statements)
- transformation.py: transform_realtime_response (54 statements)
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(mypy): resolve type errors in public_endpoints, user_api_key_auth, common_utils, transformation
- public_endpoints.py: fix _cached_endpoints type annotation
- user_api_key_auth.py: accept Optional[str] for end_user_id parameter
- common_utils.py: add NewProjectRequest/UpdateProjectRequest to Union type
- transformation.py: add ChatCompletionRedactedThinkingBlock and list[Any] to content type
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(proxy-extras): bump version to 0.4.50 and sync schema
- Bump litellm-proxy-extras from 0.4.49 to 0.4.50
- Sync schema.prisma with main proxy schema
- Includes new LiteLLM_ClaudeCodePluginTable model
- Includes new @@index([startTime, request_id]) on SpendLogs
- Update version references in requirements.txt and pyproject.toml
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(router): use string id in test_add_deployment and add defensive str() in register_model
- Change test to use string '100' instead of int 100 for model_info.id
- Add str() conversion in register_model to prevent AttributeError on non-string keys
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(security): update minimatch to 10.2.4 to fix CVE-2026-27903 and CVE-2026-27904
- Run npm audit fix in docs/my-website
- Updates minimatch from 10.2.1 to 10.2.4 (fixes HIGH severity ReDoS vulnerabilities)
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(test): update realtime guardrail test assertions to match actual guardrail behavior
- test_text_message_blocked_by_guardrail_no_ai_response: allow guardrail's own block
message text in response.done (previously expected empty content)
- test_voice_transcript_blocked_by_guardrail: allow guardrail to send response.cancel
+ block message + response.create flow (previously expected no response.create)
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: revert proxy-extras version in requirements.txt and pyproject.toml
The litellm-proxy-extras 0.4.50 is not published to PyPI yet, so consumer
references must stay at 0.4.49. Only the source package pyproject.toml
should be bumped to 0.4.50 for the publish_proxy_extras CI job.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: make transcript delta check optional in voice guardrail test
The guardrail sends an error event (guardrail_violation) when blocking
voice transcripts; it does not always produce transcript deltas. Remove
the assertion requiring response.audio_transcript.delta since the error
event is the primary signal that blocked content was handled.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* Add missing env keys to documentation: LITELLM_MAX_STREAMING_DURATION_SECONDS and LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES
These two environment variables were used in code but not documented in the
environment variables reference section of config_settings.md, causing the
test_env_keys.py CI test to fail.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* Fix 13 mypy type errors across 6 files
- in_flight_requests_middleware.py: Fix type: ignore error codes from
[union-attr] to [attr-defined], add [arg-type] for Gauge **kwargs
- transformation.py: Add [assignment] ignore for output_format reassignment,
add fallback empty string for tool use id to fix arg-type
- responses/main.py: Remove redundant type annotation on second
secret_fields assignment to fix no-redef
- streaming_iterator.py: Add [assignment] ignores for intermediate
cache token assignments
- handler.py: Add [typeddict-item] ignore for AnthropicMessagesRequest
construction from dict
- public_endpoints.py: Add [arg-type] ignore for _load_endpoints()
return type mismatch with SupportedEndpoint model
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: add auth overrides to spend tracking tests, fix realtime guardrail assertion, update UI minimatch
- Add app.dependency_overrides for user_api_key_auth in 4 spend tracking tests
that were returning 401 Unauthorized (error_code, error_message,
error_code_and_key_alias, key_hash)
- Fix realtime guardrail test to check ANY error event for guardrail_violation
instead of just the first (OpenAI may send its own errors first)
- Update ui/litellm-dashboard/package-lock.json to fix minimatch vulnerability
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* Fix failing MCP e2e and create_mcp_server UI tests
Test 1 (test_independent_clients_no_shared_session):
- Add allow_all_keys: true to MCP servers in test config. With master_key
and no DB, get_allowed_mcp_servers returned empty, causing 0 tools and
403 on tool calls. allow_all_keys bypasses per-key restrictions.
- Add asyncio.sleep(0.5) between client connections to allow MCP SDK
TaskGroup cleanup and avoid ExceptionGroup on connection close (MCP #915).
Test 2 (create_mcp_server 'auth value is provided'):
- Use userEvent.setup({ delay: null }) for instant keystrokes to avoid
timeout from default typing delay on CI.
- Increase per-test timeout to 15000ms for CI environments.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: stabilize proxy unit tests for parallel execution
- test_response_polling_handler: add xdist_group to prevent heavy import OOM
- test_db_schema_migration: use temp dir for worker isolation, sync schema.prisma index
- test_custom_tokenizer_bug: use lighter tokenizer to prevent OOM in parallel
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: add auth overrides to more spend tracking and model info tests
- Fix test_ui_view_spend_logs_pagination missing auth override (401)
- Fix test_view_spend_tags missing auth override (401)
- Fix test_view_spend_tags_no_database missing auth override (401)
- Fix test_empty_model_list.py to use app.dependency_overrides instead of patch()
for FastAPI dependency injection auth
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(test): use patch.object for aiohttp transport test to work in parallel execution
The @patch decorator was not intercepting the static method call in parallel
xdist workers. Using patch.object on the directly-imported class is more reliable.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(security): update minimatch from 10.2.1 to 10.2.4 in Dockerfile
The Docker image was explicitly pinning minimatch@10.2.1 which has HIGH
severity ReDoS vulnerabilities (GHSA-7r86-cg39-jmmj, GHSA-23c5-xmqv-rm74).
Update to 10.2.4 which includes fixes for both CVEs.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ui): prevent MCP and TeamInfo test timeouts on CI
- Add userEvent.setup({ delay: null }) to all tests using userEvent in both files
- Add timeout: 15000 to tests with significant user interaction (typing, multiple clicks)
- Fixes: create_mcp_server Bearer Token test, TeamInfo cancel button test
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: stabilize parallel test execution and aiohttp transport test
- test_aiohttp_handler: rewrite transport test to not rely on static method mock
(consistently fails in parallel xdist workers)
- test_proxy_cli: add xdist_group to prevent timeout during heavy imports
- test_swagger_chat_completions: add xdist_group to prevent timeout
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(security): add serialize-javascript override to fix GHSA-5c6j-r48x-rmvq
Add npm override for serialize-javascript>=7.0.3 in docs/my-website
to fix HIGH severity RCE vulnerability via RegExp.flags.
Also bump minimatch override to >=10.2.4.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* Fix flaky tests: remove broken Vertex model, add retries for Anthropic
- Remove vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas from
test_partner_models_httpx_streaming - consistently returns 400 BadRequest
- Add @pytest.mark.flaky(retries=6, delay=10) to test_function_call_parsing
for transient Anthropic API overload errors
- Add @pytest.mark.flaky(retries=6, delay=10) to test_openai_stream_options_call
for transient Anthropic InternalServerError
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): add xdist_group(proxy_heavy) to prevent OOM in parallel proxy tests
- Add pytestmark = pytest.mark.xdist_group('proxy_heavy') to test_proxy_utils.py
- Change test_db_schema_migration.py from schema_migration to proxy_heavy group
- Add @pytest.mark.xdist_group('proxy_heavy') to test_proxy_server.py::test_health
Groups heavy proxy tests to run on same worker, avoiding worker OOM crashes.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* Fix vertex AI qwen global endpoint test to mock vertexai module import
The test_vertex_ai_qwen_global_endpoint_url test was failing because the
VertexAIPartnerModels.completion() method tries to 'import vertexai' before
any of the mocked code runs. In environments without google-cloud-aiplatform
installed, this import fails with a VertexAIError(status_code=400).
Fix by:
- Adding patch.dict('sys.modules', {'vertexai': MagicMock()}) to mock the
vertexai module import
- Adding vertex_ai_location parameter to the acompletion call for completeness
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): add xdist_group to health endpoint and watsonx tests for parallel stability
- test_health_liveliness_endpoint: add xdist_group('proxy_health') to prevent timeout
- test_watsonx_gpt_oss tests: add xdist_group('watsonx_heavy') to prevent mock interference
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(test): pre-populate WatsonX IAM token cache to prevent parallel test interference
The watsonx prompt transformation test was failing in parallel execution because
litellm.module_level_client.post mock was being interfered with by other tests.
Pre-populating the IAM token cache avoids the HTTP call entirely.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(test): add spend data polling with retries for e2e pass-through tests
- test_vertex_with_spend.test.js: Replace 15s fixed wait with polling loop
(up to 6 attempts, 10s apart) for spend data to appear in DB
- Increase test timeout from 25s to 90s to accommodate polling
- base_anthropic_messages_tool_search_test.py: Add flaky(retries=3) for
streaming test that depends on live Anthropic API
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): reduce parallel workers from 8 to 4 for proxy tests to prevent OOM
- litellm_proxy_unit_testing_part2: -n 8 -> -n 4
- litellm_mapped_tests_proxy_part2: -n 8 -> -n 4, timeout 60 -> 120
- Worker crashes consistently caused by too many parallel proxy tests
each loading the full FastAPI app and heavy dependency tree
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(db): add migration for SpendLogs composite index (startTime, request_id)
The @@index([startTime, request_id]) was added to schema.prisma but had no
corresponding migration. This caused test_aaaasschema_migration_check to fail
because prisma migrate diff detected the missing index.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(db): add migration for MCP available_on_public_internet default change to true
The schema.prisma changed the default for available_on_public_internet from
false to true, but no migration was created. This caused the schema migration
test to detect drift.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(test): increase server wait time and add retry to flaky external API tests
- test_basic_python_version.py: increase server startup wait from 60s to 90s
for slower CI environments (fixes installing_litellm_on_python_3_13)
- test_a2a_agent.py: add flaky(retries=3, delay=5) for non-streaming test
that depends on live A2A agent endpoint
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(test): add flaky retries to all intermittent external API tests for 0-fail CI
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(test): add auth overrides to file endpoint tests that return 500
The test_target_storage tests were getting 500 because the FastAPI auth
dependency wasn't overridden. Added app.dependency_overrides for proper
auth bypass in test environment.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* staged first pass
* black
* Update litellm/proxy/health_check.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* simpler
* restore cached logo
* fix tests for perform_health_check max_concurrency arg
* implement pr suggestion
* and the helm chart
* add configureable resources and probes to the deployment in the helm chart
* more helm chart unittests
* move some background healthcheck loggin to debug
---------
Co-authored-by: Sean Glover <sglover@athenahealth.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Add @pytest.mark.skip to all test functions that use the real `prisma_client`
fixture (requiring an external PostgreSQL connection) across 7 test files.
Files updated:
- tests/proxy_unit_tests/test_proxy_server.py (5 tests)
- tests/proxy_admin_ui_tests/test_key_management.py (11 tests)
- tests/proxy_admin_ui_tests/test_role_based_access.py (5 tests)
- tests/proxy_admin_ui_tests/test_usage_endpoints.py (3 tests)
- tests/local_testing/test_blocked_user_list.py (2 tests)
- tests/local_testing/test_add_update_models.py (1 test)
- tests/local_testing/test_update_spend.py (1 test)
Total: 28 new skip markers added.
Note: tests using mock_prisma_client (properly mocked) are unaffected.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Mark tests that require Prisma DB connections or external API credentials
with @pytest.mark.skip / @pytest.mark.skipif so they don't block CI runs
when the infrastructure is unavailable.
Tests skipped:
- test_create_user_default_budget (Prisma DB)
- test_gemini_pass_through_endpoint (GEMINI_API_KEY / GOOGLE_API_KEY)
- test_vertex_ai_gemini_token_counting_with_contents (Google API creds)
- test_new/update/delete/info_project (Prisma DB)
- test_create/list/get/delete_skill_sdk (Prisma DB)
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
RedisCache() without arguments fails at construction with
"ValueError: Either 'host' or 'url' must be specified for redis."
The actual Redis connection is irrelevant since async_set_cache is mocked.
Unlike test_get_team_redis which uses client_no_auth (which sets REDIS_HOST
via fake_env_vars), test_team_update_redis has no fixture setting that env var.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Refactor proxy embeddings to use shared processor
- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks
- route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses
- tighten token array decoding logic by using router deployment lookups and the unified error handler
* Fix: Correctly process embedding requests with token arrays
The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected.
This was caused by a combination of three distinct issues:
1. In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup.
2. In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers.
3. In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists.
Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.
* test: align proxy embedding assertions
Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.
* Update proxy exception test
The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.
* testing: unsure of this change
I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.
* fix: remove unrelated change
This change was not related to the embeddings refactor and actually belonged to a different branch.
* Fix embeddings endpoint call_type to use valid CallTypes enum value
Fixed bug where the `/embeddings` endpoint was passing `call_type="embeddings"`
to guardrail hooks, but "embeddings" is not a valid value in the CallTypes enum.
Changed to use `call_type="aembedding"` (async embedding) which is the correct
CallTypes enum value and matches the route_type used in the same function.
Added unit tests to verify:
- "embeddings" is not a valid CallTypes enum value
- "aembedding" is the correct valid value
- The fix prevents ValueError when guardrails are enabled
Fixes#16240
* Inline embeddings call type regression check
* Ensure embedding test preserves proxy metadata
* fix: use fastuuid helper across the codebase
First batch of changes, simple drop in replacement.
* second batch of changes
* fixed: script mistake on helper file
* disable background health checks for specific models
* test_background_health_check_skip_disabled_models
* Disable Background Health Checks For Specific Models
* fix(db_spend_update_writer.py): fix db query
* fix(litellm_pre_call_utils.py): support passing anthropic-beta headers when 'forward_client_headers_to_llm_api' is True
allows user to pass along extra headers to vertex ai anthropic models
* docs(config_settings.md): update docs
commit 440bc027251d8180174d762d83d271d0f7b68cc5
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 23:04:11 2025 -0700
fix: fix check
commit 89a7451cb9ee26ff9f642335714dcc6f449d1fc2
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 22:42:30 2025 -0700
fix: fix test
commit 1322e3b3497e5d334fdcaa18f0cf7a98ea758df4
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 20:52:40 2025 -0700
style: add more tooltips
commit 172738b98b7864aabcacf3334a394098b300283f
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 20:51:09 2025 -0700
feat(team_member_view.tsx): add a tooltip
commit 895eb28deb9127985e30b5e859e5bca8530951c9
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 18:46:49 2025 -0700
fix(teams.tsx): support setting team member budget on create
commit 003cc54a6dd0f65030c4f39a8487adc771b62e11
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 18:40:49 2025 -0700
fix(team_member_view.tsx): style improvements
commit a627a044f21df788f80d92a4081212072be91632
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 18:40:01 2025 -0700
fix(team_member_view.tsx): handle scientific notation in string
commit c5a3b7bd8419f6394e1b490849555d02d473baed
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 18:34:25 2025 -0700
feat(team_membership_view.tsx): show team member spend + max budget on UI
commit e986d12ad5b07c676f4cac5e16745939d7473dee
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 18:28:06 2025 -0700
feat(team_member_view.tsx): show team member spend + budget on team info
commit 8e398607b25f8a8f0bab41964810b5dd27c5e3f2
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 18:18:16 2025 -0700
feat(team_info.tsx): show team member budget on team info
commit 1f56886b5913dafefc0c00fbe741c0c9c01144a6
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 18:15:30 2025 -0700
feat(team_endpoints.py): get team budget table on team info
allows user to see max budget set for team members
commit 0a4320bbfa406c24ad32a420f82152da7bdd7323
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 18:10:06 2025 -0700
feat(team_endpoints.py): return team member budget on team info
allows ui to display this to admin / team member
commit 6a4e29f87b333ae9977e8f878960e63becd89150
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 17:57:20 2025 -0700
fix(team_endpoints.py): support updating team budget on UI
commit 53f0fff34032977433dfe6935ce0a684a4141fd8
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 17:38:17 2025 -0700
feat(proxy/_types.py): return team member spend
update pydantic object to include spend
Allows showing spend of team member within team on UI
commit ef2a1a43ecf7fecfb904042cbf47b3d56246edcb
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 16:31:42 2025 -0700
feat(team_endpoints.py): support 'team_member_budget' param on `/team/update`
enables budget working across all team members
commit 512999f1249b00a02a30f049a0cfa36e829ff989
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 16:20:04 2025 -0700
test: add unit tests for default team member budget
commit 90fa3f61a2d63e12b9f3e1da9775f5c8b7294b5f
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 15:37:51 2025 -0700
feat(team_endpoints.py): support using default team member budget id, if set
allows all team members to use the same budget id
commit acef5324b1a0935a482c71060f610c3d8823e8c3
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 15:22:30 2025 -0700
feat(team_endpoints.py): support `team_member_budget` param on `/team/new`
Allow creating 1 budget for all users within team (makes it easier to increase/reduce budget if needed for all team members)
commit 2e867ac70fbd8768e7c27cf3b078e6dc10e566b9
Author: Krrish Dholakia <krrishdholakia@gmail.com>
Date: Fri Jun 20 13:45:06 2025 -0700
fix(ui_sso.py): ensure user is added to team, if set via default internal settings
allows users signed up via SSO to be added to default team
* fix(internal_user_endpoints.py): support user with `+` in email on user info
ensures user is correctly parsed from input
* fix(factory.py): support vertex function call args as None
handles empty string in args for vertex gemini calls
* docs(langfuse_integration.md): pin langfuse sdk version on docs
* fix(vertex_ai/): return empty dict, instead of none when empty string given
* refactor: reduce function size
* fix: fix linting errors
* fix: revert check
* fix(internal_user_endpoints.py): fix check
* test: update tests
* test: update tests
* feat(handle_jwt.py): map user to team when added via jwt auth
makes it easy to ensure user belongs to team
* test: test_openai_image_edit_litellm_sdk
* use n 4 for mapped tests (#11109)
* Fix/background health check (#10887)
* fix: improve health check logic by deep copying model list on each iteration
* test: add async test for background health check reflecting model list changes
* fix: validate health check interval before executing background health check
* fix: specify type for health check results dictionary
* fix(user_api_key_auth.py): handle user custom auth set with no custom settings
* bump: version 0.1.21 → 0.2.0
* ci(config.yml): run enterprise and litellm tests separately
* fix: fix linting error
* docs: add missing docs
* [Feat] Add content policy violation error mapping for image editd (#11113)
* feat: add image edit mapping for content policy violations
* test fix
* Expose `/list` and `/info` endpoints for Audit Log events (#11102)
* feat(audit_logging_endpoints.py): expose list endpoint to show all audit logs
make it easier for user to retrieve individual endpoints
* feat(enterprise/): add audit logging endpoint
* feat(audit_logging_endpoints.py): expose new GET `/audit/{id}` endpoint
make it easier to retrieve view individual audit logs
* feat(key_management_event_hooks.py): correctly show the key of the user who initiated the change
* fix(key_management_event_hooks.py): add key rotations as an audit log event
'
* test(test_audit_logging_endpoints.py): add simple unit testing for audit log endpoint
* fix: testing fixes
* fix: fix ruff check
* [Feat] Use aiohttp transport by default - 97% lower median latency (#11097)
* fix: add flag for disabling use_aiohttp_transport
* feat: add _create_async_transport
* feat: fixes for transport
* add httpx-aiohttp
* feat: fixes for transport
* refactor: fixes for transport
* build: fix deps
* fixes: test fixes
* fix: ensure aiohttp does not auto set content type
* test: test fixes
* feat: add LiteLLMAiohttpTransport
* fix: fixes for responses API handling
* test: fixes for responses API handling
* test: fixes for responses API handling
* feat: fixes for transport
* fix: base embedding handler
* test: test_async_http_handler_force_ipv4
* test: fix failing deepeval test
* fix: add YARL for bedrock urls
* fix: issues with transport
* fix: comment out linting issues
* test fix
* test: XAI is unstable
* test: fixes for using respx
* test: XAI fixes
* test: XAI fixes
* test: infinity testing fixes
* docs(config_settings.md): document param
* test: test_openai_image_edit_litellm_sdk
* test: remove deprecated test
* bump respx==0.22.0
* test: test_xai_message_name_filtering
* test: fix anthropic test after bumping httpx
* use n 4 for mapped tests (#11109)
* fix: use 1 session per event loop
* test: test_client_session_helper
* fix: linting error
* fix: resolving GET requests on httpx 0.28.1
* test fixes proxy unit tests
* fix: add ssl verify settings
* fix: proxy unit tests
* fix: refactor
* tests: basic unit tests for aiohttp transports
* tests: fixes xai
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
* test: cleanup redundant test
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: JuHyun Bae <jhyun0408@nate.com>
* fix: improve health check logic by deep copying model list on each iteration
* test: add async test for background health check reflecting model list changes
* fix: validate health check interval before executing background health check
* fix: specify type for health check results dictionary
* fix(key_management_endpoints.py): initial commit with logic to get all keys for teams user is an admin for
* fix(key_managements_endpoints.py): return all keys for teams user is an admin for
* fix(key_management_endpoints.py): add query param to ensure user opts into seeing all team keys (not just their own)
* fix(regenerate_key_modal.tsx): fix key regenerate
* fix(proxy_server.py): fix model metrics check on none api base
* test(test_key_generate_prisma.py): remove redundant test
* test(test_proxy_utils.py): add unit test covering new management endpoint helper util
* fix: fix test
* test(test_proxy_server.py): fix test
* test: add more unit testing for team member add
* fix(team_endpoints.py): add validation check to prevent same user from being added to team again
prevents duplicates
* fix(team_endpoints.py): raise error if `/team/member_delete` called on member that's not in team
prevent being able to call delete on same member multiple times
* test: update initial tests
* test: fix test
* test: update test to handle no member duplication
* docs(reliability.md): add doc on disabling fallbacks per request
* feat(litellm_pre_call_utils.py): support reading request timeout from request headers - new `x-litellm-timeout` param
Allows setting dynamic model timeouts from vercel's AI sdk
* test(test_proxy_server.py): add simple unit test for reading request timeout
* test(test_fallbacks.py): add e2e test to confirm timeout passed in request headers is correctly read
* feat(main.py): support passing metadata to openai in preview
Resolves https://github.com/BerriAI/litellm/issues/6022#issuecomment-2616119371
* fix(main.py): fix passing openai metadata
* docs(request_headers.md): document new request headers
* build: Merge branch 'main' into litellm_dev_01_27_2025_p3
* test: loosen test
* fix(internal_user_endpoints.py): fix team list sort - handle team_alias being set + None
* fix(key_management_endpoints.py): allow team admin to create key for member via admin ui
Fixes https://github.com/BerriAI/litellm/issues/7482
* fix(proxy_server.py): allow querying info on specific model group via `/model_group/info`
allows client-side user to get model info from proxy
* fix(proxy_server.py): add docstring on `/model_group/info` showing how to filter by model name
* test(test_proxy_utils.py): add unit test for returning model group info filtered
* fix(proxy_server.py): fix query param
* fix(test_Get_model_info.py): handle no whitelisted bedrock modells
* fix get_standard_logging_object_payload
* fix async_post_call_failure_hook
* fix post_call_failure_hook
* fix change
* fix _is_proxy_only_error
* fix async_post_call_failure_hook
* fix getting request body
* remove redundant code
* use a well named original function name for auth errors
* fix logging auth fails on DD
* fix using request body
* use helper for _handle_logging_proxy_only_error