Files
litellm/tests/litellm_utils_tests
ishaan-berri 51876292a0 Litellm ishaan april4 2 (#25150)
* feat(router): integrate allowed_fails_policy into health check failures (#24988)

* feat(router): integrate allowed_fails_policy into health check failures

Health check failures now increment the same per-deployment failure
counters used by allowed_fails_policy, so users can control how many
health check failures of each error type are required before a
deployment enters cooldown.

- ahealth_check() preserves the original exception in its return dict
- run_with_timeout() returns a litellm.Timeout on health check timeout
- _perform_health_check() propagates exceptions to unhealthy endpoints
- _write_health_state_to_router_cache() calls _set_cooldown_deployments
  for each unhealthy endpoint that has an exception
- When allowed_fails_policy is set, the binary health check filter is
  bypassed so cooldown is the sole routing exclusion mechanism
- Safety net: if all deployments are in cooldown with
  enable_health_check_routing=True, the cooldown filter is bypassed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(router): add health_check_ignore_transient_errors flag

When enabled, health check failures with 429 (rate limit) or 408 (timeout)
status codes are skipped from the cooldown pipeline. These are transient
load issues, not broken deployments. Auth errors (401), 404, and 5xx errors
still increment counters and trigger cooldown as before.

Config (general_settings):
  health_check_ignore_transient_errors: true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(router): also exclude 429/408 from health state cache when ignore_transient_errors set

The previous fix only skipped cooldown counter increments. The health state
cache was still marking 429/408 endpoints as is_healthy=False, causing the
binary health check filter to exclude them from routing.

Now, when health_check_ignore_transient_errors=True, 429/408 endpoints are
also excluded from the unhealthy list passed to build_deployment_health_states(),
so the binary filter treats them as unaffected (not unhealthy).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(router): add health check driven routing guide

New standalone page covering the full health check routing feature:
allowed_fails_policy integration, health_check_ignore_transient_errors,
architecture SVG, step-by-step setup, and gotchas (TTL, AllowedFails semantics).

Replaces the inline section in health.md with a link to the new page.
Added to the Routing & Load Balancing sidebar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): fix three CI failures

- Add "exception" to ILLEGAL_DISPLAY_PARAMS in health_check.py so the
  exception object is stripped before the health endpoint serializes
  results to JSON (fixes TypeError: 'URL' object is not iterable)
- Add allowed_fails_policy = None to FakeRouter stubs in
  test_router_health_check_routing.py (fixes AttributeError)
- Add health_check_ignore_transient_errors to config_settings.md router
  settings reference table (fixes documentation test)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix litellm/tests/proxy_unit_tests/test_proxy_server.py

* fix(router): address greptile review comments

- Narrow cooldown safety-net bypass: only fires when allowed_fails_policy
  is set (cooldown is health-check driven). Without a policy, cooldowns
  are from real request failures and must not be bypassed.
- Restore cooldown deployments DEBUG log that was accidentally removed.
- Fix test_health TypeError: move exception extraction to a separate
  exceptions_by_model_id dict returned alongside endpoints, so exception
  objects never appear in the endpoint dicts that get JSON-serialized
  by the /health response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): properly isolate exceptions from health response

Return exceptions_by_model_id as a separate third value from
_perform_health_check / perform_health_check so exception objects
(which contain non-JSON-serializable httpx URL types) never appear
in the endpoint dicts that get serialized by the /health response.

Callers updated: _health_endpoints.py, shared_health_check_manager.py,
proxy_server.py background loop. All use the exceptions dict only for
cooldown integration, not for display.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(shared-health-check): fix remaining 2-value return sites and update type annotation

* fix(health-check-routing): fix P0 cooldown integration never firing

The cooldown loop was reading endpoint.get("exception") which is always
None because exceptions are now returned via exceptions_by_model_id, not
stored in endpoint dicts. Fixed to use _exceptions.get(model_id).

Also fixes the transient-error filter to use _exceptions instead of
endpoint.get("exception"), and fixes all remaining 2-value return sites
in shared_health_check_manager.py. Tests updated to pass exceptions via
exceptions_by_model_id parameter instead of endpoint dicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): fix P1 transient-error filter broken on cache hits

When SharedHealthCheckManager returns cached results, exceptions_by_model_id
is always {} so the transient-error filter defaulted to status 500 for all
endpoints, incorrectly marking 429/408 endpoints as unhealthy.

Fix: store integer exception_status on each unhealthy endpoint dict in
_perform_health_check. _get_endpoint_exception_status() uses the live
exception object when available (direct path) and falls back to the stored
integer (cache-hit path). The integer is JSON-serializable and survives
the shared cache round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): gate cooldown loop behind allowed_fails_policy

Without the policy, cooldown is not the routing exclusion mechanism.
Firing _set_cooldown_deployments for all enable_health_check_routing users
was a backwards-incompatible change — 401s would immediately cooldown
deployments that the binary filter would have recovered on the next cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* revert: undo allowed_fails_policy gate on cooldown loop

Cooldown integration via health checks is intentional for all
enable_health_check_routing users, not just those with allowed_fails_policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs+tests): fix health_check_ignore_transient_errors doc section and test coverage

- Move health_check_ignore_transient_errors from router_settings to
  general_settings in config_settings.md (code reads it from general_settings)
- Remove duplicate enable_health_check_routing / health_check_staleness_threshold
  entries that were incorrectly listed under router_settings
- Replace TestHealthCheckEndpointExceptionPropagation tests with ones that
  exercise the real _perform_health_check code path via mocked ahealth_check,
  verifying exceptions appear in exceptions_by_model_id and NOT in endpoint dicts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tests+docs): fix tuple unpacking and docs test failures

- Update test mocks that return (healthy, unhealthy) to return
  (healthy, unhealthy, {}) to match the new 3-value signature
- Update test unpackings of perform_shared_health_check to use
  healthy, unhealthy, _ = ...
- Add health_check_ignore_transient_errors to router_settings section
  in config_settings.md (it is a Router constructor param, so the doc
  test requires it there; it also lives in general_settings for proxy use)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix CodeQL errors

* fix(tests): fix 2-value unpackings of _perform_health_check in test_health_check.py

* fix(tests): fix mock _perform_health_check returning 2-tuple instead of 3

* fix team routing

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add distributed lock for key rotation job (#23364)

* fix: add distributed lock for key rotation job

* fix: address Greptile review feedback on key rotation lock (#23834)

* fix: address Greptile review feedback on key rotation lock

* fix req changes greptile

* feat(proxy): Optional on_error for guardrail pipeline (API / technical failures) (#24831)

* guardrails fallback

* docs

* docs: add LITELLM_KEY_ROTATION_LOCK_TTL_SECONDS to environment variables reference

* fix(mypy): accept Union[Dict, Any] in _get_deployment_order and use typed list to fix min() type error

* fix(mypy): use Optional[str] for api_base in PydanticAI provider to match superclass signature

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Shivam Rawat <shivam@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
2026-04-04 23:09:42 +00:00
..
2026-03-30 18:14:44 -07:00