mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-18 03:31:23 +00:00
51876292a0
* feat(router): integrate allowed_fails_policy into health check failures (#24988) * feat(router): integrate allowed_fails_policy into health check failures Health check failures now increment the same per-deployment failure counters used by allowed_fails_policy, so users can control how many health check failures of each error type are required before a deployment enters cooldown. - ahealth_check() preserves the original exception in its return dict - run_with_timeout() returns a litellm.Timeout on health check timeout - _perform_health_check() propagates exceptions to unhealthy endpoints - _write_health_state_to_router_cache() calls _set_cooldown_deployments for each unhealthy endpoint that has an exception - When allowed_fails_policy is set, the binary health check filter is bypassed so cooldown is the sole routing exclusion mechanism - Safety net: if all deployments are in cooldown with enable_health_check_routing=True, the cooldown filter is bypassed Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * feat(router): add health_check_ignore_transient_errors flag When enabled, health check failures with 429 (rate limit) or 408 (timeout) status codes are skipped from the cooldown pipeline. These are transient load issues, not broken deployments. Auth errors (401), 404, and 5xx errors still increment counters and trigger cooldown as before. Config (general_settings): health_check_ignore_transient_errors: true Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(router): also exclude 429/408 from health state cache when ignore_transient_errors set The previous fix only skipped cooldown counter increments. The health state cache was still marking 429/408 endpoints as is_healthy=False, causing the binary health check filter to exclude them from routing. Now, when health_check_ignore_transient_errors=True, 429/408 endpoints are also excluded from the unhealthy list passed to build_deployment_health_states(), so the binary filter treats them as unaffected (not unhealthy). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * docs(router): add health check driven routing guide New standalone page covering the full health check routing feature: allowed_fails_policy integration, health_check_ignore_transient_errors, architecture SVG, step-by-step setup, and gotchas (TTL, AllowedFails semantics). Replaces the inline section in health.md with a link to the new page. Added to the Routing & Load Balancing sidebar. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(health-check-routing): fix three CI failures - Add "exception" to ILLEGAL_DISPLAY_PARAMS in health_check.py so the exception object is stripped before the health endpoint serializes results to JSON (fixes TypeError: 'URL' object is not iterable) - Add allowed_fails_policy = None to FakeRouter stubs in test_router_health_check_routing.py (fixes AttributeError) - Add health_check_ignore_transient_errors to config_settings.md router settings reference table (fixes documentation test) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix litellm/tests/proxy_unit_tests/test_proxy_server.py * fix(router): address greptile review comments - Narrow cooldown safety-net bypass: only fires when allowed_fails_policy is set (cooldown is health-check driven). Without a policy, cooldowns are from real request failures and must not be bypassed. - Restore cooldown deployments DEBUG log that was accidentally removed. - Fix test_health TypeError: move exception extraction to a separate exceptions_by_model_id dict returned alongside endpoints, so exception objects never appear in the endpoint dicts that get JSON-serialized by the /health response. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(health-check-routing): properly isolate exceptions from health response Return exceptions_by_model_id as a separate third value from _perform_health_check / perform_health_check so exception objects (which contain non-JSON-serializable httpx URL types) never appear in the endpoint dicts that get serialized by the /health response. Callers updated: _health_endpoints.py, shared_health_check_manager.py, proxy_server.py background loop. All use the exceptions dict only for cooldown integration, not for display. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(shared-health-check): fix remaining 2-value return sites and update type annotation * fix(health-check-routing): fix P0 cooldown integration never firing The cooldown loop was reading endpoint.get("exception") which is always None because exceptions are now returned via exceptions_by_model_id, not stored in endpoint dicts. Fixed to use _exceptions.get(model_id). Also fixes the transient-error filter to use _exceptions instead of endpoint.get("exception"), and fixes all remaining 2-value return sites in shared_health_check_manager.py. Tests updated to pass exceptions via exceptions_by_model_id parameter instead of endpoint dicts. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(health-check-routing): fix P1 transient-error filter broken on cache hits When SharedHealthCheckManager returns cached results, exceptions_by_model_id is always {} so the transient-error filter defaulted to status 500 for all endpoints, incorrectly marking 429/408 endpoints as unhealthy. Fix: store integer exception_status on each unhealthy endpoint dict in _perform_health_check. _get_endpoint_exception_status() uses the live exception object when available (direct path) and falls back to the stored integer (cache-hit path). The integer is JSON-serializable and survives the shared cache round-trip. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(health-check-routing): gate cooldown loop behind allowed_fails_policy Without the policy, cooldown is not the routing exclusion mechanism. Firing _set_cooldown_deployments for all enable_health_check_routing users was a backwards-incompatible change — 401s would immediately cooldown deployments that the binary filter would have recovered on the next cycle. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * revert: undo allowed_fails_policy gate on cooldown loop Cooldown integration via health checks is intentional for all enable_health_check_routing users, not just those with allowed_fails_policy. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(docs+tests): fix health_check_ignore_transient_errors doc section and test coverage - Move health_check_ignore_transient_errors from router_settings to general_settings in config_settings.md (code reads it from general_settings) - Remove duplicate enable_health_check_routing / health_check_staleness_threshold entries that were incorrectly listed under router_settings - Replace TestHealthCheckEndpointExceptionPropagation tests with ones that exercise the real _perform_health_check code path via mocked ahealth_check, verifying exceptions appear in exceptions_by_model_id and NOT in endpoint dicts Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(tests+docs): fix tuple unpacking and docs test failures - Update test mocks that return (healthy, unhealthy) to return (healthy, unhealthy, {}) to match the new 3-value signature - Update test unpackings of perform_shared_health_check to use healthy, unhealthy, _ = ... - Add health_check_ignore_transient_errors to router_settings section in config_settings.md (it is a Router constructor param, so the doc test requires it there; it also lives in general_settings for proxy use) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Fix CodeQL errors * fix(tests): fix 2-value unpackings of _perform_health_check in test_health_check.py * fix(tests): fix mock _perform_health_check returning 2-tuple instead of 3 * fix team routing --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: add distributed lock for key rotation job (#23364) * fix: add distributed lock for key rotation job * fix: address Greptile review feedback on key rotation lock (#23834) * fix: address Greptile review feedback on key rotation lock * fix req changes greptile * feat(proxy): Optional on_error for guardrail pipeline (API / technical failures) (#24831) * guardrails fallback * docs * docs: add LITELLM_KEY_ROTATION_LOCK_TTL_SECONDS to environment variables reference * fix(mypy): accept Union[Dict, Any] in _get_deployment_order and use typed list to fix min() type error * fix(mypy): use Optional[str] for api_base in PydanticAI provider to match superclass signature --------- Co-authored-by: Sameer Kankute <sameer@berri.ai> Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com> Co-authored-by: Shivam Rawat <shivam@berri.ai> Co-authored-by: yuneng-jiang <yuneng@berri.ai>