- New troubleshoot page and blog post with step-by-step comparison workflow
- Screenshots under static/img/cost-discrepancy-debug
- Link from spend tracking; sidebar entry under Troubleshooting
- Flowchart SVG: Path B connectors below box; clarify LiteLLM schedules customer calls when stuck
Made-with: Cursor
* feat(guardrails): optional skip system message in unified guardrail inputs
Made-with: Cursor
* feat(dashboard): skip_system_message_in_guardrail in guardrail UI
Add a tri-state control (inherit / yes / no) when creating or editing
guardrails so admins can set litellm_params.skip_system_message_in_guardrail
without YAML. Table edit merges existing litellm_params before PUT to avoid
wiping content-filter and other provider fields.
Document the dashboard flow in the guardrails quick start with a screenshot.
Made-with: Cursor
* fix(guardrails): type structured_messages as AllMessageValues for mypy
Use AllMessageValues in openai_messages_without_system and cast adapter
request messages so GenericGuardrailAPIInputs matches TypedDict.
Made-with: Cursor
* Add PromptGuard guardrail integration
Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy,
supporting prompt injection detection, PII redaction, topic filtering,
entity blocklists, and hallucination detection via PromptGuard's
/api/v1/guard API endpoint.
Backend:
- Add PROMPTGUARD to SupportedGuardrailIntegrations enum
- Implement PromptGuardGuardrail (CustomGuardrail subclass) with
apply_guardrail handling allow/block/redact decisions
- Add Pydantic config model with api_key, api_base, ui_friendly_name
- Auto-discovered via guardrail_hooks/promptguard/__init__.py registries
Frontend:
- Add PromptGuard partner card to Guardrail Garden with eval scores
- Add preset configuration for quick setup
- Add logo to guardrailLogoMap
Tests:
- 30 unit tests covering configuration, allow/block/redact actions,
request payload construction, error handling, config model, and
registry wiring
* Fix redact path and init ordering per review feedback
- P1: Update structured_messages (not just texts) when PromptGuard
returns a redact decision, so PII redaction is effective for the
primary LLM message path
- P2: Validate credentials before allocating the HTTPX client so
resources aren't acquired if PromptGuardMissingCredentials is raised
- Add tests for structured_messages redaction and texts-only redaction
* Harden PromptGuard integration: fail-open, event hooks, images, docs
- Add block_on_error config (default fail-closed, configurable fail-open)
- Declare supported_event_hooks (pre_call, post_call) like other vendors
- Forward images from GenericGuardrailAPIInputs to PromptGuard API
- Wrap API call in try/except for resilient error handling
- Add comprehensive documentation page with config examples
- Register docs page in sidebar alongside other guardrail providers
- Expand test suite from 32 to 40 tests covering new functionality
* Fix dict[str, Any] -> Dict[str, Any] for Python 3.8 compat
* Address remaining Greptile feedback: timeout, redact guard
- Add explicit 10s timeout to async_handler.post() to prevent
indefinite hangs when PromptGuard API is unresponsive
- Guard redact path: only update inputs["texts"] when the key
was originally present, avoiding phantom key injection
- Add test: redact with structured_messages only does not create
texts key (41 tests total)
* Fix CI lint: black formatting, add PromptGuardConfigModel to LitellmParams
- Reformat promptguard.py to match CI black version (parenthesization)
- Add PromptGuardConfigModel as base class of LitellmParams for proper
Pydantic schema validation, consistent with all other guardrail vendors
- Use litellm_params.block_on_error directly (now a typed field)
* Address Greptile review: redact path, null decision, error context
- P1: Filter _extract_texts_from_messages to user-role messages only,
preventing system/assistant content from being injected into texts
- P1: Strengthen test_redact_updates_structured_messages assertion from
weak `in` check to strict equality, catching the injection bug
- P2: Use `result.get("decision") or "allow"` to handle explicit null
decision values (not just absent keys)
- P2: Wrap bare exception re-raise in GuardrailRaisedException so the
caller knows which guardrail failed (block_on_error=True path)
- P2: Add static Promptguard entry in guardrail_provider_map so the
preset works before populateGuardrailProviderMap is called
- Add test for explicit null decision treated as allow
* Fix black formatting: collapse f-string in error message
* fix(vertex_ai): support pluggable (executable) credential_source for WIF auth (#24700)
The WIF credential dispatch in load_auth() only handled identity_pool and
aws credential types. When credential_source.executable was present (used
for Azure Managed Identity via Workload Identity Federation), it fell
through to identity_pool.Credentials which rejected it with MalformedError.
Add dispatch to google.auth.pluggable.Credentials for executable-type
credential sources, following the same pattern as the existing identity_pool
and aws helpers.
Fixes authentication for Azure Container Apps → GCP Vertex AI via WIF
with executable credential sources.
* feat(logging): add component and logger fields to JSON logs for 3rd p… (#24447)
* feat(logging): add component and logger fields to JSON logs for 3rd party filtering
* Let user-supplied extra fields win over auto-generated component/logger, tighten test assertions
* Feat - Add organization into the metrics metadata for org_id & org_alias (#24440)
* Add org_id and org_alias label names to Prometheus metric definitions
* Add user_api_key_org_alias to StandardLoggingUserAPIKeyMetadata
* Populate user_api_key_org_alias in pre-call metadata
* Pass org_id and org_alias into per-request Prometheus metric labels
* Add test for org labels on per-request Prometheus metrics
* chore: resolve test mockdata
* Address review: populate org_alias from DB view, add feature flag, use .get() for org metadata
* Add org labels to failure path and verify flag behavior in test
* Fix test: build flag-off enum_values without org fields
* Gate org labels behind feature flag in get_labels() instead of static metric lists
* Scope org label injection to metrics that carry team context, remove orphaned budget label defs, add test teardown
* Use explicit metric allowlist for org label injection instead of team heuristic
* Fix duplicate org label guard, move _org_label_metrics to class constant
* Reset custom_prometheus_metadata_labels after duplicate label assertion
* fix: emit org labels by default, remove flag, fix missing org_alias in all metadata paths
* fix: emit org labels by default, no opt-in flag required
* fix: write org_alias to metadata unconditionally in proxy_server.py
* fix: 429s from batch creation being converted to 500 (#24703)
* add us gov models (#24660)
* add us gov models
* added max tokens
* Litellm dev 04 02 2026 p1 (#25052)
* fix: replace hardcoded url
* fix: Anthropic web search cost not tracked for Chat Completions
The ModelResponse branch in response_object_includes_web_search_call()
only checked url_citation annotations and prompt_tokens_details, missing
Anthropic's server_tool_use.web_search_requests field. This caused
_handle_web_search_cost() to never fire for Anthropic Claude models.
Also routes vertex_ai/claude-* models to the Anthropic cost calculator
instead of the Gemini one, since Claude on Vertex uses the same
server_tool_use billing structure as the direct Anthropic API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(anthropic): pass logging_obj to client.post for litellm_overhead_time_ms (#24071)
When LITELLM_DETAILED_TIMING=true, litellm_overhead_time_ms was null for
Anthropic because the handler did not pass logging_obj to client.post(),
so track_llm_api_timing could not set llm_api_duration_ms. Pass
logging_obj=logging_obj at all four post() call sites (make_call,
make_sync_call, acompletion, completion). Add test to ensure make_call
passes logging_obj to client.post.
Made-with: Cursor
* sap - add additional parameters for grounding
- additional parameter for grounding added for the sap provider
* sap - fix models
* (sap) add filtering, masking, translation SAP GEN AI Hub modules
* (sap) add tests and docs for new SAP modules
* (sap) add support of multiple modules config
* (sap) code refactoring
* (sap) rename file
* test(): add safeguard tests
* (sap) update tests
* (sap) update docs, solve merge conflict in transformation.py
* (sap) linter fix
* (sap) Align embedding request transformation with current API
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) mock commit
* (sap) run black formater
* (sap) add literals to models, add negative tests, fix test for tool transformation
* (sap) fix formating
* (sap) fix models
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) commit for rerun bot review
* (sap) minor improve
* (sap) fix after bot review
* (sap) lint fix
* docs(sap): update documentation
* fix(sap): change creds priority
* fix(sap): change creds priority
* fix(sap): fix sap creds unit test
* fix(sap): linter fix
* fix(sap): linter fix
* linter fix
* (sap) update logic of fetching creds, add additional tests
* (sap) clean up code
* (sap) fix after review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) add a possibility to put the service key by both variants
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) update test
* (sap) update service key resolve function
* (sap) run black formater
* (sap) fix validate credentials, add negative tests for credential fetching
* (sap) fix validate credentials, add negative tests for credential fetching
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) lint fix
* (sap) lint fix
* feat: support service_tier in gemini
* chore: add a service_tier field mapping from openai to gemini
* fix: use x-gemini-service-tier header in response
* docs: add service_tier to gemini docs
* chore: add defaut/standard mapping, and some tests
* chore: tidying up some case insensitivity
* chore: remove unnecessary guard
* fix: remove redundant test file
* fix: handle 'auto' case-insensitively
* fix: return service_tier on final steamed chunk
* chore: black
* feat: enable supports_service_tier to gemini models
* Fix get_standard_logging_metadata tests
* Fix test_get_model_info_bedrock_models
* Fix test_get_model_info_bedrock_models
* Fix remaining tests
* Fix mypy issues
* Fix tests
* Fix merge conflicts
* Fix code qa
* Fix code qa
* Fix code qa
* Fix greptile review
---------
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Josh <36064836+J-Byron@users.noreply.github.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Alperen Kömürcü <alperen.koemuercue@sap.com>
Co-authored-by: Vasilisa Parshikova <vasilisa.parshikova@sap.com>
Co-authored-by: Lin Xu <lin.xu03@sap.com>
Co-authored-by: Mark McDonald <macd@google.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Allow JWT tokens matching routing_overrides to use OAuth2 introspection without enabling global OAuth2 while keeping OAuth2 routing limited to LLM/info routes. Add regression coverage for management-route boundary and tighten opaque-token assertions; update docs to reflect selective-mode route scope.
Made-with: Cursor
Pin all cosign public key references to the immutable commit hash
(0112e53) that first introduced the key, instead of fetching it from
the release tag. This addresses the concern that an attacker with push
access could replace the key on main/tags and re-sign tampered images.
Docs now show two verification methods: commit hash (recommended) and
release tag (convenience), with explanation of why the hash is stronger.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (#25257)
* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant
Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.
* Batch-limit stale managed object cleanup with single bounded SQL query
Two fixes to _cleanup_stale_managed_objects:
1. Replace unbounded update_many with a single execute_raw using a
subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
rows. Zero rows loaded into Python memory — everything stays in Postgres.
Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
(the proxy requires PostgreSQL per schema.prisma).
2. Extract _expire_stale_rows as a separate method for testability.
Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.
* docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference
* test: remove deprecated embed-english-v2.0 cohere embedding tests
* docs(blog): add cosign Docker image verification instructions
Add steps for verifying Docker images with cosign to three security blog posts:
CI/CD v2, Security Townhall, and Security Update.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(proxy): add cosign verification to Docker/Helm/Terraform deploy page
Add image signature verification steps to the main deployment doc so
users pulling Docker images know how to verify them with cosign.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: fixes
* Update index.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* [Docs] Scope cosign signing docs to GHCR and specify starting version
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* [Docs] Add starting version callout to ci_cd_v2 blog post
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* Litellm ishaan march23 - MCP Toolsets + GCP Caching fix (#25146)
* feat(mcp): MCP Toolsets — curated tool subsets from one or more MCP servers (#24335)
* feat(mcp): add LiteLLM_MCPToolsetTable and mcp_toolsets to ObjectPermissionTable
* feat(mcp): add prisma migration for MCPToolset table
* feat(mcp): add MCPToolset Python types
* feat(mcp): add toolset_db.py with CRUD helpers for MCPToolset
* feat(mcp): add toolset CRUD endpoints to mcp_management_endpoints
* fix(mcp): skip allow_all_keys servers when explicit mcp_servers permission is set (toolset scope fix)
* feat(mcp): add _apply_toolset_scope and toolset route handling in server.py
* fix(mcp): resolve toolset names in responses API before fetching tools
* feat(mcp): add mcp_toolsets field to LiteLLM_ObjectPermissionTable type
* feat(mcp): register LiteLLM_MCPToolsetTable in prisma client initialization
* feat(mcp): validate mcp_toolsets in key-vs-team permission check
* feat(mcp): register toolset routes in proxy_server.py
* feat(mcp): add MCPToolset and MCPToolsetTool TypeScript types
* feat(mcp): add fetchMCPToolsets, createMCPToolset, updateMCPToolset, deleteMCPToolset API functions
* feat(mcp): add useMCPToolsets React Query hook
* feat(mcp): add toolsets (purple) as third option type in MCPServerSelector
* feat(mcp): extract toolsets from combined MCP field in key form
* feat(mcp): extract toolsets from combined MCP field in team form
* feat(mcp): show toolsets section in MCPServerPermissions read view
* feat(mcp): pass mcp_toolsets through object_permissions_view
* feat(mcp): add MCPToolsetsTab component for creating and managing toolsets
* feat(mcp): add Toolsets tab to mcp_servers.tsx
* feat(mcp): pass mcpToolsets to playground chat and responses API calls
* feat(mcp): generate correct server_url for toolsets in playground API calls
* docs(mcp): add MCP Toolsets documentation
* docs(mcp): add mcp_toolsets to sidebar
* fix(mcp): replace x-mcp-toolset-id header with ContextVar to prevent client forgery
* fix(mcp): use ContextVar + StreamingResponse for toolset MCP routes (fixes SSE streaming)
* fix(mcp): cache toolset permission lookups to avoid per-request DB calls
* test(mcp): add tests for toolset scope enforcement, ContextVar isolation, and access control
* fix(mcp): cache toolset name lookups in MCPServerManager to avoid per-request DB calls
* fix(mcp): prevent body_iter deadlock + use cached toolset lookup in responses API
- _stream_mcp_asgi_response: add done callback to handler_task that puts
the EOF sentinel on body_queue when the task exits, preventing body_iter
from hanging forever if the handler raises after headers are sent.
- litellm_proxy_mcp_handler: replace raw get_mcp_toolset_by_name() DB call
with global_mcp_server_manager.get_toolset_by_name_cached() so toolset
resolution uses the 60s TTL cache added for this purpose instead of
hitting the DB on every responses-API request.
* fix(mcp): toolset access control, asyncio fix, and real unit tests
- server.py: _apply_toolset_scope now enforces that non-admin keys must
have the requested toolset_id in their mcp_toolsets grant list;
admin keys always bypass the check.
- mcp_management_endpoints.py: three access-control fixes:
* fetch_mcp_toolsets: non-admin keys with mcp_toolsets=None now
return [] instead of all toolsets (only admins get 'all' when
the field is absent)
* fetch_mcp_toolset: non-admin keys that haven't been granted the
requested toolset_id now get 403 instead of the full result
* add_mcp_toolset: duplicate toolset_name now returns 409 Conflict
instead of an opaque 500
- proxy_server.py: use asyncio.get_running_loop() instead of
get_event_loop() inside an already-running coroutine (Python 3.10+).
- test_mcp_toolset_scope.py: replace four hollow tests that only
asserted local variable properties with real tests that call the
production fetch_mcp_toolsets() and handle_streamable_http_mcp()
functions with mocked dependencies.
* fix(mcp): add mcp_toolsets to ObjectPermissionBase, fix multi-toolset overwrite, fix delete 404, allow standalone key toolsets
* fix(mcp): add auth check on toolset resolution in responses API; union mcp_servers in _merge_toolset_permissions
* fix(mcp): handle RecordNotFoundError in update_mcp_toolset; union direct servers with toolset servers
* fix(mcp): use _user_has_admin_view; deny None mcp_toolsets for non-admin; use direct RecordNotFoundError import; fix docstring
* fix(mcp): add @default(now()) to MCPToolsetTable.updated_at; fix test for non-admin toolset access
* fix: use UniqueViolationError import; guard _ensure_eof for error/cancel only
* fix(mcp): preserve mcp_access_groups in toolset scope, use shared Redis cache for toolset perms
- Remove mcp_access_groups=[] from _apply_toolset_scope (server.py) and the
responses API toolset path (litellm_proxy_mcp_handler.py). A key's access-group
grants remain valid even when the request is scoped to a single toolset; clearing
them silently revoked legitimate entitlements.
- Switch resolve_toolset_tool_permissions and get_toolset_by_name_cached to use
user_api_key_cache (Redis-backed DualCache in production) instead of per-instance
in-memory dicts. Cache entries are now shared across workers, eliminating the
per-worker stale-toolset-permission window flagged as a P1 by Greptile.
- Use union merge (set union of tool names per server) when applying toolset
permissions in the responses API path so direct-server tool restrictions are not
overwritten by toolset permissions.
* fix(mcp): return 404 when edit_mcp_toolset target does not exist
* fix(mcp): align mcp_toolsets default to None in LiteLLM_ObjectPermissionTable
* fix(mcp): admin toolset visibility, in-place tool name mutation, test helper coercion
* fix(mcp): treat None/[] team mcp_toolsets as no restriction in key validation
* fix(mcp): allow_all_keys backward compat, blocked_tools API write-path, efficient startup query
* fix(mcp): use _mcp_active_toolset_id ContextVar to detect toolset scope, avoiding DB-default false-positive
* fix(mcp): remove dead toolset cache stubs, log invalidation failures, align schema updated_at defaults
* fix(mcp): deserialise MCPToolset from Redis cache hit, replace fastapi import in test
* fix(mcp): evict name-cache on toolset mutation, 409 on rename conflict, warning-level list errors
* fix(redis): regenerate GCP IAM token per connection for async cluster (#24426)
* fix(redis): regenerate GCP IAM token per connection for async cluster clients
Async RedisCluster was generating the IAM token once at startup and
storing it as a static password. After the 1-hour GCP token TTL, any
new connection (including to newly-discovered cluster nodes) would fail
to authenticate.
Fix: introduce GCPIAMCredentialProvider that implements redis-py's
CredentialProvider protocol. It calls _generate_gcp_iam_access_token()
on every new connection, matching what the sync redis_connect_func
already does. async_redis.RedisCluster accepts a credential_provider
kwarg which is invoked per-connection.
* refactor(redis): move GCPIAMCredentialProvider to its own file
Extract GCPIAMCredentialProvider and _generate_gcp_iam_access_token
into litellm/_redis_credential_provider.py. _redis.py imports them
from there, keeping the public API unchanged.
* fix: address Greptile review issues
- GCPIAMCredentialProvider now inherits from redis.credentials.CredentialProvider
so redis-py's async path calls get_credentials_async() properly
- move _redis_credential_provider import to top of _redis.py (PEP 8)
- remove dead else-branch that silently no-oped (gcp_service_account from
redis_kwargs.get() was always None since it's popped by _get_redis_client_logic)
- remove mid-function 'from litellm import get_secret_str' inline import
- remove unused 'call' import from test_redis.py
* chore: retrigger CI/review
* chore: sync schema.prisma copies from root
* chore: sync schema.prisma copies from root
* fix(proxy_server): use bounded asyncio.Queue with maxsize to prevent unbounded growth
* fix(a2a/pydantic_ai): make api_base Optional to match base class signature
* fix(a2a/pydantic_ai): make api_base Optional in handler and guard against None
* fix(mcp): remove unused get_all_mcp_servers import
* fix(mcp): remove unused MCPToolset import
* refactor(mcp): extract toolset permission logic to reduce statement count below PLR0915 limit
* fix(tests): update reload_servers_from_database tests to mock prisma directly
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(toolset_db): lazy-import prisma to avoid ImportError when prisma not installed
* fix(tests): update UI tests for toolset tab and updated empty state text
* fix(tests): add get_mcp_server_by_name to fake_manager stub
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* feat(router): integrate allowed_fails_policy into health check failures (#24988)
* feat(router): integrate allowed_fails_policy into health check failures
Health check failures now increment the same per-deployment failure
counters used by allowed_fails_policy, so users can control how many
health check failures of each error type are required before a
deployment enters cooldown.
- ahealth_check() preserves the original exception in its return dict
- run_with_timeout() returns a litellm.Timeout on health check timeout
- _perform_health_check() propagates exceptions to unhealthy endpoints
- _write_health_state_to_router_cache() calls _set_cooldown_deployments
for each unhealthy endpoint that has an exception
- When allowed_fails_policy is set, the binary health check filter is
bypassed so cooldown is the sole routing exclusion mechanism
- Safety net: if all deployments are in cooldown with
enable_health_check_routing=True, the cooldown filter is bypassed
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* feat(router): add health_check_ignore_transient_errors flag
When enabled, health check failures with 429 (rate limit) or 408 (timeout)
status codes are skipped from the cooldown pipeline. These are transient
load issues, not broken deployments. Auth errors (401), 404, and 5xx errors
still increment counters and trigger cooldown as before.
Config (general_settings):
health_check_ignore_transient_errors: true
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(router): also exclude 429/408 from health state cache when ignore_transient_errors set
The previous fix only skipped cooldown counter increments. The health state
cache was still marking 429/408 endpoints as is_healthy=False, causing the
binary health check filter to exclude them from routing.
Now, when health_check_ignore_transient_errors=True, 429/408 endpoints are
also excluded from the unhealthy list passed to build_deployment_health_states(),
so the binary filter treats them as unaffected (not unhealthy).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* docs(router): add health check driven routing guide
New standalone page covering the full health check routing feature:
allowed_fails_policy integration, health_check_ignore_transient_errors,
architecture SVG, step-by-step setup, and gotchas (TTL, AllowedFails semantics).
Replaces the inline section in health.md with a link to the new page.
Added to the Routing & Load Balancing sidebar.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(health-check-routing): fix three CI failures
- Add "exception" to ILLEGAL_DISPLAY_PARAMS in health_check.py so the
exception object is stripped before the health endpoint serializes
results to JSON (fixes TypeError: 'URL' object is not iterable)
- Add allowed_fails_policy = None to FakeRouter stubs in
test_router_health_check_routing.py (fixes AttributeError)
- Add health_check_ignore_transient_errors to config_settings.md router
settings reference table (fixes documentation test)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix litellm/tests/proxy_unit_tests/test_proxy_server.py
* fix(router): address greptile review comments
- Narrow cooldown safety-net bypass: only fires when allowed_fails_policy
is set (cooldown is health-check driven). Without a policy, cooldowns
are from real request failures and must not be bypassed.
- Restore cooldown deployments DEBUG log that was accidentally removed.
- Fix test_health TypeError: move exception extraction to a separate
exceptions_by_model_id dict returned alongside endpoints, so exception
objects never appear in the endpoint dicts that get JSON-serialized
by the /health response.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(health-check-routing): properly isolate exceptions from health response
Return exceptions_by_model_id as a separate third value from
_perform_health_check / perform_health_check so exception objects
(which contain non-JSON-serializable httpx URL types) never appear
in the endpoint dicts that get serialized by the /health response.
Callers updated: _health_endpoints.py, shared_health_check_manager.py,
proxy_server.py background loop. All use the exceptions dict only for
cooldown integration, not for display.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(shared-health-check): fix remaining 2-value return sites and update type annotation
* fix(health-check-routing): fix P0 cooldown integration never firing
The cooldown loop was reading endpoint.get("exception") which is always
None because exceptions are now returned via exceptions_by_model_id, not
stored in endpoint dicts. Fixed to use _exceptions.get(model_id).
Also fixes the transient-error filter to use _exceptions instead of
endpoint.get("exception"), and fixes all remaining 2-value return sites
in shared_health_check_manager.py. Tests updated to pass exceptions via
exceptions_by_model_id parameter instead of endpoint dicts.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(health-check-routing): fix P1 transient-error filter broken on cache hits
When SharedHealthCheckManager returns cached results, exceptions_by_model_id
is always {} so the transient-error filter defaulted to status 500 for all
endpoints, incorrectly marking 429/408 endpoints as unhealthy.
Fix: store integer exception_status on each unhealthy endpoint dict in
_perform_health_check. _get_endpoint_exception_status() uses the live
exception object when available (direct path) and falls back to the stored
integer (cache-hit path). The integer is JSON-serializable and survives
the shared cache round-trip.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(health-check-routing): gate cooldown loop behind allowed_fails_policy
Without the policy, cooldown is not the routing exclusion mechanism.
Firing _set_cooldown_deployments for all enable_health_check_routing users
was a backwards-incompatible change — 401s would immediately cooldown
deployments that the binary filter would have recovered on the next cycle.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* revert: undo allowed_fails_policy gate on cooldown loop
Cooldown integration via health checks is intentional for all
enable_health_check_routing users, not just those with allowed_fails_policy.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(docs+tests): fix health_check_ignore_transient_errors doc section and test coverage
- Move health_check_ignore_transient_errors from router_settings to
general_settings in config_settings.md (code reads it from general_settings)
- Remove duplicate enable_health_check_routing / health_check_staleness_threshold
entries that were incorrectly listed under router_settings
- Replace TestHealthCheckEndpointExceptionPropagation tests with ones that
exercise the real _perform_health_check code path via mocked ahealth_check,
verifying exceptions appear in exceptions_by_model_id and NOT in endpoint dicts
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(tests+docs): fix tuple unpacking and docs test failures
- Update test mocks that return (healthy, unhealthy) to return
(healthy, unhealthy, {}) to match the new 3-value signature
- Update test unpackings of perform_shared_health_check to use
healthy, unhealthy, _ = ...
- Add health_check_ignore_transient_errors to router_settings section
in config_settings.md (it is a Router constructor param, so the doc
test requires it there; it also lives in general_settings for proxy use)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Fix CodeQL errors
* fix(tests): fix 2-value unpackings of _perform_health_check in test_health_check.py
* fix(tests): fix mock _perform_health_check returning 2-tuple instead of 3
* fix team routing
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: add distributed lock for key rotation job (#23364)
* fix: add distributed lock for key rotation job
* fix: address Greptile review feedback on key rotation lock (#23834)
* fix: address Greptile review feedback on key rotation lock
* fix req changes greptile
* feat(proxy): Optional on_error for guardrail pipeline (API / technical failures) (#24831)
* guardrails fallback
* docs
* docs: add LITELLM_KEY_ROTATION_LOCK_TTL_SECONDS to environment variables reference
* fix(mypy): accept Union[Dict, Any] in _get_deployment_order and use typed list to fix min() type error
* fix(mypy): use Optional[str] for api_base in PydanticAI provider to match superclass signature
---------
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Shivam Rawat <shivam@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
* fix(pricing): add unversioned vertex_ai/claude-haiku-4-5 entry
Missing unversioned entry causes cost tracking to return $0.00 for
all requests using vertex_ai/claude-haiku-4-5. All other Vertex AI
Claude models have both versioned and unversioned entries.
* fix(router): skip misleading tags error when no candidates (e.g. cooldown)
Return early from get_deployments_for_tag when healthy_deployments is empty so
tag-based routing does not raise no_deployments_with_tag_routing after cooldown
filters all deployments. Adds regression test.
Made-with: Cursor
* feat(oci): add embedding support and update model catalog
- Add OCIEmbeddingConfig for OCI GenAI embedding models
- Add 16 new chat models (Cohere, Meta Llama, xAI Grok, Google Gemini)
- Add 8 embedding models (Cohere embed v3.0, v4.0)
- Update documentation with embedding examples
- Update pricing for all new models
* test(oci): add unit tests for OCI embedding support
- 17 unit tests covering OCIEmbeddingConfig
- Tests for URL generation, param mapping, request/response transform
- Tests for model pricing JSON completeness
* style(oci): format with black and ruff
* fix(oci): correct embedding request body format
OCI embedText API expects inputs, truncate, and inputType at the
top level of the request body, not nested under embedTextDetails.
Fixed transformation and updated tests accordingly.
Verified with real OCI API: 3/3 embedding models working.
* docs: clarify tag routing early return and test intent
Made-with: Cursor
* fix(oci): address code review findings from Greptile
- P1: Fix signing URL mismatch with custom api_base by accepting
api_base parameter in transform_embedding_request
- P2: Remove encoding_format from supported params (OCI does not
support it, was silently dropped)
- P2: Raise ValueError for token-array inputs instead of silently
converting to string representation
- Add test for token-list rejection
* fix(mcp): add STS AssumeRole support for MCP SigV4 authentication
MCPSigV4Auth only supported static AWS credentials or the boto3 default
credential chain. Production Kubernetes environments typically authenticate
via IAM role assumption (sts:AssumeRole), which was not possible.
Add aws_role_name and aws_session_name parameters to the MCP SigV4 auth
stack. When aws_role_name is provided, MCPSigV4Auth calls sts:AssumeRole
to obtain temporary credentials before signing requests. Explicit keys,
if also provided, are used as the source identity for the STS call;
otherwise ambient credentials (pod role, instance profile) are used.
* fix: stop logging credential values and add missing redaction patterns
Replaces raw credential values in debug/error log messages with
boolean presence checks or type names. Adds PEM block, GCP token,
JWT, SAS token, and service-account blob patterns to the redaction
filter. Fixes private_key pattern to capture full PEM blocks instead
of stopping at the first whitespace.
Addresses: Vertex AI credential JSON (including RSA private key)
being logged to stderr on health check failures.
* fix: log only field names for UserAPIKeyAuth, not full object
* style: apply black formatting to experimental_mcp_client/client.py
* style: fix black/isort formatting and mypy error in proxy_server.py
- Fix black formatting in experimental_mcp_client/client.py (done in prev commit)
- Fix black/isort formatting in key_management_endpoints.py, proxy_server.py, transformation.py
- Fix mypy: iterate over optional list safely (access_group_ids or []) in proxy_server.py
* fix(test): patch check_migration.verbose_logger directly to fix xdist ordering issue
When test_proxy_cli.py tests run before test_check_migration.py in the same
xdist worker, litellm.proxy.db.check_migration is already in sys.modules.
Patching litellm._logging.verbose_logger has no effect on the already-bound
reference. Patch the correct target (check_migration.verbose_logger) and
import the module before patching so the order doesn't matter.
* fix(mypy): make api_base Optional in PydanticAIProviderConfig to match base class signature
---------
Co-authored-by: Ihsan Soydemir <soydemir.ihsan@gmail.com>
Co-authored-by: Milan <milan@berri.ai>
Co-authored-by: Daniel Gandolfi <danielgandolfi@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
* Litellm ishaan april1 (#25103)
* fix(proxy): enforce upperbound key params on key/update and add custom_key_update hook
The /key/update endpoint did not enforce upperbound_key_generate_params,
allowing users to bypass configured limits (tpm_limit, rpm_limit,
max_budget, duration, budget_duration) by updating an existing key
instead of generating a new one.
Extract the upperbound enforcement logic from _common_key_generation_helper()
into a standalone _enforce_upperbound_key_params() function and call it from
both the generate and update paths. For updates, None values are skipped
(not filled with defaults) since they mean "don't change this field".
Also adds a custom_key_update config option and user_custom_key_update global,
mirroring the existing custom_key_generate pattern, so custom key validation
logic can fire during key updates as well.
* fix(proxy): invoke custom_key_update hook in bulk update path
The user_custom_key_update hook was only called in update_key_fn
(single key update) but not in _process_single_key_update (bulk
update path), allowing custom validation to be bypassed via the
/key/update/bulk endpoint. Mirror the hook invocation in both paths.
* fix(proxy): pass UpdateKeyRequest to hook in bulk path, not BulkUpdateKeyRequestItem
Move the custom_key_update hook invocation to after UpdateKeyRequest
is constructed so the hook receives the same type in both single and
bulk update paths. Previously the bulk path passed
BulkUpdateKeyRequestItem (5 fields only), which would cause
AttributeError for hooks accessing fields like tpm_limit or models.
* fix(bedrock): promote cache usage to message_delta for Claude Code (#24850)
Ensure Bedrock/Anthropic-compatible streaming exposes cache usage where Claude Code reads it by promoting message_stop usage onto message_delta and preserving usage fields in fake-streamed message_delta events.
Made-with: Cursor
* fix(search): Support self-hosted Firecrawl response format in search transform (#24866)
The `transform_search_response` method only handled Firecrawl Cloud (v2)
response format where `data` is a dict with `web`/`news` keys. Self-hosted
Firecrawl (v1) returns `data` as a flat list of result objects, causing an
`AttributeError: 'list' object has no attribute 'get'`.
Detect the response format by checking if `data` is a list (self-hosted)
or dict (cloud) and handle both cases.
Cloud format: {"data": {"web": [...], "news": [...]}}
Self-hosted: {"success": true, "data": [{"url": "...", "title": "...", ...}]}
Co-authored-by: Synergy <synergyoclaw@gmail.com>
* feat: add environment and user tracking to prompt management (#24855)
* feat: add environment and user tracking to prompt management
- Add environment (development/staging/production) and created_by columns to LiteLLM_PromptTable
- Update unique constraint to [prompt_id, version, environment]
- All CRUD endpoints support environment filtering and user tracking
- Redesigned prompt detail page with environment tabs and version history
- UI: environment filter on list page, environment selector in editor
- 8 new tests for environment and user tracking
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: Black formatting and add environments to PromptInfoResponse TypeScript type
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address Greptile review findings
- P1: delete_prompt scopes in-memory cleanup to environment when provided
- P2: dotprompt_content parsed directly regardless of environment flag
- P2: use distinct for environments query
- P2: fix double-fetch on initial mount in prompt_info.tsx
- fix: remove unsupported select kwarg from find_many
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: address remaining Greptile review comments
- Remove unused useCallback import (index.tsx)
- Remove unused ENV_COLORS variable (prompt_info.tsx)
- P1: in-memory fallback in get_prompt_versions now respects environment filter
- P1: reset selectedEnv when promptId changes to avoid stale state
- Cyclic imports are pre-existing pattern, not introduced by this PR
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: scope patch_prompt to environment using primary key
- Add environment query param to patch_prompt endpoint
- Look up target row by composite key (prompt_id + version + environment)
- Update by primary key (id) to target exactly one row
- Fixes Greptile finding: patch with multiple environments no longer ambiguous
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: use actual start_time for failed request spend logs (#24906)
async_post_call_failure_hook set both start_time and end_time to
datetime.now(), making all failed requests show duration=0. Use the
actual start_time from litellm_logging_obj instead, so spend logs
reflect the real request duration on timeout and other failures.
Fixes#24888
* feat(bedrock): add nova canvas image edit support (#24869)
* feat(bedrock): add nova canvas image edit support
* fix(bedrock): support PathLike inputs for nova image edit
* chore: sync schema.prisma copies from root
* fix(mypy): correct type-ignore code for delta_usage arg-type
* fix(mypy): cast status_code to str, suppress intentional str yield
* fix(lint): extract _create_content_block_chunks to fix PLR0915
* fix(lint): extract helpers to fix PLR0915 in prompt endpoints
---------
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: redhelix <amin.lalji@gmail.com>
Co-authored-by: Synergy <synergyoclaw@gmail.com>
Co-authored-by: Talha Anwar <37379131+talhaanwarch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: madhu19991 <madhu@thunkai.com>
Co-authored-by: Srikanth @adobe <devarakondasrikanth@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(test): update model armor streaming test to handle string or int error code
---------
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: redhelix <amin.lalji@gmail.com>
Co-authored-by: Synergy <synergyoclaw@gmail.com>
Co-authored-by: Talha Anwar <37379131+talhaanwarch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: madhu19991 <madhu@thunkai.com>
Co-authored-by: Srikanth @adobe <devarakondasrikanth@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix(proxy): enforce key-level model allowlist for custom auth
custom_auth_run_common_checks only runs common_checks (team/user/project model checks).
Custom auth now also enforces key-level model restrictions via can_key_call_model.
Move the custom-auth key-access regression tests to test_user_api_key_auth.py and keep test_custom_auth_end_user_budget.py focused on end-user budget behavior.
Made-with: Cursor
* fix(proxy): gate custom-auth key model checks behind opt-in
Keep key-level model allowlist enforcement in custom auth behind `custom_auth_run_common_checks` to preserve backwards compatibility, and update tests to verify default non-enforcement and opt-in enforcement behavior.
Made-with: Cursor
* test(proxy): isolate custom auth default check from shared settings state
Patch `proxy_server.general_settings` to an empty dict in the default custom-auth key-access test so it remains deterministic under shared module state.
Made-with: Cursor
* test(proxy): strengthen custom auth post-check assertions
Tighten custom auth regression tests by asserting exact can_key_call_model args and remove an unused common_checks mock from the default behavior path.
Made-with: Cursor
* fix(agentcore): parse A2A JSON-RPC responses in AgentCore provider
* fix(prompt-templates): ensure_alternating_roles handles tool-call chains
* feat(auth): add JWT claim routing overrides for OAuth2 validation
Made-with: Cursor
* docs(auth): document JWT-to-OAuth2 routing overrides
Add generic docs for running JWT and OAuth2 together, including routing_overrides YAML examples and list-based selector behavior for iss/client_id/aud.
Made-with: Cursor
---------
Co-authored-by: Milan <milan@berri.ai>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
- Add default_team_params to litellm_settings reference table in
config_settings.md with all sub-fields documented
- Update self_serve.md and msft_sso.md examples to include
team_member_permissions, tpm_limit, and rpm_limit
- Fix misleading comment that implied default_team_params only applies
to SSO auto-created teams — it applies to all /team/new calls
* update bedrock models in tests
* updated more tests and model_prices_and_context_window
* fix model id and pricing
* replace more sonnet models
* update tests
* git push
* update pricing
* flaky total cost
* monkey patch
* relax the cost change
* fix and revert some changes
* revert the pricing
* chore: move cost/pricing changes to bedrock-cost-fixes branch
* chore: split Bedrock file-api beta stripping to separate branch
Removes strip_unsupported_file_api_betas_for_bedrock_invoke from this branch;
see litellm_bedrock_invoke_strip_file_api_betas for that fix.
Made-with: Cursor
Remove @neondatabase/api-client and neonctl to address CVE-2026-25639
(axios supply chain vulnerability). Pin all JS dependencies to exact
versions across all package.json files to prevent future supply chain
attacks via semver range resolution.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>