- Fix perform_redaction to handle dict representation of ModelResponse (from model_dump())
- Preserve full choices structure when redacting, redact content/audio in place
- Add _redact_standard_logging_object helper for standard_logging_object field
- Update test_logging_redaction_e2e_test assertions to expect choices format
- Add charity_engine to provider_endpoints_support.json
Fixes: test_standard_logging_payload, test_standard_logging_payload_audio
Made-with: Cursor
Any param in DEFAULT_CHAT_COMPLETION_PARAM_VALUES that arrives via
completion(**kwargs) is now automatically forwarded to
get_optional_params(), even if it's not a named parameter of
completion().
Previously, get_non_default_completion_params() excluded params in
OPENAI_CHAT_COMPLETION_PARAMS (assuming they'd be forwarded via the
named-param path), while optional_param_args only contained explicitly
named params. Params like 'store' that were in the known-params list
but not named params fell through both paths and were silently dropped.
The fix adds a 7-line loop after building optional_param_args that
forwards any kwargs present in DEFAULT_CHAT_COMPLETION_PARAM_VALUES.
This means new OpenAI params only need to be added to the constants
dict — no boilerplate changes to 3+ function signatures required.
Fixes#23087
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
* feat(charity_engine): add Charity Engine provider
Charity Engine is a crowdsourced distributed computing platform that
donates processing power to charitable causes. Its inference API
provides OpenAI-compatible chat, completions, and embeddings endpoints.
* test(charity_engine): add provider config and resolution tests
Verify JSONProviderRegistry config, provider list membership,
model routing for charity_engine/<model>, and Router compatibility.
* feat(charity_engine): add Charity Engine to LlmProviders enum
Enables provider_list membership and LlmProviders.CHARITY_ENGINE
resolution required by the provider and test suite.
* fix(charity_engine): remove api_base_env to fix non-deterministic test
The CHARITY_ENGINE_API_BASE env var could override the base_url in CI,
causing test_charity_engine_provider_resolution to fail intermittently.
* fix(charity_engine): remove trailing slash from base_url
Claude's API returns assistant messages with empty text blocks
({"type": "text", "text": ""}) alongside tool_use blocks during
multi-turn tool-use conversations. These blocks are rejected when
sent back to the API with "text content blocks must be non-empty".
Sanitization already exists for other code paths (/v1/chat/completions
for both Anthropic and Bedrock), but NOT for the /v1/messages native
path. This adds the same treatment by stripping empty text blocks
from messages in async_anthropic_messages_handler before they are
forwarded to the provider.
Fixes#22930
The bug occurred when user data inadvertently contained reserved Python
keywords like 'self', 'params', or '__class__' as keys. When such a dict
was unpacked via **kwargs to LiteLLM_Params() or GenericLiteLLMParams(),
Python raised TypeError because 'self' was passed both implicitly and
as a keyword argument.
The fix:
- Add a Pydantic model_validator(mode='before') to GenericLiteLLMParams
that filters out reserved keys ('self', 'params', '__class__') before
validation
- Move the max_retries str-to-int conversion into the same validator
- Remove the custom __init__ methods from both GenericLiteLLMParams and
LiteLLM_Params, since the validator now handles the preprocessing
- Clean up unused VERTEX_CREDENTIALS_TYPES import
This fix applies to all classes that inherit from GenericLiteLLMParams,
including LiteLLM_Params and updateLiteLLMParams.
Added comprehensive tests in tests/test_litellm/test_litellm_params_reserved_keys.py
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
The test calls OpenAI's gpt-4o-audio-preview model which sometimes
doesn't return usage data in the streaming response. Fixed by:
- Adding @pytest.mark.flaky(retries=5, delay=2) for retry handling
- Fixing usage_obj loop to check chunk.usage is not None
- Skipping gracefully when OpenAI doesn't return usage data
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test_create_skill test was consistently failing in CI with a 500 from
Anthropic because the SKILL.md frontmatter always used the same hardcoded
name (test-skill-litellm). Since test_delete_skill is permanently skipped,
skills accumulate in the CI account, and re-creating with a duplicate name
triggers an Internal Server Error on Anthropic's side.
Fix: pass a timestamp-based unique_suffix to create_skill_zip so each run
produces a distinct skill name in the zip's SKILL.md frontmatter.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The Claude Agent SDK sends max_tokens=32000 for unrecognized model names
(like "bedrock-nova-pro"), which exceeds Nova Pro's 10,000 limit. Enable
modify_params in the test proxy config so LiteLLM clamps max_tokens to the
model's actual limit. Also swap nova-premier to nova-pro since premier
requires provisioned throughput unavailable in CI.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The test was flaky under pytest-xdist parallel execution because it used
async acompletion (which runs completion() in a thread pool via
run_in_executor) and relied on shared global state (known_tokenizer_config,
iam_token_cache, module_level_client) that could be modified by other tests
running in parallel. Failures were silently swallowed by a broad try/except,
causing mock_post.call_count to remain 0.
Fix:
- Convert from async acompletion to sync completion, matching every other
test in the file. The test's intent is verifying prompt transformation,
not async behavior.
- Use monkeypatch.setitem for known_tokenizer_config to ensure proper
teardown isolation.
- Remove unnecessary mock layers (async template fetchers, iam_token_cache
pre-population, mock completion response) that were only needed for the
async code path.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The SearXNG search tests were failing in CI because they depend on a live
SearXNG instance that returns results. Since this provider is used by a
very small subset of customers, replace the flaky integration tests with
deterministic unit tests that validate request payloads, URL construction,
response parsing, and header configuration without requiring external infra.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The gemini-3.1-flash-image-preview model introduced a new pricing field
that was missing from the test's validation schema and cost_fields list.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(agentcore): handle JSON responses from agents using sync return
BedrockAgentCoreApp agents that use synchronous `return` (instead of
async `yield`) respond with Content-Type: application/json instead of
text/event-stream. The streaming parser only handles SSE format, silently
discarding the JSON body and returning empty content to the client.
This adds Content-Type detection in both sync and async streaming
wrappers — when application/json is received, the response is parsed
and converted to a single-chunk stream. Also extends _parse_json_response
with a fallback chain supporting multiple agent response schemas (standard
AgentCore, Strands framework, plain string, raw JSON fallback).
* fix(agentcore): add dict-type guard to _parse_json_response
Prevent AttributeError when json.loads() returns a non-dict
(e.g. JSON array or primitive) by adding an isinstance check
at the top of _parse_json_response. Non-dict values fall back
to raw JSON string content.
* fix(agentcore): handle malformed JSON and split streaming chunks
- Wrap json.loads() in try/except in both sync and async streaming
wrappers so malformed JSON bodies raise a structured BedrockError
instead of a raw JSONDecodeError
- Split the JSON-fallback streaming path into two chunks (content
chunk with finish_reason=None, then stop sentinel with empty delta)
to match the SSE path convention
* fix(agentcore): catch IO errors in streaming JSON path + async error test
- Broaden except clause to catch both json.JSONDecodeError and IO-level
exceptions (httpx.ReadError, etc.) from response.read()/aread(), so
all failures surface as structured BedrockError
- Add async malformed-JSON test to mirror the sync test coverage
* feat(proxy): add Prisma DB pool and engine health metrics to Prometheus
Add a PrismaMetricsCollector that periodically queries pg_stat_activity
and the Prisma engine process to expose connection pool and engine health
as Prometheus gauges/counters. Auto-enabled when prometheus_system is in
service_callback.
New metrics:
- litellm_db_pool_active_connections (Gauge)
- litellm_db_pool_idle_connections (Gauge)
- litellm_db_pool_total_connections (Gauge)
- litellm_db_pool_waiting_connections (Gauge)
- litellm_db_engine_up (Gauge)
- litellm_db_engine_restarts_total (Counter)
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address Greptile review feedback
- Only increment engine_restarts counter on heavy reconnects (engine
actually dead), not lightweight network-blip reconnects
- Fix potential KeyError in _get_or_create_gauge/counter fallback path
when REGISTRY._names_to_collectors is absent
- Rename litellm_db_pool_waiting_connections to
litellm_db_pool_lock_waiting_connections to clarify it measures lock
contention, not pool slot queuing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: warn when prometheus_system enabled but watchdog disabled
Log a warning when users have prometheus_system in service_callback
but PRISMA_HEALTH_WATCHDOG_ENABLED=false, since DB pool and engine
metrics won't be collected in that configuration.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* ci: retrigger CI checks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* refactor: use labeled gauge for DB pool connection metrics
Replace 3 separate pool gauges (active, idle, total) with a single
`litellm_db_pool_connections` gauge using a `state` label. This is more
Prometheus-idiomatic and exposes all pg_stat_activity states (active,
idle, idle in transaction, etc.) without ambiguity about what "total"
includes.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: address Greptile review — stale labels and fallback re-registration
- Zero out known pg_stat_activity states that are absent from the current
query result, preventing stale gauge values from persisting.
- Simplify _get_or_create_gauge/counter by removing the fallback loop
that could re-register an already-registered metric (ValueError).
- Add test for stale label clearing across collection cycles.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: include "unknown" in _PG_STATES for stale label clearing
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: collect immediately on start and consolidate into single query
- Move sleep to end of loop so metrics appear on /metrics immediately
after startup instead of after a 30s delay.
- Combine pool state and lock waiting queries into a single SQL query
using conditional aggregation, halving per-cycle DB overhead.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: prevent tight spin loop on collection error
Move asyncio.sleep outside the try/except so it always executes even
when _collect_engine_health() or _collect_pool_metrics() raises.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: add multiprocess_mode to _get_or_create_gauge initialization
- Include `multiprocess_mode` parameter to properly support multiprocessing in Gauge creation.
- Ensure consistent behavior for labeled and unlabeled Gauges.
* fix: handle invalid env var and document watchdog prerequisite
- Add try/except ValueError for PRISMA_METRICS_COLLECTION_INTERVAL_SECONDS
to prevent proxy startup crash on non-numeric values (e.g. "30s")
- Document that DB metrics require both prometheus_system callback and
PRISMA_HEALTH_WATCHDOG_ENABLED=true
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: use defensive null coalescing for query_raw row values
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add invalid env var fallback test and fix mock signature
- Add test for non-numeric PRISMA_METRICS_COLLECTION_INTERVAL_SECONDS
- Add **kwargs to mock _patched_get_or_create_gauge for forward compat
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
When both enable_jwt_auth and enable_oauth2_auth are True, the proxy now
routes tokens based on their format:
- JWT tokens (3 dot-separated parts) -> JWT auth handler
- Opaque tokens -> OAuth2 auth handler
This enables using JWT for human users and OAuth2 for M2M (machine) clients
on the same LiteLLM instance. Previously, enabling OAuth2 would intercept
all tokens on LLM API routes before JWT auth could run.
When only one auth method is enabled, behavior is unchanged (backward compatible).
gpt-5.4-pro and gpt-5.4-pro-2026-03-05 do not support the
/v1/chat/completions endpoint — OpenAI returns a 404 with
"This is not a chat model". These models are responses-only,
like o3-pro and o1-pro.
Changes:
- Set mode from "chat" to "responses" for both model entries
- Update supported_endpoints to ["/v1/responses", "/v1/batch"]
- Add regression test for responses API bridge routing
FixesBerriAI/litellm#23014
When calling get_users() directly (not via FastAPI), Query() defaults
are not resolved. Pass organization_ids=None explicitly to avoid
'Query' object has no attribute 'split' error.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- test_get_users_*: pass proxy admin user_api_key_dict since get_users
now calls _authorize_user_list_request which checks user_role
- test_validate_team_member_add_permissions_non_admin: set
organization_id on mock team since _is_user_org_admin_for_team
accesses it
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Thread org objects {organization_id, organization_alias} instead of bare IDs from
users/page.tsx → view_users.tsx → CreateUserButton so the selector can show aliases
- Replace single-select org dropdown with multi-select; always shown when organizationIds
is non-null; disabled/pre-selected for single-org admins; displays "Alias (id)"
- handleCreate: maps organization_ids → organizations before POST, removes redundant
organizationMemberAddCall (backend _add_user_to_organizations handles it)
- _user_is_org_admin: also checks organizations list field in addition to singular
organization_id so /user/new succeeds for org admins
- Add 5 backend unit tests for _user_is_org_admin and 2 frontend tests for new form behavior
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Org admins and team admins opening the invite-user modal could not see
the 4 global proxy roles because GET /user/available_roles has no
request body, so the org-admin route check (which requires
organization_id in the payload) always returned False and blocked them.
Add /user/available_roles to self_managed_routes so the route-access
check passes for any authenticated user. The endpoint's existing
Depends(user_api_key_auth) still requires a valid API key.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* feat: enforce x-litellm-trace-id in header, if required
* feat: update spend for agent
* refactor: update agent table to follow similar format as other entities - also add a spend column - allows us to see spend of an agent
* fix: cleanup ui
* feat: return spend on agent endpoints
* feat: scope pr
* feat(agents/): support budgets + rate limiting on agents + agent sessions
* fix: address PR review feedback
- Add missing tpm_limit, rpm_limit, session_tpm_limit, session_rpm_limit
columns to root schema.prisma to match proxy and extras schemas
- Add backwards-compatible fallback to key metadata for max_iterations
so existing users don't silently lose enforcement
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: qa'ed RPM limiting on agents
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add health check toggle to agents page
Backend:
- Add health_check query parameter to GET /v1/agents endpoint
- When health_check=true, performs concurrent GET requests to each agent's
URL and filters out agents with unreachable URLs (5s timeout)
- Agents returning HTTP <500 are considered healthy; 5xx and connection
errors mark agents as unhealthy
UI:
- Add Health Check toggle (Switch) to agents panel header
- Toggle triggers re-fetch with health_check=true, filtering the agent list
- Icon color changes (green/gray) to indicate toggle state
- Tooltip explains behavior: 'only agents with reachable URLs are shown'
Networking:
- Update getAgentsList to accept optional healthCheck boolean parameter
Tests:
- Backend: 9 new tests covering health check filtering, _check_agent_url_health
helper (no URL, 200, 404, 500, connection error cases)
- UI: 3 new tests verifying toggle renders, initial fetch without health check,
and fetch with health check after toggle click
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* fix: fix greptile comment re: security issue
* fix: fix based on greptile feedback
* fix: align health check tests with implementation
- Rename test_should_return_unhealthy_when_no_url to
test_should_return_healthy_when_no_url (implementation returns
healthy=True for agents without a URL)
- Patch get_async_httpx_client instead of httpx.AsyncClient so mocks
actually intercept the HTTP calls made by _check_agent_url_health
- Remove unnecessary __aenter__/__aexit__ context-manager mocks
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* chore: undo _experimental/out renames from cherry-pick
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Update litellm/proxy/agent_endpoints/endpoints.py
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
* fix(docker): bump tar/minimatch/pypdf for CVE fixes + harden SBOM patching
- Bump tar 7.5.8→7.5.10, minimatch 10.2.1→10.2.4, pypdf 6.6.2→6.7.3
- Add sed-based SBOM metadata patching with properly indented find/sed
- Add npm package manager cleanup (apk del / apt-get purge) to remove
stale SBOM entries from image scanners
- Scope || true to only apk del via brace grouping { ... || true; }
- Guard npm root -g with non-empty assertion to prevent silent failures
- Scope minimatch sed regex to ^10.x to avoid matching other major versions
Addresses: CVE-2026-27903, CVE-2026-27904, GHSA-qffp-2rhf-9h96, CVE-2026-27888
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(docker): scope find to /usr/local/lib /usr/lib, drop autoremove
- Replace `find /` with `find /usr/local/lib /usr/lib` to avoid
traversing /proc, /sys, /dev during SBOM metadata patching
- Remove `apt-get autoremove -y` from Debian-based Dockerfiles to
prevent nodejs from being removed as an auto-installed dependency
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* Revert "fix(vertex): drop bare {} schemas from anyOf before adding nullable=True (#23060)"
This reverts commit 3ad9a536d3.
* Revert "Merge pull request #22589 from Chesars/fix/vertex-preserve-any-type-schema"
This reverts commit da941e4261, reversing
changes made to f77f28a5f8.
streaming_handler.py: EndpointType.ANTHROPIC was missing from the cost
injection block — only VERTEX_AI was handled, so Anthropic passthrough
streaming never got cost injected into message_delta chunks even with
include_cost_in_streaming_usage: true.
test_anthropic_passthrough.py: AnthropicResponsesStreamWrapper yields
full multi-line SSE frames as single bytes objects (e.g.
"event: message_delta\ndata: {...}\n\n"). The tests were checking
startswith('data: ') on the whole chunk, which starts with 'event:',
so every message_delta event was silently skipped. Fix: split each chunk
by \n before checking for the data: prefix. Also removes the
@pytest.mark.skip added with wrong diagnosis on the OpenAI model test.
DALL-E 2 create_variation requires a square PNG. The old fixture fetched
the LiteLLM logo from S3 which is non-square, causing API rejections.
Replace with a programmatically-generated 1024x1024 RGBA PNG via Pillow.
* fix(anthropic/skills): remove ?beta=true query param from Skills API URLs
Beta access is controlled via the anthropic-beta header (already set
to skills-2025-10-02), not a URL query param. The spurious ?beta=true
was causing 500 errors from Anthropic's server.
* fix(test): update openrouter image generation assertion to accept any image format
gemini-2.5-flash-image returns JPEG, not PNG. The assertion was hardcoded
to png after the model was swapped from gemini-2.5-flash-image-preview
(which returned PNG) in commit 34e8e972.
* fix(ci): fix image variation test for openai sdk 2.24.0 and swap nova-premier to nova-pro
image_gen_tests: openai==2.24.0 (bumped Feb 25) requires BytesIO objects to have
a .name attribute for MIME type detection in multipart uploads. Add .name to the
fixture so create_variation works. Also guard with OPENAI_API_KEY skipif.
proxy_e2e_anthropic_messages_tests: nova-premier requires provisioned throughput
not available via standard on-demand cross-region inference on the CI account.
Swap to nova-pro which uses standard inference profiles.
* fix: remove skipif, keep only .name fix for openai sdk compat
- Replace ModelResponse(stream=True) with ModelResponseStream in
test_unit_test_custom_stream_wrapper_repeating_chunk — stream=True
stores delta as a plain dict causing AttributeError in CustomStreamWrapper
- Accept MidStreamFallbackError alongside InternalServerError in the
repeating-chunk safety check assertion
- Add @pytest.mark.flaky(retries=3) to the live OpenAI audio output
usage test