mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 22:48:35 +00:00
386f334feef5808074de4f201d7d511a6f3acabe
28 Commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
386f334fee |
Prompt Compression - add it to the proxy (#25729)
* refactor: new agentic loop event hook simplifies how to create logic for tool based multi llm calls * fix: compress - make it work on anthropic input as well * fix(compress.py): working prompt compression for claude code ensures claude code messages can run through proxy easily * docs: add agentic loop hook guide * docs: add agentic_loop_hook to sidebar * fix: fix multiple arguments error * fix: fix tool call loop for compression on streaming /v1/messages * fix: fix linting errors * fix: fix ci/cd errors * feat(litellm_pre_call_utils.py): use claude code session for litellm session id allows claude code logs to be stitched together, making it easy to know they were all part of the same conversation * fix: suppress incorrect mypy warning rE: module * revert: drop PR's changes to litellm/proxy/_experimental/out/ Restores the 34 HTML files under _experimental/out/ to their pre-PR paths (X/index.html -> X.html). All renames are R100 (content unchanged); no other files are touched. * fix: address greptile review comments on PR #25729 - Skip ``kwargs["tools"] = []`` injection when compression is a no-op — Anthropic Messages rejects empty tool arrays on requests that did not originally declare tools. - Move agentic-loop safety guards (fingerprint cycle / max depth) out of the per-callback try/except so they propagate instead of being swallowed by the generic exception handler. Extracted _check_agentic_loop_safety. - Gate generic ``x-<vendor>-session-id`` capture behind the LITELLM_CAPTURE_VENDOR_SESSION_HEADERS env var (off by default) to preserve backwards compatibility; explicit x-litellm-* headers are unaffected. - Fix monkeypatch target in pre-call-hook test to patch the actual module-level binding (litellm.integrations.compression_interception.handler.compress). - Add regression tests for empty-tools skip and opt-in session capture. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * revert: drop LITELLM_CAPTURE_VENDOR_SESSION_HEADERS flag Generic x-<vendor>-session-id header capture is a new feature and only runs *after* the explicit x-litellm-trace-id / x-litellm-session-id checks, so it does not change behavior for any existing caller that was already using the LiteLLM headers — no backwards-incompatibility to gate. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * refactor(compress): replace input_type with CallTypes call_type Drop the bespoke ``CompressionInputType`` literal and use the existing ``litellm.types.utils.CallTypes`` enum instead. ``litellm.compress()`` now takes ``call_type: Union[CallTypes, str]`` (default ``CallTypes.completion``) — no new concept to learn, and the enum is already the way the rest of the codebase talks about request shapes. Supported values: ``completion`` / ``acompletion`` (OpenAI chat-completions shape) and ``anthropic_messages`` (Anthropic structured content blocks). Updated: compress(), the compression_interception handler, tests, docs, and the two eval scripts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e8461b5b97 | style: run black formatter on files from main merge | ||
|
|
cb8fc480e6 |
Merge pull request #25732 from harish876/health-check-oom
Optimize database query to prevent OOM errors during health checks |
||
|
|
d20c70f24c |
Optimize database query which fetches latest model_id, model_name pairs and dedupes them in memory.
Current fix includes - Updates test case - Optimized query with docstring. The change leverages deduplication and sorting logic from SQL - Added a bench script to differentiate peak memory usage before and after |
||
|
|
0e43050a01 |
Merge pull request #25650 from BerriAI/litellm_dev_04_13_2026_p1
feat: add litellm.compress() — BM25-based prompt compression with ret… |
||
|
|
26c7412339 |
feat: add litellm.compress() — BM25-based prompt compression with retrieval tool (#25637)
* feat: add litellm.compress() for BM25-based context compression Adds a compress() utility that reduces context size for LLM calls using BM25 relevance scoring (with optional semantic embeddings via litellm.embedding()). Messages below a token threshold pass through unchanged; messages above are scored, ranked, and the lowest-relevance ones replaced with stubs. Originals are cached and a retrieval tool is injected so the model can recover dropped content on demand. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(compress): truncate high-scoring messages instead of fully stubbing them When a relevant message was too large to fit in the token budget it was replaced with a stub, leaving the LLM with no real content to work with. Now the highest-scoring overflow message is truncated (first 70% + last 30% of words) to fill the remaining budget, so the LLM always receives actual content rather than just a retrieval pointer. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(bm25): add prefix expansion so query terms match inflected doc tokens "cook" now matches "cooking", "auth" matches "authentication", etc. Without this, short query terms scored 0 against longer inflected forms in documents, causing the wrong message to be kept. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * test: add routing correctness test and eval harness for litellm.compress() - test_simple_compression: parametrized test verifying BM25 routes the right message based on query ("How to cook?" keeps cooking, "Fix auth" keeps auth content) - eval_compression.py: end-to-end eval harness comparing baseline vs compressed model performance on HumanEval-style coding problems Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat(eval): add SWE-bench Lite compression eval harness Uses princeton-nlp/SWE-bench_Lite_bm25_27K which bundles ~27k tokens of BM25-retrieved repo context per problem — large enough to meaningfully stress litellm.compress() without Docker or GitHub API calls. Proxy eval metrics (no test runner needed): - has_diff: model produced a valid unified diff - file_overlap: fraction of gold-patch files in generated patch - exact_file_match: generated patch touches exactly the right files Run: python tests/eval_swe_bench.py --model gpt-4o --problems 10 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(eval): robust dataset loading + sys.path fix for worktree imports - Add HuggingFace API fallback so the SWE-bench loader doesn't need the `datasets` library (avoids pyarrow/numpy binary compat issues) - Insert repo root into sys.path so compression module resolves from worktrees - Use direct import of litellm_compress to avoid __getattr__ issues Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * improve compression quality: line-based truncation, multi-message budget, 70% default target - Switch truncate_message from word-based to line-based splitting to preserve code structure (function boundaries, indentation) - Allow multiple messages to be truncated instead of burning entire budget on one overflow message - Raise default compression target from 50% to 70% of trigger for better quality/cost tradeoff - Add --compression-target CLI arg to SWE-bench eval harness - Move tests to canonical locations (tests/test_litellm/, scripts/) - Add docs page and sidebar entries for compress() Eval results (5 problems, Opus, trigger=10k): Hunk overlap delta improved from -0.417 to -0.221 Content similarity now matches baseline (+0.006) Cost savings: 72% Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * docs: add SWE-bench performance results to compress() docs Include benchmark table from Opus eval (5 problems, trigger=10k) showing 72% cost savings with file-level quality fully preserved. Add metric explanations and eval runner examples. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(eval): use tolerance-based hunk overlap metric The exact line-number matching was too brittle — LLM-generated patches often target the right code region but with slightly offset line numbers. Switch to hunk-level overlap with a 10-line tolerance window so nearby edits count as matches. This better reflects actual patch quality. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * feat: add compression_interception callback for LiteLLM Proxy Add a proxy callback that automatically compresses incoming /v1/messages payloads above a configurable token threshold, runs the retrieval tool loop server-side, and returns the final response. This brings compress() support to proxy deployments (e.g. Claude Code via /v1/messages). - New callback: litellm/integrations/compression_interception/ - Proxy config: compression_interception_params in litellm_settings - Support for input_type param in compress() (openai vs anthropic) - Docs: proxy setup instructions with YAML config example - Tests: 139-line unit test suite for the interception handler Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Revert "feat: add compression_interception callback for LiteLLM Proxy" This reverts commit 72bd5cb152ca1df07f14a14e14a2816e188874a8. --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
a6c30b30bf |
build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv * ci: move automation and local tooling to uv * docker: migrate image builds and runtime setup to uv * docs: update install and deployment guidance for uv * chore: align auxiliary scripts and tests with uv * test: harden test_litellm isolation * fix: keep release and health check images self-contained * build: pin uv tooling and health check deps * test: isolate bedrock image request formatting from suite state * test: cover sandbox executor requirements flow * ci: fix circleci no-op command steps * ci: fix circleci publish workflow parsing * fix: stabilize remaining uv migration CI checks * ci: increase matrix test timeout headroom * fix: restore published docker and license coverage * fix: restore proxy runtime build parity * fix: restore proxy extras parity and venv migrations * ci: persist uv path across circleci steps * fix: keep psycopg binary in default test env * docker: preserve prisma cache across stages * test: run local proxy checks through uv python * build: restore runtime deps moved into ci * build: refresh uv lock after upstream merge * fix: restore module import in test_check_migration after merge The conflict resolution imported only the function but the test body references check_migration as a module throughout. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching - Move google-generativeai, Pillow, tenacity back to ci group (they are lazily imported and bloat the base SDK install needlessly) - Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant in Docker where system Node.js is already installed via apk) - Remove all nodejs-wheel node replacement and venv npm patching blocks from Dockerfiles since the wheel is no longer installed - Add --no-default-groups to CodSpeed benchmark workflow so the benchmark environment matches the old minimal pip install footprint - Apply standard uv two-phase Docker pattern: copy metadata first, install deps (cached layer), then copy source and install project - Replace CircleCI enterprise no-op with proper uv sync command Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * chore: regenerate uv.lock after removing nodejs-wheel-binaries Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): use cache/restore instead of cache to prevent cache poisoning The old workflow used actions/cache/restore (read-only). The uv migration changed it to actions/cache (read-write), which zizmor flags as a cache poisoning risk. Restore the safer read-only variant. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert The setup-uv action enables caching by default, which zizmor flags as a cache poisoning risk. Disable it since we already use a read-only cache/restore step. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): disable setup-uv cache in publish workflow Silences zizmor cache-poisoning alert. Publishing workflow runs infrequently on protected branches so caching adds no real benefit. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(test): remove duplicate verbose_logger mock in test_check_migration The logger was patched twice — first via mocker.patch() then via mocker.patch.object(autospec=True). The second call fails because autospec cannot inspect an already-mocked attribute. Remove the redundant first patch. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * fix(ci): free disk space before Docker build in test-server-root-path The Dockerfile.non_root build ran out of disk on the CI runner. Remove Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
51af6fedb3 |
[Infra] Harden supply chain: remove unused scripts, add pip binary-only install
Remove ci_cd/publish-proxy-extras.sh (dead, unreferenced PyPI publish script) and .pre-commit-config.yaml (pulls external repos from GitHub on git commit). Add --only-binary :all: to scripts/install.sh to prevent execution of malicious setup.py during pip install. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
5f63873dca |
[Infra] Pin all Docker build dependencies to exact versions
Pin every dependency across all Docker builds so upgrades are intentional. Verified by building all 3 production images and diffing pip freeze against known-good v1.83.0-nightly baselines — zero version drift. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> |
||
|
|
8e61b32b8e |
[Staging] - Ishaan March 17th (#23903)
* feat(xai): add grok-4.20 beta 2 models with pricing (#23900)
Add three grok-4.20 beta 2 model variants from xAI:
- grok-4.20-multi-agent-beta-0309 (reasoning + multi-agent)
- grok-4.20-beta-0309-reasoning (reasoning)
- grok-4.20-beta-0309-non-reasoning
Pricing (from https://docs.x.ai/docs/models):
- Input: $2.00/1M tokens ($0.20/1M cached)
- Output: $6.00/1M tokens
- Context: 2M tokens
All variants support vision, function calling, tool choice, and web search.
Closes LIT-2171
* docs: add Quick Install section for litellm --setup wizard (#23905)
* docs: add Quick Install section for litellm --setup wizard
* docs: clarify setup wizard is for local/beginner use
* feat(setup): interactive setup wizard + install.sh (#23644)
* feat(setup): add interactive setup wizard + install.sh
Adds `litellm --setup` — a Claude Code-style TUI onboarding wizard that
guides users through provider selection, API key entry, and proxy config
generation, then optionally starts the proxy immediately.
- litellm/setup_wizard.py: wizard with ASCII art, numbered provider menu
(OpenAI, Anthropic, Azure, Gemini, Bedrock, Ollama), API key prompts,
port/master-key config, and litellm_config.yaml generation
- litellm/proxy/proxy_cli.py: adds --setup flag that invokes the wizard
- scripts/install.sh: curl-installable script (detect OS/Python, pip
install litellm[proxy], launch wizard)
Usage:
curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh
litellm --setup
* fix(install.sh): remove orange color, add LITELLM_BRANCH env var for branch installs
* fix(install.sh): install from git branch so --setup is available for QA
* fix(install.sh): remove stale LITELLM_BRANCH reference that caused unbound variable error
* fix(install.sh): force-reinstall from git to bypass cached PyPI version
* fix(install.sh): show pip progress bar during install
* fix(install.sh): always launch wizard via $PYTHON_BIN -m litellm, not PATH binary
* fix(install.sh): use litellm.proxy.proxy_cli module (no __main__.py exists)
* fix(install.sh): suppress RuntimeWarning from module invocation
* fix(install.sh): use Python bin-dir litellm binary to avoid CWD sys.path shadowing
* fix(install.sh): use sysconfig.get_path('scripts') to find pip-installed litellm binary
* fix(install.sh): redirect stdin from /dev/tty on exec so wizard gets terminal, not exhausted pipe
* fix(install.sh): warn about git clone duration, drop --no-cache-dir so re-runs are faster
* feat(setup_wizard): arrow-key selector, updated model names
* fix(setup_wizard): use sysconfig binary to start proxy, not python -m litellm
* feat(setup_wizard): credential validation after key entry + clear next-steps after proxy start
* style(install.sh): show git clone warning in blue
* refactor(setup_wizard): class with static methods, use check_valid_key from litellm.utils
* address greptile review: fix yaml escaping, port validation, display name collisions, tests
- setup_wizard.py: add _yaml_escape() for safe YAML embedding of API keys
- setup_wizard.py: add _styled_input() with readline ANSI ignore markers
- setup_wizard.py: change DIVIDER to _divider() fn to avoid import-time color capture
- setup_wizard.py: validate port range 1-65535, initialize before loop
- setup_wizard.py: qualify azure display names (azure-gpt-4o) to avoid collision with openai
- setup_wizard.py: work on env_copy in _build_config to avoid mutating caller's dict
- setup_wizard.py: skip model_list entries for providers with no credentials
- setup_wizard.py: prompt for azure deployment name
- setup_wizard.py: wrap os.execlp in try/except with friendly fallback
- setup_wizard.py: wrap config write in try/except OSError
- setup_wizard.py: fix _validate_and_report to use two print lines (no \r overwrite)
- setup_wizard.py: add .gitignore tip next to key storage notice
- setup_wizard.py: fix run_setup_wizard() return type annotation to None
- scripts/install.sh: drop pipefail (not supported by dash on Ubuntu when invoked as sh)
- scripts/install.sh: use litellm[proxy] from PyPI (not hardcoded dev branch)
- scripts/install.sh: guard /dev/tty read with -r check for Docker/CI compat
- scripts/install.sh: remove --force-reinstall to avoid downgrading dependencies
- tests/test_litellm/test_setup_wizard.py: 13 unit tests for _build_config and _yaml_escape
* style: black format setup_wizard.py
* fix: address remaining greptile issues - Windows compat, YAML quoting, credential flow
- guard termios/tty imports with try/except ImportError for Windows compat
- quote master_key as YAML double-quoted scalar (same as env vars)
- remove unused port param from _build_config signature
- _validate_and_report now returns the final key so re-entered creds are stored
- add test for master_key YAML quoting
* fix: add --port to suggested command, guard /dev/tty exec in install.sh
* fix: quote api_base in YAML, skip azure if no deployment, only redraw on state change
* fix: address greptile review comments
- _yaml_escape: add control character escaping (\n, \r, \t)
- test: fix tautological assertion in test_build_config_azure_no_deployment_skipped
- test: add tests for control character escaping in _yaml_escape
* feat(ui): remove Chat UI page link and banner from sidebar and playground (#23908)
* feat(guardrails): MCPJWTSigner - built-in guardrail for zero trust MCP auth (#23897)
* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers
* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.
* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.
* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers
* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.
* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.
* feat(guardrails): add MCPJWTSigner built-in guardrail for zero trust MCP auth
Signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT so MCP servers
can trust a single signing authority instead of every upstream IdP.
Enable in config.yaml:
guardrails:
- guardrail_name: mcp-jwt-signer
litellm_params:
guardrail: mcp_jwt_signer
mode: pre_mcp_call
default_on: true
JWT carries sub (user_id), act.sub (team_id, RFC 8693), tool-level scope, iss,
aud, iat/exp/nbf. RSA-2048 keypair auto-generated at startup unless
MCP_JWT_SIGNING_KEY env var is set.
Adds /.well-known/jwks.json endpoint and jwks_uri to /.well-known/openid-configuration
so MCP servers can verify LiteLLM-issued tokens via OIDC discovery.
* Update MCPServerManager to raise HTTPException with status code 400 for extra headers in OpenAPI-backed servers. Adjust tests to verify the correct status code and exception message.
* fix: address P1 issues in MCPJWTSigner
- OpenAPI servers: warn + skip header injection instead of 500
- JWKS Cache-Control: 5min for auto-generated keys, 1h for persistent
- sub claim: fallback to apikey:{token_hash} for anonymous callers
- ttl_seconds: validate > 0 at init time
* docs: add MCP zero trust auth guide with architecture diagram
* docs: add FastMCP JWT verification guide to zero trust doc
* fix: address remaining Greptile review issues (round 2)
- mcp_server_manager: warn when hook Authorization overwrites existing header
- __init__: remove _mcp_jwt_signer_instance from __all__ (private internal)
- discoverable_endpoints: copy dict instead of mutating in-place on OIDC augmentation
- test docstring: reflect warn-and-continue behavior for OpenAPI servers
- test: update scope assertions for least-privilege (no mcp:tools/list on tool-call JWTs)
* fix: address Greptile round 3 feedback
- initialize_guardrail: validate mode='pre_mcp_call' at init time — misconfigured
mode silently bypasses JWT injection, which is a zero-trust bypass
- _build_claims: remove duplicate inline 'import re' (module-level import already present)
- _types.py: add TODO comment explaining jwt_claims is forward-compat plumbing
for a follow-up PR that will forward upstream IdP claims into outbound MCP JWTs
* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes
Addresses all missing pieces from the scoping doc review:
FR-5 (Verify + re-sign): MCPJWTSigner now accepts access_token_discovery_uri
and token_introspection_endpoint. When set, the incoming Bearer token is
extracted from raw_headers (threaded through pre_call_tool_check), verified
against the IdP's JWKS (JWT) or introspected (opaque), and only re-signed if
valid. Falls back to user_api_key_dict.jwt_claims for LiteLLM JWT-auth mode.
FR-12 (Configurable end-user identity mapping): end_user_claim_sources
ordered list drives sub resolution — sources: token:<claim>, litellm:user_id,
litellm:email, litellm:end_user_id, litellm:team_id.
FR-13 (Claim operations): add_claims (insert-if-absent), set_claims (always
override), remove_claims (delete) applied in that order.
FR-14 (Two-token model): channel_token_audience + channel_token_ttl issue a
second JWT injected as x-mcp-channel-token: Bearer <token>.
FR-15 (Incoming claim validation): required_claims raises HTTP 403 when any
listed claim is absent; optional_claims passes listed claims from verified
token into the outbound JWT.
FR-9 (Debug headers): debug_headers: true emits x-litellm-mcp-debug with kid,
sub, iss, exp, scope.
FR-10 (Configurable scopes): allowed_scopes replaces auto-generation. Also
fixed: tool-call JWTs no longer grant mcp:tools/list (overpermission).
P1 fixes:
- proxy/utils.py: _convert_mcp_hook_response_to_kwargs merges rather than
replaces extra_headers, preserving headers from prior guardrails.
- mcp_server_manager.py: warns when hook injects Authorization alongside a
server-configured authentication_token (previously silent).
- mcp_server_manager.py: pre_call_tool_check now accepts raw_headers and
extracts incoming_bearer_token so FR-5 verification has the raw token.
- proxy/utils.py: remove stray inline import inspect inside loop (pre-existing
lint error, now cleaned up).
Tests: 43 passing (28 new tests covering all FR flags + P1 fixes).
* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes (core)
Remaining files from the FR implementation:
mcp_jwt_signer.py — full rewrite with all new params:
FR-5: access_token_discovery_uri, token_introspection_endpoint,
verify_issuer, verify_audience + _verify_incoming_jwt(),
_introspect_opaque_token()
FR-12: end_user_claim_sources ordered resolution chain
FR-13: add_claims, set_claims, remove_claims
FR-14: channel_token_audience, channel_token_ttl → x-mcp-channel-token
FR-15: required_claims (raises 403), optional_claims (passthrough)
FR-9: debug_headers → x-litellm-mcp-debug
FR-10: allowed_scopes; tool-call JWTs no longer over-grant tools/list
mcp_server_manager.py:
- pre_call_tool_check gains raw_headers param to extract incoming_bearer_token
- Silent Authorization override warning fixed: now fires when server has
authentication_token AND hook injects Authorization
tests/test_mcp_jwt_signer.py:
28 new tests covering all FR flags + P1 fixes (43 total, all passing)
* fix(mcp_jwt_signer): address pre-landing review issues
- Remove stale TODO comment on UserAPIKeyAuth.jwt_claims — the field is
already populated and consumed by MCPJWTSigner in the same PR
- Fix _get_oidc_discovery to only cache the OIDC discovery doc when
jwks_uri is present; a malformed/empty doc now retries on the next
request instead of being permanently cached until proxy restart
- Add FR-5 test coverage for _fetch_jwks (cache hit/miss),
_get_oidc_discovery (cache/no-cache on bad doc), _verify_incoming_jwt
(valid token, expired token), _introspect_opaque_token (active,
inactive, no endpoint), and the end-to-end 401 hook path — 53 tests
total, all passing
* docs(mcp_zero_trust): rewrite as use-case guide covering all new JWT signer features
Add scenario-driven sections for each new config area:
- Verify+re-sign with Okta/Azure AD (access_token_discovery_uri,
end_user_claim_sources, token_introspection_endpoint)
- Enforcing caller attributes with required_claims / optional_claims
- Adding metadata via add_claims / set_claims / remove_claims
- Two-token model for AWS Bedrock AgentCore Gateway
(channel_token_audience / channel_token_ttl)
- Controlling scopes with allowed_scopes
- Debugging JWT rejections with debug_headers
Update JWT claims table to reflect configurable sub (end_user_claim_sources)
* fix(mcp_jwt_signer): wire all config.yaml params through initialize_guardrail
The factory was only passing issuer/audience/ttl_seconds to MCPJWTSigner.
All FR-5/9/10/12/13/14/15 params (access_token_discovery_uri,
end_user_claim_sources, add/set/remove_claims, channel_token_audience,
required/optional_claims, debug_headers, allowed_scopes, etc.) were
silently dropped, making every advertised advanced feature non-functional
when loaded from config.yaml.
Add regression test that asserts every param is wired through correctly.
* docs(mcp_zero_trust): add hero image
* docs(mcp_zero_trust): apply Linear-style edits
- Lead with the problem (unsigned direct calls bypass access controls)
- Shorter statement section headers instead of question-form headers
- Move diagram/OIDC discovery block after the reader is bought in
- Add 'read further only if you need to' callout after basic setup
- Two-token section now opens from the user problem not product jargon
- Add concrete 403 error response example in required_claims section
- Debug section opens from the symptom (MCP server returning 401)
- Lowercase claims reference header for consistency
* fix(mcp_jwt_signer): fix algorithm confusion attack + add OIDC discovery 24h TTL
- Remove alg from unverified JWT header; use signing_jwk.algorithm_name from JWKS key instead.
Reading alg from attacker-controlled headers enables alg:none / HS256 confusion attacks.
- Add _oidc_discovery_fetched_at timestamp and _OIDC_DISCOVERY_TTL = 86400 (24h).
Without a TTL the cached discovery doc never refreshes, so IdP key rotation is invisible.
---------
Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>
* fix(ci): stabilize CI - formatting, type errors, test polling, security CVEs, router bug, batch resolution
Fix 1: Run Black formatter on 35 files
Fix 2: Fix MyPy type errors:
- setup_wizard.py: add type annotation for 'selected' set variable
- user_api_key_auth.py: remove redundant type annotation on jwt_claims reassignment
Fix 3: Fix spend accuracy test burst 2 polling to wait for expected total
spend instead of just 'any increase' from burst 2
Fix 4: Bump Next.js 16.1.6 -> 16.1.7 to fix CVE-2026-27978, CVE-2026-27979,
CVE-2026-27980, CVE-2026-29057
Fix 5: Fix router _pre_call_checks model variable being overwritten inside
loop, causing wrong model lookups on subsequent deployments. Use local
_deployment_model variable instead.
Fix 6: Add missing resolve_output_file_ids_to_unified call in batch retrieve
non-terminal-to-terminal path (matching the terminal path behavior)
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* chore: regenerate poetry.lock to sync with pyproject.toml
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: format merged files from main and regenerate poetry.lock
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(mypy): annotate jwt_claims as Optional[dict] to fix type incompatibility
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): update router region test to use gpt-4.1-mini (fix flaky model lookup)
Replace deprecated gpt-3.5-turbo-1106 with gpt-4.1-mini + mock_response in
test_router_region_pre_call_check, following the same pattern used in commit
|
||
|
|
1f412bc6d8 |
[Feat] Add Tool Policies for AI Gateway (#22732)
* fix: fix ui render * fix: fix minor bugs * refactor: use prisma functions instead of raw sql (safer) * fix(add-new-tiles-to-tool-policies): allow developer to see what's available * feat: ensure tool allowlist runs correctly for tool names + mcp's * refactor: more ui improvements * feat: working key tool blocking * feat(tools): show tool logs * refactor: backend code improvements * refactor: improve log viewer for tools * fix: address PR review feedback for tool access control - Add missing blocked_tools column to root schema.prisma (schema drift) - Invalidate ToolPolicyRegistry after policy mutations so changes take effect immediately - Remove dead code: unused get_effective_policies, get_tool_policies_cached, and helpers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix: race condition in permission resolution and remove duplicate allowlist check - Use atomic update_many with object_permission_id=None to prevent concurrent requests from creating orphaned permission rows and losing tool blocks - Remove duplicate allowed_tools enforcement from guardrail (already enforced in auth layer via check_tools_allowlist) - Move inline uuid import to module level Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * update to account for userAgent * UI - Add ToolDetails * input/output policy * LiteLLM_PolicyAttachmentTable * LiteLLM_PolicyAttachmentTable * fix: add _enqueue_tool_registry_upsert * fix: tool mgmt endpoints * tool mgmt endpoints * Update tests/test_litellm/proxy/db/test_tool_registry_writer.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tests/test_litellm/proxy/db/test_tool_registry_writer.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * Update tests/test_litellm/proxy/db/test_tool_registry_writer.py Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> * fix: sync root schema.prisma and fix test_tool_registry_writer for input/output policy - Migrate root schema.prisma LiteLLM_ToolTable from call_policy to input_policy/output_policy, add missing user_agent and last_used_at columns (now consistent with litellm/proxy/schema.prisma and litellm-proxy-extras) - Fix SpendLogToolIndex comment across all three schema files - Fix all call_policy references in test_tool_registry_writer.py: swapped update_tool_policy arguments, wrong get_tools_by_names return type assertions, _mock_tool_row setting call_policy instead of input_policy Addresses Greptile review feedback on PR #22732. Made-with: Cursor --------- Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com> |
||
|
|
67f90254ed |
feat(guardrails): team-based guardrail registration and approval workflow (#22459)
* feat(guardrails): team-based guardrail registration and approval workflow Add team-based guardrail submission system where teams can register Generic Guardrail API guardrails for admin review. Includes: - POST /guardrails/register endpoint for team-scoped submissions - Admin review endpoints (list/get/approve/reject submissions) - Team Guardrails tab in the UI dashboard - extra_headers support for forwarding client headers to guardrail APIs - Prisma schema migration for status, submitted_at, reviewed_at fields - Documentation for team-based guardrails and static/dynamic headers Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(guardrails): address review feedback - SSRF, silent failure, redundant query - Validate api_base URL scheme (http/https only) and hostname in register_guardrail to prevent SSRF via team submissions - Return warning field in approve response when in-memory initialization fails so admins know the guardrail won't work until next sync cycle - Eliminate redundant DB query in list_guardrail_submissions by fetching all team guardrails once and deriving both filtered list and summary counts from the single result set Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(guardrails): add pending_review status guard to reject endpoint Prevent rejecting already-active or already-rejected guardrails, which would create a DB/memory inconsistency (active in memory but rejected in DB). Now mirrors the approve endpoint's status check. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
12c4876891 |
Agents - assign tools (#22064)
* feat(proxy): add max_iterations limiter for agent session loops (#22058) Adds a new proxy hook that enforces a per-session cap on the number of LLM calls an agentic loop can make. Callers send a session_id with each request, and the hook counts calls per session, returning 429 when the configured max_iterations limit is exceeded. - Uses Redis Lua script for atomic increment (multi-instance safe) - Falls back to in-memory cache when Redis unavailable - Follows parallel_request_limiter_v3 pattern - Configurable via key metadata: {"max_iterations": 25} - Session counters auto-expire via TTL (default 1hr) Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * feat: add new code execution dataset * feat(agent_endpoints/): allow giving agents keys * fix: ui fixes * feat: allow assigning mcp servers to agents * fix: eliminate duplicate DB queries in MCP agent auth and N+1 in agent listing (#22110) - Extract _get_agent_object_permission helper so _get_allowed_mcp_servers_for_agent and _get_agent_tool_permissions_for_server share a single DB fetch instead of each independently querying the same agent row (was 1+N queries per MCP request) - Use include={"object_permission": True} on find_many in get_all_agents_from_db to eagerly load permissions in one query instead of N+1 - Use include={"object_permission": True} on create/update/find_unique in all agent CRUD operations, removing attach_object_permission_to_dict follow-up calls Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> |
||
|
|
e0ddb2a525 | fix: guard print_aggregate against empty latencies | ||
|
|
95d9514054 | fix: add auth headers and empty latencies guard to benchmark script | ||
|
|
94b76ea9ad |
feat: add network_mock transport for benchmarking proxy overhead without real API calls
Intercepts at httpx transport layer so the full proxy path (auth, routing,
OpenAI SDK, response transformation) is exercised with zero-latency responses.
Activated via `litellm_settings: { network_mock: true }` in proxy config.
|
||
|
|
7f81dea8b3 | Add custom auth header support and increase default prompt size to 100k chars (#19436) | ||
|
|
270b41b0f4 | Simplify file comments (#19382) | ||
|
|
0cd7763d5f |
Add health check scripts and parallel execution support (#19295)
- Add health_check_client.py for monitoring model availability - Add health_check_client_README.md with usage documentation - Add health_check_requirements.txt for dependencies - Add run_parallel_health_checks.ps1 (PowerShell version) - Add run_parallel_health_checks.sh (Bash version) - Organize all scripts under scripts/health_check/ directory |
||
|
|
07fe9e8604 |
implement failopen option default to True on grayswan guardrail (#18266)
* implement failopen option default to True * introduce a config to set the timeout limit (default to 30) |
||
|
|
b635f92d90 | Add benchmark_proxy_vs_provider.py script to scripts directory with usage examples (#17889) | ||
|
|
762b429d6c | enhance: create_litellm_branch tool to be more robust (#17874) | ||
|
|
a7ad8a36a4 |
chore: cleanup unused scripts and fix misplaced test file (#17611)
Remove scripts/ directory containing unused development/debug scripts: - mock_ibm_guardrails_server.py - test_groq_streaming_issue.py (debug for #12660) - test_mock_ibm_guardrails.py - update_readme_providers_table.py Move misplaced test file to correct location: - test_litellm/ -> tests/test_litellm/ (from PR #17221) |
||
|
|
c44e075b2d |
feat: add script to create branches with litellm_ prefix (#17606)
Add utility scripts to create branches with litellm_ prefix from contributor branches. This helps maintain consistent branch naming conventions for CI/CD. - scripts/create_litellm_branch.sh (Bash for macOS/Linux) - scripts/create_litellm_branch.ps1 (PowerShell for Windows) Usage: ./scripts/create_litellm_branch.sh [source_branch] [new_branch_name] ./scripts/create_litellm_branch.ps1 [source_branch] [new_branch_name] Features: - Auto-prefixes branch names with litellm_ - Handles existing branches gracefully - Validates branch names - Supports local and remote source branches |
||
|
|
d35d9008c9 | Ensure detector-id is passed as header to IBM detector server (#16649) | ||
|
|
0428229032 |
[Docs] readme fixes add supported providers (#16109)
* add provider test * docs readme.md * docs providers * order providers * test_providers_alphabetically_ordered * docs endpoint * fix config * add ENDPOINT_COLUMNS * add provider endpoints * docs fix |
||
|
|
ddacaf6c32 |
(feat) Organizations: allow org admins to create teams on UI + (feat) IBM Guardrails (#15924)
* fix(oldteams.tsx): allow org admin to create team on ui * fix(oldteams.tsx): show org admin a dropdown of allowed orgs for team creation * docs(access_control.md): cleanup doc * feat(ibm_guardrails/): initial commit adding support for ibm guardrails on litellm allows user to use self-hosted ibm guardrails * feat(ibm_detector.py): working detector * docs(ibm_guardrails.md): document new ibm guardrails * fix: fix linting errors |
||
|
|
000ecad4e2 |
Fix Groq streaming ASCII encoding issue
Replace iter_lines()/aiter_lines() with iter_text()/aiter_text() using explicit UTF-8 encoding to handle non-ASCII characters like µ in streaming responses. - Added utf8_iter_lines() and utf8_aiter_lines() helper functions - Ensures proper UTF-8 decoding of streaming response content - Added comprehensive tests for Unicode character handling Fixes #12660 |