Commit Graph

32 Commits

Author SHA1 Message Date
Yassin Kortam 950074eea2 fix: atomic TPM rate limit (#27001)
Co-authored-by: Yassin Kortam <yassinkortam@g.ucla.edu>
2026-05-05 16:58:07 -07:00
Krrish Dholakia b6fc75b3ce Merge branch 'litellm_internal_staging' into litellm_adaptive_routing 2026-04-20 15:28:08 -07:00
Krrish Dholakia 386f334fee Prompt Compression - add it to the proxy (#25729)
* refactor: new agentic loop event hook

simplifies how to create logic for tool based multi llm calls

* fix: compress - make it work on anthropic input as well

* fix(compress.py): working prompt compression for claude code

ensures claude code messages can run through proxy easily

* docs: add agentic loop hook guide

* docs: add agentic_loop_hook to sidebar

* fix: fix multiple arguments error

* fix: fix tool call loop for compression on streaming /v1/messages

* fix: fix linting errors

* fix: fix ci/cd errors

* feat(litellm_pre_call_utils.py): use claude code session for litellm session id

allows claude code logs to be stitched together, making it easy to know they were all part of the same conversation

* fix: suppress incorrect mypy warning rE: module

* revert: drop PR's changes to litellm/proxy/_experimental/out/

Restores the 34 HTML files under _experimental/out/ to their pre-PR
paths (X/index.html -> X.html). All renames are R100 (content
unchanged); no other files are touched.

* fix: address greptile review comments on PR #25729

- Skip ``kwargs["tools"] = []`` injection when compression is a no-op —
  Anthropic Messages rejects empty tool arrays on requests that did not
  originally declare tools.
- Move agentic-loop safety guards (fingerprint cycle / max depth) out of
  the per-callback try/except so they propagate instead of being swallowed
  by the generic exception handler. Extracted _check_agentic_loop_safety.
- Gate generic ``x-<vendor>-session-id`` capture behind the
  LITELLM_CAPTURE_VENDOR_SESSION_HEADERS env var (off by default) to
  preserve backwards compatibility; explicit x-litellm-* headers are
  unaffected.
- Fix monkeypatch target in pre-call-hook test to patch the actual
  module-level binding
  (litellm.integrations.compression_interception.handler.compress).
- Add regression tests for empty-tools skip and opt-in session capture.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* revert: drop LITELLM_CAPTURE_VENDOR_SESSION_HEADERS flag

Generic x-<vendor>-session-id header capture is a new feature and only
runs *after* the explicit x-litellm-trace-id / x-litellm-session-id
checks, so it does not change behavior for any existing caller that was
already using the LiteLLM headers — no backwards-incompatibility to gate.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* refactor(compress): replace input_type with CallTypes call_type

Drop the bespoke ``CompressionInputType`` literal and use the existing
``litellm.types.utils.CallTypes`` enum instead.  ``litellm.compress()``
now takes ``call_type: Union[CallTypes, str]`` (default
``CallTypes.completion``) — no new concept to learn, and the enum is
already the way the rest of the codebase talks about request shapes.

Supported values: ``completion`` / ``acompletion`` (OpenAI chat-completions
shape) and ``anthropic_messages`` (Anthropic structured content blocks).

Updated: compress(), the compression_interception handler, tests, docs,
and the two eval scripts.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-20 15:08:00 -07:00
Krrish Dholakia 924fa6a3bc feat: commit new adaptive routing 2026-04-18 21:29:39 -07:00
Krrish Dholakia dd4a1d2be2 feat: add adaptive routing to litellm
allow model routing to improve based on conversation signals

ensures router is picking best model for task
2026-04-18 16:35:17 -07:00
Ishaan Jaffer e8461b5b97 style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
ishaan-berri cb8fc480e6 Merge pull request #25732 from harish876/health-check-oom
Optimize database query to prevent OOM errors during health checks
2026-04-15 18:13:11 -07:00
harish876 d20c70f24c Optimize database query which fetches latest model_id, model_name pairs and dedupes them in memory.
Current fix includes
 - Updates test case
 - Optimized query with docstring. The change leverages deduplication and sorting logic from SQL
 - Added a bench script to differentiate peak memory usage before and after
2026-04-15 00:54:37 +00:00
ishaan-berri 0e43050a01 Merge pull request #25650 from BerriAI/litellm_dev_04_13_2026_p1
feat: add litellm.compress() — BM25-based prompt compression with ret…
2026-04-14 12:24:47 -07:00
Krrish Dholakia 26c7412339 feat: add litellm.compress() — BM25-based prompt compression with retrieval tool (#25637)
* feat: add litellm.compress() for BM25-based context compression

Adds a compress() utility that reduces context size for LLM calls using
BM25 relevance scoring (with optional semantic embeddings via
litellm.embedding()). Messages below a token threshold pass through
unchanged; messages above are scored, ranked, and the lowest-relevance
ones replaced with stubs. Originals are cached and a retrieval tool is
injected so the model can recover dropped content on demand.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(compress): truncate high-scoring messages instead of fully stubbing them

When a relevant message was too large to fit in the token budget it was
replaced with a stub, leaving the LLM with no real content to work with.
Now the highest-scoring overflow message is truncated (first 70% + last 30%
of words) to fill the remaining budget, so the LLM always receives actual
content rather than just a retrieval pointer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bm25): add prefix expansion so query terms match inflected doc tokens

"cook" now matches "cooking", "auth" matches "authentication", etc.
Without this, short query terms scored 0 against longer inflected forms
in documents, causing the wrong message to be kept.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add routing correctness test and eval harness for litellm.compress()

- test_simple_compression: parametrized test verifying BM25 routes the
  right message based on query ("How to cook?" keeps cooking, "Fix auth"
  keeps auth content)
- eval_compression.py: end-to-end eval harness comparing baseline vs
  compressed model performance on HumanEval-style coding problems

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(eval): add SWE-bench Lite compression eval harness

Uses princeton-nlp/SWE-bench_Lite_bm25_27K which bundles ~27k tokens of
BM25-retrieved repo context per problem — large enough to meaningfully
stress litellm.compress() without Docker or GitHub API calls.

Proxy eval metrics (no test runner needed):
  - has_diff: model produced a valid unified diff
  - file_overlap: fraction of gold-patch files in generated patch
  - exact_file_match: generated patch touches exactly the right files

Run: python tests/eval_swe_bench.py --model gpt-4o --problems 10

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(eval): robust dataset loading + sys.path fix for worktree imports

- Add HuggingFace API fallback so the SWE-bench loader doesn't need
  the `datasets` library (avoids pyarrow/numpy binary compat issues)
- Insert repo root into sys.path so compression module resolves
  from worktrees
- Use direct import of litellm_compress to avoid __getattr__ issues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* improve compression quality: line-based truncation, multi-message budget, 70% default target

- Switch truncate_message from word-based to line-based splitting to
  preserve code structure (function boundaries, indentation)
- Allow multiple messages to be truncated instead of burning entire
  budget on one overflow message
- Raise default compression target from 50% to 70% of trigger for
  better quality/cost tradeoff
- Add --compression-target CLI arg to SWE-bench eval harness
- Move tests to canonical locations (tests/test_litellm/, scripts/)
- Add docs page and sidebar entries for compress()

Eval results (5 problems, Opus, trigger=10k):
  Hunk overlap delta improved from -0.417 to -0.221
  Content similarity now matches baseline (+0.006)
  Cost savings: 72%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add SWE-bench performance results to compress() docs

Include benchmark table from Opus eval (5 problems, trigger=10k)
showing 72% cost savings with file-level quality fully preserved.
Add metric explanations and eval runner examples.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(eval): use tolerance-based hunk overlap metric

The exact line-number matching was too brittle — LLM-generated patches
often target the right code region but with slightly offset line numbers.
Switch to hunk-level overlap with a 10-line tolerance window so nearby
edits count as matches. This better reflects actual patch quality.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add compression_interception callback for LiteLLM Proxy

Add a proxy callback that automatically compresses incoming /v1/messages
payloads above a configurable token threshold, runs the retrieval tool
loop server-side, and returns the final response. This brings compress()
support to proxy deployments (e.g. Claude Code via /v1/messages).

- New callback: litellm/integrations/compression_interception/
- Proxy config: compression_interception_params in litellm_settings
- Support for input_type param in compress() (openai vs anthropic)
- Docs: proxy setup instructions with YAML config example
- Tests: 139-line unit test suite for the interception handler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "feat: add compression_interception callback for LiteLLM Proxy"

This reverts commit 72bd5cb152ca1df07f14a14e14a2816e188874a8.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 12:23:54 -07:00
stuxf a6c30b30bf build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv

* ci: move automation and local tooling to uv

* docker: migrate image builds and runtime setup to uv

* docs: update install and deployment guidance for uv

* chore: align auxiliary scripts and tests with uv

* test: harden test_litellm isolation

* fix: keep release and health check images self-contained

* build: pin uv tooling and health check deps

* test: isolate bedrock image request formatting from suite state

* test: cover sandbox executor requirements flow

* ci: fix circleci no-op command steps

* ci: fix circleci publish workflow parsing

* fix: stabilize remaining uv migration CI checks

* ci: increase matrix test timeout headroom

* fix: restore published docker and license coverage

* fix: restore proxy runtime build parity

* fix: restore proxy extras parity and venv migrations

* ci: persist uv path across circleci steps

* fix: keep psycopg binary in default test env

* docker: preserve prisma cache across stages

* test: run local proxy checks through uv python

* build: restore runtime deps moved into ci

* build: refresh uv lock after upstream merge

* fix: restore module import in test_check_migration after merge

The conflict resolution imported only the function but the test body
references check_migration as a module throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching

- Move google-generativeai, Pillow, tenacity back to ci group (they are
  lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
  in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
  from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
  environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
  deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate uv.lock after removing nodejs-wheel-binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): use cache/restore instead of cache to prevent cache poisoning

The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert

The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv cache in publish workflow

Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate verbose_logger mock in test_check_migration

The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): free disk space before Docker build in test-server-root-path

The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:46:23 -07:00
Yuneng Jiang 51af6fedb3 [Infra] Harden supply chain: remove unused scripts, add pip binary-only install
Remove ci_cd/publish-proxy-extras.sh (dead, unreferenced PyPI publish script)
and .pre-commit-config.yaml (pulls external repos from GitHub on git commit).
Add --only-binary :all: to scripts/install.sh to prevent execution of
malicious setup.py during pip install.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 14:13:57 -07:00
Yuneng Jiang 5f63873dca [Infra] Pin all Docker build dependencies to exact versions
Pin every dependency across all Docker builds so upgrades are intentional.
Verified by building all 3 production images and diffing pip freeze against
known-good v1.83.0-nightly baselines — zero version drift.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 00:05:39 -07:00
Ishaan Jaff 8e61b32b8e [Staging] - Ishaan March 17th (#23903)
* feat(xai): add grok-4.20 beta 2 models with pricing (#23900)

Add three grok-4.20 beta 2 model variants from xAI:
- grok-4.20-multi-agent-beta-0309 (reasoning + multi-agent)
- grok-4.20-beta-0309-reasoning (reasoning)
- grok-4.20-beta-0309-non-reasoning

Pricing (from https://docs.x.ai/docs/models):
- Input: $2.00/1M tokens ($0.20/1M cached)
- Output: $6.00/1M tokens
- Context: 2M tokens

All variants support vision, function calling, tool choice, and web search.
Closes LIT-2171

* docs: add Quick Install section for litellm --setup wizard (#23905)

* docs: add Quick Install section for litellm --setup wizard

* docs: clarify setup wizard is for local/beginner use

* feat(setup): interactive setup wizard + install.sh (#23644)

* feat(setup): add interactive setup wizard + install.sh

Adds `litellm --setup` — a Claude Code-style TUI onboarding wizard that
guides users through provider selection, API key entry, and proxy config
generation, then optionally starts the proxy immediately.

- litellm/setup_wizard.py: wizard with ASCII art, numbered provider menu
  (OpenAI, Anthropic, Azure, Gemini, Bedrock, Ollama), API key prompts,
  port/master-key config, and litellm_config.yaml generation
- litellm/proxy/proxy_cli.py: adds --setup flag that invokes the wizard
- scripts/install.sh: curl-installable script (detect OS/Python, pip
  install litellm[proxy], launch wizard)

Usage:
  curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh
  litellm --setup

* fix(install.sh): remove orange color, add LITELLM_BRANCH env var for branch installs

* fix(install.sh): install from git branch so --setup is available for QA

* fix(install.sh): remove stale LITELLM_BRANCH reference that caused unbound variable error

* fix(install.sh): force-reinstall from git to bypass cached PyPI version

* fix(install.sh): show pip progress bar during install

* fix(install.sh): always launch wizard via $PYTHON_BIN -m litellm, not PATH binary

* fix(install.sh): use litellm.proxy.proxy_cli module (no __main__.py exists)

* fix(install.sh): suppress RuntimeWarning from module invocation

* fix(install.sh): use Python bin-dir litellm binary to avoid CWD sys.path shadowing

* fix(install.sh): use sysconfig.get_path('scripts') to find pip-installed litellm binary

* fix(install.sh): redirect stdin from /dev/tty on exec so wizard gets terminal, not exhausted pipe

* fix(install.sh): warn about git clone duration, drop --no-cache-dir so re-runs are faster

* feat(setup_wizard): arrow-key selector, updated model names

* fix(setup_wizard): use sysconfig binary to start proxy, not python -m litellm

* feat(setup_wizard): credential validation after key entry + clear next-steps after proxy start

* style(install.sh): show git clone warning in blue

* refactor(setup_wizard): class with static methods, use check_valid_key from litellm.utils

* address greptile review: fix yaml escaping, port validation, display name collisions, tests

- setup_wizard.py: add _yaml_escape() for safe YAML embedding of API keys
- setup_wizard.py: add _styled_input() with readline ANSI ignore markers
- setup_wizard.py: change DIVIDER to _divider() fn to avoid import-time color capture
- setup_wizard.py: validate port range 1-65535, initialize before loop
- setup_wizard.py: qualify azure display names (azure-gpt-4o) to avoid collision with openai
- setup_wizard.py: work on env_copy in _build_config to avoid mutating caller's dict
- setup_wizard.py: skip model_list entries for providers with no credentials
- setup_wizard.py: prompt for azure deployment name
- setup_wizard.py: wrap os.execlp in try/except with friendly fallback
- setup_wizard.py: wrap config write in try/except OSError
- setup_wizard.py: fix _validate_and_report to use two print lines (no \r overwrite)
- setup_wizard.py: add .gitignore tip next to key storage notice
- setup_wizard.py: fix run_setup_wizard() return type annotation to None
- scripts/install.sh: drop pipefail (not supported by dash on Ubuntu when invoked as sh)
- scripts/install.sh: use litellm[proxy] from PyPI (not hardcoded dev branch)
- scripts/install.sh: guard /dev/tty read with -r check for Docker/CI compat
- scripts/install.sh: remove --force-reinstall to avoid downgrading dependencies
- tests/test_litellm/test_setup_wizard.py: 13 unit tests for _build_config and _yaml_escape

* style: black format setup_wizard.py

* fix: address remaining greptile issues - Windows compat, YAML quoting, credential flow

- guard termios/tty imports with try/except ImportError for Windows compat
- quote master_key as YAML double-quoted scalar (same as env vars)
- remove unused port param from _build_config signature
- _validate_and_report now returns the final key so re-entered creds are stored
- add test for master_key YAML quoting

* fix: add --port to suggested command, guard /dev/tty exec in install.sh

* fix: quote api_base in YAML, skip azure if no deployment, only redraw on state change

* fix: address greptile review comments

- _yaml_escape: add control character escaping (\n, \r, \t)
- test: fix tautological assertion in test_build_config_azure_no_deployment_skipped
- test: add tests for control character escaping in _yaml_escape

* feat(ui): remove Chat UI page link and banner from sidebar and playground (#23908)

* feat(guardrails): MCPJWTSigner - built-in guardrail for zero trust MCP auth (#23897)

* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers

* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.

* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.

* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers

* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.

* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.

* feat(guardrails): add MCPJWTSigner built-in guardrail for zero trust MCP auth

Signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT so MCP servers
can trust a single signing authority instead of every upstream IdP.

Enable in config.yaml:
  guardrails:
    - guardrail_name: mcp-jwt-signer
      litellm_params:
        guardrail: mcp_jwt_signer
        mode: pre_mcp_call
        default_on: true

JWT carries sub (user_id), act.sub (team_id, RFC 8693), tool-level scope, iss,
aud, iat/exp/nbf. RSA-2048 keypair auto-generated at startup unless
MCP_JWT_SIGNING_KEY env var is set.

Adds /.well-known/jwks.json endpoint and jwks_uri to /.well-known/openid-configuration
so MCP servers can verify LiteLLM-issued tokens via OIDC discovery.

* Update MCPServerManager to raise HTTPException with status code 400 for extra headers in OpenAPI-backed servers. Adjust tests to verify the correct status code and exception message.

* fix: address P1 issues in MCPJWTSigner

- OpenAPI servers: warn + skip header injection instead of 500
- JWKS Cache-Control: 5min for auto-generated keys, 1h for persistent
- sub claim: fallback to apikey:{token_hash} for anonymous callers
- ttl_seconds: validate > 0 at init time

* docs: add MCP zero trust auth guide with architecture diagram

* docs: add FastMCP JWT verification guide to zero trust doc

* fix: address remaining Greptile review issues (round 2)

- mcp_server_manager: warn when hook Authorization overwrites existing header
- __init__: remove _mcp_jwt_signer_instance from __all__ (private internal)
- discoverable_endpoints: copy dict instead of mutating in-place on OIDC augmentation
- test docstring: reflect warn-and-continue behavior for OpenAPI servers
- test: update scope assertions for least-privilege (no mcp:tools/list on tool-call JWTs)

* fix: address Greptile round 3 feedback

- initialize_guardrail: validate mode='pre_mcp_call' at init time — misconfigured
  mode silently bypasses JWT injection, which is a zero-trust bypass
- _build_claims: remove duplicate inline 'import re' (module-level import already present)
- _types.py: add TODO comment explaining jwt_claims is forward-compat plumbing
  for a follow-up PR that will forward upstream IdP claims into outbound MCP JWTs

* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes

Addresses all missing pieces from the scoping doc review:

FR-5 (Verify + re-sign): MCPJWTSigner now accepts access_token_discovery_uri
and token_introspection_endpoint.  When set, the incoming Bearer token is
extracted from raw_headers (threaded through pre_call_tool_check), verified
against the IdP's JWKS (JWT) or introspected (opaque), and only re-signed if
valid.  Falls back to user_api_key_dict.jwt_claims for LiteLLM JWT-auth mode.

FR-12 (Configurable end-user identity mapping): end_user_claim_sources
ordered list drives sub resolution — sources: token:<claim>, litellm:user_id,
litellm:email, litellm:end_user_id, litellm:team_id.

FR-13 (Claim operations): add_claims (insert-if-absent), set_claims (always
override), remove_claims (delete) applied in that order.

FR-14 (Two-token model): channel_token_audience + channel_token_ttl issue a
second JWT injected as x-mcp-channel-token: Bearer <token>.

FR-15 (Incoming claim validation): required_claims raises HTTP 403 when any
listed claim is absent; optional_claims passes listed claims from verified
token into the outbound JWT.

FR-9 (Debug headers): debug_headers: true emits x-litellm-mcp-debug with kid,
sub, iss, exp, scope.

FR-10 (Configurable scopes): allowed_scopes replaces auto-generation.  Also
fixed: tool-call JWTs no longer grant mcp:tools/list (overpermission).

P1 fixes:
- proxy/utils.py: _convert_mcp_hook_response_to_kwargs merges rather than
  replaces extra_headers, preserving headers from prior guardrails.
- mcp_server_manager.py: warns when hook injects Authorization alongside a
  server-configured authentication_token (previously silent).
- mcp_server_manager.py: pre_call_tool_check now accepts raw_headers and
  extracts incoming_bearer_token so FR-5 verification has the raw token.
- proxy/utils.py: remove stray inline import inspect inside loop (pre-existing
  lint error, now cleaned up).

Tests: 43 passing (28 new tests covering all FR flags + P1 fixes).

* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes (core)

Remaining files from the FR implementation:

mcp_jwt_signer.py — full rewrite with all new params:
  FR-5:  access_token_discovery_uri, token_introspection_endpoint,
         verify_issuer, verify_audience + _verify_incoming_jwt(),
         _introspect_opaque_token()
  FR-12: end_user_claim_sources ordered resolution chain
  FR-13: add_claims, set_claims, remove_claims
  FR-14: channel_token_audience, channel_token_ttl → x-mcp-channel-token
  FR-15: required_claims (raises 403), optional_claims (passthrough)
  FR-9:  debug_headers → x-litellm-mcp-debug
  FR-10: allowed_scopes; tool-call JWTs no longer over-grant tools/list

mcp_server_manager.py:
  - pre_call_tool_check gains raw_headers param to extract incoming_bearer_token
  - Silent Authorization override warning fixed: now fires when server has
    authentication_token AND hook injects Authorization

tests/test_mcp_jwt_signer.py:
  28 new tests covering all FR flags + P1 fixes (43 total, all passing)

* fix(mcp_jwt_signer): address pre-landing review issues

- Remove stale TODO comment on UserAPIKeyAuth.jwt_claims — the field is
  already populated and consumed by MCPJWTSigner in the same PR
- Fix _get_oidc_discovery to only cache the OIDC discovery doc when
  jwks_uri is present; a malformed/empty doc now retries on the next
  request instead of being permanently cached until proxy restart
- Add FR-5 test coverage for _fetch_jwks (cache hit/miss),
  _get_oidc_discovery (cache/no-cache on bad doc), _verify_incoming_jwt
  (valid token, expired token), _introspect_opaque_token (active,
  inactive, no endpoint), and the end-to-end 401 hook path — 53 tests
  total, all passing

* docs(mcp_zero_trust): rewrite as use-case guide covering all new JWT signer features

Add scenario-driven sections for each new config area:
- Verify+re-sign with Okta/Azure AD (access_token_discovery_uri,
  end_user_claim_sources, token_introspection_endpoint)
- Enforcing caller attributes with required_claims / optional_claims
- Adding metadata via add_claims / set_claims / remove_claims
- Two-token model for AWS Bedrock AgentCore Gateway
  (channel_token_audience / channel_token_ttl)
- Controlling scopes with allowed_scopes
- Debugging JWT rejections with debug_headers

Update JWT claims table to reflect configurable sub (end_user_claim_sources)

* fix(mcp_jwt_signer): wire all config.yaml params through initialize_guardrail

The factory was only passing issuer/audience/ttl_seconds to MCPJWTSigner.
All FR-5/9/10/12/13/14/15 params (access_token_discovery_uri,
end_user_claim_sources, add/set/remove_claims, channel_token_audience,
required/optional_claims, debug_headers, allowed_scopes, etc.) were
silently dropped, making every advertised advanced feature non-functional
when loaded from config.yaml.

Add regression test that asserts every param is wired through correctly.

* docs(mcp_zero_trust): add hero image

* docs(mcp_zero_trust): apply Linear-style edits

- Lead with the problem (unsigned direct calls bypass access controls)
- Shorter statement section headers instead of question-form headers
- Move diagram/OIDC discovery block after the reader is bought in
- Add 'read further only if you need to' callout after basic setup
- Two-token section now opens from the user problem not product jargon
- Add concrete 403 error response example in required_claims section
- Debug section opens from the symptom (MCP server returning 401)
- Lowercase claims reference header for consistency

* fix(mcp_jwt_signer): fix algorithm confusion attack + add OIDC discovery 24h TTL

- Remove alg from unverified JWT header; use signing_jwk.algorithm_name from JWKS key instead.
  Reading alg from attacker-controlled headers enables alg:none / HS256 confusion attacks.
- Add _oidc_discovery_fetched_at timestamp and _OIDC_DISCOVERY_TTL = 86400 (24h).
  Without a TTL the cached discovery doc never refreshes, so IdP key rotation is invisible.

---------

Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>

* fix(ci): stabilize CI - formatting, type errors, test polling, security CVEs, router bug, batch resolution

Fix 1: Run Black formatter on 35 files
Fix 2: Fix MyPy type errors:
  - setup_wizard.py: add type annotation for 'selected' set variable
  - user_api_key_auth.py: remove redundant type annotation on jwt_claims reassignment
Fix 3: Fix spend accuracy test burst 2 polling to wait for expected total
  spend instead of just 'any increase' from burst 2
Fix 4: Bump Next.js 16.1.6 -> 16.1.7 to fix CVE-2026-27978, CVE-2026-27979,
  CVE-2026-27980, CVE-2026-29057
Fix 5: Fix router _pre_call_checks model variable being overwritten inside
  loop, causing wrong model lookups on subsequent deployments. Use local
  _deployment_model variable instead.
Fix 6: Add missing resolve_output_file_ids_to_unified call in batch retrieve
  non-terminal-to-terminal path (matching the terminal path behavior)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* chore: regenerate poetry.lock to sync with pyproject.toml

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: format merged files from main and regenerate poetry.lock

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): annotate jwt_claims as Optional[dict] to fix type incompatibility

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): update router region test to use gpt-4.1-mini (fix flaky model lookup)

Replace deprecated gpt-3.5-turbo-1106 with gpt-4.1-mini + mock_response in
test_router_region_pre_call_check, following the same pattern used in commit
717d37cc5b for test_router_context_window_check_pre_call_check_out_group.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retry flaky logging_testing (async event loop race condition)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): aggregate all mock calls in langfuse e2e test to fix race condition

The _verify_langfuse_call helper only inspected the last mock call
(mock_post.call_args), but the Langfuse SDK may split trace-create and
generation-create events across separate HTTP flush cycles. This caused
an IndexError when the last call's batch contained only one event type.

Fix: iterate over mock_post.call_args_list to collect batch items from
ALL calls. Also add a safety assertion after filtering by trace_id and
mark all langfuse e2e tests with @pytest.mark.flaky(retries=3) as an
extra safety net for any residual timing issues.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): black formatting + update OpenAPI compliance tests for spec changes

- Apply Black 26.x formatting to litellm_logging.py (parenthesized style)
- Update test_input_types_match_spec to follow $ref to InteractionsInput schema
  (Google updated their OpenAPI spec to use $ref instead of inline oneOf)
- Update test_content_schema_uses_discriminator to handle discriminator without
  explicit mapping (Google removed the mapping key from Content discriminator)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* revert: undo incorrect Black 26.x formatting on litellm_logging.py

The file was correctly formatted for Black 23.12.1 (the version pinned
in pyproject.toml). The previous commit applied Black 26.x formatting
which was incompatible with the CI's Black version.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): deduplicate and sort langfuse batch events after aggregation

The Langfuse SDK may send the same event (e.g., trace-create) in
multiple flush cycles, causing duplicates when we aggregate from all
mock calls. After filtering by trace_id, deduplicate by keeping only
the first event of each type, then sort to ensure trace-create is at
index 0 and generation-create at index 1.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-03-18 15:09:01 -07:00
Ishaan Jaff 1f412bc6d8 [Feat] Add Tool Policies for AI Gateway (#22732)
* fix: fix ui render

* fix: fix minor bugs

* refactor: use prisma functions instead of raw sql (safer)

* fix(add-new-tiles-to-tool-policies): allow developer to see what's available

* feat: ensure tool allowlist runs correctly for tool names + mcp's

* refactor: more ui improvements

* feat: working key tool blocking

* feat(tools): show tool logs

* refactor: backend code improvements

* refactor: improve log viewer for tools

* fix: address PR review feedback for tool access control

- Add missing blocked_tools column to root schema.prisma (schema drift)
- Invalidate ToolPolicyRegistry after policy mutations so changes take effect immediately
- Remove dead code: unused get_effective_policies, get_tool_policies_cached, and helpers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: race condition in permission resolution and remove duplicate allowlist check

- Use atomic update_many with object_permission_id=None to prevent concurrent
  requests from creating orphaned permission rows and losing tool blocks
- Remove duplicate allowed_tools enforcement from guardrail (already enforced
  in auth layer via check_tools_allowlist)
- Move inline uuid import to module level

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* update to account for  userAgent

* UI - Add ToolDetails

* input/output policy

* LiteLLM_PolicyAttachmentTable

* LiteLLM_PolicyAttachmentTable

* fix: add _enqueue_tool_registry_upsert

* fix: tool mgmt endpoints

* tool mgmt endpoints

* Update tests/test_litellm/proxy/db/test_tool_registry_writer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tests/test_litellm/proxy/db/test_tool_registry_writer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update tests/test_litellm/proxy/db/test_tool_registry_writer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix: sync root schema.prisma and fix test_tool_registry_writer for input/output policy

- Migrate root schema.prisma LiteLLM_ToolTable from call_policy to
  input_policy/output_policy, add missing user_agent and last_used_at columns
  (now consistent with litellm/proxy/schema.prisma and litellm-proxy-extras)
- Fix SpendLogToolIndex comment across all three schema files
- Fix all call_policy references in test_tool_registry_writer.py:
  swapped update_tool_policy arguments, wrong get_tools_by_names return type
  assertions, _mock_tool_row setting call_policy instead of input_policy

Addresses Greptile review feedback on PR #22732.

Made-with: Cursor

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-03 20:22:20 -08:00
Krish Dholakia 67f90254ed feat(guardrails): team-based guardrail registration and approval workflow (#22459)
* feat(guardrails): team-based guardrail registration and approval workflow

Add team-based guardrail submission system where teams can register
Generic Guardrail API guardrails for admin review. Includes:

- POST /guardrails/register endpoint for team-scoped submissions
- Admin review endpoints (list/get/approve/reject submissions)
- Team Guardrails tab in the UI dashboard
- extra_headers support for forwarding client headers to guardrail APIs
- Prisma schema migration for status, submitted_at, reviewed_at fields
- Documentation for team-based guardrails and static/dynamic headers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(guardrails): address review feedback - SSRF, silent failure, redundant query

- Validate api_base URL scheme (http/https only) and hostname in
  register_guardrail to prevent SSRF via team submissions
- Return warning field in approve response when in-memory initialization
  fails so admins know the guardrail won't work until next sync cycle
- Eliminate redundant DB query in list_guardrail_submissions by fetching
  all team guardrails once and deriving both filtered list and summary
  counts from the single result set

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(guardrails): add pending_review status guard to reject endpoint

Prevent rejecting already-active or already-rejected guardrails, which
would create a DB/memory inconsistency (active in memory but rejected
in DB). Now mirrors the approve endpoint's status check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:06:49 -08:00
Krish Dholakia 12c4876891 Agents - assign tools (#22064)
* feat(proxy): add max_iterations limiter for agent session loops (#22058)

Adds a new proxy hook that enforces a per-session cap on the number of
LLM calls an agentic loop can make. Callers send a session_id with each
request, and the hook counts calls per session, returning 429 when the
configured max_iterations limit is exceeded.

- Uses Redis Lua script for atomic increment (multi-instance safe)
- Falls back to in-memory cache when Redis unavailable
- Follows parallel_request_limiter_v3 pattern
- Configurable via key metadata: {"max_iterations": 25}
- Session counters auto-expire via TTL (default 1hr)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add new code execution dataset

* feat(agent_endpoints/): allow giving agents keys

* fix: ui fixes

* feat: allow assigning mcp servers to agents

* fix: eliminate duplicate DB queries in MCP agent auth and N+1 in agent listing (#22110)

- Extract _get_agent_object_permission helper so _get_allowed_mcp_servers_for_agent
  and _get_agent_tool_permissions_for_server share a single DB fetch instead of
  each independently querying the same agent row (was 1+N queries per MCP request)
- Use include={"object_permission": True} on find_many in get_all_agents_from_db
  to eagerly load permissions in one query instead of N+1
- Use include={"object_permission": True} on create/update/find_unique in all
  agent CRUD operations, removing attach_object_permission_to_dict follow-up calls

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-25 11:44:30 -08:00
Ryan Crabbe e0ddb2a525 fix: guard print_aggregate against empty latencies 2026-02-23 10:06:10 -08:00
Ryan Crabbe 95d9514054 fix: add auth headers and empty latencies guard to benchmark script 2026-02-23 09:58:40 -08:00
Ryan Crabbe 94b76ea9ad feat: add network_mock transport for benchmarking proxy overhead without real API calls
Intercepts at httpx transport layer so the full proxy path (auth, routing,
OpenAI SDK, response transformation) is exercised with zero-latency responses.
Activated via `litellm_settings: { network_mock: true }` in proxy config.
2026-02-21 17:52:39 -08:00
Alexsander Hamir 7f81dea8b3 Add custom auth header support and increase default prompt size to 100k chars (#19436) 2026-01-20 13:25:12 -08:00
Alexsander Hamir 270b41b0f4 Simplify file comments (#19382) 2026-01-19 17:01:38 -08:00
Alexsander Hamir 0cd7763d5f Add health check scripts and parallel execution support (#19295)
- Add health_check_client.py for monitoring model availability
- Add health_check_client_README.md with usage documentation
- Add health_check_requirements.txt for dependencies
- Add run_parallel_health_checks.ps1 (PowerShell version)
- Add run_parallel_health_checks.sh (Bash version)
- Organize all scripts under scripts/health_check/ directory
2026-01-19 08:38:38 -08:00
Xiaohan Fu 07fe9e8604 implement failopen option default to True on grayswan guardrail (#18266)
* implement failopen option default to True

* introduce a config to set the timeout limit (default to 30)
2026-01-06 15:17:05 +05:30
Alexsander Hamir b635f92d90 Add benchmark_proxy_vs_provider.py script to scripts directory with usage examples (#17889) 2025-12-12 11:26:34 -08:00
Alexsander Hamir 762b429d6c enhance: create_litellm_branch tool to be more robust (#17874) 2025-12-12 05:35:50 -08:00
Cesar Garcia a7ad8a36a4 chore: cleanup unused scripts and fix misplaced test file (#17611)
Remove scripts/ directory containing unused development/debug scripts:
- mock_ibm_guardrails_server.py
- test_groq_streaming_issue.py (debug for #12660)
- test_mock_ibm_guardrails.py
- update_readme_providers_table.py

Move misplaced test file to correct location:
- test_litellm/ -> tests/test_litellm/ (from PR #17221)
2025-12-08 19:00:55 -08:00
Alexsander Hamir c44e075b2d feat: add script to create branches with litellm_ prefix (#17606)
Add utility scripts to create branches with litellm_ prefix from contributor branches.
This helps maintain consistent branch naming conventions for CI/CD.

- scripts/create_litellm_branch.sh (Bash for macOS/Linux)
- scripts/create_litellm_branch.ps1 (PowerShell for Windows)

Usage:
  ./scripts/create_litellm_branch.sh [source_branch] [new_branch_name]
  ./scripts/create_litellm_branch.ps1 [source_branch] [new_branch_name]

Features:
- Auto-prefixes branch names with litellm_
- Handles existing branches gracefully
- Validates branch names
- Supports local and remote source branches
2025-12-06 10:41:39 -08:00
Rob Geada d35d9008c9 Ensure detector-id is passed as header to IBM detector server (#16649) 2025-11-14 19:35:49 -08:00
Ishaan Jaff 0428229032 [Docs] readme fixes add supported providers (#16109)
* add provider test

* docs readme.md

* docs providers

* order providers

* test_providers_alphabetically_ordered

* docs endpoint

* fix config

* add ENDPOINT_COLUMNS

* add provider endpoints

* docs fix
2025-10-30 17:23:20 -07:00
Krish Dholakia ddacaf6c32 (feat) Organizations: allow org admins to create teams on UI + (feat) IBM Guardrails (#15924)
* fix(oldteams.tsx): allow org admin to create team on ui

* fix(oldteams.tsx): show org admin a dropdown of allowed orgs for team creation

* docs(access_control.md): cleanup doc

* feat(ibm_guardrails/): initial commit adding support for ibm guardrails on litellm

allows user to use self-hosted ibm guardrails

* feat(ibm_detector.py): working detector

* docs(ibm_guardrails.md): document new ibm guardrails

* fix: fix linting errors
2025-10-25 11:13:39 -07:00
Cole McIntosh 000ecad4e2 Fix Groq streaming ASCII encoding issue
Replace iter_lines()/aiter_lines() with iter_text()/aiter_text() using explicit
UTF-8 encoding to handle non-ASCII characters like µ in streaming responses.

- Added utf8_iter_lines() and utf8_aiter_lines() helper functions
- Ensures proper UTF-8 decoding of streaming response content
- Added comprehensive tests for Unicode character handling

Fixes #12660
2025-08-16 08:32:22 -05:00