The previous v1.83.3 changelog was generated against v1.83.0-nightly and
missed ~3 weeks of work. This regenerates it against the previous stable
release and restructures the LLM API Endpoints section to group by API
type (Responses, Batch, Count Tokens, Video Generation, Pass-Through,
etc.) matching the convention used in v1.82.3, v1.82.0, and v1.81.14.
Adds ~25 previously uncited PRs, cross-section duplications for
cross-cutting changes, and a verified first-time-contributors list.
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution)
- Add @kedarthakkar to new contributors (first-ever PR via #23769)
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
* feat: add litellm.compress() for BM25-based context compression
Adds a compress() utility that reduces context size for LLM calls using
BM25 relevance scoring (with optional semantic embeddings via
litellm.embedding()). Messages below a token threshold pass through
unchanged; messages above are scored, ranked, and the lowest-relevance
ones replaced with stubs. Originals are cached and a retrieval tool is
injected so the model can recover dropped content on demand.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(compress): truncate high-scoring messages instead of fully stubbing them
When a relevant message was too large to fit in the token budget it was
replaced with a stub, leaving the LLM with no real content to work with.
Now the highest-scoring overflow message is truncated (first 70% + last 30%
of words) to fill the remaining budget, so the LLM always receives actual
content rather than just a retrieval pointer.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(bm25): add prefix expansion so query terms match inflected doc tokens
"cook" now matches "cooking", "auth" matches "authentication", etc.
Without this, short query terms scored 0 against longer inflected forms
in documents, causing the wrong message to be kept.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* test: add routing correctness test and eval harness for litellm.compress()
- test_simple_compression: parametrized test verifying BM25 routes the
right message based on query ("How to cook?" keeps cooking, "Fix auth"
keeps auth content)
- eval_compression.py: end-to-end eval harness comparing baseline vs
compressed model performance on HumanEval-style coding problems
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat(eval): add SWE-bench Lite compression eval harness
Uses princeton-nlp/SWE-bench_Lite_bm25_27K which bundles ~27k tokens of
BM25-retrieved repo context per problem — large enough to meaningfully
stress litellm.compress() without Docker or GitHub API calls.
Proxy eval metrics (no test runner needed):
- has_diff: model produced a valid unified diff
- file_overlap: fraction of gold-patch files in generated patch
- exact_file_match: generated patch touches exactly the right files
Run: python tests/eval_swe_bench.py --model gpt-4o --problems 10
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(eval): robust dataset loading + sys.path fix for worktree imports
- Add HuggingFace API fallback so the SWE-bench loader doesn't need
the `datasets` library (avoids pyarrow/numpy binary compat issues)
- Insert repo root into sys.path so compression module resolves
from worktrees
- Use direct import of litellm_compress to avoid __getattr__ issues
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* improve compression quality: line-based truncation, multi-message budget, 70% default target
- Switch truncate_message from word-based to line-based splitting to
preserve code structure (function boundaries, indentation)
- Allow multiple messages to be truncated instead of burning entire
budget on one overflow message
- Raise default compression target from 50% to 70% of trigger for
better quality/cost tradeoff
- Add --compression-target CLI arg to SWE-bench eval harness
- Move tests to canonical locations (tests/test_litellm/, scripts/)
- Add docs page and sidebar entries for compress()
Eval results (5 problems, Opus, trigger=10k):
Hunk overlap delta improved from -0.417 to -0.221
Content similarity now matches baseline (+0.006)
Cost savings: 72%
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* docs: add SWE-bench performance results to compress() docs
Include benchmark table from Opus eval (5 problems, trigger=10k)
showing 72% cost savings with file-level quality fully preserved.
Add metric explanations and eval runner examples.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(eval): use tolerance-based hunk overlap metric
The exact line-number matching was too brittle — LLM-generated patches
often target the right code region but with slightly offset line numbers.
Switch to hunk-level overlap with a 10-line tolerance window so nearby
edits count as matches. This better reflects actual patch quality.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* feat: add compression_interception callback for LiteLLM Proxy
Add a proxy callback that automatically compresses incoming /v1/messages
payloads above a configurable token threshold, runs the retrieval tool
loop server-side, and returns the final response. This brings compress()
support to proxy deployments (e.g. Claude Code via /v1/messages).
- New callback: litellm/integrations/compression_interception/
- Proxy config: compression_interception_params in litellm_settings
- Support for input_type param in compress() (openai vs anthropic)
- Docs: proxy setup instructions with YAML config example
- Tests: 139-line unit test suite for the interception handler
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* Revert "feat: add compression_interception callback for LiteLLM Proxy"
This reverts commit 72bd5cb152ca1df07f14a14e14a2816e188874a8.
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
The blog CSS selectors for dark mode used descendant selectors like
[data-theme='dark'] .blog-wrapper which never matched because both
data-theme and .blog-wrapper are applied to the same <html> element
by Docusaurus. Fixed by using compound selectors (no space):
[data-theme='dark'].blog-wrapper.
Also added missing dark-mode overrides for:
- pre/code blocks in blog posts
- link colors in blog posts
- marquee items, separators, and labels on blog list page
- pagination links on blog list page
- meta text and author separators on blog list page
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
- Add LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS to the environment variables
reference so the documentation test passes.
- Annotate the values variable in _reject_os_environ_references so it
accepts both dict.values() and list iterables.
- Log a warning when dropping callback params that carry os.environ/
references so operators notice the misconfiguration.
- Require absolute paths in oidc/file/ and correct the documented
example to use the leading-slash form.
- Drop the unused return value from _reject_os_environ_references.
- Reject os.environ/ references supplied via /health/test_connection
request params instead of resolving them; config-sourced values are
already resolved before reaching the endpoint.
- Skip os.environ/ references in dynamic callback params loaded from
per-request metadata.
- Constrain oidc/file/ to an allowed credential directory allowlist
(defaults to /var/run/secrets and /run/secrets, overridable via
LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS).
* feat(guardrails): optional skip system message in unified guardrail inputs
Made-with: Cursor
* feat(dashboard): skip_system_message_in_guardrail in guardrail UI
Add a tri-state control (inherit / yes / no) when creating or editing
guardrails so admins can set litellm_params.skip_system_message_in_guardrail
without YAML. Table edit merges existing litellm_params before PUT to avoid
wiping content-filter and other provider fields.
Document the dashboard flow in the guardrails quick start with a screenshot.
Made-with: Cursor
* fix(guardrails): type structured_messages as AllMessageValues for mypy
Use AllMessageValues in openai_messages_without_system and cast adapter
request messages so GenericGuardrailAPIInputs matches TypedDict.
Made-with: Cursor
* build: migrate packaging metadata to uv
* ci: move automation and local tooling to uv
* docker: migrate image builds and runtime setup to uv
* docs: update install and deployment guidance for uv
* chore: align auxiliary scripts and tests with uv
* test: harden test_litellm isolation
* fix: keep release and health check images self-contained
* build: pin uv tooling and health check deps
* test: isolate bedrock image request formatting from suite state
* test: cover sandbox executor requirements flow
* ci: fix circleci no-op command steps
* ci: fix circleci publish workflow parsing
* fix: stabilize remaining uv migration CI checks
* ci: increase matrix test timeout headroom
* fix: restore published docker and license coverage
* fix: restore proxy runtime build parity
* fix: restore proxy extras parity and venv migrations
* ci: persist uv path across circleci steps
* fix: keep psycopg binary in default test env
* docker: preserve prisma cache across stages
* test: run local proxy checks through uv python
* build: restore runtime deps moved into ci
* build: refresh uv lock after upstream merge
* fix: restore module import in test_check_migration after merge
The conflict resolution imported only the function but the test body
references check_migration as a module throughout.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching
- Move google-generativeai, Pillow, tenacity back to ci group (they are
lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* chore: regenerate uv.lock after removing nodejs-wheel-binaries
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): use cache/restore instead of cache to prevent cache poisoning
The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert
The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): disable setup-uv cache in publish workflow
Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(test): remove duplicate verbose_logger mock in test_check_migration
The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* fix(ci): free disk space before Docker build in test-server-root-path
The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
* Add PromptGuard guardrail integration
Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy,
supporting prompt injection detection, PII redaction, topic filtering,
entity blocklists, and hallucination detection via PromptGuard's
/api/v1/guard API endpoint.
Backend:
- Add PROMPTGUARD to SupportedGuardrailIntegrations enum
- Implement PromptGuardGuardrail (CustomGuardrail subclass) with
apply_guardrail handling allow/block/redact decisions
- Add Pydantic config model with api_key, api_base, ui_friendly_name
- Auto-discovered via guardrail_hooks/promptguard/__init__.py registries
Frontend:
- Add PromptGuard partner card to Guardrail Garden with eval scores
- Add preset configuration for quick setup
- Add logo to guardrailLogoMap
Tests:
- 30 unit tests covering configuration, allow/block/redact actions,
request payload construction, error handling, config model, and
registry wiring
* Fix redact path and init ordering per review feedback
- P1: Update structured_messages (not just texts) when PromptGuard
returns a redact decision, so PII redaction is effective for the
primary LLM message path
- P2: Validate credentials before allocating the HTTPX client so
resources aren't acquired if PromptGuardMissingCredentials is raised
- Add tests for structured_messages redaction and texts-only redaction
* Harden PromptGuard integration: fail-open, event hooks, images, docs
- Add block_on_error config (default fail-closed, configurable fail-open)
- Declare supported_event_hooks (pre_call, post_call) like other vendors
- Forward images from GenericGuardrailAPIInputs to PromptGuard API
- Wrap API call in try/except for resilient error handling
- Add comprehensive documentation page with config examples
- Register docs page in sidebar alongside other guardrail providers
- Expand test suite from 32 to 40 tests covering new functionality
* Fix dict[str, Any] -> Dict[str, Any] for Python 3.8 compat
* Address remaining Greptile feedback: timeout, redact guard
- Add explicit 10s timeout to async_handler.post() to prevent
indefinite hangs when PromptGuard API is unresponsive
- Guard redact path: only update inputs["texts"] when the key
was originally present, avoiding phantom key injection
- Add test: redact with structured_messages only does not create
texts key (41 tests total)
* Fix CI lint: black formatting, add PromptGuardConfigModel to LitellmParams
- Reformat promptguard.py to match CI black version (parenthesization)
- Add PromptGuardConfigModel as base class of LitellmParams for proper
Pydantic schema validation, consistent with all other guardrail vendors
- Use litellm_params.block_on_error directly (now a typed field)
* Address Greptile review: redact path, null decision, error context
- P1: Filter _extract_texts_from_messages to user-role messages only,
preventing system/assistant content from being injected into texts
- P1: Strengthen test_redact_updates_structured_messages assertion from
weak `in` check to strict equality, catching the injection bug
- P2: Use `result.get("decision") or "allow"` to handle explicit null
decision values (not just absent keys)
- P2: Wrap bare exception re-raise in GuardrailRaisedException so the
caller knows which guardrail failed (block_on_error=True path)
- P2: Add static Promptguard entry in guardrail_provider_map so the
preset works before populateGuardrailProviderMap is called
- Add test for explicit null decision treated as allow
* Fix black formatting: collapse f-string in error message
* fix(vertex_ai): support pluggable (executable) credential_source for WIF auth (#24700)
The WIF credential dispatch in load_auth() only handled identity_pool and
aws credential types. When credential_source.executable was present (used
for Azure Managed Identity via Workload Identity Federation), it fell
through to identity_pool.Credentials which rejected it with MalformedError.
Add dispatch to google.auth.pluggable.Credentials for executable-type
credential sources, following the same pattern as the existing identity_pool
and aws helpers.
Fixes authentication for Azure Container Apps → GCP Vertex AI via WIF
with executable credential sources.
* feat(logging): add component and logger fields to JSON logs for 3rd p… (#24447)
* feat(logging): add component and logger fields to JSON logs for 3rd party filtering
* Let user-supplied extra fields win over auto-generated component/logger, tighten test assertions
* Feat - Add organization into the metrics metadata for org_id & org_alias (#24440)
* Add org_id and org_alias label names to Prometheus metric definitions
* Add user_api_key_org_alias to StandardLoggingUserAPIKeyMetadata
* Populate user_api_key_org_alias in pre-call metadata
* Pass org_id and org_alias into per-request Prometheus metric labels
* Add test for org labels on per-request Prometheus metrics
* chore: resolve test mockdata
* Address review: populate org_alias from DB view, add feature flag, use .get() for org metadata
* Add org labels to failure path and verify flag behavior in test
* Fix test: build flag-off enum_values without org fields
* Gate org labels behind feature flag in get_labels() instead of static metric lists
* Scope org label injection to metrics that carry team context, remove orphaned budget label defs, add test teardown
* Use explicit metric allowlist for org label injection instead of team heuristic
* Fix duplicate org label guard, move _org_label_metrics to class constant
* Reset custom_prometheus_metadata_labels after duplicate label assertion
* fix: emit org labels by default, remove flag, fix missing org_alias in all metadata paths
* fix: emit org labels by default, no opt-in flag required
* fix: write org_alias to metadata unconditionally in proxy_server.py
* fix: 429s from batch creation being converted to 500 (#24703)
* add us gov models (#24660)
* add us gov models
* added max tokens
* Litellm dev 04 02 2026 p1 (#25052)
* fix: replace hardcoded url
* fix: Anthropic web search cost not tracked for Chat Completions
The ModelResponse branch in response_object_includes_web_search_call()
only checked url_citation annotations and prompt_tokens_details, missing
Anthropic's server_tool_use.web_search_requests field. This caused
_handle_web_search_cost() to never fire for Anthropic Claude models.
Also routes vertex_ai/claude-* models to the Anthropic cost calculator
instead of the Gemini one, since Claude on Vertex uses the same
server_tool_use billing structure as the direct Anthropic API.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
* fix(anthropic): pass logging_obj to client.post for litellm_overhead_time_ms (#24071)
When LITELLM_DETAILED_TIMING=true, litellm_overhead_time_ms was null for
Anthropic because the handler did not pass logging_obj to client.post(),
so track_llm_api_timing could not set llm_api_duration_ms. Pass
logging_obj=logging_obj at all four post() call sites (make_call,
make_sync_call, acompletion, completion). Add test to ensure make_call
passes logging_obj to client.post.
Made-with: Cursor
* sap - add additional parameters for grounding
- additional parameter for grounding added for the sap provider
* sap - fix models
* (sap) add filtering, masking, translation SAP GEN AI Hub modules
* (sap) add tests and docs for new SAP modules
* (sap) add support of multiple modules config
* (sap) code refactoring
* (sap) rename file
* test(): add safeguard tests
* (sap) update tests
* (sap) update docs, solve merge conflict in transformation.py
* (sap) linter fix
* (sap) Align embedding request transformation with current API
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) mock commit
* (sap) run black formater
* (sap) add literals to models, add negative tests, fix test for tool transformation
* (sap) fix formating
* (sap) fix models
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) commit for rerun bot review
* (sap) minor improve
* (sap) fix after bot review
* (sap) lint fix
* docs(sap): update documentation
* fix(sap): change creds priority
* fix(sap): change creds priority
* fix(sap): fix sap creds unit test
* fix(sap): linter fix
* fix(sap): linter fix
* linter fix
* (sap) update logic of fetching creds, add additional tests
* (sap) clean up code
* (sap) fix after review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) add a possibility to put the service key by both variants
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) update test
* (sap) update service key resolve function
* (sap) run black formater
* (sap) fix validate credentials, add negative tests for credential fetching
* (sap) fix validate credentials, add negative tests for credential fetching
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) fix after bot review
* (sap) lint fix
* (sap) lint fix
* feat: support service_tier in gemini
* chore: add a service_tier field mapping from openai to gemini
* fix: use x-gemini-service-tier header in response
* docs: add service_tier to gemini docs
* chore: add defaut/standard mapping, and some tests
* chore: tidying up some case insensitivity
* chore: remove unnecessary guard
* fix: remove redundant test file
* fix: handle 'auto' case-insensitively
* fix: return service_tier on final steamed chunk
* chore: black
* feat: enable supports_service_tier to gemini models
* Fix get_standard_logging_metadata tests
* Fix test_get_model_info_bedrock_models
* Fix test_get_model_info_bedrock_models
* Fix remaining tests
* Fix mypy issues
* Fix tests
* Fix merge conflicts
* Fix code qa
* Fix code qa
* Fix code qa
* Fix greptile review
---------
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Josh <36064836+J-Byron@users.noreply.github.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Alperen Kömürcü <alperen.koemuercue@sap.com>
Co-authored-by: Vasilisa Parshikova <vasilisa.parshikova@sap.com>
Co-authored-by: Lin Xu <lin.xu03@sap.com>
Co-authored-by: Mark McDonald <macd@google.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>