Commit Graph

6111 Commits

Author SHA1 Message Date
yuneng-jiang 5c1f7d99bf Merge pull request #25731 from BerriAI/docs_guardrail
fallbacks image
2026-04-14 18:13:12 -07:00
shivam 65ce89dc67 update 2026-04-14 18:02:41 -07:00
shivam 19629004f5 fallbacks image 2026-04-14 17:58:11 -07:00
Yuneng Jiang 05ad48236f [Docs] Regenerate v1.83.3-stable release notes from v1.82.3-stable baseline
The previous v1.83.3 changelog was generated against v1.83.0-nightly and
missed ~3 weeks of work. This regenerates it against the previous stable
release and restructures the LLM API Endpoints section to group by API
type (Responses, Batch, Count Tokens, Video Generation, Pass-Through,
etc.) matching the convention used in v1.82.3, v1.82.0, and v1.81.14.
Adds ~25 previously uncited PRs, cross-section duplications for
cross-cutting changes, and a verified first-time-contributors list.
2026-04-14 17:19:42 -07:00
Ryan Crabbe 3aae15f5d8 [Docs] Use GitHub avatar for Ryan Crabbe in release notes
Replace the expiring LinkedIn CDN image URL with a stable GitHub
avatar URL for v1.83.3 and v1.83.7.rc.1 release notes.
2026-04-14 16:22:07 -07:00
Yuneng Jiang 966be2982a [Docs] Add missed content PRs to v1.83.7.rc.1 and update runbook
- Add 8 content PRs that merged directly to the release branch outside the listed staging PRs: #23769 (Ramp callback), #25252 (JWT OAuth2 override), #25254 (AWS GovCloud mode), #25258 (batch-limit cleanup), #25334 (router custom_llm_provider), #25345 (Triton embeddings), #25347 (tag-based routing), #25358 (Baseten pricing attribution)
- Add @kedarthakkar to new contributors (first-ever PR via #23769)
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS: require walking git log range between release tags in addition to staging PRs, and verify new-contributor status per author rather than trusting the GH release body floor
2026-04-14 16:13:09 -07:00
Yuneng Jiang 4a1da629fa [Fix] Correct pip install versions for v1.83.3-stable and v1.83.7.rc.1 docs
PyPI publishes 1.83.3 and 1.83.7 (no .post1 / rc1 suffixes) — align the pip install commands with the actual published versions.
2026-04-14 16:00:27 -07:00
Yuneng Jiang 8eec2c69b7 [Docs] Add release notes for v1.83.3-stable and v1.83.7.rc.1
- Retitle existing v1.83.3 preview file to v1.83.3-stable (same commit)
- Add new v1.83.7.rc.1 preview release notes
- Update RELEASE_NOTES_GENERATION_INSTRUCTIONS runbook with guidance on resolving staging PRs to their underlying commits
2026-04-14 15:58:13 -07:00
ishaan-berri 0e43050a01 Merge pull request #25650 from BerriAI/litellm_dev_04_13_2026_p1
feat: add litellm.compress() — BM25-based prompt compression with ret…
2026-04-14 12:24:47 -07:00
Sameer Kankute 1a9a31e4a2 Merge pull request #25665 from BerriAI/litellm_oss_staging_04_13_2026_p1
litellm oss staging 04/13/2026
2026-04-14 23:50:08 +05:30
Jonas Neubert e724e5e07d add NO_OPENAPI env var to disable /openapi.json endpoint (#25547) 2026-04-14 23:37:49 +05:30
Ashton Sidhu 6343148c95 Hiddenlayer Integration: Add V2 Integration (#22708)
* Serialize error message to a string; only scan last message

* Update litellm/proxy/guardrails/guardrail_hooks/hiddenlayer/hiddenlayer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Add v2 of hiddenlayer guardrail implementation

* Update litellm/proxy/guardrails/guardrail_hooks/hiddenlayer/hiddenlayer.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Fix potential header issue

* linting

* Add image support

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-14 23:37:49 +05:30
ishaan-berri 4a71583951 Merge pull request #25348 from BerriAI/litellm_gemini-veo-video-resolution-pricing2
feat(gemini): Veo Lite pricing, video resolution usage and tiered cost
2026-04-14 10:23:22 -07:00
yuneng-jiang 8427534f13 Merge pull request #25647 from BerriAI/litellm_yj_apr_11
[Infra] Merge dev branch with main
2026-04-13 17:28:38 -07:00
yuneng-jiang a306092d47 Merge pull request #25463 from BerriAI/litellm_oss_staging_04_09_2026
Litellm oss staging 04 09 2026
2026-04-13 17:25:53 -07:00
ishaan-berri 548225ef31 Merge pull request #25586 from BerriAI/litellm_ishaan_april11
Litellm ishaan april11
2026-04-13 14:55:50 -07:00
Krrish Dholakia 26c7412339 feat: add litellm.compress() — BM25-based prompt compression with retrieval tool (#25637)
* feat: add litellm.compress() for BM25-based context compression

Adds a compress() utility that reduces context size for LLM calls using
BM25 relevance scoring (with optional semantic embeddings via
litellm.embedding()). Messages below a token threshold pass through
unchanged; messages above are scored, ranked, and the lowest-relevance
ones replaced with stubs. Originals are cached and a retrieval tool is
injected so the model can recover dropped content on demand.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(compress): truncate high-scoring messages instead of fully stubbing them

When a relevant message was too large to fit in the token budget it was
replaced with a stub, leaving the LLM with no real content to work with.
Now the highest-scoring overflow message is truncated (first 70% + last 30%
of words) to fill the remaining budget, so the LLM always receives actual
content rather than just a retrieval pointer.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bm25): add prefix expansion so query terms match inflected doc tokens

"cook" now matches "cooking", "auth" matches "authentication", etc.
Without this, short query terms scored 0 against longer inflected forms
in documents, causing the wrong message to be kept.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* test: add routing correctness test and eval harness for litellm.compress()

- test_simple_compression: parametrized test verifying BM25 routes the
  right message based on query ("How to cook?" keeps cooking, "Fix auth"
  keeps auth content)
- eval_compression.py: end-to-end eval harness comparing baseline vs
  compressed model performance on HumanEval-style coding problems

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat(eval): add SWE-bench Lite compression eval harness

Uses princeton-nlp/SWE-bench_Lite_bm25_27K which bundles ~27k tokens of
BM25-retrieved repo context per problem — large enough to meaningfully
stress litellm.compress() without Docker or GitHub API calls.

Proxy eval metrics (no test runner needed):
  - has_diff: model produced a valid unified diff
  - file_overlap: fraction of gold-patch files in generated patch
  - exact_file_match: generated patch touches exactly the right files

Run: python tests/eval_swe_bench.py --model gpt-4o --problems 10

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(eval): robust dataset loading + sys.path fix for worktree imports

- Add HuggingFace API fallback so the SWE-bench loader doesn't need
  the `datasets` library (avoids pyarrow/numpy binary compat issues)
- Insert repo root into sys.path so compression module resolves
  from worktrees
- Use direct import of litellm_compress to avoid __getattr__ issues

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* improve compression quality: line-based truncation, multi-message budget, 70% default target

- Switch truncate_message from word-based to line-based splitting to
  preserve code structure (function boundaries, indentation)
- Allow multiple messages to be truncated instead of burning entire
  budget on one overflow message
- Raise default compression target from 50% to 70% of trigger for
  better quality/cost tradeoff
- Add --compression-target CLI arg to SWE-bench eval harness
- Move tests to canonical locations (tests/test_litellm/, scripts/)
- Add docs page and sidebar entries for compress()

Eval results (5 problems, Opus, trigger=10k):
  Hunk overlap delta improved from -0.417 to -0.221
  Content similarity now matches baseline (+0.006)
  Cost savings: 72%

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* docs: add SWE-bench performance results to compress() docs

Include benchmark table from Opus eval (5 problems, trigger=10k)
showing 72% cost savings with file-level quality fully preserved.
Add metric explanations and eval runner examples.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(eval): use tolerance-based hunk overlap metric

The exact line-number matching was too brittle — LLM-generated patches
often target the right code region but with slightly offset line numbers.
Switch to hunk-level overlap with a 10-line tolerance window so nearby
edits count as matches. This better reflects actual patch quality.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add compression_interception callback for LiteLLM Proxy

Add a proxy callback that automatically compresses incoming /v1/messages
payloads above a configurable token threshold, runs the retrieval tool
loop server-side, and returns the final response. This brings compress()
support to proxy deployments (e.g. Claude Code via /v1/messages).

- New callback: litellm/integrations/compression_interception/
- Proxy config: compression_interception_params in litellm_settings
- Support for input_type param in compress() (openai vs anthropic)
- Docs: proxy setup instructions with YAML config example
- Tests: 139-line unit test suite for the interception handler

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Revert "feat: add compression_interception callback for LiteLLM Proxy"

This reverts commit 72bd5cb152ca1df07f14a14e14a2816e188874a8.

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-13 12:23:54 -07:00
Krrish Dholakia d319cd8cc6 fix: blog dark mode - text invisible on dark background (#25620)
The blog CSS selectors for dark mode used descendant selectors like
[data-theme='dark'] .blog-wrapper which never matched because both
data-theme and .blog-wrapper are applied to the same <html> element
by Docusaurus. Fixed by using compound selectors (no space):
[data-theme='dark'].blog-wrapper.

Also added missing dark-mode overrides for:
- pre/code blocks in blog posts
- link colors in blog posts
- marquee items, separators, and labels on blog list page
- pagination links on blog list page
- meta text and author separators on blog list page

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-04-13 09:08:57 -07:00
Sameer Kankute fa605d85c0 Merge pull request #25616 from BerriAI/main
merge main
2026-04-13 08:43:43 +05:30
Yuneng Jiang 41849a540d document new env var and fix type hint
- Add LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS to the environment variables
  reference so the documentation test passes.
- Annotate the values variable in _reject_os_environ_references so it
  accepts both dict.values() and list iterables.
2026-04-11 22:17:32 -07:00
Yuneng Jiang 6baee0dfcb address review feedback
- Log a warning when dropping callback params that carry os.environ/
  references so operators notice the misconfiguration.
- Require absolute paths in oidc/file/ and correct the documented
  example to use the leading-slash form.
- Drop the unused return value from _reject_os_environ_references.
2026-04-11 21:52:39 -07:00
Yuneng Jiang 06a0d4498a fix: tighten handling of environment references in request parameters
- Reject os.environ/ references supplied via /health/test_connection
  request params instead of resolving them; config-sourced values are
  already resolved before reaching the endpoint.
- Skip os.environ/ references in dynamic callback params loaded from
  per-request metadata.
- Constrain oidc/file/ to an allowed credential directory allowlist
  (defaults to /var/run/secrets and /run/secrets, overridable via
  LITELLM_OIDC_ALLOWED_CREDENTIAL_DIRS).
2026-04-11 21:41:41 -07:00
ishaan-berri fdd7500904 blog: add back arrow to blog post pages (#25587)
* blog: add back arrow to post pages

* blog: style back arrow — fixed top-left below navbar
2026-04-11 19:15:45 -07:00
ishaan-berri 1edf41c26f Merge pull request #25585 from BerriAI/litellm_dev_04_11_2026_p1
Litellm dev 04 11 2026 p1
2026-04-11 18:46:57 -07:00
ishaan-berri 329a526b9d Merge pull request #25579 from BerriAI/feat/anthropic-advisor-tool
feat(advisor): advisor tool orchestration loop for non-Anthropic providers
2026-04-11 18:32:44 -07:00
Ishaan Jaffer dd87f3be5b docs(advisor): move supported providers to top, focus how it works on litellm native loop 2026-04-11 18:27:18 -07:00
Ishaan Jaffer a8bc7bfcd4 docs(advisor): add how it works section with mermaid diagram + non-native provider table 2026-04-11 18:23:33 -07:00
Ishaan Jaffer 35f4b47ff8 apply content guidelines: scale/resilience narrative, FAQ, Key Takeaways, Conclusion CTA 2026-04-11 18:12:32 -07:00
Ishaan Jaffer 14eed24471 add redis circuit breaker blog post with React diagrams 2026-04-11 18:02:59 -07:00
Ishaan Jaffer 8e616ecdf4 add BlogPostPage swizzle: hide sidebar, add hiring CTA on every post 2026-04-11 18:02:56 -07:00
Ishaan Jaffer dac44fb443 blog list styles: clean typography, marquee animation, hero layout 2026-04-11 18:02:52 -07:00
Ishaan Jaffer 85cb7db8b9 blog list page: Ramp-style flat list with hero, provider marquee, hiring CTA 2026-04-11 18:02:48 -07:00
Ishaan Jaffer 05d516482f restyle blog list page to match engineering blog aesthetic 2026-04-11 18:02:44 -07:00
Krrish Dholakia e08e3bf748 docs: clarify how to get benchmarking script 2026-04-11 17:31:03 -07:00
Krrish Dholakia 12bca649fc docs: refactor benchmarking docs to be clearer 2026-04-11 17:30:09 -07:00
Yuneng Jiang 909247785e Merge remote-tracking branch 'origin' into litellm_internal_staging_04_11_2026 2026-04-11 15:41:03 -07:00
Sameer Kankute c13be44e44 feat(guardrails): optional skip system message in unified guardrail inputs (#25481)
* feat(guardrails): optional skip system message in unified guardrail inputs

Made-with: Cursor

* feat(dashboard): skip_system_message_in_guardrail in guardrail UI

Add a tri-state control (inherit / yes / no) when creating or editing
guardrails so admins can set litellm_params.skip_system_message_in_guardrail
without YAML. Table edit merges existing litellm_params before PUT to avoid
wiping content-filter and other provider fields.

Document the dashboard flow in the guardrails quick start with a screenshot.

Made-with: Cursor

* fix(guardrails): type structured_messages as AllMessageValues for mypy

Use AllMessageValues in openai_messages_without_system and cast adapter
request messages so GenericGuardrailAPIInputs matches TypedDict.

Made-with: Cursor
2026-04-11 08:53:24 -07:00
Yuneng Jiang 9a0487553d Merge remote-tracking branch 'origin' into litellm_oss_staging_04_09_2026 2026-04-10 16:41:27 -07:00
ishaan-berri 831083b565 Merge pull request #25525 from BerriAI/feat/anthropic-advisor-tool
feat(anthropic): support advisor_20260301 tool type
2026-04-10 16:39:34 -07:00
Krrish Dholakia 4e12d3c562 docs: document april townhall announcements (#25537)
* docs: document april townhall announcements

* docs: cleanup blog post
2026-04-10 16:12:06 -07:00
Ishaan Jaffer d6e2a74c0f docs: move advisor tool doc to completion/ guides section in sidebar 2026-04-10 15:08:25 -07:00
Ishaan Jaffer ed973c049f docs: add Advisor Tool documentation page 2026-04-10 13:15:54 -07:00
Yuneng Jiang a889dea8cc [Docs] Add missing MCP per-user token env vars to config_settings
MCP_PER_USER_TOKEN_DEFAULT_TTL and MCP_PER_USER_TOKEN_EXPIRY_BUFFER_SECONDS
were added in #25441 but not documented, causing test_env_keys.py to fail.
2026-04-09 23:58:36 -07:00
Krrish Dholakia a6d81e1575 docs: add Docker Image Security Guide for cosign verification and deployment best practices (#25439)
- New doc page covering all signed image variants, verification commands,
  CI/CD enforcement (K8s Sigstore Policy Controller, GCP Binary Authorization,
  AWS/EKS, GitHub Actions), digest pinning, and safe upgrade patterns
- Added to sidebar under Setup & Deployment
- Cross-linked from the existing deploy.md cosign section

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-04-09 23:58:35 -07:00
Yuneng Jiang ce0b57b4ff [Docs] Add missing MCP per-user token env vars to config_settings
MCP_PER_USER_TOKEN_DEFAULT_TTL and MCP_PER_USER_TOKEN_EXPIRY_BUFFER_SECONDS
were added in #25441 but not documented, causing test_env_keys.py to fail.
2026-04-09 21:04:34 -07:00
Krrish Dholakia 3a6db708ce docs: add Docker Image Security Guide for cosign verification and deployment best practices (#25439)
- New doc page covering all signed image variants, verification commands,
  CI/CD enforcement (K8s Sigstore Policy Controller, GCP Binary Authorization,
  AWS/EKS, GitHub Actions), digest pinning, and safe upgrade patterns
- Added to sidebar under Setup & Deployment
- Cross-linked from the existing deploy.md cosign section

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-04-09 11:50:15 -07:00
stuxf a6c30b30bf build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv

* ci: move automation and local tooling to uv

* docker: migrate image builds and runtime setup to uv

* docs: update install and deployment guidance for uv

* chore: align auxiliary scripts and tests with uv

* test: harden test_litellm isolation

* fix: keep release and health check images self-contained

* build: pin uv tooling and health check deps

* test: isolate bedrock image request formatting from suite state

* test: cover sandbox executor requirements flow

* ci: fix circleci no-op command steps

* ci: fix circleci publish workflow parsing

* fix: stabilize remaining uv migration CI checks

* ci: increase matrix test timeout headroom

* fix: restore published docker and license coverage

* fix: restore proxy runtime build parity

* fix: restore proxy extras parity and venv migrations

* ci: persist uv path across circleci steps

* fix: keep psycopg binary in default test env

* docker: preserve prisma cache across stages

* test: run local proxy checks through uv python

* build: restore runtime deps moved into ci

* build: refresh uv lock after upstream merge

* fix: restore module import in test_check_migration after merge

The conflict resolution imported only the function but the test body
references check_migration as a module throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching

- Move google-generativeai, Pillow, tenacity back to ci group (they are
  lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
  in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
  from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
  environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
  deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate uv.lock after removing nodejs-wheel-binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): use cache/restore instead of cache to prevent cache poisoning

The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert

The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv cache in publish workflow

Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate verbose_logger mock in test_check_migration

The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): free disk space before Docker build in test-server-root-path

The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:46:23 -07:00
Abhijoy Sarkar c688d9d6bc Add PromptGuard guardrail integration (#24268)
* Add PromptGuard guardrail integration

Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy,
supporting prompt injection detection, PII redaction, topic filtering,
entity blocklists, and hallucination detection via PromptGuard's
/api/v1/guard API endpoint.

Backend:
- Add PROMPTGUARD to SupportedGuardrailIntegrations enum
- Implement PromptGuardGuardrail (CustomGuardrail subclass) with
  apply_guardrail handling allow/block/redact decisions
- Add Pydantic config model with api_key, api_base, ui_friendly_name
- Auto-discovered via guardrail_hooks/promptguard/__init__.py registries

Frontend:
- Add PromptGuard partner card to Guardrail Garden with eval scores
- Add preset configuration for quick setup
- Add logo to guardrailLogoMap

Tests:
- 30 unit tests covering configuration, allow/block/redact actions,
  request payload construction, error handling, config model, and
  registry wiring

* Fix redact path and init ordering per review feedback

- P1: Update structured_messages (not just texts) when PromptGuard
  returns a redact decision, so PII redaction is effective for the
  primary LLM message path
- P2: Validate credentials before allocating the HTTPX client so
  resources aren't acquired if PromptGuardMissingCredentials is raised
- Add tests for structured_messages redaction and texts-only redaction

* Harden PromptGuard integration: fail-open, event hooks, images, docs

- Add block_on_error config (default fail-closed, configurable fail-open)
- Declare supported_event_hooks (pre_call, post_call) like other vendors
- Forward images from GenericGuardrailAPIInputs to PromptGuard API
- Wrap API call in try/except for resilient error handling
- Add comprehensive documentation page with config examples
- Register docs page in sidebar alongside other guardrail providers
- Expand test suite from 32 to 40 tests covering new functionality

* Fix dict[str, Any] -> Dict[str, Any] for Python 3.8 compat

* Address remaining Greptile feedback: timeout, redact guard

- Add explicit 10s timeout to async_handler.post() to prevent
  indefinite hangs when PromptGuard API is unresponsive
- Guard redact path: only update inputs["texts"] when the key
  was originally present, avoiding phantom key injection
- Add test: redact with structured_messages only does not create
  texts key (41 tests total)

* Fix CI lint: black formatting, add PromptGuardConfigModel to LitellmParams

- Reformat promptguard.py to match CI black version (parenthesization)
- Add PromptGuardConfigModel as base class of LitellmParams for proper
  Pydantic schema validation, consistent with all other guardrail vendors
- Use litellm_params.block_on_error directly (now a typed field)

* Address Greptile review: redact path, null decision, error context

- P1: Filter _extract_texts_from_messages to user-role messages only,
  preventing system/assistant content from being injected into texts
- P1: Strengthen test_redact_updates_structured_messages assertion from
  weak `in` check to strict equality, catching the injection bug
- P2: Use `result.get("decision") or "allow"` to handle explicit null
  decision values (not just absent keys)
- P2: Wrap bare exception re-raise in GuardrailRaisedException so the
  caller knows which guardrail failed (block_on_error=True path)
- P2: Add static Promptguard entry in guardrail_provider_map so the
  preset works before populateGuardrailProviderMap is called
- Add test for explicit null decision treated as allow

* Fix black formatting: collapse f-string in error message
2026-04-09 08:12:24 -07:00
michelligabriele cd9c511df6 feat(proxy): add credential overrides per team/project via model_config metadata (#24438) 2026-04-09 07:22:27 -07:00
Krrish Dholakia f42ffed2bd Litellm oss staging 04 02 2026 p1 (#25055)
* fix(vertex_ai): support pluggable (executable) credential_source for WIF auth (#24700)

The WIF credential dispatch in load_auth() only handled identity_pool and
aws credential types. When credential_source.executable was present (used
for Azure Managed Identity via Workload Identity Federation), it fell
through to identity_pool.Credentials which rejected it with MalformedError.

Add dispatch to google.auth.pluggable.Credentials for executable-type
credential sources, following the same pattern as the existing identity_pool
and aws helpers.

Fixes authentication for Azure Container Apps → GCP Vertex AI via WIF
with executable credential sources.

* feat(logging): add component and logger fields to JSON logs for 3rd p… (#24447)

* feat(logging): add component and logger fields to JSON logs for 3rd party filtering

* Let user-supplied extra fields win over auto-generated component/logger, tighten test assertions

* Feat - Add organization into the metrics metadata for org_id & org_alias (#24440)

* Add org_id and org_alias label names to Prometheus metric definitions

* Add user_api_key_org_alias to StandardLoggingUserAPIKeyMetadata

* Populate user_api_key_org_alias in pre-call metadata

* Pass org_id and org_alias into per-request Prometheus metric labels

* Add test for org labels on per-request Prometheus metrics

* chore: resolve test mockdata

* Address review: populate org_alias from DB view, add feature flag, use .get() for org metadata

* Add org labels to failure path and verify flag behavior in test

* Fix test: build flag-off enum_values without org fields

* Gate org labels behind feature flag in get_labels() instead of static metric lists

* Scope org label injection to metrics that carry team context, remove orphaned budget label defs, add test teardown

* Use explicit metric allowlist for org label injection instead of team heuristic

* Fix duplicate org label guard, move _org_label_metrics to class constant

* Reset custom_prometheus_metadata_labels after duplicate label assertion

* fix: emit org labels by default, remove flag, fix missing org_alias in all metadata paths

* fix: emit org labels by default, no opt-in flag required

* fix: write org_alias to metadata unconditionally in proxy_server.py

* fix: 429s from batch creation being converted to 500 (#24703)

* add us gov models (#24660)

* add us gov models

* added max tokens

* Litellm dev 04 02 2026 p1 (#25052)

* fix: replace hardcoded url

* fix: Anthropic web search cost not tracked for Chat Completions

The ModelResponse branch in response_object_includes_web_search_call()
only checked url_citation annotations and prompt_tokens_details, missing
Anthropic's server_tool_use.web_search_requests field. This caused
_handle_web_search_cost() to never fire for Anthropic Claude models.

Also routes vertex_ai/claude-* models to the Anthropic cost calculator
instead of the Gemini one, since Claude on Vertex uses the same
server_tool_use billing structure as the direct Anthropic API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(anthropic): pass logging_obj to client.post for litellm_overhead_time_ms (#24071)

When LITELLM_DETAILED_TIMING=true, litellm_overhead_time_ms was null for
Anthropic because the handler did not pass logging_obj to client.post(),
so track_llm_api_timing could not set llm_api_duration_ms. Pass
logging_obj=logging_obj at all four post() call sites (make_call,
make_sync_call, acompletion, completion). Add test to ensure make_call
passes logging_obj to client.post.

Made-with: Cursor

* sap - add additional parameters for grounding

- additional parameter for grounding added for the sap provider

* sap - fix models

* (sap) add filtering, masking, translation SAP GEN AI Hub modules

* (sap) add tests and docs for new SAP modules

* (sap) add support of multiple modules config

* (sap) code refactoring

* (sap) rename file

* test(): add safeguard tests

* (sap) update tests

* (sap) update docs, solve merge conflict in transformation.py

* (sap) linter fix

* (sap) Align embedding request transformation with current API

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) mock commit

* (sap) run black formater

* (sap) add literals to models, add negative tests, fix test for tool transformation

* (sap) fix formating

* (sap) fix models

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) commit for rerun bot review

* (sap) minor improve

* (sap) fix after bot review

* (sap) lint fix

* docs(sap): update documentation

* fix(sap): change creds priority

* fix(sap): change creds priority

* fix(sap): fix sap creds unit test

* fix(sap): linter fix

* fix(sap): linter fix

* linter fix

* (sap) update logic of fetching creds, add additional tests

* (sap) clean up code

* (sap) fix after review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) add a possibility to put the service key by both variants

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) update test

* (sap) update service key resolve function

* (sap) run black formater

* (sap) fix validate credentials, add negative tests for credential fetching

* (sap) fix validate credentials, add negative tests for credential fetching

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) lint fix

* (sap) lint fix

* feat: support service_tier in gemini

* chore: add a service_tier field mapping from openai to gemini

* fix: use x-gemini-service-tier header in response

* docs: add service_tier to gemini docs

* chore: add defaut/standard mapping, and some tests

* chore: tidying up some case insensitivity

* chore: remove unnecessary guard

* fix: remove redundant test file

* fix: handle 'auto' case-insensitively

* fix: return service_tier on final steamed chunk

* chore: black

* feat: enable supports_service_tier to gemini models

* Fix get_standard_logging_metadata tests

* Fix test_get_model_info_bedrock_models

* Fix test_get_model_info_bedrock_models

* Fix remaining tests

* Fix mypy issues

* Fix tests

* Fix merge conflicts

* Fix code qa

* Fix code qa

* Fix code qa

* Fix greptile review

---------

Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Josh <36064836+J-Byron@users.noreply.github.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Alperen Kömürcü <alperen.koemuercue@sap.com>
Co-authored-by: Vasilisa Parshikova <vasilisa.parshikova@sap.com>
Co-authored-by: Lin Xu <lin.xu03@sap.com>
Co-authored-by: Mark McDonald <macd@google.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
2026-04-08 21:37:10 -07:00