Commit Graph

6085 Commits

Author SHA1 Message Date
Sameer Kankute 3fdd67ff23 Delete docs/my-website/blog/debug_cost_discrepancy/index.md 2026-04-15 21:35:05 +05:30
Sameer Kankute 639135e365 Update docs/my-website/blog/debug_cost_discrepancy/index.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-13 11:33:24 +05:30
Sameer Kankute 5e830e0d55 docs(troubleshoot): add cost discrepancy debugging guide
- New troubleshoot page and blog post with step-by-step comparison workflow
- Screenshots under static/img/cost-discrepancy-debug
- Link from spend tracking; sidebar entry under Troubleshooting
- Flowchart SVG: Path B connectors below box; clarify LiteLLM schedules customer calls when stuck

Made-with: Cursor
2026-04-13 11:27:16 +05:30
Sameer Kankute fa605d85c0 Merge pull request #25616 from BerriAI/main
merge main
2026-04-13 08:43:43 +05:30
ishaan-berri fdd7500904 blog: add back arrow to blog post pages (#25587)
* blog: add back arrow to post pages

* blog: style back arrow — fixed top-left below navbar
2026-04-11 19:15:45 -07:00
ishaan-berri 1edf41c26f Merge pull request #25585 from BerriAI/litellm_dev_04_11_2026_p1
Litellm dev 04 11 2026 p1
2026-04-11 18:46:57 -07:00
Ishaan Jaffer 35f4b47ff8 apply content guidelines: scale/resilience narrative, FAQ, Key Takeaways, Conclusion CTA 2026-04-11 18:12:32 -07:00
Ishaan Jaffer 14eed24471 add redis circuit breaker blog post with React diagrams 2026-04-11 18:02:59 -07:00
Ishaan Jaffer 8e616ecdf4 add BlogPostPage swizzle: hide sidebar, add hiring CTA on every post 2026-04-11 18:02:56 -07:00
Ishaan Jaffer dac44fb443 blog list styles: clean typography, marquee animation, hero layout 2026-04-11 18:02:52 -07:00
Ishaan Jaffer 85cb7db8b9 blog list page: Ramp-style flat list with hero, provider marquee, hiring CTA 2026-04-11 18:02:48 -07:00
Ishaan Jaffer 05d516482f restyle blog list page to match engineering blog aesthetic 2026-04-11 18:02:44 -07:00
Krrish Dholakia e08e3bf748 docs: clarify how to get benchmarking script 2026-04-11 17:31:03 -07:00
Krrish Dholakia 12bca649fc docs: refactor benchmarking docs to be clearer 2026-04-11 17:30:09 -07:00
Yuneng Jiang 909247785e Merge remote-tracking branch 'origin' into litellm_internal_staging_04_11_2026 2026-04-11 15:41:03 -07:00
Sameer Kankute c13be44e44 feat(guardrails): optional skip system message in unified guardrail inputs (#25481)
* feat(guardrails): optional skip system message in unified guardrail inputs

Made-with: Cursor

* feat(dashboard): skip_system_message_in_guardrail in guardrail UI

Add a tri-state control (inherit / yes / no) when creating or editing
guardrails so admins can set litellm_params.skip_system_message_in_guardrail
without YAML. Table edit merges existing litellm_params before PUT to avoid
wiping content-filter and other provider fields.

Document the dashboard flow in the guardrails quick start with a screenshot.

Made-with: Cursor

* fix(guardrails): type structured_messages as AllMessageValues for mypy

Use AllMessageValues in openai_messages_without_system and cast adapter
request messages so GenericGuardrailAPIInputs matches TypedDict.

Made-with: Cursor
2026-04-11 08:53:24 -07:00
ishaan-berri 831083b565 Merge pull request #25525 from BerriAI/feat/anthropic-advisor-tool
feat(anthropic): support advisor_20260301 tool type
2026-04-10 16:39:34 -07:00
Krrish Dholakia 4e12d3c562 docs: document april townhall announcements (#25537)
* docs: document april townhall announcements

* docs: cleanup blog post
2026-04-10 16:12:06 -07:00
Ishaan Jaffer d6e2a74c0f docs: move advisor tool doc to completion/ guides section in sidebar 2026-04-10 15:08:25 -07:00
Ishaan Jaffer ed973c049f docs: add Advisor Tool documentation page 2026-04-10 13:15:54 -07:00
Yuneng Jiang ce0b57b4ff [Docs] Add missing MCP per-user token env vars to config_settings
MCP_PER_USER_TOKEN_DEFAULT_TTL and MCP_PER_USER_TOKEN_EXPIRY_BUFFER_SECONDS
were added in #25441 but not documented, causing test_env_keys.py to fail.
2026-04-09 21:04:34 -07:00
Krrish Dholakia 3a6db708ce docs: add Docker Image Security Guide for cosign verification and deployment best practices (#25439)
- New doc page covering all signed image variants, verification commands,
  CI/CD enforcement (K8s Sigstore Policy Controller, GCP Binary Authorization,
  AWS/EKS, GitHub Actions), digest pinning, and safe upgrade patterns
- Added to sidebar under Setup & Deployment
- Cross-linked from the existing deploy.md cosign section

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-04-09 11:50:15 -07:00
Abhijoy Sarkar c688d9d6bc Add PromptGuard guardrail integration (#24268)
* Add PromptGuard guardrail integration

Add PromptGuard as a first-class guardrail vendor in LiteLLM's proxy,
supporting prompt injection detection, PII redaction, topic filtering,
entity blocklists, and hallucination detection via PromptGuard's
/api/v1/guard API endpoint.

Backend:
- Add PROMPTGUARD to SupportedGuardrailIntegrations enum
- Implement PromptGuardGuardrail (CustomGuardrail subclass) with
  apply_guardrail handling allow/block/redact decisions
- Add Pydantic config model with api_key, api_base, ui_friendly_name
- Auto-discovered via guardrail_hooks/promptguard/__init__.py registries

Frontend:
- Add PromptGuard partner card to Guardrail Garden with eval scores
- Add preset configuration for quick setup
- Add logo to guardrailLogoMap

Tests:
- 30 unit tests covering configuration, allow/block/redact actions,
  request payload construction, error handling, config model, and
  registry wiring

* Fix redact path and init ordering per review feedback

- P1: Update structured_messages (not just texts) when PromptGuard
  returns a redact decision, so PII redaction is effective for the
  primary LLM message path
- P2: Validate credentials before allocating the HTTPX client so
  resources aren't acquired if PromptGuardMissingCredentials is raised
- Add tests for structured_messages redaction and texts-only redaction

* Harden PromptGuard integration: fail-open, event hooks, images, docs

- Add block_on_error config (default fail-closed, configurable fail-open)
- Declare supported_event_hooks (pre_call, post_call) like other vendors
- Forward images from GenericGuardrailAPIInputs to PromptGuard API
- Wrap API call in try/except for resilient error handling
- Add comprehensive documentation page with config examples
- Register docs page in sidebar alongside other guardrail providers
- Expand test suite from 32 to 40 tests covering new functionality

* Fix dict[str, Any] -> Dict[str, Any] for Python 3.8 compat

* Address remaining Greptile feedback: timeout, redact guard

- Add explicit 10s timeout to async_handler.post() to prevent
  indefinite hangs when PromptGuard API is unresponsive
- Guard redact path: only update inputs["texts"] when the key
  was originally present, avoiding phantom key injection
- Add test: redact with structured_messages only does not create
  texts key (41 tests total)

* Fix CI lint: black formatting, add PromptGuardConfigModel to LitellmParams

- Reformat promptguard.py to match CI black version (parenthesization)
- Add PromptGuardConfigModel as base class of LitellmParams for proper
  Pydantic schema validation, consistent with all other guardrail vendors
- Use litellm_params.block_on_error directly (now a typed field)

* Address Greptile review: redact path, null decision, error context

- P1: Filter _extract_texts_from_messages to user-role messages only,
  preventing system/assistant content from being injected into texts
- P1: Strengthen test_redact_updates_structured_messages assertion from
  weak `in` check to strict equality, catching the injection bug
- P2: Use `result.get("decision") or "allow"` to handle explicit null
  decision values (not just absent keys)
- P2: Wrap bare exception re-raise in GuardrailRaisedException so the
  caller knows which guardrail failed (block_on_error=True path)
- P2: Add static Promptguard entry in guardrail_provider_map so the
  preset works before populateGuardrailProviderMap is called
- Add test for explicit null decision treated as allow

* Fix black formatting: collapse f-string in error message
2026-04-09 08:12:24 -07:00
michelligabriele cd9c511df6 feat(proxy): add credential overrides per team/project via model_config metadata (#24438) 2026-04-09 07:22:27 -07:00
Krrish Dholakia f42ffed2bd Litellm oss staging 04 02 2026 p1 (#25055)
* fix(vertex_ai): support pluggable (executable) credential_source for WIF auth (#24700)

The WIF credential dispatch in load_auth() only handled identity_pool and
aws credential types. When credential_source.executable was present (used
for Azure Managed Identity via Workload Identity Federation), it fell
through to identity_pool.Credentials which rejected it with MalformedError.

Add dispatch to google.auth.pluggable.Credentials for executable-type
credential sources, following the same pattern as the existing identity_pool
and aws helpers.

Fixes authentication for Azure Container Apps → GCP Vertex AI via WIF
with executable credential sources.

* feat(logging): add component and logger fields to JSON logs for 3rd p… (#24447)

* feat(logging): add component and logger fields to JSON logs for 3rd party filtering

* Let user-supplied extra fields win over auto-generated component/logger, tighten test assertions

* Feat - Add organization into the metrics metadata for org_id & org_alias (#24440)

* Add org_id and org_alias label names to Prometheus metric definitions

* Add user_api_key_org_alias to StandardLoggingUserAPIKeyMetadata

* Populate user_api_key_org_alias in pre-call metadata

* Pass org_id and org_alias into per-request Prometheus metric labels

* Add test for org labels on per-request Prometheus metrics

* chore: resolve test mockdata

* Address review: populate org_alias from DB view, add feature flag, use .get() for org metadata

* Add org labels to failure path and verify flag behavior in test

* Fix test: build flag-off enum_values without org fields

* Gate org labels behind feature flag in get_labels() instead of static metric lists

* Scope org label injection to metrics that carry team context, remove orphaned budget label defs, add test teardown

* Use explicit metric allowlist for org label injection instead of team heuristic

* Fix duplicate org label guard, move _org_label_metrics to class constant

* Reset custom_prometheus_metadata_labels after duplicate label assertion

* fix: emit org labels by default, remove flag, fix missing org_alias in all metadata paths

* fix: emit org labels by default, no opt-in flag required

* fix: write org_alias to metadata unconditionally in proxy_server.py

* fix: 429s from batch creation being converted to 500 (#24703)

* add us gov models (#24660)

* add us gov models

* added max tokens

* Litellm dev 04 02 2026 p1 (#25052)

* fix: replace hardcoded url

* fix: Anthropic web search cost not tracked for Chat Completions

The ModelResponse branch in response_object_includes_web_search_call()
only checked url_citation annotations and prompt_tokens_details, missing
Anthropic's server_tool_use.web_search_requests field. This caused
_handle_web_search_cost() to never fire for Anthropic Claude models.

Also routes vertex_ai/claude-* models to the Anthropic cost calculator
instead of the Gemini one, since Claude on Vertex uses the same
server_tool_use billing structure as the direct Anthropic API.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(anthropic): pass logging_obj to client.post for litellm_overhead_time_ms (#24071)

When LITELLM_DETAILED_TIMING=true, litellm_overhead_time_ms was null for
Anthropic because the handler did not pass logging_obj to client.post(),
so track_llm_api_timing could not set llm_api_duration_ms. Pass
logging_obj=logging_obj at all four post() call sites (make_call,
make_sync_call, acompletion, completion). Add test to ensure make_call
passes logging_obj to client.post.

Made-with: Cursor

* sap - add additional parameters for grounding

- additional parameter for grounding added for the sap provider

* sap - fix models

* (sap) add filtering, masking, translation SAP GEN AI Hub modules

* (sap) add tests and docs for new SAP modules

* (sap) add support of multiple modules config

* (sap) code refactoring

* (sap) rename file

* test(): add safeguard tests

* (sap) update tests

* (sap) update docs, solve merge conflict in transformation.py

* (sap) linter fix

* (sap) Align embedding request transformation with current API

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) mock commit

* (sap) run black formater

* (sap) add literals to models, add negative tests, fix test for tool transformation

* (sap) fix formating

* (sap) fix models

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) commit for rerun bot review

* (sap) minor improve

* (sap) fix after bot review

* (sap) lint fix

* docs(sap): update documentation

* fix(sap): change creds priority

* fix(sap): change creds priority

* fix(sap): fix sap creds unit test

* fix(sap): linter fix

* fix(sap): linter fix

* linter fix

* (sap) update logic of fetching creds, add additional tests

* (sap) clean up code

* (sap) fix after review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) add a possibility to put the service key by both variants

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) update test

* (sap) update service key resolve function

* (sap) run black formater

* (sap) fix validate credentials, add negative tests for credential fetching

* (sap) fix validate credentials, add negative tests for credential fetching

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) fix after bot review

* (sap) lint fix

* (sap) lint fix

* feat: support service_tier in gemini

* chore: add a service_tier field mapping from openai to gemini

* fix: use x-gemini-service-tier header in response

* docs: add service_tier to gemini docs

* chore: add defaut/standard mapping, and some tests

* chore: tidying up some case insensitivity

* chore: remove unnecessary guard

* fix: remove redundant test file

* fix: handle 'auto' case-insensitively

* fix: return service_tier on final steamed chunk

* chore: black

* feat: enable supports_service_tier to gemini models

* Fix get_standard_logging_metadata tests

* Fix test_get_model_info_bedrock_models

* Fix test_get_model_info_bedrock_models

* Fix remaining tests

* Fix mypy issues

* Fix tests

* Fix merge conflicts

* Fix code qa

* Fix code qa

* Fix code qa

* Fix greptile review

---------

Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Josh <36064836+J-Byron@users.noreply.github.com>
Co-authored-by: mubashir1osmani <mubashir.osmani777@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Alperen Kömürcü <alperen.koemuercue@sap.com>
Co-authored-by: Vasilisa Parshikova <vasilisa.parshikova@sap.com>
Co-authored-by: Lin Xu <lin.xu03@sap.com>
Co-authored-by: Mark McDonald <macd@google.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
2026-04-08 21:37:10 -07:00
Kedar Thakkar 233870d7b2 Add Ramp as a built-in generic API callback with docs (#23769) 2026-04-08 20:06:48 -07:00
Sameer Kankute 65829f79d7 docs: document LITELLM_MCP_STDIO_EXTRA_COMMANDS in env reference
Required by tests/documentation_tests/test_env_keys.py for os.getenv usage in constants.

Made-with: Cursor
2026-04-08 21:31:51 +05:30
yuneng-jiang 096893ea97 Merge pull request #25273 from BerriAI/litellm_pin_cosign_pub_to_commit
[Infra] Pin cosign.pub verification to initial commit hash
2026-04-07 15:40:46 -07:00
milan-berri bf8b615b64 fix(auth): support selective jwt override oauth2 routing (#25252)
Allow JWT tokens matching routing_overrides to use OAuth2 introspection without enabling global OAuth2 while keeping OAuth2 routing limited to LLM/info routes. Add regression coverage for management-route boundary and tighten opaque-token assertions; update docs to reflect selective-mode route scope.

Made-with: Cursor
2026-04-07 13:52:47 -07:00
Yuneng Jiang ce75fde727 Merge remote main into litellm_pin_cosign_pub_to_commit 2026-04-07 10:27:00 -07:00
Yuneng Jiang 30565581be [Infra] Pin cosign.pub verification to initial commit hash
Pin all cosign public key references to the immutable commit hash
(0112e53) that first introduced the key, instead of fetching it from
the release tag. This addresses the concern that an attacker with push
access could replace the key on main/tags and re-sign tampered images.

Docs now show two verification methods: commit hash (recommended) and
release tag (convenience), with explanation of why the hash is stronger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:53:23 -07:00
ishaan-berri 7a9a9f0c79 fix: batch-limit stale managed object cleanup to prevent 300K row UPD… (#25258)
* fix: batch-limit stale managed object cleanup to prevent 300K row UPDATE (#25257)

* Add STALE_OBJECT_CLEANUP_BATCH_SIZE constant

Configurable batch limit (default 1000) for stale managed object cleanup,
preventing unbounded UPDATE queries from hitting 300K+ rows at once.

* Batch-limit stale managed object cleanup with single bounded SQL query

Two fixes to _cleanup_stale_managed_objects:

1. Replace unbounded update_many with a single execute_raw using a
   subquery LIMIT, capping each poll cycle to STALE_OBJECT_CLEANUP_BATCH_SIZE
   rows. Zero rows loaded into Python memory — everything stays in Postgres.
   Uses the same PostgreSQL raw-SQL pattern as spend_log_cleanup.py
   (the proxy requires PostgreSQL per schema.prisma).

2. Extract _expire_stale_rows as a separate method for testability.

Keeps the file_purpose='response' filter to avoid incorrectly expiring
long-running batch or fine-tune jobs that legitimately exceed the
staleness cutoff.

* docs: add STALE_OBJECT_CLEANUP_BATCH_SIZE to env vars reference

* test: remove deprecated embed-english-v2.0 cohere embedding tests
2026-04-06 19:11:55 -07:00
yuneng-jiang 39c1042258 [Docs] Add cosign Docker image verification steps to security blog posts (#25122)
* docs(blog): add cosign Docker image verification instructions

Add steps for verifying Docker images with cosign to three security blog posts:
CI/CD v2, Security Townhall, and Security Update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(proxy): add cosign verification to Docker/Helm/Terraform deploy page

Add image signature verification steps to the main deployment doc so
users pulling Docker images know how to verify them with cosign.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: fixes

* Update index.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* [Docs] Scope cosign signing docs to GHCR and specify starting version

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Docs] Add starting version callout to ci_cd_v2 blog post

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-06 09:59:27 -07:00
ishaan-berri c5686b9726 [Nit] Small docs fix, fixing img + folder name (#25171)
* fix toolsets img

* docs fix
2026-04-04 18:14:32 -07:00
ishaan-berri 9088b46b90 Litellm docs 1 83 3 (#25166)
* doc fix

* docs fix

* docs fix

* doc fix

* docs

* docs fix
2026-04-04 17:54:47 -07:00
ishaan-berri 693ad49719 Litellm ishaan march23 - MCP Toolsets + GCP Caching fix (#25146) (#25155)
* Litellm ishaan march23 - MCP Toolsets + GCP Caching fix  (#25146)

* feat(mcp): MCP Toolsets — curated tool subsets from one or more MCP servers (#24335)

* feat(mcp): add LiteLLM_MCPToolsetTable and mcp_toolsets to ObjectPermissionTable

* feat(mcp): add prisma migration for MCPToolset table

* feat(mcp): add MCPToolset Python types

* feat(mcp): add toolset_db.py with CRUD helpers for MCPToolset

* feat(mcp): add toolset CRUD endpoints to mcp_management_endpoints

* fix(mcp): skip allow_all_keys servers when explicit mcp_servers permission is set (toolset scope fix)

* feat(mcp): add _apply_toolset_scope and toolset route handling in server.py

* fix(mcp): resolve toolset names in responses API before fetching tools

* feat(mcp): add mcp_toolsets field to LiteLLM_ObjectPermissionTable type

* feat(mcp): register LiteLLM_MCPToolsetTable in prisma client initialization

* feat(mcp): validate mcp_toolsets in key-vs-team permission check

* feat(mcp): register toolset routes in proxy_server.py

* feat(mcp): add MCPToolset and MCPToolsetTool TypeScript types

* feat(mcp): add fetchMCPToolsets, createMCPToolset, updateMCPToolset, deleteMCPToolset API functions

* feat(mcp): add useMCPToolsets React Query hook

* feat(mcp): add toolsets (purple) as third option type in MCPServerSelector

* feat(mcp): extract toolsets from combined MCP field in key form

* feat(mcp): extract toolsets from combined MCP field in team form

* feat(mcp): show toolsets section in MCPServerPermissions read view

* feat(mcp): pass mcp_toolsets through object_permissions_view

* feat(mcp): add MCPToolsetsTab component for creating and managing toolsets

* feat(mcp): add Toolsets tab to mcp_servers.tsx

* feat(mcp): pass mcpToolsets to playground chat and responses API calls

* feat(mcp): generate correct server_url for toolsets in playground API calls

* docs(mcp): add MCP Toolsets documentation

* docs(mcp): add mcp_toolsets to sidebar

* fix(mcp): replace x-mcp-toolset-id header with ContextVar to prevent client forgery

* fix(mcp): use ContextVar + StreamingResponse for toolset MCP routes (fixes SSE streaming)

* fix(mcp): cache toolset permission lookups to avoid per-request DB calls

* test(mcp): add tests for toolset scope enforcement, ContextVar isolation, and access control

* fix(mcp): cache toolset name lookups in MCPServerManager to avoid per-request DB calls

* fix(mcp): prevent body_iter deadlock + use cached toolset lookup in responses API

- _stream_mcp_asgi_response: add done callback to handler_task that puts
  the EOF sentinel on body_queue when the task exits, preventing body_iter
  from hanging forever if the handler raises after headers are sent.
- litellm_proxy_mcp_handler: replace raw get_mcp_toolset_by_name() DB call
  with global_mcp_server_manager.get_toolset_by_name_cached() so toolset
  resolution uses the 60s TTL cache added for this purpose instead of
  hitting the DB on every responses-API request.

* fix(mcp): toolset access control, asyncio fix, and real unit tests

- server.py: _apply_toolset_scope now enforces that non-admin keys must
  have the requested toolset_id in their mcp_toolsets grant list;
  admin keys always bypass the check.
- mcp_management_endpoints.py: three access-control fixes:
  * fetch_mcp_toolsets: non-admin keys with mcp_toolsets=None now
    return [] instead of all toolsets (only admins get 'all' when
    the field is absent)
  * fetch_mcp_toolset: non-admin keys that haven't been granted the
    requested toolset_id now get 403 instead of the full result
  * add_mcp_toolset: duplicate toolset_name now returns 409 Conflict
    instead of an opaque 500
- proxy_server.py: use asyncio.get_running_loop() instead of
  get_event_loop() inside an already-running coroutine (Python 3.10+).
- test_mcp_toolset_scope.py: replace four hollow tests that only
  asserted local variable properties with real tests that call the
  production fetch_mcp_toolsets() and handle_streamable_http_mcp()
  functions with mocked dependencies.

* fix(mcp): add mcp_toolsets to ObjectPermissionBase, fix multi-toolset overwrite, fix delete 404, allow standalone key toolsets

* fix(mcp): add auth check on toolset resolution in responses API; union mcp_servers in _merge_toolset_permissions

* fix(mcp): handle RecordNotFoundError in update_mcp_toolset; union direct servers with toolset servers

* fix(mcp): use _user_has_admin_view; deny None mcp_toolsets for non-admin; use direct RecordNotFoundError import; fix docstring

* fix(mcp): add @default(now()) to MCPToolsetTable.updated_at; fix test for non-admin toolset access

* fix: use UniqueViolationError import; guard _ensure_eof for error/cancel only

* fix(mcp): preserve mcp_access_groups in toolset scope, use shared Redis cache for toolset perms

- Remove mcp_access_groups=[] from _apply_toolset_scope (server.py) and the
  responses API toolset path (litellm_proxy_mcp_handler.py). A key's access-group
  grants remain valid even when the request is scoped to a single toolset; clearing
  them silently revoked legitimate entitlements.

- Switch resolve_toolset_tool_permissions and get_toolset_by_name_cached to use
  user_api_key_cache (Redis-backed DualCache in production) instead of per-instance
  in-memory dicts. Cache entries are now shared across workers, eliminating the
  per-worker stale-toolset-permission window flagged as a P1 by Greptile.

- Use union merge (set union of tool names per server) when applying toolset
  permissions in the responses API path so direct-server tool restrictions are not
  overwritten by toolset permissions.

* fix(mcp): return 404 when edit_mcp_toolset target does not exist

* fix(mcp): align mcp_toolsets default to None in LiteLLM_ObjectPermissionTable

* fix(mcp): admin toolset visibility, in-place tool name mutation, test helper coercion

* fix(mcp): treat None/[] team mcp_toolsets as no restriction in key validation

* fix(mcp): allow_all_keys backward compat, blocked_tools API write-path, efficient startup query

* fix(mcp): use _mcp_active_toolset_id ContextVar to detect toolset scope, avoiding DB-default false-positive

* fix(mcp): remove dead toolset cache stubs, log invalidation failures, align schema updated_at defaults

* fix(mcp): deserialise MCPToolset from Redis cache hit, replace fastapi import in test

* fix(mcp): evict name-cache on toolset mutation, 409 on rename conflict, warning-level list errors

* fix(redis): regenerate GCP IAM token per connection for async cluster (#24426)

* fix(redis): regenerate GCP IAM token per connection for async cluster clients

Async RedisCluster was generating the IAM token once at startup and
storing it as a static password. After the 1-hour GCP token TTL, any
new connection (including to newly-discovered cluster nodes) would fail
to authenticate.

Fix: introduce GCPIAMCredentialProvider that implements redis-py's
CredentialProvider protocol. It calls _generate_gcp_iam_access_token()
on every new connection, matching what the sync redis_connect_func
already does. async_redis.RedisCluster accepts a credential_provider
kwarg which is invoked per-connection.

* refactor(redis): move GCPIAMCredentialProvider to its own file

Extract GCPIAMCredentialProvider and _generate_gcp_iam_access_token
into litellm/_redis_credential_provider.py. _redis.py imports them
from there, keeping the public API unchanged.

* fix: address Greptile review issues

- GCPIAMCredentialProvider now inherits from redis.credentials.CredentialProvider
  so redis-py's async path calls get_credentials_async() properly
- move _redis_credential_provider import to top of _redis.py (PEP 8)
- remove dead else-branch that silently no-oped (gcp_service_account from
  redis_kwargs.get() was always None since it's popped by _get_redis_client_logic)
- remove mid-function 'from litellm import get_secret_str' inline import
- remove unused 'call' import from test_redis.py

* chore: retrigger CI/review

* chore: sync schema.prisma copies from root

* chore: sync schema.prisma copies from root

* fix(proxy_server): use bounded asyncio.Queue with maxsize to prevent unbounded growth

* fix(a2a/pydantic_ai): make api_base Optional to match base class signature

* fix(a2a/pydantic_ai): make api_base Optional in handler and guard against None

* fix(mcp): remove unused get_all_mcp_servers import

* fix(mcp): remove unused MCPToolset import

* refactor(mcp): extract toolset permission logic to reduce statement count below PLR0915 limit

* fix(tests): update reload_servers_from_database tests to mock prisma directly

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(toolset_db): lazy-import prisma to avoid ImportError when prisma not installed

* fix(tests): update UI tests for toolset tab and updated empty state text

* fix(tests): add get_mcp_server_by_name to fake_manager stub

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-04 16:23:21 -07:00
ishaan-berri 51876292a0 Litellm ishaan april4 2 (#25150)
* feat(router): integrate allowed_fails_policy into health check failures (#24988)

* feat(router): integrate allowed_fails_policy into health check failures

Health check failures now increment the same per-deployment failure
counters used by allowed_fails_policy, so users can control how many
health check failures of each error type are required before a
deployment enters cooldown.

- ahealth_check() preserves the original exception in its return dict
- run_with_timeout() returns a litellm.Timeout on health check timeout
- _perform_health_check() propagates exceptions to unhealthy endpoints
- _write_health_state_to_router_cache() calls _set_cooldown_deployments
  for each unhealthy endpoint that has an exception
- When allowed_fails_policy is set, the binary health check filter is
  bypassed so cooldown is the sole routing exclusion mechanism
- Safety net: if all deployments are in cooldown with
  enable_health_check_routing=True, the cooldown filter is bypassed

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* feat(router): add health_check_ignore_transient_errors flag

When enabled, health check failures with 429 (rate limit) or 408 (timeout)
status codes are skipped from the cooldown pipeline. These are transient
load issues, not broken deployments. Auth errors (401), 404, and 5xx errors
still increment counters and trigger cooldown as before.

Config (general_settings):
  health_check_ignore_transient_errors: true

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(router): also exclude 429/408 from health state cache when ignore_transient_errors set

The previous fix only skipped cooldown counter increments. The health state
cache was still marking 429/408 endpoints as is_healthy=False, causing the
binary health check filter to exclude them from routing.

Now, when health_check_ignore_transient_errors=True, 429/408 endpoints are
also excluded from the unhealthy list passed to build_deployment_health_states(),
so the binary filter treats them as unaffected (not unhealthy).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(router): add health check driven routing guide

New standalone page covering the full health check routing feature:
allowed_fails_policy integration, health_check_ignore_transient_errors,
architecture SVG, step-by-step setup, and gotchas (TTL, AllowedFails semantics).

Replaces the inline section in health.md with a link to the new page.
Added to the Routing & Load Balancing sidebar.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): fix three CI failures

- Add "exception" to ILLEGAL_DISPLAY_PARAMS in health_check.py so the
  exception object is stripped before the health endpoint serializes
  results to JSON (fixes TypeError: 'URL' object is not iterable)
- Add allowed_fails_policy = None to FakeRouter stubs in
  test_router_health_check_routing.py (fixes AttributeError)
- Add health_check_ignore_transient_errors to config_settings.md router
  settings reference table (fixes documentation test)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix litellm/tests/proxy_unit_tests/test_proxy_server.py

* fix(router): address greptile review comments

- Narrow cooldown safety-net bypass: only fires when allowed_fails_policy
  is set (cooldown is health-check driven). Without a policy, cooldowns
  are from real request failures and must not be bypassed.
- Restore cooldown deployments DEBUG log that was accidentally removed.
- Fix test_health TypeError: move exception extraction to a separate
  exceptions_by_model_id dict returned alongside endpoints, so exception
  objects never appear in the endpoint dicts that get JSON-serialized
  by the /health response.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): properly isolate exceptions from health response

Return exceptions_by_model_id as a separate third value from
_perform_health_check / perform_health_check so exception objects
(which contain non-JSON-serializable httpx URL types) never appear
in the endpoint dicts that get serialized by the /health response.

Callers updated: _health_endpoints.py, shared_health_check_manager.py,
proxy_server.py background loop. All use the exceptions dict only for
cooldown integration, not for display.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(shared-health-check): fix remaining 2-value return sites and update type annotation

* fix(health-check-routing): fix P0 cooldown integration never firing

The cooldown loop was reading endpoint.get("exception") which is always
None because exceptions are now returned via exceptions_by_model_id, not
stored in endpoint dicts. Fixed to use _exceptions.get(model_id).

Also fixes the transient-error filter to use _exceptions instead of
endpoint.get("exception"), and fixes all remaining 2-value return sites
in shared_health_check_manager.py. Tests updated to pass exceptions via
exceptions_by_model_id parameter instead of endpoint dicts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): fix P1 transient-error filter broken on cache hits

When SharedHealthCheckManager returns cached results, exceptions_by_model_id
is always {} so the transient-error filter defaulted to status 500 for all
endpoints, incorrectly marking 429/408 endpoints as unhealthy.

Fix: store integer exception_status on each unhealthy endpoint dict in
_perform_health_check. _get_endpoint_exception_status() uses the live
exception object when available (direct path) and falls back to the stored
integer (cache-hit path). The integer is JSON-serializable and survives
the shared cache round-trip.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(health-check-routing): gate cooldown loop behind allowed_fails_policy

Without the policy, cooldown is not the routing exclusion mechanism.
Firing _set_cooldown_deployments for all enable_health_check_routing users
was a backwards-incompatible change — 401s would immediately cooldown
deployments that the binary filter would have recovered on the next cycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* revert: undo allowed_fails_policy gate on cooldown loop

Cooldown integration via health checks is intentional for all
enable_health_check_routing users, not just those with allowed_fails_policy.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(docs+tests): fix health_check_ignore_transient_errors doc section and test coverage

- Move health_check_ignore_transient_errors from router_settings to
  general_settings in config_settings.md (code reads it from general_settings)
- Remove duplicate enable_health_check_routing / health_check_staleness_threshold
  entries that were incorrectly listed under router_settings
- Replace TestHealthCheckEndpointExceptionPropagation tests with ones that
  exercise the real _perform_health_check code path via mocked ahealth_check,
  verifying exceptions appear in exceptions_by_model_id and NOT in endpoint dicts

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(tests+docs): fix tuple unpacking and docs test failures

- Update test mocks that return (healthy, unhealthy) to return
  (healthy, unhealthy, {}) to match the new 3-value signature
- Update test unpackings of perform_shared_health_check to use
  healthy, unhealthy, _ = ...
- Add health_check_ignore_transient_errors to router_settings section
  in config_settings.md (it is a Router constructor param, so the doc
  test requires it there; it also lives in general_settings for proxy use)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Fix CodeQL errors

* fix(tests): fix 2-value unpackings of _perform_health_check in test_health_check.py

* fix(tests): fix mock _perform_health_check returning 2-tuple instead of 3

* fix team routing

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: add distributed lock for key rotation job (#23364)

* fix: add distributed lock for key rotation job

* fix: address Greptile review feedback on key rotation lock (#23834)

* fix: address Greptile review feedback on key rotation lock

* fix req changes greptile

* feat(proxy): Optional on_error for guardrail pipeline (API / technical failures) (#24831)

* guardrails fallback

* docs

* docs: add LITELLM_KEY_ROTATION_LOCK_TTL_SECONDS to environment variables reference

* fix(mypy): accept Union[Dict, Any] in _get_deployment_order and use typed list to fix min() type error

* fix(mypy): use Optional[str] for api_base in PydanticAI provider to match superclass signature

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Harshit Jain <48647625+Harshit28j@users.noreply.github.com>
Co-authored-by: Shivam Rawat <shivam@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
2026-04-04 23:09:42 +00:00
ishaan-berri b53cfe729a Litellm ishaan march30 (#24887) (#25151)
* fix(pricing): add unversioned vertex_ai/claude-haiku-4-5 entry

Missing unversioned entry causes cost tracking to return $0.00 for
all requests using vertex_ai/claude-haiku-4-5. All other Vertex AI
Claude models have both versioned and unversioned entries.

* fix(router): skip misleading tags error when no candidates (e.g. cooldown)

Return early from get_deployments_for_tag when healthy_deployments is empty so
tag-based routing does not raise no_deployments_with_tag_routing after cooldown
filters all deployments. Adds regression test.

Made-with: Cursor

* feat(oci): add embedding support and update model catalog

- Add OCIEmbeddingConfig for OCI GenAI embedding models
- Add 16 new chat models (Cohere, Meta Llama, xAI Grok, Google Gemini)
- Add 8 embedding models (Cohere embed v3.0, v4.0)
- Update documentation with embedding examples
- Update pricing for all new models



* test(oci): add unit tests for OCI embedding support

- 17 unit tests covering OCIEmbeddingConfig
- Tests for URL generation, param mapping, request/response transform
- Tests for model pricing JSON completeness



* style(oci): format with black and ruff

* fix(oci): correct embedding request body format

OCI embedText API expects inputs, truncate, and inputType at the
top level of the request body, not nested under embedTextDetails.
Fixed transformation and updated tests accordingly.

Verified with real OCI API: 3/3 embedding models working.

* docs: clarify tag routing early return and test intent

Made-with: Cursor

* fix(oci): address code review findings from Greptile

- P1: Fix signing URL mismatch with custom api_base by accepting
  api_base parameter in transform_embedding_request
- P2: Remove encoding_format from supported params (OCI does not
  support it, was silently dropped)
- P2: Raise ValueError for token-array inputs instead of silently
  converting to string representation
- Add test for token-list rejection

* fix(mcp): add STS AssumeRole support for MCP SigV4 authentication

MCPSigV4Auth only supported static AWS credentials or the boto3 default
credential chain. Production Kubernetes environments typically authenticate
via IAM role assumption (sts:AssumeRole), which was not possible.

Add aws_role_name and aws_session_name parameters to the MCP SigV4 auth
stack. When aws_role_name is provided, MCPSigV4Auth calls sts:AssumeRole
to obtain temporary credentials before signing requests. Explicit keys,
if also provided, are used as the source identity for the STS call;
otherwise ambient credentials (pod role, instance profile) are used.

* fix: stop logging credential values and add missing redaction patterns

Replaces raw credential values in debug/error log messages with
boolean presence checks or type names. Adds PEM block, GCP token,
JWT, SAS token, and service-account blob patterns to the redaction
filter. Fixes private_key pattern to capture full PEM blocks instead
of stopping at the first whitespace.

Addresses: Vertex AI credential JSON (including RSA private key)
being logged to stderr on health check failures.

* fix: log only field names for UserAPIKeyAuth, not full object

* style: apply black formatting to experimental_mcp_client/client.py

* style: fix black/isort formatting and mypy error in proxy_server.py

- Fix black formatting in experimental_mcp_client/client.py (done in prev commit)
- Fix black/isort formatting in key_management_endpoints.py, proxy_server.py, transformation.py
- Fix mypy: iterate over optional list safely (access_group_ids or []) in proxy_server.py

* fix(test): patch check_migration.verbose_logger directly to fix xdist ordering issue

When test_proxy_cli.py tests run before test_check_migration.py in the same
xdist worker, litellm.proxy.db.check_migration is already in sys.modules.
Patching litellm._logging.verbose_logger has no effect on the already-bound
reference. Patch the correct target (check_migration.verbose_logger) and
import the module before patching so the order doesn't matter.

* fix(mypy): make api_base Optional in PydanticAIProviderConfig to match base class signature

---------

Co-authored-by: Ihsan Soydemir <soydemir.ihsan@gmail.com>
Co-authored-by: Milan <milan@berri.ai>
Co-authored-by: Daniel Gandolfi <danielgandolfi@gmail.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: user <70670632+stuxf@users.noreply.github.com>
Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>
2026-04-04 14:44:07 -07:00
ryan-crabbe-berri eb780a85bb Merge pull request #25032 from BerriAI/litellm_docs-default-team-params
docs: document default_team_params in config reference
2026-04-03 16:07:46 -07:00
ishaan-berri c6aa3ea452 Litellm ishaan april1 try2 (#25110)
* Litellm ishaan april1 (#25103)

* fix(proxy): enforce upperbound key params on key/update and add custom_key_update hook

The /key/update endpoint did not enforce upperbound_key_generate_params,
allowing users to bypass configured limits (tpm_limit, rpm_limit,
max_budget, duration, budget_duration) by updating an existing key
instead of generating a new one.

Extract the upperbound enforcement logic from _common_key_generation_helper()
into a standalone _enforce_upperbound_key_params() function and call it from
both the generate and update paths. For updates, None values are skipped
(not filled with defaults) since they mean "don't change this field".

Also adds a custom_key_update config option and user_custom_key_update global,
mirroring the existing custom_key_generate pattern, so custom key validation
logic can fire during key updates as well.

* fix(proxy): invoke custom_key_update hook in bulk update path

The user_custom_key_update hook was only called in update_key_fn
(single key update) but not in _process_single_key_update (bulk
update path), allowing custom validation to be bypassed via the
/key/update/bulk endpoint. Mirror the hook invocation in both paths.

* fix(proxy): pass UpdateKeyRequest to hook in bulk path, not BulkUpdateKeyRequestItem

Move the custom_key_update hook invocation to after UpdateKeyRequest
is constructed so the hook receives the same type in both single and
bulk update paths. Previously the bulk path passed
BulkUpdateKeyRequestItem (5 fields only), which would cause
AttributeError for hooks accessing fields like tpm_limit or models.

* fix(bedrock): promote cache usage to message_delta for Claude Code (#24850)

Ensure Bedrock/Anthropic-compatible streaming exposes cache usage where Claude Code reads it by promoting message_stop usage onto message_delta and preserving usage fields in fake-streamed message_delta events.

Made-with: Cursor

* fix(search): Support self-hosted Firecrawl response format in search transform (#24866)

The `transform_search_response` method only handled Firecrawl Cloud (v2)
response format where `data` is a dict with `web`/`news` keys. Self-hosted
Firecrawl (v1) returns `data` as a flat list of result objects, causing an
`AttributeError: 'list' object has no attribute 'get'`.

Detect the response format by checking if `data` is a list (self-hosted)
or dict (cloud) and handle both cases.

Cloud format:  {"data": {"web": [...], "news": [...]}}
Self-hosted:   {"success": true, "data": [{"url": "...", "title": "...", ...}]}

Co-authored-by: Synergy <synergyoclaw@gmail.com>

* feat: add environment and user tracking to prompt management (#24855)

* feat: add environment and user tracking to prompt management

- Add environment (development/staging/production) and created_by columns to LiteLLM_PromptTable
- Update unique constraint to [prompt_id, version, environment]
- All CRUD endpoints support environment filtering and user tracking
- Redesigned prompt detail page with environment tabs and version history
- UI: environment filter on list page, environment selector in editor
- 8 new tests for environment and user tracking

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: Black formatting and add environments to PromptInfoResponse TypeScript type

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address Greptile review findings

- P1: delete_prompt scopes in-memory cleanup to environment when provided
- P2: dotprompt_content parsed directly regardless of environment flag
- P2: use distinct for environments query
- P2: fix double-fetch on initial mount in prompt_info.tsx
- fix: remove unsupported select kwarg from find_many

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: address remaining Greptile review comments

- Remove unused useCallback import (index.tsx)
- Remove unused ENV_COLORS variable (prompt_info.tsx)
- P1: in-memory fallback in get_prompt_versions now respects environment filter
- P1: reset selectedEnv when promptId changes to avoid stale state
- Cyclic imports are pre-existing pattern, not introduced by this PR

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: scope patch_prompt to environment using primary key

- Add environment query param to patch_prompt endpoint
- Look up target row by composite key (prompt_id + version + environment)
- Update by primary key (id) to target exactly one row
- Fixes Greptile finding: patch with multiple environments no longer ambiguous

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: use actual start_time for failed request spend logs (#24906)

async_post_call_failure_hook set both start_time and end_time to
datetime.now(), making all failed requests show duration=0. Use the
actual start_time from litellm_logging_obj instead, so spend logs
reflect the real request duration on timeout and other failures.

Fixes #24888

* feat(bedrock): add nova canvas image edit support (#24869)

* feat(bedrock): add nova canvas image edit support

* fix(bedrock): support PathLike inputs for nova image edit

* chore: sync schema.prisma copies from root

* fix(mypy): correct type-ignore code for delta_usage arg-type

* fix(mypy): cast status_code to str, suppress intentional str yield

* fix(lint): extract _create_content_block_chunks to fix PLR0915

* fix(lint): extract helpers to fix PLR0915 in prompt endpoints

---------

Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: redhelix <amin.lalji@gmail.com>
Co-authored-by: Synergy <synergyoclaw@gmail.com>
Co-authored-by: Talha Anwar <37379131+talhaanwarch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: madhu19991 <madhu@thunkai.com>
Co-authored-by: Srikanth @adobe <devarakondasrikanth@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>

* fix(test): update model armor streaming test to handle string or int error code

---------

Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>
Co-authored-by: redhelix <amin.lalji@gmail.com>
Co-authored-by: Synergy <synergyoclaw@gmail.com>
Co-authored-by: Talha Anwar <37379131+talhaanwarch@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: madhu19991 <madhu@thunkai.com>
Co-authored-by: Srikanth @adobe <devarakondasrikanth@users.noreply.github.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2026-04-03 14:57:44 -07:00
ishaan-berri fc885af994 docs(blog): add security hardening April 2026 post (#25101) (#25102) 2026-04-03 13:06:14 -07:00
yuneng-jiang 3604b600d3 [Infra] Merge internal dev branch with main (#25036)
* fix(proxy): enforce key-level model allowlist for custom auth

custom_auth_run_common_checks only runs common_checks (team/user/project model checks).
Custom auth now also enforces key-level model restrictions via can_key_call_model.

Move the custom-auth key-access regression tests to test_user_api_key_auth.py and keep test_custom_auth_end_user_budget.py focused on end-user budget behavior.

Made-with: Cursor

* fix(proxy): gate custom-auth key model checks behind opt-in

Keep key-level model allowlist enforcement in custom auth behind `custom_auth_run_common_checks` to preserve backwards compatibility, and update tests to verify default non-enforcement and opt-in enforcement behavior.

Made-with: Cursor

* test(proxy): isolate custom auth default check from shared settings state

Patch `proxy_server.general_settings` to an empty dict in the default custom-auth key-access test so it remains deterministic under shared module state.

Made-with: Cursor

* test(proxy): strengthen custom auth post-check assertions

Tighten custom auth regression tests by asserting exact can_key_call_model args and remove an unused common_checks mock from the default behavior path.

Made-with: Cursor

* fix(agentcore): parse A2A JSON-RPC responses in AgentCore provider

* fix(prompt-templates): ensure_alternating_roles handles tool-call chains

* feat(auth): add JWT claim routing overrides for OAuth2 validation

Made-with: Cursor

* docs(auth): document JWT-to-OAuth2 routing overrides

Add generic docs for running JWT and OAuth2 together, including routing_overrides YAML examples and list-based selector behavior for iss/client_id/aud.

Made-with: Cursor

---------

Co-authored-by: Milan <milan@berri.ai>
Co-authored-by: michelligabriele <gabriele.michelli@icloud.com>
2026-04-02 16:38:01 -07:00
Ryan Crabbe c19a63e2bf docs: clarify that models sub-field only applies to SSO auto-created teams 2026-04-02 16:00:20 -07:00
Ryan Crabbe 59b09102b9 docs: add default_team_params to config reference and update examples
- Add default_team_params to litellm_settings reference table in
  config_settings.md with all sub-fields documented
- Update self_serve.md and msft_sso.md examples to include
  team_member_permissions, tpm_limit, and rpm_limit
- Fix misleading comment that implied default_team_params only applies
  to SSO auto-created teams — it applies to all /team/new calls
2026-04-02 15:51:28 -07:00
Krrish Dholakia 06df8edf92 docs: cleanup (#25026) 2026-04-02 15:18:24 -07:00
Krrish Dholakia cae8613660 Announce April Townhall (#25021)
* fix: replace hardcoded url

* docs: announce april townhall
2026-04-02 14:10:49 -07:00
yuneng-jiang 068e6e2a9e Merge pull request #24951 from BerriAI/litellm_remove_neon_cli
[Fix] Remove Neon CLI and Pin All JS Dependencies
2026-04-02 12:47:46 -07:00
David Chen d1df4e838b Litellm fix update bedrock models (#24947)
* update bedrock models in tests

* updated more tests and model_prices_and_context_window

* fix model id and pricing

* replace more sonnet models

* update tests

* git push

* update pricing

* flaky total cost

* monkey patch

* relax the cost change

* fix and revert some changes

* revert the pricing

* chore: move cost/pricing changes to bedrock-cost-fixes branch

* chore: split Bedrock file-api beta stripping to separate branch

Removes strip_unsupported_file_api_betas_for_bedrock_invoke from this branch;
see litellm_bedrock_invoke_strip_file_api_betas for that fix.

Made-with: Cursor
2026-04-01 19:22:54 -07:00
Yuneng Jiang 006d481025 [Fix] Remove neon CLI dependency and pin all JS dependencies
Remove @neondatabase/api-client and neonctl to address CVE-2026-25639
(axios supply chain vulnerability). Pin all JS dependencies to exact
versions across all package.json files to prevent future supply chain
attacks via semver range resolution.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 16:15:32 -07:00
ryan-crabbe-berri 2f1cfb0548 Merge pull request #24751 from BerriAI/litellm_ryan-march-28
litellm ryan march 28
2026-03-31 17:25:30 -07:00