Commit Graph

5716 Commits

Author SHA1 Message Date
Krish Dholakia 67f90254ed feat(guardrails): team-based guardrail registration and approval workflow (#22459)
* feat(guardrails): team-based guardrail registration and approval workflow

Add team-based guardrail submission system where teams can register
Generic Guardrail API guardrails for admin review. Includes:

- POST /guardrails/register endpoint for team-scoped submissions
- Admin review endpoints (list/get/approve/reject submissions)
- Team Guardrails tab in the UI dashboard
- extra_headers support for forwarding client headers to guardrail APIs
- Prisma schema migration for status, submitted_at, reviewed_at fields
- Documentation for team-based guardrails and static/dynamic headers

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(guardrails): address review feedback - SSRF, silent failure, redundant query

- Validate api_base URL scheme (http/https only) and hostname in
  register_guardrail to prevent SSRF via team submissions
- Return warning field in approve response when in-memory initialization
  fails so admins know the guardrail won't work until next sync cycle
- Eliminate redundant DB query in list_guardrail_submissions by fetching
  all team guardrails once and deriving both filtered list and summary
  counts from the single result set

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(guardrails): add pending_review status guard to reject endpoint

Prevent rejecting already-active or already-rejected guardrails, which
would create a DB/memory inconsistency (active in memory but rejected
in DB). Now mirrors the approve endpoint's status check.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 22:06:49 -08:00
ryan-crabbe 5b0238736c Add incident report: cache eviction closes in-use httpx clients (#22309) 2026-03-02 21:49:48 -08:00
Ishaan Jaff bfceb7fc3f feat(perplexity): add embedding support for pplx-embed-v1 models (#22610)
* feat: add Perplexity embedding support (pplx-embed-v1)

Add support for Perplexity AI's embedding models via the LLM HTTP handler:

Models:
- pplx-embed-v1-0.6b (1024 dims, 32K context, $0.004/1M tokens)
- pplx-embed-v1-4b (2560 dims, 32K context, $0.03/1M tokens)

Implementation:
- PerplexityEmbeddingConfig in litellm/llms/perplexity/embedding/
- Registered in ProviderConfigManager, __init__.py lazy imports, main.py dispatch
- Model pricing added to model_prices_and_context_window.json
- Supports dimensions and encoding_format parameters
- Uses base_llm_http_handler.embedding() pattern

Tests:
- 19 unit tests covering transformation, params, URLs, provider config, model info

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* docs: add Perplexity AI embeddings documentation

- Create providers/perplexity_embedding.md with SDK and proxy usage examples
- Convert Perplexity from flat doc to category in sidebars.js
- Category includes existing chat/responses doc + new embeddings doc
- Covers pplx-embed-v1-0.6b and pplx-embed-v1-4b models
- Documents supported parameters (dimensions, encoding_format)
- Includes proxy config and curl examples

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: decode Perplexity base64_int8 embeddings to OpenAI-format float arrays

Perplexity returns embeddings as base64-encoded signed int8 values by default,
not float arrays like OpenAI. This commit adds decoding in
transform_embedding_response so the proxy returns standard OpenAI-compatible
float arrays (normalized to [-1, 1]).

- Added _decode_base64_embedding() static method
- Handles both base64 strings (decoded) and float lists (passthrough)
- Added 3 new tests for base64 decoding + passthrough

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-03-02 17:37:50 -08:00
Kenan Yildirim b8befb3403 Add CrowdStrike AIDR guardrail hook (#17876)
* Add CrowdStrike AIDR guardrail hook

* fixup! use apply_guardrail event hook

* fixup! update imports

* fix(guardrails): include AI response in CrowdStrike AIDR output events

Issue:
_build_guard_input_for_response() was:
- Sending only the original user input (messages).
- Not sending the AI provider response.

This fix will:
  - Extract response.choices from the ModelResponse object and include them in guard_input payload.
  - Thus, ensure AIDR output rules receive the AI-generated content for analysis.
  - Fix and update tests.

* fix(guardrails): prevent duplicate input events in CrowdStrike AIDR guardrail

Issue:
The CrowdStrike AIDR guardrail was running on during_call hooks wihtout event_hook configured.

This fix will:
- Set event_hook to ["pre_call", "post_call"] (AIDR admins will control what policy is applied)

This change will:
- Require default_on parameter
- Prevent duplicate API calls to AIDR for the same input
- Avoid unchecked AI provider API calls on during_call hook

* docs: add CrowdStrike AIDR to the list of Guardrails under Integrations

* docs: update CrowdStrike AIDR documentation page

---------

Co-authored-by: Konstantin Lapine <konstantin.lapine@crowdstrike.com>
2026-03-02 17:26:54 -08:00
Cesar Garcia 2525d66dbe Merge pull request #22584 from BerriAI/litellm_oss_staging_02_27_2026
Litellm oss staging 02 27 2026
2026-03-02 19:05:02 -03:00
Chesars 6292c3dbdf merge: resolve conflicts with upstream/main
- anthropic.md: keep claude-opus-4-6 alias and claude-sonnet-4-6 entry
- transformation.py: take upstream's formatted effort_map with fallback
2026-03-02 18:49:24 -03:00
Shivam Rawat d5355602d5 added configurable env for mcp timeouts (#22287) 2026-03-02 13:13:41 -08:00
mubashir1osmani ea8d22753d docs: add fallback setup for virtual key with Loom video
docs: add fallback setup for virtual key with Loom video
2026-03-02 16:04:27 -05:00
mubashir1osmani e96c4fed39 Update docs/my-website/docs/tutorials/fallbacks.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-02 16:03:55 -05:00
mubashir1osmani fac29f1963 docs: add fallback setup for virtual key with Loom video
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 15:57:36 -05:00
Chesars ec16bd3509 merge: resolve conflict with upstream/main in presidio.py
Take upstream's refactored PII handling with _unmask_pii_text and
_process_response_for_pii helpers. Add missing StreamingChoices import.
2026-03-02 17:40:22 -03:00
yuneng-jiang 8053be60df Merge pull request #22182 from BerriAI/litellm_make_session_duration_configurable
[Feat] Make UI login session duration configurable via LITELLM_UI_SESSION_DURATION
2026-02-28 20:31:31 -08:00
yuneng-jiang ef9fc872af Update docs/my-website/docs/proxy/ui_project_management.md
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-28 18:15:09 -08:00
yuneng-jiang 709fd51672 [Docs] UI - Project Management: Add comprehensive UI documentation with beta notice
Add detailed UI walkthrough for Project Management feature including:
- Beta notice with link to API documentation
- Overview of projects and organizational hierarchy
- Prerequisites and setup instructions
- Separate section for enabling projects in UI settings
- Step-by-step guide for creating and managing projects
- Use cases for key organization within teams
- Next steps and related documentation links
- Proper sidebar navigation integration

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-02-28 18:10:15 -08:00
yuneng-jiang d55d199546 project docs 2026-02-28 18:07:24 -08:00
Krish Dholakia c4ca4566c0 docs: Clean up budget reset and timezones documentation (#22428)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-02-28 17:37:25 -08:00
Ishaan Jaffer 92db5a990c release notes 2026-02-28 15:51:24 -08:00
Ishaan Jaff 755ae9ed56 Litellm stability fix v2 (#22452)
* fix(test): add spend data polling + graceful skip to Gemini e2e spend tests

Same fix as test_vertex_with_spend.test.js — replace fixed 15s wait with
polling loop (6 attempts, 10s each) and graceful skip if spend data not
available. Also add jest.retryTimes(3) and increase timeout to 90s.

This is the last remaining CI failure on main (pipeline 62771).

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): add graceful skip for spend data in Anthropic passthrough test

The test_anthropic_basic_completion_with_headers fails with KeyError: 0
because the /spend/logs endpoint returns an error dict (auth error) instead
of a list. When dict[0] is accessed, it throws KeyError.

Fix: Check if spend_data is actually a list with valid entries before
asserting. Skip spend assertions gracefully if data unavailable.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): resolve 4 CI test failures

1. Add CURSOR_API_BASE to environment variables reference in config_settings.md
2. Fix test_sse_mcp_handler_mock by mocking extract_mcp_auth_context and
   set_auth_context so the handler reaches sse_session_manager.handle_request
3. Change test_async_increment_tokens_with_ttl_preservation flaky decorator
   from reruns=3 to retries=3,delay=2 for better intermittent failure handling
4. Add app.dependency_overrides for user_api_key_auth in test_mock_create_audio_file
   to bypass authentication (same pattern as test_target_storage_invokes_storage_backend)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-02-28 15:29:45 -08:00
Ishaan Jaff e3756252a8 Development environment setup (#22432)
* feat: add Cursor Cloud Agents as a native pass-through provider

- Add CURSOR to LlmProviders enum
- Add /cursor/{endpoint:path} pass-through route with Basic Auth
- Add /cursor to mapped_pass_through_routes for proper routing
- Create CursorPassthroughLoggingHandler for Logs page visibility
  - Classifies operations (agent:create, agent:list, models:list, etc.)
  - Logs model as cursor/cursor:<operation> for clean Logs display
  - Tracks cost as $0 (subscription-based, no per-request pricing)
- Add Cursor to UI: provider enum, logo, credential fields
- Add provider_create_fields.json entry for LLM Credentials UI
- Add 18 unit tests covering route, auth, logging, and classification

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: use correct Cursor logo from lobehub, add documentation page

- Replace placeholder Cursor logo with official hexagonal logo from lobehub
- Add docs/pass_through/cursor.md with full tutorial matching a2a_cost_tracking style
  - Quick Start: add creds on UI, start proxy, launch agent, view logs
  - Examples: all Cursor Cloud Agents API endpoints
  - Advanced: virtual key usage
  - Screenshots: credential form, logs page, log detail view
- Add Cursor to sidebars.js under Pass-through Endpoints
- Add screenshots to docs/my-website/img/

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* docs: simplify Cursor doc - UI-only flow, no config.yaml needed

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: cursor pass-through reads credentials from UI (litellm.credential_list)

The pass-through route now checks litellm.credential_list as a fallback
when CURSOR_API_KEY env var is not set. This means adding credentials
via the UI (Models + Endpoints → LLM Credentials) works without any
config.yaml or environment variable setup.

Credential lookup order:
1. passthrough_endpoint_router (config.yaml with use_in_pass_through)
2. litellm.credential_list (credentials added via UI)
3. CURSOR_API_KEY environment variable

Also respects api_base from UI credentials if set.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-02-28 14:50:06 -08:00
Ishaan Jaff 29e3fd5d79 [Release Fix] (#22411)
* fix(lint): suppress PLR0915 for 3 complex methods that exceed 50-statement limit

- streaming_iterator.py: _process_event (84 statements)
- transformation.py: translate_messages_to_responses_input (51 statements)
- transformation.py: transform_realtime_response (54 statements)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): resolve type errors in public_endpoints, user_api_key_auth, common_utils, transformation

- public_endpoints.py: fix _cached_endpoints type annotation
- user_api_key_auth.py: accept Optional[str] for end_user_id parameter
- common_utils.py: add NewProjectRequest/UpdateProjectRequest to Union type
- transformation.py: add ChatCompletionRedactedThinkingBlock and list[Any] to content type

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(proxy-extras): bump version to 0.4.50 and sync schema

- Bump litellm-proxy-extras from 0.4.49 to 0.4.50
- Sync schema.prisma with main proxy schema
- Includes new LiteLLM_ClaudeCodePluginTable model
- Includes new @@index([startTime, request_id]) on SpendLogs
- Update version references in requirements.txt and pyproject.toml

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(router): use string id in test_add_deployment and add defensive str() in register_model

- Change test to use string '100' instead of int 100 for model_info.id
- Add str() conversion in register_model to prevent AttributeError on non-string keys

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(security): update minimatch to 10.2.4 to fix CVE-2026-27903 and CVE-2026-27904

- Run npm audit fix in docs/my-website
- Updates minimatch from 10.2.1 to 10.2.4 (fixes HIGH severity ReDoS vulnerabilities)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): update realtime guardrail test assertions to match actual guardrail behavior

- test_text_message_blocked_by_guardrail_no_ai_response: allow guardrail's own block
  message text in response.done (previously expected empty content)
- test_voice_transcript_blocked_by_guardrail: allow guardrail to send response.cancel
  + block message + response.create flow (previously expected no response.create)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: revert proxy-extras version in requirements.txt and pyproject.toml

The litellm-proxy-extras 0.4.50 is not published to PyPI yet, so consumer
references must stay at 0.4.49. Only the source package pyproject.toml
should be bumped to 0.4.50 for the publish_proxy_extras CI job.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: make transcript delta check optional in voice guardrail test

The guardrail sends an error event (guardrail_violation) when blocking
voice transcripts; it does not always produce transcript deltas. Remove
the assertion requiring response.audio_transcript.delta since the error
event is the primary signal that blocked content was handled.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add missing env keys to documentation: LITELLM_MAX_STREAMING_DURATION_SECONDS and LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES

These two environment variables were used in code but not documented in the
environment variables reference section of config_settings.md, causing the
test_env_keys.py CI test to fail.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Fix 13 mypy type errors across 6 files

- in_flight_requests_middleware.py: Fix type: ignore error codes from
  [union-attr] to [attr-defined], add [arg-type] for Gauge **kwargs
- transformation.py: Add [assignment] ignore for output_format reassignment,
  add fallback empty string for tool use id to fix arg-type
- responses/main.py: Remove redundant type annotation on second
  secret_fields assignment to fix no-redef
- streaming_iterator.py: Add [assignment] ignores for intermediate
  cache token assignments
- handler.py: Add [typeddict-item] ignore for AnthropicMessagesRequest
  construction from dict
- public_endpoints.py: Add [arg-type] ignore for _load_endpoints()
  return type mismatch with SupportedEndpoint model

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add auth overrides to spend tracking tests, fix realtime guardrail assertion, update UI minimatch

- Add app.dependency_overrides for user_api_key_auth in 4 spend tracking tests
  that were returning 401 Unauthorized (error_code, error_message,
  error_code_and_key_alias, key_hash)
- Fix realtime guardrail test to check ANY error event for guardrail_violation
  instead of just the first (OpenAI may send its own errors first)
- Update ui/litellm-dashboard/package-lock.json to fix minimatch vulnerability

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Fix failing MCP e2e and create_mcp_server UI tests

Test 1 (test_independent_clients_no_shared_session):
- Add allow_all_keys: true to MCP servers in test config. With master_key
  and no DB, get_allowed_mcp_servers returned empty, causing 0 tools and
  403 on tool calls. allow_all_keys bypasses per-key restrictions.
- Add asyncio.sleep(0.5) between client connections to allow MCP SDK
  TaskGroup cleanup and avoid ExceptionGroup on connection close (MCP #915).

Test 2 (create_mcp_server 'auth value is provided'):
- Use userEvent.setup({ delay: null }) for instant keystrokes to avoid
  timeout from default typing delay on CI.
- Increase per-test timeout to 15000ms for CI environments.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: stabilize proxy unit tests for parallel execution

- test_response_polling_handler: add xdist_group to prevent heavy import OOM
- test_db_schema_migration: use temp dir for worker isolation, sync schema.prisma index
- test_custom_tokenizer_bug: use lighter tokenizer to prevent OOM in parallel

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: add auth overrides to more spend tracking and model info tests

- Fix test_ui_view_spend_logs_pagination missing auth override (401)
- Fix test_view_spend_tags missing auth override (401)
- Fix test_view_spend_tags_no_database missing auth override (401)
- Fix test_empty_model_list.py to use app.dependency_overrides instead of patch()
  for FastAPI dependency injection auth

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): use patch.object for aiohttp transport test to work in parallel execution

The @patch decorator was not intercepting the static method call in parallel
xdist workers. Using patch.object on the directly-imported class is more reliable.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(security): update minimatch from 10.2.1 to 10.2.4 in Dockerfile

The Docker image was explicitly pinning minimatch@10.2.1 which has HIGH
severity ReDoS vulnerabilities (GHSA-7r86-cg39-jmmj, GHSA-23c5-xmqv-rm74).
Update to 10.2.4 which includes fixes for both CVEs.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ui): prevent MCP and TeamInfo test timeouts on CI

- Add userEvent.setup({ delay: null }) to all tests using userEvent in both files
- Add timeout: 15000 to tests with significant user interaction (typing, multiple clicks)
- Fixes: create_mcp_server Bearer Token test, TeamInfo cancel button test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: stabilize parallel test execution and aiohttp transport test

- test_aiohttp_handler: rewrite transport test to not rely on static method mock
  (consistently fails in parallel xdist workers)
- test_proxy_cli: add xdist_group to prevent timeout during heavy imports
- test_swagger_chat_completions: add xdist_group to prevent timeout

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(security): add serialize-javascript override to fix GHSA-5c6j-r48x-rmvq

Add npm override for serialize-javascript>=7.0.3 in docs/my-website
to fix HIGH severity RCE vulnerability via RegExp.flags.
Also bump minimatch override to >=10.2.4.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Fix flaky tests: remove broken Vertex model, add retries for Anthropic

- Remove vertex_ai/meta/llama-4-scout-17b-16e-instruct-maas from
  test_partner_models_httpx_streaming - consistently returns 400 BadRequest
- Add @pytest.mark.flaky(retries=6, delay=10) to test_function_call_parsing
  for transient Anthropic API overload errors
- Add @pytest.mark.flaky(retries=6, delay=10) to test_openai_stream_options_call
  for transient Anthropic InternalServerError

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): add xdist_group(proxy_heavy) to prevent OOM in parallel proxy tests

- Add pytestmark = pytest.mark.xdist_group('proxy_heavy') to test_proxy_utils.py
- Change test_db_schema_migration.py from schema_migration to proxy_heavy group
- Add @pytest.mark.xdist_group('proxy_heavy') to test_proxy_server.py::test_health

Groups heavy proxy tests to run on same worker, avoiding worker OOM crashes.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Fix vertex AI qwen global endpoint test to mock vertexai module import

The test_vertex_ai_qwen_global_endpoint_url test was failing because the
VertexAIPartnerModels.completion() method tries to 'import vertexai' before
any of the mocked code runs. In environments without google-cloud-aiplatform
installed, this import fails with a VertexAIError(status_code=400).

Fix by:
- Adding patch.dict('sys.modules', {'vertexai': MagicMock()}) to mock the
  vertexai module import
- Adding vertex_ai_location parameter to the acompletion call for completeness

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): add xdist_group to health endpoint and watsonx tests for parallel stability

- test_health_liveliness_endpoint: add xdist_group('proxy_health') to prevent timeout
- test_watsonx_gpt_oss tests: add xdist_group('watsonx_heavy') to prevent mock interference

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): pre-populate WatsonX IAM token cache to prevent parallel test interference

The watsonx prompt transformation test was failing in parallel execution because
litellm.module_level_client.post mock was being interfered with by other tests.
Pre-populating the IAM token cache avoids the HTTP call entirely.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): add spend data polling with retries for e2e pass-through tests

- test_vertex_with_spend.test.js: Replace 15s fixed wait with polling loop
  (up to 6 attempts, 10s apart) for spend data to appear in DB
- Increase test timeout from 25s to 90s to accommodate polling
- base_anthropic_messages_tool_search_test.py: Add flaky(retries=3) for
  streaming test that depends on live Anthropic API

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): reduce parallel workers from 8 to 4 for proxy tests to prevent OOM

- litellm_proxy_unit_testing_part2: -n 8 -> -n 4
- litellm_mapped_tests_proxy_part2: -n 8 -> -n 4, timeout 60 -> 120
- Worker crashes consistently caused by too many parallel proxy tests
  each loading the full FastAPI app and heavy dependency tree

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(db): add migration for SpendLogs composite index (startTime, request_id)

The @@index([startTime, request_id]) was added to schema.prisma but had no
corresponding migration. This caused test_aaaasschema_migration_check to fail
because prisma migrate diff detected the missing index.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(db): add migration for MCP available_on_public_internet default change to true

The schema.prisma changed the default for available_on_public_internet from
false to true, but no migration was created. This caused the schema migration
test to detect drift.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): increase server wait time and add retry to flaky external API tests

- test_basic_python_version.py: increase server startup wait from 60s to 90s
  for slower CI environments (fixes installing_litellm_on_python_3_13)
- test_a2a_agent.py: add flaky(retries=3, delay=5) for non-streaming test
  that depends on live A2A agent endpoint

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): add flaky retries to all intermittent external API tests for 0-fail CI

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(test): add auth overrides to file endpoint tests that return 500

The test_target_storage tests were getting 500 because the FastAPI auth
dependency wasn't overridden. Added app.dependency_overrides for proper
auth bypass in test environment.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-02-28 09:46:35 -08:00
Chesars 8a85a2cf82 Merge branch 'litellm_oss_staging_02_27_2026' of https://github.com/BerriAI/litellm into litellm_oss_staging_02_27_2026 2026-02-28 09:54:56 -03:00
Harshit Jain bfdea4227a Merge pull request #22103 from Harshit28j/litellm_feat_datadog_metrics
feat: ability to trace metrics datadog
2026-02-28 17:25:23 +05:30
Harshit28j dee2a62686 Add security vulnerability scan report to v1.81.14 release notes 2026-02-28 16:13:45 +05:30
Harshit Jain 1576033495 Merge pull request #22299 from BerriAI/litellm_health_check_tokens
Litellm health check tokens
2026-02-28 15:05:45 +05:30
Harshit28j e0168db683 add docs and formatting 2026-02-28 14:08:09 +05:30
Ishaan Jaff 15fcd90b9c feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)
* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting
2026-02-27 18:00:50 -08:00
Dylan Duan af6fe184fb docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)
* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider
2026-02-27 17:24:48 -08:00
Cesar Garcia 7d084dfb9d Merge pull request #20525 from Chesars/docs/opus-4-6-openrouter-and-1m-context
docs: add OpenRouter Opus 4.6 to model map and update Claude Opus 4.6 docs
2026-02-27 19:07:31 -03:00
Cesar Garcia 1552775166 Merge pull request #21595 from Chesars/fix/moonshot-preserve-image-url-content
fix(moonshot): preserve image_url blocks in multimodal messages
2026-02-27 17:47:02 -03:00
Noah Nistler d13508c1c5 Enable local file support for OCR (#22133)
* [Docs] Enable local file support

Implemented internal handling for converting file-type documents to the required format for OCR processing, ensuring seamless integration with various providers.

* Refactor OCR file handling and improve security checks

Removed deprecated MIME type mapping and file conversion functions, replacing them with updated implementations. Enhanced security by rejecting 'file' document types in JSON requests, ensuring file uploads are handled via multipart/form-data. Updated tests to reflect these changes and ensure proper functionality.

* Enhance MIME type validation in OCR processing

Added a regular expression check to validate MIME types in the convert_file_document_to_url_document function, raising a ValueError for invalid types. Updated tests to ensure proper error handling for unsupported MIME types.

* Enhance type safety in OCR file handling

Added type casting for the uploaded file in the _parse_multipart_form function to ensure proper handling of UploadFile instances. This change improves type safety and reduces potential runtime errors during file processing.

* Refactor MIME type handling in document uploads

Updated the MIME type extraction logic to strip parameters from the Content-Type header, ensuring only the base type is used. Added tests to verify that MIME parameters are correctly handled and stripped in various scenarios.

* Update OCR documentation for MIME type recommendations and remove unnecessary tips

Clarified the recommended usage of MIME types for raw bytes in document uploads. Simplified the documentation by removing the tip about multipart file uploads from tools like Postman, ensuring a more concise and focused guide.

* Enhance multipart form handling in OCR endpoints

Updated the _parse_multipart_form function to ignore both 'file' and 'document' fields during form parsing, ensuring that the document built from the uploaded file is not overridden. Added a new test to verify that injected document fields do not affect the constructed document, improving security and robustness of the file upload process.
2026-02-27 10:50:02 -08:00
Sameer Kankute 63c9b3a137 Merge pull request #22087 from BerriAI/litellm_fix_anthropic_responses
Add v1 for anthropic responses transformation
2026-02-27 21:18:04 +05:30
Sameer Kankute 2fa9b81e2f Add docs for opt out variable 2026-02-27 13:28:48 +05:30
Cesar Garcia c6b6e29bc2 Merge pull request #22220 from Chesars/fix/reasoning-effort-output-config-claude-46
fix(anthropic): map reasoning_effort to output_config for Claude 4.6 models
2026-02-26 17:30:54 -03:00
Chesars a8c95392fb fix(anthropic): map reasoning_effort to output_config for Claude 4.6 models
Claude 4.6 models use output_config as a stable API feature. This commit:
- Maps reasoning_effort to output_config for 4.6 models (minimal → low)
- Restricts effort="max" to Opus 4.6 only
- Skips beta header injection for 4.6 models
- Updates docs for Claude 4.6 effort support
2026-02-26 17:18:32 -03:00
shivam ffb438f3f2 fix: clarify EXPERIMENTAL_UI_LOGIN ignores LITELLM_UI_SESSION_DURATION, add regression test 2026-02-26 04:41:07 -08:00
shivam 44557261a3 greptile issue 2026-02-26 04:29:42 -08:00
shivam a7f5163976 added the env flag 2026-02-26 03:49:54 -08:00
Sameer Kankute 24fd841e83 Fix code qa 2026-02-26 13:16:09 +05:30
Ishaan Jaff f1c9cb7e71 feat(vertex_ai): Vertex AI Gemini Live via unified /realtime endpoint (#22153)
* feat(vertex_ai): add Vertex AI Gemini Live support via unified /realtime endpoint

Adds VertexAIRealtimeConfig which translates the OpenAI Realtime WebSocket
protocol to Vertex AI BidiGenerateContent. Supports voice in/voice out
(16 kHz mic → 24 kHz speaker) and text in/text out through the proxy's
/realtime endpoint.

Key changes:
- New litellm/llms/vertex_ai/realtime/transformation.py with VertexAIRealtimeConfig
  - Builds correct wss:// URL (regional + global)
  - OAuth2 Bearer token auth (not API key)
  - Full model path (projects/.../publishers/google/models/...)
  - Ignores session.update (Vertex AI only accepts one setup message)
- realtime_api/main.py: vertex_ai branch resolves OAuth token + constructs config
- llm_http_handler.py: auto-sends session setup before bidirectional_forward
- gemini/realtime/transformation.py: fix crashes on empty turnComplete events
- realtime_streaming.py: try/except guard so bad messages don't kill the loop
- proxy_server.py: add missing websockets.exceptions import

* docs: add vertex_realtime to sidebars

* fix: drop unknown event types in Gemini transform; add vertex_ai health check

* fix: propagate UUID fallback IDs from transform_content_done_event to return_additional_content_done_events

* fix: route guardrail backend sends through provider transform; fix str.strip misuse for model prefix

* fix: handle Vertex AI full resource path in session.created; route guardrail block sends through _send_to_backend

* fix: remove unused VertexBase in transformation.py; apply UUID fallback in return_additional_content_done_events
2026-02-25 22:11:06 -08:00
Ishaan Jaff 82cd14ea1d feat(realtime): guardrails support for /v1/realtime WebSocket endpoint (#22152)
* feat(realtime): add guardrails query param to /v1/realtime WebSocket endpoint

- Add 'guardrails' query param (comma-separated) to realtime_websocket_endpoint
- Import websockets and websockets.exceptions at module level (fixes NameError in except clause)
- Split try/except into Phase 1 (pre-call) and Phase 2 (routing) so guardrail
  errors send back a typed error event before closing, while upstream errors
  close silently with 1011

* feat(ui): pass selectedGuardrails from sidebar to RealtimePlayground WebSocket URL

* docs(realtime): add guardrails section with dynamic passing examples
2026-02-25 21:34:22 -08:00
Steve G 9806e21871 Add Lakera v2 post-call hook and tests (fixed PII masking) (#21783)
* Add post-call hook for Lakera guardrail and mask PII in responses

* Add post-call hook for Lakera and mask PII in responses

* Fix post-call hook: pass event_type to call_v2_guard

* Address Greptile review: return ModelResponse, fix mutation, add header, test location, mask order

- PII masking path: return ModelResponse instead of dict so deployment hook accepts it
- Avoid mutating request data: deep copy original_messages and messages in _mask_pii_in_messages
- Add guardrail header in PII-only return path
- Add test in tests/test_litellm/ (test_lakera_ai_v2.py) per PR checklist
- Sort PII payload spans by (start,end) descending so multiple spans in one message mask correctly

Co-authored-by: Cursor <cursoragent@cursor.com>

* Updated ponteital for index mismatch when choices have null content and inconsistent on_flagged access pattern

* Update litellm/proxy/guardrails/guardrail_hooks/lakera_ai_v2.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Update to explicitly state supported endpoints - chat completions

* Fix minor lint error on masked_entity_count

---------

Co-authored-by: Steve <steve.giguere@lakera.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-25 17:20:38 -08:00
Ishaan Jaff cc85fe5921 Proxy request tags docs (#22129)
* docs: document x-litellm-tags header and request body tags parameter

- Add documentation for x-litellm-tags header (comma-separated or array)
- Add documentation for tags in request body
- Clarify that dynamic tags override config tags

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* docs: consolidate tag documentation and improve cross-references

- Make request_tags.md the single source of truth for all tag options
- Add cross-reference from cost_tracking.md to request_tags.md
- Document both direct tags and metadata.tags formats
- Add key/team tag setup and custom header tracking to request_tags.md
- Reduce duplication and make navigation clearer

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* docs: use generic examples instead of specific company names

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* docs: clarify x-litellm-tags header format is comma-separated string

HTTP headers are always strings, not arrays. Remove misleading
array format documentation for the header parameter.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Update docs/my-website/docs/proxy/request_tags.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-25 14:29:00 -08:00
yuneng-jiang 7daeaf8106 [Docs] Add Credential Usage Tracking documentation
Add new document explaining automatic credential usage tracking and tagging. When models use reusable credentials, LiteLLM automatically injects a Credential: <name> tag on requests, enabling credential-level spend tracking on the Usage page with no additional configuration.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-02-25 10:52:24 -08:00
Harshit Jain cd60e3d4e0 fix: req changes 2026-02-25 22:27:51 +05:30
Harshit Jain 10e769a5e4 feat: ability to trace metrics 2026-02-25 22:03:51 +05:30
Sameer Kankute ec3ae25a3a Merge pull request #22070 from BerriAI/litellm_forward_auth_headers
[Feat]Add forward auth headers of provider
2026-02-25 18:45:38 +05:30
Harshit Jain d2aeb3e513 Merge pull request #22084 from Harshit28j/litellm_presidio-non-json-response-handling
fix(guardrails): prevent presidio crash on non-json responses
2026-02-25 16:37:39 +05:30
Harshit28j 6a8052295b fix(guardrails): prevent presidio crash on non-json responses 2026-02-25 16:11:24 +05:30
Harshit Jain 23e84eb789 fix: Metadata / Trace ID Missing in S3 Streaming Callbacks 2026-02-25 14:16:42 +05:30
Sameer Kankute 0e806c83c1 Fix docs 2026-02-25 12:13:56 +05:30