Commit Graph

27586 Commits

Author SHA1 Message Date
Derek Duenas bbaf0af907 Grayswan guardrail passthrough on flagged (#16891)
* attempt to implement the passthrough feature

* Formatting and small change

* Fix formatting

* Format test file

---------

Co-authored-by: Xiaohan Fu <xiaohan@grayswan.ai>
2025-11-21 20:01:35 -08:00
yuneng-jiang f56c7e1ef9 Change Bulk Invite User Roles to match backend (#16906) 2025-11-21 20:00:57 -08:00
Mubashir Osmani 696974bacb fix: add mcp server ids (#16904)
* fix: add mcp server ids

* revert to prev version
2025-11-21 20:00:33 -08:00
Dima-Mediator a0d4d0b304 Gemini models: capture image_tokens and support cost_per_output_image_token in costs calculations (#16912) 2025-11-21 19:59:24 -08:00
wangsoft 42c883d64f fix redis event loop closed at first call (#16913) 2025-11-21 19:15:56 -08:00
Alexsander Hamir 6e70c279f8 [Fix] - Router's Cache: Fix routing for requests with same cacheable prefix but different user messages (#16951)
* fix(router): use cacheable prefix for prompt caching cache keys

Fix issue where requests with same cacheable prefix but different user
messages were routing to different deployments, preventing cached token
reuse. The cache key now correctly includes only the cacheable prefix
(up to and including the last cache_control block) instead of the
entire messages array.

## New Functions

### extract_cacheable_prefix()
Static method that extracts the cacheable prefix from messages for
prompt caching. The cacheable prefix is defined as everything UP TO
AND INCLUDING the LAST content block (across all messages) that has
cache_control with type "ephemeral". This includes ALL blocks
before the last cacheable block (even if they don't have cache_control
themselves).

- Finds the last content block with cache_control across all messages
- Returns all messages and content blocks up to and including that
  last cacheable block
- Excludes everything after the last cacheable block (including user
  messages that come after)
- Returns empty list if no cacheable blocks are found

## Changed Functions

### get_prompt_caching_cache_key()
Modified to use the cacheable prefix instead of the full messages array
when generating cache keys. This ensures that requests with the same
cacheable prefix but different user messages generate the same cache
key, enabling proper routing to the same deployment.

- Now calls extract_cacheable_prefix() to get only cacheable content
- Returns None if no cacheable prefix is found (can't generate key)
- Cache key is now based on cacheable prefix only, not full messages

### async_get_model_id()
Completely refactored to use the cacheable prefix directly instead of
the previous workaround that checked progressively shorter message
slices. The previous implementation was inefficient and unreliable.

- Removed progressive message slicing logic (messages[:-1], messages[:-2], etc.)
- Now uses single direct cache lookup with cacheable prefix-based key
- More efficient (1 lookup instead of up to 4)
- More reliable (uses correct cache key based on cacheable prefix)
- Returns None if no cacheable prefix found

### add_model_id()
Added None check for cache_key to prevent caching when no cacheable
prefix is found. This ensures we don't attempt to cache when there's
no meaningful cache key to use.

- Added guard: returns early if cache_key is None
- Prevents attempting to cache when no cacheable prefix exists

### async_add_model_id()
Added None check for cache_key to prevent caching when no cacheable
prefix is found. Matches the behavior of add_model_id() for consistency.

- Added guard: returns early if cache_key is None
- Prevents attempting to cache when no cacheable prefix exists

### get_model_id()
Added None check for cache_key to handle cases where no cacheable
prefix is found. Ensures consistent behavior across all cache methods.

- Added guard: returns None if cache_key is None
- Prevents calling get_cache() with None key

## Test

### test_router_prompt_caching_same_cacheable_prefix_routes_to_same_deployment()
New end-to-end test that validates the fix. Tests that requests with
the same cacheable prefix (system blocks with cache_control) but
different user messages:
1. Generate the same cache key
2. Successfully perform cache lookup
3. Route to the same deployment

This test reproduces the exact scenario from the user's bug report
where three requests with different user messages should route to the
same deployment but were previously routing to different ones.

Fixes issue where cached tokens couldn't be reused because requests
were routed to different providers due to different cache keys.

* fix(router): use cast() for proper type handling in extract_cacheable_prefix

Replace type annotation with type: ignore comment with proper cast()
from typing module, matching the pattern used throughout the
codebase for creating modified AllMessageValues dictionaries.
2025-11-21 19:13:40 -08:00
yuneng-jiang b074c79734 Allow partial matches for user id in user table (#16952) 2025-11-21 19:12:16 -08:00
Alexsander Hamir cdb46f919d fix: cache SSL contexts to prevent excessive memory allocation (#16955)
Previously, get_ssl_configuration() created a new SSL context on every
call, even when the configuration was identical. This caused continuous
memory allocation from ssl.create_default_context(), especially during:
- Proxy server startup
- Background health checks
- HTTP client creation

Solution:
- Added _ssl_context_cache to cache SSL contexts by configuration
  parameters (cafile, ssl_security_level, ssl_ecdh_curve)
- Refactored SSL context creation into _create_ssl_context() helper
- Modified get_ssl_configuration() to reuse cached contexts when
  configuration matches

This significantly reduces memory allocation while maintaining backward
compatibility. SSL contexts are now reused instead of being recreated
repeatedly, eliminating the memory leak observed in memray profiling.

Fixes memory allocation issue where create_default_context was allocating
6.282MB+ continuously even without any requests.
2025-11-21 19:11:54 -08:00
Alexsander Hamir f542011076 fix: cache cooldown key (#16954)
There's no need to generate a key multiple times for the same model, cache it with a max limit.
2025-11-21 19:11:17 -08:00
yuneng-jiang 6881594632 [Fix] Exclude litellm_credential_name from Sensitive Data Masker (Updated) (#16958)
* Exclude litellm_credential_name from sensitive masker

* Adding missing file
2025-11-21 19:09:48 -08:00
Justin Tahara 703f619e08 feat(bedrock): Add Claude 4.5 to US Gov Cloud (#16957)
* feat(bedrock): Add Claude 4.5 to US Gov Cloud

* Adding west and tests
2025-11-21 19:06:26 -08:00
Mubashir Osmani db58f6aeb1 fix: arize phoenix logging (#16301)
* arize phx

* fix arize integration

* traces to specific project name

* fix

* look for http endpoint
2025-11-21 18:46:18 -08:00
yuneng-jiang eb48d5cc42 Revert "Exclude litellm_credential_name from sensitive masker (#16950)" (#16956)
This reverts commit 5cfacb96e6.
2025-11-21 18:09:54 -08:00
Ishaan Jaffer 58a56babd9 test fixes for masker 2025-11-21 17:41:35 -08:00
Ishaan Jaffer 1f36fad94b TestDockerModelRunnerIntegration 2025-11-21 17:39:33 -08:00
Ishaan Jaffer 3296ffd3ca test fixes 2025-11-21 17:38:20 -08:00
Ishaan Jaff 34f0c3c4dc Remove cost tracking disabled tooltip in chat ui (#16953)
* Fix: Simplify cost tooltip in ResponseMetrics

Co-authored-by: ishaan <ishaan@berri.ai>

* Fix: Display cost metric correctly in ResponseMetrics

Co-authored-by: ishaan <ishaan@berri.ai>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ishaan <ishaan@berri.ai>
2025-11-21 17:25:37 -08:00
Ishaan Jaffer 8b8b31ecd8 fix img gen 2025-11-21 17:18:48 -08:00
Ishaan Jaffer 6439aed3ac snowflake test fix 2025-11-21 17:12:55 -08:00
Ishaan Jaffer e7a32c1e8f docker test fixes 2025-11-21 16:52:58 -08:00
Ishaan Jaffer 473fec8a60 fix _get_allowed_mcp_servers 2025-11-21 16:48:43 -08:00
yuneng-jiang 1ebe1fea37 Docs for Model Compare UI and Org Usage (#16928)
* Docs for Model Compare UI and Org Usage

* Fix typo in img path and add Model Compare to sidebars.js

* Updated to remove from 1.80 writeup
2025-11-21 16:45:55 -08:00
yuneng-jiang 49e331329b Remove console logs and errors from model tab (#16455) 2025-11-21 16:45:11 -08:00
yuneng-jiang 5cfacb96e6 Exclude litellm_credential_name from sensitive masker (#16950) 2025-11-21 16:40:17 -08:00
Ishaan Jaffer 9b5a655b7c fix _encode_tool_call_id_with_signature 2025-11-21 16:27:35 -08:00
Ishaan Jaffer 69da15e65e test_fal_ai_image_generation_basic 2025-11-21 16:23:41 -08:00
Ishaan Jaffer 4a9f163db1 TestPromptVersionsEndpoint 2025-11-21 16:21:13 -08:00
Ishaan Jaffer 4e8f1d0143 fix prompt manager 2025-11-21 16:17:44 -08:00
Ishaan Jaffer 2226450437 test_ensure_initialize_azure_sdk_client_always_used 2025-11-21 16:15:35 -08:00
Ishaan Jaffer 4205b7caeb fix install litellm 2025-11-21 16:15:09 -08:00
yuneng-jiang 0abfb07ab8 Remove UI Session Token from user/info return (#16851) 2025-11-21 16:11:58 -08:00
Ishaan Jaff 8e318dd06c [Feat] New LLM Provider - Docker Model Runner (#16948)
* add DOCKER_MODEL_RUNNER

* add DockerModelRunnerChatConfig Transorm

* add docker_model_runner

* add docker_model_runner

* docs docker model runner

* add DockerModelRunnerChatConfig

* add docker_model_runner to providers

* test_completion_hits_correct_url_and_body

* fix sidebar

* TestDockerModelRunnerIntegration

* test_completion_with_custom_engine_and_host

* docs docker model runner

* docs fix
2025-11-21 16:09:32 -08:00
Eiliya d88580fa28 fix(gemini-video): inherit BaseVideoConfig to enable async content response (#16875)
This fix addresses the same issue that was resolved for OpenAI video in PR #16708.

The GeminiVideoConfig class was importing BaseVideoConfig only within TYPE_CHECKING,
causing it to be 'Any' at runtime. This prevented the async_transform_video_content_response
method from being available during video content downloads.

Changes:
- Moved BaseVideoConfig import from TYPE_CHECKING to top-level imports
- Added test_gemini_video_config_has_async_transform() to verify the fix
- Ensures GeminiVideoConfig properly inherits BaseVideoConfig at runtime

Fixes video generation errors for Gemini Veo models:
'GeminiVideoConfig' object has no attribute 'async_transform_video_content_response'
2025-11-21 16:01:21 -08:00
Suresh Kumar 5b4a848391 fix anthropic pass-through endpoint (#16883) 2025-11-21 16:00:05 -08:00
yuneng-jiang b6b8f46b36 Change Public Model Hub to use proxyBaseUrl (#16892) 2025-11-21 15:59:03 -08:00
Cesar Garcia 1c65800f4a Feat: add support for Grok 4.1 Fast models (#16936)
* feat: Add support for Grok 4.1 Fast models

Add new xAI Grok 4.1 Fast models optimized for high-performance agentic tool calling:

- xai/grok-4-1-fast (alias for grok-4-1-fast-reasoning)
- xai/grok-4-1-fast-reasoning (with reasoning capabilities)
- xai/grok-4-1-fast-reasoning-latest
- xai/grok-4-1-fast-non-reasoning (without reasoning for faster responses)
- xai/grok-4-1-fast-non-reasoning-latest

Features:
- Context window: 2,000,000 tokens
- Pricing: $0.20/1M input, $0.50/1M output tokens
- Cached tokens: $0.05/1M tokens
- Supports: Function calling, Structured outputs, Vision, Audio input, Web search, Reasoning

Fixes #16927

* docs: Add comprehensive Grok models documentation

- Add 'Supported Models' section highlighting new Grok 4.1 Fast models
- Include comparison guide for reasoning vs non-reasoning models
- Add complete model family table (Grok 4.1, 4, 3, Code, 2)
- Add features legend explaining capabilities
- Remove pricing details (link to xAI docs instead for current rates)
- Improve documentation clarity and consistency

Related to #16927

* docs: Minor corrections to xai.md
2025-11-21 15:57:55 -08:00
Cesar Garcia 22ef7ab070 feat: Add support for Gemini 3 Pro Image model (#16938)
Add gemini-3-pro-image-preview model configuration for Google's new
image generation model (aka "Nano Banana Pro 🍌").

Model details:
- Input: $2.00/1M tokens (text), $0.0011/image
- Output: $12.00/1M tokens (text), $0.134/image (1K/2K)
- Context: 65k input / 32k output tokens
- Capabilities: structured outputs, web search, caching, thinking
- No function calling support
- Available on both Gemini API and Vertex AI

Added variants:
- gemini-3-pro-image-preview (base, uses Vertex AI)
- gemini/gemini-3-pro-image-preview (Gemini API)
- vertex_ai/gemini-3-pro-image-preview (Vertex AI)

Source: https://ai.google.dev/gemini-api/docs/pricing
Fixes: #16925
2025-11-21 15:55:25 -08:00
YutaSaito 7c4ef090c1 docs: fix mcp url format (#16940)
* docs: fix mcp url format

* fix: update Cursor MCP example to use url instead of server_url
2025-11-21 15:43:26 -08:00
Ishaan Jaff ed6c3b4c86 [Bug Fix]: Search APIs - error in firecrawl-search "Invalid request body" (#16943)
* add search_tool_name in litellm params

* test_search_tool_name_in_all_litellm_params

* bump config
2025-11-21 14:56:19 -08:00
Ishaan Jaff 01ea6c8948 [New model] Add GLM 4.6 from together.ai (#16942)
* new model - add together_ai/zai-org/GLM-4.6

* together_ai/zai-org/GLM-4.6
2025-11-21 14:39:52 -08:00
Cesar Garcia 1812ebae70 fix: Correct Cerebras GPT-OSS-120B model name (#16939)
Change model identifier from cerebras/openai/gpt-oss-120b to
cerebras/gpt-oss-120b to match Cerebras API requirements.

The Cerebras API only accepts 'gpt-oss-120b' as the model ID, not
'openai/gpt-oss-120b'. The previous name was causing "Model does not
exist" errors when users tried to use it.

Tested with real API calls to confirm:
- cerebras/gpt-oss-120b → sends 'gpt-oss-120b' →  works
- cerebras/openai/gpt-oss-120b → sends 'openai/gpt-oss-120b' →  fails

Fixes #16924
2025-11-21 14:20:31 -08:00
Ishaan Jaff fb38763eb4 [Feat] UI - Show "get code" section for prompt management + minor polish of showing version history (#16941)
* add _get_prompt_data_from_dotprompt_content

* fix pre call hook for prompt template

* fix: get_latest_version_prompt_id

* fix get_latest_version_prompt_id

* test_get_latest_version_prompt_id

* fx info and delete lookup for prompts

* refactor prompt table

* - rename to prompt studio

* fix get_prompt_info

* fix endpoints

* add PromptCodeSnippets

* prompt info view

* add prompt info view

* show correct version for prompts

* fix version selector

* fix endpoints and version

* fix get_prompt_info

* fix version display
2025-11-21 14:00:33 -08:00
Ishaan Jaff c9ac1949ee [Fix] Prompt Management - UI, allow seeing model, prompt id for Prompt (#16932)
* add _get_prompt_data_from_dotprompt_content

* fix pre call hook for prompt template

* fix: get_latest_version_prompt_id

* fix get_latest_version_prompt_id

* test_get_latest_version_prompt_id

* fx info and delete lookup for prompts

* refactor prompt table
2025-11-21 13:59:54 -08:00
yuneng-jiang 5dd2ee0bff Change /public fields to honor server root path (#16930) 2025-11-21 13:59:16 -08:00
Ishaan Jaff 6ae22908b7 [Feat] Prompt Versioning - Allow specifying prompt version in code (#16929)
* add _get_prompt_data_from_dotprompt_content

* fix pre call hook for prompt template

* fix: get_latest_version_prompt_id

* fix get_latest_version_prompt_id

* test_get_latest_version_prompt_id
2025-11-21 13:58:49 -08:00
yuneng-jiang 4b25398afe [Infra] CI/CD Fixes (#16937)
* Attempt CI/CD Fix

* Adding test for coverage

* Adding max depth to copilot and vertex

* Fixing mypy lint and docker database

* Fixing UI build issues

* Update playwright test
2025-11-21 13:58:19 -08:00
colinlin-stripe f9d8eeaf8e [stripe] gemini 3 thought signatures in tool call id (#16895)
* though signature tool call id

* [stripe] refactor and tests

* [stripe] remove md and move to factory

* [stripe] remove redudant test

* [stripe] ran black formatting

* [stripe] add thought signature docs

* [stripe] remove unused import
2025-11-21 13:44:53 -08:00
Clint Banzhaf caddc6dd0f fix images being dropped from tool results for bedrock (#16492)
* fix images being dropped from tool results for bedrock

* type fixes
2025-11-21 10:52:48 -08:00
Ishaan Jaff 97d9da93e0 [Feat] Prompt Management - Allow viewing version history (#16901)
* TestPromptRequest

* add prompts/test endpoint for testing prompt

* TestPromptTestEndpoint

* feat: working v1 of this ui

* workig prompt endpoints

* add chat ui for prompts

* add conversation panel

* add init chat ui

* allow clicking edit prompt

* fix use get_base_prompt_id

* add endpoints for viewing prompt versions

* TestPromptVersioning

* add getPromptVersions

* add VersionHistorySidePanel

* allow viewing version history

* add version history
2025-11-21 08:54:52 -08:00
Ishaan Jaff 3c789ac287 feat: Add vector store create and search call types (#16859)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: ishaan <ishaan@berri.ai>
2025-11-21 08:54:41 -08:00