Commit Graph

2637 Commits

Author SHA1 Message Date
Krrish Dholakia 92ebf5b918 fix(router.py): fix print statement 2025-08-11 17:46:14 -07:00
Ishaan Jaff 9f78287000 [Bug Fix]: Azure OpenAI GPT-5 max_tokens + reasoning param support (#13510)
* add AzureOpenAIGPT5Config

* add AzureOpenAIGPT5Config

* add AzureOpenAIGPT5Config

* add AzureOpenAIGPT5Config

* test_azure_gpt5_supports_reasoning_effort

* test_azure_gpt5_reasoning

* test_azure_gpt5_reasoning

* ruff check fixes

* docs azure gpt5
2025-08-11 15:40:53 -07:00
Ishaan Jaff 1cd827874f [Bug Fix] - Allow using reasoning_effort for gpt-5 model family and reasoning for Responses API (#13475)
* test_openai_gpt5_reasoning

* test_openai_gpt5_reasoning_effort_parameter

* add OpenAIGPT5ResponsesAPIConfig

* test_openai_gpt5_reasoning_effort_parameter

* fixes
2025-08-10 09:55:36 -07:00
Krish Dholakia 9f6f96d76c Litellm dev 08 07 2025 p1 (#13418)
* fix(router.py): support base model for model group usage

allows model group info to show accurate cost information for azure models

* fix(router.py): fix changes

* test: add unit tests

* build(pyproject.toml): bump openai version requirements

support custom tool from responses api

 Closes https://github.com/BerriAI/litellm/issues/13391

* docs(responses_api.md): add verbosity + free-form function calling parameters

* docs(responses_api.md): add cfg + minimal reasoning to docs

Closes https://github.com/BerriAI/litellm/issues/13391

* docs(responses_api.md): add proxy examples to docs

* refactor: fix ruff error
2025-08-09 16:30:04 -07:00
Sannan Nasir 0e53b1feab Add digitalocean provider (#12169)
* Add digitalocean provider

* Add digitalocean provider

* Revert "Add digitalocean provider"

This reverts commit 96dda40f45b3d12ea03e861d060ec81460b7759e.

* changes

* fixes

* Update transformation

* refactoring

* rename provider to Gradient AI

* fixes

* Incorporte review comments

* revert changes

* fix typo

* revert change

* incorporated review comments

* Revert "Incorporte review comments"

This reverts commit 37bd51bd54ef4fd52ccc12866e47f8de9476d597.

* changes

* Revert "Revert "Incorporte review comments"

This reverts commit 37bd51bd54ef4fd52ccc12866e47f8de9476d597."

This reverts commit 68c8a198ee0d6441c3a52f6c6a49c9c95a4cb0a8.

* changes

* fixes

* Update provider_specific_fields.tsx
2025-08-09 16:26:33 -07:00
Ishaan Jaff f60a9cf908 [Bug]: Fix JWTs access not working with model groups (#13474)
* fix can_team_access_model

* test_find_team_with_model_access_model_group
2025-08-09 16:14:51 -07:00
Jugal D. Bhatt 67833590d6 [Proxy changes] Litellm add model price reload schedule for multi-pod (#13470)
* added mcp guardrails doc in mcp.md

* add button to reload models

* Added button changes

* added button for scheduling reload

* add multi pod support to reloading the model price json

* fix ruff
2025-08-09 16:12:13 -07:00
Krish Dholakia 1c8761111f Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362)
* feat(proxy/utils.py): track pre-call hooks in OTEL

some pre call hooks can cause latency in high traffic - make sure this is tracked

* fix(router.py): move redis call on deployment_callback_on_success to pipeline operation

reduces p99 latency by half when redis is enabled

* fix(parallel_request_limiter_v3.py): only run check if any item has rate limits set

Prevents unnecessary latency added by rate limit checks

* test: add unit tests

* Latency Improvements: only track tpm/rpm usage when set on deployment+ LLM Caching - use an in-memory cache to reduce redis calls + OTEL - track time spent on LLM caching (#13472)

* fix(router.py): only track usage for deployments with tpm/rpm set

ensures additional latency avoided for non-tpm/rpm models

* fix(caching_handler.py): log time spent on request get cache to OTEL

enables easy debugging of call latency

* fix(caching_handler.py): use dual cache object for in-memory caching + trace redis call within caching handler

* fix(caching_handler.py): working in-memory cache for redis calls

ensures dual cache works when redis cache setup for llm calls

makes calls quicker by only checking redis when in-memory cache missed for llm api call

* test: remove redundant test

* test: add unit tests
2025-08-09 16:09:51 -07:00
Ishaan Jaff 60306d34a0 [Bug Fix] Allow using Swagger for /chat/completions (#13469)
* fix get_openapi_schema

* fixes for ProxyChatCompletionRequest

* TestSwaggerChatCompletions

* fix working request body

* fix - add "messages"

* fix messages

* TestSwaggerChatCompletions

* test_messages_field_has_example

* ruff check fix
2025-08-09 15:35:45 -07:00
Jugal D. Bhatt 1270df08a4 [Proxy + UI] Litellm add reload model api and button (#13464)
* added mcp guardrails doc in mcp.md

* add button to reload models

* Added button changes

* remove the model_reload
2025-08-09 13:52:56 -07:00
Jugal D. Bhatt 10a1fe21c5 [LLM Translation] Litellm azure o series drop params (#13353)
* added route check

* fix ruff

* Added support for dropping o_series params

* Added ruff fix

* fix tests
2025-08-09 13:52:45 -07:00
Ishaan Jaff eb4bd26f24 [Bug Fix] - Get Routes (#13466)
* fixes get_routes_for_mounted_app

* fix - use _safe_get_endpoint_name

* fix code QA check

* test_get_routes_for_mounted_app_with_static_files

* test fixes
2025-08-09 12:52:23 -07:00
Ishaan Jaff 825ea65b96 [Bug Fix] Responses API - Responses API failed if input containing ResponseReasoningItem (#13465)
* add test_responses_api_multi_turn_with_reasoning_and_structured_output

* fix transform_responses_api_request
2025-08-09 11:20:34 -07:00
Ishaan Jaff a843e876a8 [Feat] Working e2e flow for Responses API session management with media (#13456)
* add MultimodalContent on chat UI

* add multi modal img on chat ui

* utils for responses API imgs

* add code snippet with imgs

* chat UI add imgs

* add imge upload

* chat ui allow adding images

* fix chat send button

* fix button styles

* fix clear chat

* fixes session management

* fixes for session management

* QA fix _should_check_cold_storage_for_full_payload

* test_should_check_cold_storage_for_full_payload
2025-08-08 18:28:10 -07:00
Ishaan Jaff 3b65733af8 [Bug fix] - Error creating standard logging object - can't register atexit after shutdownLitellm fixes standard logging payload (#13436)
* fix: _generate_cold_storage_object_key

* _get_configured_cold_storage_custom_logger

* test_e2e_generate_cold_storage_object_key_runtime_error_handled
2025-08-08 12:38:26 -07:00
Jugal D. Bhatt 51c2ff7c15 fix user membership issue (#13433) 2025-08-08 12:00:58 -07:00
Ishaan Jaff 3a35c82884 [Feat] Add reasoning_effort to OpenAIGPT5Config (#13434)
* add reasoning_effort toi OpenAIGPT5Config

* test_gpt5_supports_reasoning_effort
2025-08-08 11:57:12 -07:00
Thiago Salvatore c2ad858c83 fix(access group): allow access group on mcp tool retrieval (#13425)
* fix(access group): allow access group on mcp tool retrieval

* fix(test): fix broken tests and add test case for access group

* fix(mypy): fix typing issues
2025-08-08 08:55:46 -07:00
Ishaan Jaff 9761ba7c7a [Bug Fix] Responses api session management for streaming responses (#13396)
* fix proxy config

* fix(responses api): fix streaming ID consistency and tool format handling (#12640)

* fix(responses): ensure streaming chunk IDs use consistent encoding format

Fixes streaming ID inconsistency where streaming responses used raw provider IDs
while non-streaming responses used properly encoded IDs with provider context.

Changes:
- Updated LiteLLMCompletionStreamingIterator to accept provider context
- Added _encode_chunk_id() method using same logic as non-streaming responses
- Modified chunk transformation to encode all streaming item_ids with resp_ prefix
- Updated handlers to pass custom_llm_provider and litellm_metadata to streaming iterator

Impact:
- Streaming chunk IDs now format: resp_<base64_encoded_provider_context>
- Enables session continuity when using streaming response IDs as previous_response_id
- Allows provider detection and load balancing with streaming responses
- Maintains backward compatibility with existing streaming functionality

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(types): add explicit Optional[str] type annotation for model_id

This resolves MyPy type checking error where model_id could be None
but wasn't explicitly typed as Optional[str].

* fix(types): handle None case for litellm_metadata access

Prevents 'Item None has no attribute get' error by checking for None
before accessing litellm_metadata dictionary.

* test: add comprehensive tests for streaming ID consistency

Adds unit and E2E tests to verify streaming chunk IDs are properly encoded
with consistent format across streaming responses.

## Tests Added

### Unit Test (test_reasoning_content_transformation.py)
- `test_streaming_chunk_id_encoding()`: Validates the `_encode_chunk_id()` method
  correctly encodes chunk IDs with `resp_` prefix and provider context

### E2E Tests (test_e2e_openai_responses_api.py)
- `test_streaming_id_consistency_across_chunks()`: Tests that all streaming chunk IDs
  are properly encoded across multiple chunks in a real streaming response
- `test_streaming_response_id_as_previous_response_id()`: Tests the core use case -
  using streaming response IDs for session continuity with `previous_response_id`

## Key Testing Approach
- Uses **Gemini** (non-OpenAI model) to test the transformation logic rather than
  OpenAI passthrough, since the streaming ID consistency issue occurs when LiteLLM
  transforms responses rather than just passing through to native OpenAI responses API
- Tests validate that streaming chunk IDs now use same encoding as non-streaming responses
- Verifies session continuity works with streaming responses

Addresses @ishaan-jaff's request for unit tests covering the streaming ID consistency fix.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* fix(lint): remove unused imports in transformation.py

Removes unused imports to fix CI linting errors:
- GenericResponseOutputItem
- OutputFunctionToolCall

* test: remove E2E tests from openai_endpoints_tests

Remove streaming ID consistency E2E tests as requested by @ishaan-jaff.
Keep only the mock/unit test in test_reasoning_content_transformation.py

* revert: remove streaming chunk ID encoding to original behavior

This reverts the streaming chunk ID encoding changes to understand the original issue better.
Original behavior was:
- Streaming chunks: raw provider IDs
- Streaming final response: raw IDs (PROBLEM!)
- Non-streaming final response: encoded IDs (correct)

The real issue: streaming final response IDs were not encoded, breaking session continuity.

* fix(responses): encode streaming final response IDs to match OpenAI behavior

Fixes streaming ID inconsistency to match OpenAI's Responses API behavior:
- Streaming chunks: raw message IDs (like OpenAI's msg_xxx)
- Final response: encoded IDs (like OpenAI's resp_xxx)

This enables session continuity by ensuring streaming final response IDs
have the same encoded format as non-streaming responses, allowing them
to be used as previous_response_id in follow-up requests.

Changes:
- Add custom_llm_provider and litellm_metadata to LiteLLMCompletionStreamingIterator
- Update handlers to pass provider context to streaming iterator
- Apply _update_responses_api_response_id_with_model_id to final streaming response
- Keep streaming chunks as raw IDs to match OpenAI format

Impact:
- Session continuity works with streaming responses
- Load balancing can detect provider from streaming final response IDs
- Format matches OpenAI's Responses API exactly

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <noreply@anthropic.com>

* test: update unit test to match correct OpenAI-compatible behavior

Updates the unit test to verify streaming chunk IDs are raw (not encoded)
to match OpenAI's responses API format:
- Streaming chunks: raw message IDs (like msg_xxx)
- Final response: encoded IDs (like resp_xxx)

This reflects the correct behavior implemented in the fix.

---------

Co-authored-by: Claude <noreply@anthropic.com>

* cleanup

* TestBaseResponsesAPIStreamingIterator

---------

Co-authored-by: Javier de la Torre <jatorre@carto.com>
Co-authored-by: Claude <noreply@anthropic.com>
2025-08-07 20:13:24 -07:00
Ishaan Jaff 7695882d8a test_supports_tool_choice 2025-08-07 16:56:45 -07:00
Ishaan Jaff 2037037258 [Bug Fix] OpenAI gpt-5 series does not support "max_tokens" parameter and temperature values that are not = 1 (#13390)
* add OpenAIGPT5Config

* add map_openai_params for gpt5

* add OpenAIGPT5Config

* add OpenAI gpt 5 transform

* docs gpt 5 openai
2025-08-07 16:35:00 -07:00
Ishaan Jaff e8c081b8ff test_stream_chunk_builder_litellm_usage_chunks 2025-08-07 15:22:52 -07:00
Ishaan Jaff dbb651ea95 remove old mapped test 2025-08-07 13:51:50 -07:00
Ishaan Jaff 621b3dca7b [Bug Fix] Mistral Tool Calling - Grammar error: at 3(11): failed to compile JSON schema (#13389)
* test_claude_tool_use_with_gemini

* add _remove_json_schema_refs

* add _clean_tool_schema_for_mistral

* fixes mistral tool calls

* _remove_json_schema_refs

* fix - vertex, remove hardcoded test
2025-08-07 13:50:22 -07:00
Ishaan Jaff 984f91f4f5 test_completion_gemini_stream 2025-08-07 13:24:00 -07:00
Ishaan Jaff 08ac2aeb6d Revert "Fix SSO Logout | Create Unified Login Page with SSO and Username/Password Options (#12703)" (#13387)
This reverts commit a752d7acc9.
2025-08-07 13:13:05 -07:00
Ishaan Jaff 4d941c914e [Feat] Responses API Session Handling - Multi media support (#13347)
* rename ResponsesSessionHandler

* use ResponsesSessionHandler

* test session handler

* refactor ResponsesSessionHandler

* fix get_proxy_server_request_from_spend_log

* use constant for LITELLM_TRUNCATED_PAYLOAD_FIELD

* add _should_check_cold_storage_for_full_payload

* add get_class_type_for_custom_logger_name

* get_active_custom_logger_for_callback_name

* add get_proxy_server_request_from_cold_storage to CustomLogger

* add ColdStorageHandler

* start using cold storage integration

* add get_proxy_server_request_from_cold_storage

* fixes from manual testing

* s3 v2 fix getting region name

* ChatCompletionImageUrlObject

* use _get_configured_cold_storage_custom_logger

* fixes for _should_check_cold_storage_for_full_payload

* fix _download_object_from_s3

* test_s3_v2_with_cold_storage

* add cold_storage_object_key to StandardLoggingMetadata

* use get_proxy_server_request_from_cold_storage_with_object_key

* add cold_storage_object_key to SpendLogsMetadata

* add cold_storage_object_key

* get_proxy_server_request_from_cold_storage_with_object_key

* use get_proxy_server_request_from_cold_storage_with_object_key

* test responses API

* add get_proxy_server_request_from_cold_storage_with_object_key

* session handler fixes

* test session handler

* fix ruff checks

* _download_object_from_s3

* cleanup

* test

* lint fix

* test_e2e_cold_storage_successful_retrieval

* test_e2e_generate_cold_storage_object_key_successful

* test_async_gcs_pub_sub_v1

* test fix

* test fix

* test fix

* test_standard_logging_metadata_has_cold_storage_object_key_field

* test_sanitize_request_body_for_spend_logs_payload_basic

* test_transform_input_image_item_to_image_item_with_image_data
2025-08-07 10:59:53 -07:00
Anand Khinvasara 96dca4eff8 fix: 12152 - Redacted sensitive information logged in bedrock guardrails (#13356) 2025-08-07 08:42:11 -07:00
Edward D'Amato 30fc5b871c feat(integrations): allow setting of braintrust callback base url (#13368)
* feat(integrations): allow setting of braintrust callback base url

* chore(misc): remove extra additions due to merge
2025-08-07 08:40:11 -07:00
Ishaan Jaff dfada882f1 vtx test fix gemini-2.5-flash-lite 2025-08-07 00:11:10 -07:00
yeahyung a92bf8173e Fix create, search vector store error (#13285)
* (#13284) add avector_store_create to route_type which doesn't require model

* (#13284) exclude hidden params in metadata when create vector store

* (#13284) fix lint error

* (#13284) keep metadata None if metadata is None(not empty dict)

* (#13284) add test code

* (#13284) change test code name

* (#13284) add avector_store_search to route_type which doesn't require model
2025-08-06 11:15:17 -07:00
Jugal D. Bhatt b1a8968895 [MCP Gateway] fix auth on ui for bearer servers (#13312)
* fix auth on ui for bearer servers

* add tests and fixes

* fix tests
2025-08-06 09:46:10 -07:00
Ishaan Jaff eeed03a78f test fix: gcp deprecated gemini-1.5-flash 2025-08-06 08:43:45 -07:00
Krish Dholakia 0da25fadc0 Exclude none fields on /chat/completion - fixes n8n bug + Allow calling /v1/models when end user over budget (#13320)
* fix(proxy_server.py): exclude none fields before returning

Fixes https://github.com/BerriAI/litellm/issues/13055

* test: add unit tests

* feat(auth_checks.py): allow info routes to work when end user over budget

Fixes https://github.com/BerriAI/litellm/issues/13286
2025-08-05 21:39:46 -07:00
zjx20 92c525ddfe feat(JinaAI): support multimodal embedding models (#13181)
* feat(JinaAI): support multimodal embedding models

* add test case

* add test

* fix test
2025-08-05 19:21:56 -07:00
Krish Dholakia 324cfe8bdc fix(streaming_handler.py): include cost in streaming usage object (#13319)
Fixes https://github.com/BerriAI/litellm/issues/12689
2025-08-05 18:38:31 -07:00
Jugal D. Bhatt b6fcda2f8a [LLM Translation] Fix model group on clientside auth with API calls (#13314)
* fix unsupported operand type(s) for +=: 'NoneType' and 'str' on clientside auth creds for responses

* fix the client side auth to use correct metadata

* add more tests

* fix tests
2025-08-05 17:46:47 -07:00
Ishaan Jaff b455ada161 [Bug Fix] [Bug]: New Databricks Foundation Models databricks-gpt-oss-20b and databricks-gpt-oss-120b failed with error: litellm.APIConnectionError: 'signature' (#13318)
* test_transform_choices_without_signature

* fix ChatCompletionThinkingBlock

* extract_reasoning_content
2025-08-05 17:46:40 -07:00
Ishaan Jaff dab8ba03e3 [Feat] - When using custom tags on prometheus allow using wildcard patterns (#13316)
* _tag_matches_wildcard_configured_pattern

* test_get_custom_labels_from_tags_wildcard_patterns

* docs Custom Tags

* docs how custom tags work

* fix
2025-08-05 17:46:13 -07:00
Ishaan Jaff 0ccc493455 [Bug]: Fix Mimetype Resolution Error in Bedrock Document Understanding (#13309)
* fix _validate_format for BedrockImageProcessor

* add test

* fix _validate_format for bedrock

* _get_document_format

* test_bedrock_get_document_format_fallback_mimes

* fix: add fallback method for mime type detection
2025-08-05 17:07:10 -07:00
Jugal D. Bhatt 32501c85f5 fix unsupported operand type(s) for +=: 'NoneType' and 'str' on clientside auth creds for responses (#13293) 2025-08-05 13:16:16 -07:00
Jugal D. Bhatt 609fa9f5ca [LLM Translation + Coding tools] Added litellm claude code count tokens support (#13261)
* Added litellm claude code count tokens support

* fix mypy

* create helper

* Revert construct

* revert construct

* fix return

* Add reutrn none

* change to factory approach

* refactor to BaseModelInfo

* enum fix
2025-08-05 10:57:24 -07:00
Jugal D. Bhatt 29a8c583c2 added redis iam auth (#13275) 2025-08-05 10:56:34 -07:00
Ishaan Jaff 5a02eb473b test_function_calling_with_tool_response 2025-08-05 09:55:47 -07:00
Krish Dholakia 416da066eb fix(main.py): handle tool being a pydantic object (#13274)
* fix(main.py): handle tool being a pydantic object

Fixes https://github.com/BerriAI/litellm/issues/13064

* fix(prompt_templates/common_utils.py): fix unpack defs deepcopy issue

Fixes https://github.com/BerriAI/litellm/issues/13151

* fix(utils.py): handle tools is none
2025-08-04 23:44:02 -07:00
Krish Dholakia eb49f987de Ensure disable_llm_api_endpoints works + Add wildcard model support for 'team-byok' model (#13278)
* fix(route_checks.py): ensure disable llm api endpoints is correctly set

* fix(route_checks.py): raise httpexception

raise expected exceptions

* fix(router.py): handle team only wildcard models

fixes issue where team only wildcard models were not considered during auth checks

* fix(router.py): handle team only wildcard models

fixes issue where team only wildcard models were not considered during auth checks
2025-08-04 23:19:51 -07:00
Jugal D. Bhatt efd34966dc [LLM Translation] Support /v1/models/{model_id} retrieval (#13268)
* added model id endpoint

* fix test

* add route to internal users

* make the functions reusable

* fixed mypy
2025-08-04 18:03:59 -07:00
Jugal D. Bhatt de7108b5f8 input cost per token higher than 1 test (#13270) 2025-08-04 18:02:03 -07:00
Ishaan Jaff ba1882fdd5 [Bug Fix] Prometheus - fix for litellm_input_tokens_metric, litellm_output_tokens_metric - Note this updates the metric name (#13271)
* fixes for litellm_tokens_metric

* test_prometheus_token_metrics_with_prometheus_config
2025-08-04 17:22:21 -07:00
Pascal Bro a17d483c89 Add GCS bucket caching support (#13122) 2025-08-04 16:09:33 -07:00