Commit Graph

646 Commits

Author SHA1 Message Date
Ishaan Jaff 19e26a5c60 test_default_api_base 2025-07-04 18:26:54 -07:00
Ishaan Jaff 59f3771799 test_text_completion_stream - hf 2025-07-03 16:00:51 -07:00
Ishaan Jaff 437f4765b4 test_completion_mistral_api_mistral_large_function_call_with_streaming 2025-07-03 14:58:28 -07:00
Krish Dholakia c0319d0d01 Litellm dev fix gemini web search tracking (#12288)
* feat(stream_chunk_builder_utils.py): correctly return web_search_requests on stream chunk builder

* fix(types/utils.py): handle prompttokendetails

* fix(stream_chunk_builder_utils.py): fix ruff check error

* test: try-except rate limit error

* fix: fix import
2025-07-03 12:27:14 -07:00
Ishaan Jaff 75bb22a868 fix huggingface/deepseek-ai/DeepSeek-R1 2025-07-03 12:13:51 -07:00
Ishaan Jaff 5630147e80 Revert "Revert "fix tests (#12286)""
This reverts commit 12f157513b.
2025-07-03 12:08:27 -07:00
Ishaan Jaff 12f157513b Revert "fix tests (#12286)"
This reverts commit 99ce3a24cc.
2025-07-03 12:04:23 -07:00
célina 99ce3a24cc fix tests (#12286) 2025-07-03 10:57:19 -07:00
Krrish Dholakia a198d4a39f test: change mistral model
service tier exceeded
2025-07-02 21:11:02 -07:00
Ishaan Jaff 6b623f9c98 test whitelisted models 2025-06-28 14:46:16 -07:00
Ishaan Jaff 041db0268c [Bug fix] Router - handle cooldown_time = 0 for deployments (#12108)
* fix get cooldown time

* fixes for _should_run_cooldown_logic

* test_cooldown_time_zero_uses_zero_not_default

* Update litellm/router_utils/cooldown_cache.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update litellm/router_utils/cooldown_handlers.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-27 17:50:35 -07:00
Krish Dholakia 7f8b2579a2 Minor Fixes (#11868)
* fix(litellm_pre_call_utils.py): add user agent tags to spend logs in standard logging payload logic

avoid clash when tag based routing is enabled

* test: remove redundant test

* test: rename oidc test to run earlier

quicker debuging

* fix(azure.py): return more detailed error message

* fix(azure/common_utils.py): use default scope, if scope is none

fixes oidc test

* fix: always default to cognitiveservices.azure.com

* test: update test
2025-06-18 14:12:59 -07:00
Krish Dholakia 0319adbf5d feat(speech/): working gemini tts support via openai's /v1/speech endpoint (#11832)
* feat(speech/): working gemini tts support via openai's `/v1/speech` endpoint

Enables calling gemini models via `/v1/speech`

* feat(speech_to_completion_bridge/): voice param support

enables passing voice param to gemini models

* fix: fix ruff checks

* fix: fix checks
2025-06-18 10:36:25 -07:00
Ishaan Jaff 355e6118d8 def test_text_completion_stream(): 2025-06-14 16:46:09 -07:00
Ishaan Jaff 5a051cb264 test_async_embedding_azure_caching - flaky test 2025-06-14 13:55:29 -07:00
Ishaan Jaff e3094c2249 set flaky tests as flaky 2025-06-14 13:51:52 -07:00
Krrish Dholakia 31a73be03f fix(litellm_logging.py): skip should_run_logging check on streaming 2025-06-13 21:19:24 -07:00
Ishaan Jaff 5b451bf483 test_openai_azure_embedding_simple 2025-06-13 19:00:25 -07:00
Ishaan Jaff 7947139913 [Feat] MCP expose streamable https endpoint for LiteLLM Proxy (#11645)
* feat - add https mcp support

* fixes for MCP http integration

* fix code QA

* bump mcp dep

* test_mcp_server_manager_https_server

* test mcp server https

* fix linting error

* bump mcp in poetry

* fix import streamablehttp_client

* fix streamablehttp_client

* fix streamablehttp_client

* add streamablehttp_client

* add simple https server

* working mounted app

* working HTTPS mcp streamable

* fix code QA check

* feat: add MCP Server

* fix - init just as fastapi app

* add LITELLM_MCP_SERVER_DESCRIPTION

* fix importing / init litellm app

* Update litellm/proxy/_experimental/mcp_server/server.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update litellm/proxy/_experimental/mcp_server/server.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update server.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fixes based on review + code check

* fix linting

* test_streamable_http_mcp_handler_mock

* fix python 3.13 install

* fix deps test

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-06-12 16:32:04 -07:00
Thiago Salvatore fab24fae1a fix: Do not add default model on tag based-routing when valid tag (#11454)
* Do not add default when valid tagged model

* Use default models when no tag matches

* Add unit tests
2025-06-12 13:18:42 -07:00
Krrish Dholakia e46ef9d642 test: update test with new kwargs 2025-06-11 22:19:17 -07:00
Ishaan Jaff 91010cda8f [Bug Fix] Add audio/ogg mapping for Audio MIME types (#11635)
* Add audio/ogg mapping

* test_vertex_ai_gemini_audio_ogg

* test_vertex_ai_gemini_audio_ogg
2025-06-11 14:19:53 -07:00
Krrish Dholakia ec52600f98 test: handle fireworks ai instability 2025-06-11 10:09:28 -07:00
Krish Dholakia c569056ea8 Show remaining users on UI (#11568)
* docs(deploy.md): move docker recommendation to `main-stable`

* feat(enterprise/internal_user_endpoints.py): expose endpoint for checking available premium users

* feat(usage_indictor.tsx): add new element to help track remaining premium users

* feat(usage_indicator.tsx): show premium user remaining usage

allows users with user caps to know how much is left

* fix(vertex_and_google_ai_studio_gemini.py): bubble up stream is not finished, even if stop reason is given

prevents early completion of stream

Closes https://github.com/BerriAI/litellm/issues/11549

* fix(streaming_handler.py): respect is_finished = False in hidden params

internal logic for preventing ending stream early

* fix(litellm_license.py): add function to check if user is over limit

* fix(internal_user_endpoints.py): add function to check if user is over limit

* refactor: move test

* docs(customer_endpoints.py): document new param
2025-06-09 22:04:45 -07:00
Laurien 0c50f8bcc9 Update enduser spend and budget reset date based on budget duration (#8460) 2025-06-08 08:39:14 -07:00
Krish Dholakia 8dd8615a54 Ensure consistent 'created' across all chunks + set tool call id for ollama streaming calls (#11528)
* fix(streaming_handler.py): maintain same 'created' across all chunks

Fixes https://github.com/BerriAI/litellm/issues/11437

* test: add unit test to ensure created is always the same across all chunks

* fix(types/utils.py): set a tool call id, if missing in delta tool call

Ensures stream chunk builder can reconstruct tool calls correctly

Fixes https://github.com/BerriAI/litellm/issues/11262

* fix(responses/transformation.py): support passing mcp server tool call to anthropic

allows switching between openai and anthropic for mcp tool calling

* fix(ollama/chat/transformation.py): set tool call id's when missing
2025-06-07 20:50:07 -07:00
Krish Dholakia c42740a4b9 Simplify experimental multi-instance rate limiter - more accurate (#11424)
* refactor: comment out circuit breaker

causes incorrect rate limiting in high traffic

* fix(base_routing_strategy.py): don't reset value if redis val is lower than current in-memory value

Fixes issue where redis might be trailing in-memory value

* fix(parallel_request_limiter_v2.py): if in-memory higher than redis, don't reset value; add previous slot keys to redis increment to correctly 'get' them

* fix(parallel_request_limiter_v3.py): v3 implementation of parallel request limiter

does not use background redis syncing - increments redis in call

 simplify rate limiting logic, to improve accuracy

* fix: fix ruff errors

* fix(parallel_request_limiter_v3.py): don't decrement limit on post call success - causes double decrements

* fix(parallel_request_limiter_v3.py): working accurate multi-instance logic

ensured just 100 requests allowed on 100 users, 10 ramp up, 100 rpm limit key, 2 instances

* fix(parallel_request_limiter_v3.py): working accurate rate limiting with time window resets

allows rate limiting to work across multiple windows

* test: add unit tests for v3 rate limiter

* fix(parallel_request_limiter_v3.py): return window value into in-memory cache

allows in-memory cache checks to be used correctly

* refactor(parallel_request_limiter_v3.py): refactor rate limiting to work for multiple window/counter key pairs

enables using for user/team/model rate limiting

* feat(parallel_request_limiter_v3.py): working rate limiting, across key/user/team/end-user

* fix(parallel_request_limiter_v3.py): add model specific rate limiting

* fix(parallel_request_limiter_v3.py): ignore if no rate limits set

skip unecessary rate limit checks - if no limits set

* fix(parallel_request_limiter_v3.py): initial commit bringing token rate limits back

* fix(parallel_request_limiter_v3.py): increment by value in list + update assertions to handle tokens + max parallel requests

* test(parallel_request_limiter_v3.py): more testing

* fix(parallel_request_limiter.py): working in-memory cache limiter

* fix(redis_cache.py): ignore linting error - use safe hasattr

* fix(parallel_request_limiter_v3.py): fix linting error

* refactor: remove redundant parallel_Request_limiter_v2.py

old / inaccurate implementation

* test: update tests

* style: cleanup

* test: update test

* docs(config_settings.md): document new env var

* test(test_base_routing_strategy.py): update test
2025-06-07 11:10:55 -07:00
Ishaan Jaff bc835c6044 test_lm_studio_completion 2025-06-06 20:41:00 -07:00
Cole McIntosh e191e72746 Fix: Respect user_header_name property for budget selection and user identification (#11419)
* Refactor get_end_user_id_from_request_body to support user ID retrieval from custom headers and multiple request body formats. Enhance tests to cover various scenarios including header precedence and fallback mechanisms.

* Refactor get_end_user_id_from_request_body function to accept request_body as the first parameter, improving clarity and flexibility. Update tests for compatibility and add new cases to ensure correct functionality across various request body formats.

* Update _user_api_key_auth_builder and user_api_key_auth to pass request object to get_end_user_id_from_request_body, enhancing user ID retrieval from request data.

* refactor(auth_utils.py): update get_end_user_id_from_request_body to accept request_headers instead of request, and adjust related function calls in user_api_key_auth and tests

* refactor(tests): update mock request handling in LLM pass-through endpoint tests

- Replaced the Request object with a Mock for better flexibility in testing.
- Enhanced mock setup to include user API key handling and virtual key retrieval.
- Updated test calls to reflect changes in mock request structure and added necessary patches for new dependencies.

* refactor(vertex_and_google_ai_studio_gemini.py): remove redundant variable declaration for url_context_metadata, linting error
2025-06-06 14:21:02 -07:00
Cole McIntosh 1ceb9f9621 Merge pull request #11455 from colesmcintosh/429-fireworks-mapping
Fix Fireworks AI rate limit exception mapping - detect "rate limit" text in error messages
2025-06-06 15:06:15 -06:00
Krrish Dholakia 0c9f992af0 test: update to handle gemini-flash empty responses 2025-06-06 13:37:29 -07:00
Ishaan Jaff f0cb80ec50 [Feat] Return response_id == upstream response ID for VertexAI + Google AI studio (Stream+Non stream) (#11456)
* fix: vertexAI return responseID

* fix: vertexAI return responseID

* test_vertex_ai_response_id

* test: test_vertex_ai_streaming_response_id

* test_vertex_ai_streaming_response_id
2025-06-05 20:18:55 -07:00
Cole McIntosh 08239357cf Add ExceptionCheckers class for improved error string detection
Introduce the ExceptionCheckers class to encapsulate methods for checking error conditions in exception strings, specifically for identifying rate limit errors. Update the Fireworks AI exception mapping tests to cover various scenarios, including standard 429 errors and text-based detection, ensuring accurate mapping to RateLimitError. Enhance test coverage for both positive and negative cases of rate limit detection.
2025-06-05 17:15:53 -06:00
Cole McIntosh fda99ecb41 Enhance exception mapping for Fireworks AI: add better handling for 429 status codes and text-based rate limit detection. Update tests to verify correct mapping to RateLimitError for both 429 and related error messages. 2025-06-05 15:47:25 -06:00
Ishaan Jaff f0e0007eaf fix: gemini-2.0-flash-preview-image-generation test 2025-06-04 21:21:28 -07:00
Krish Dholakia 4611b821ec Support returning virtual key in custom auth + Handle provider-specific optional params for embedding calls (#11346)
* feat(custom_auth_auto.py): support returning a litellm virtual key from custom auth

allows admin to remap old keys to litellm virtual keys

* fix(utils.py): correctly handle optional params for openai sdk calls

Fixes https://github.com/BerriAI/litellm/issues/11126

* test: update test

* fix(utils.py): handle edge cases
2025-06-03 07:24:13 -07:00
Krish Dholakia ccc085faee Merge in - Gemini streaming - thinking content parsing - return in reasoning_content (#11298)
* fix(base_routing_strategy.py): compress increments to redis - reduces write ops

* fix(base_routing_strategy.py): make get and reset in memory keys atomic

* fix(base_routing_strategy.py): don't reset keys - causes discrepency on subsequent requests to instance

* fix(parallel_request_limiter.py): retrieve values of previous slots from cache

more accurate rate limiting with sliding window

* fix: fix test

* fix: fix linting error

* fix(gemini/): fix streaming handler for function calling

Closes https://github.com/BerriAI/litellm/pull/11294

* fix: fix linting error

* test: update test

* fix(vertex_and_google_ai_studio_gemini.py): return none on skipped chunk

* fix(streaming_handler.py): skip none chunks on async streaming
2025-06-02 23:14:38 -07:00
Cole McIntosh ba89d4f00f refactor: update model handling in Azure and OpenAI audio transcription classes (#11333)
- Changed hardcoded model "whisper-1" to dynamic model extraction in AzureAudioTranscription and OpenAIAudioTranscription classes.
- Added tests to ensure correct model mapping for various transcription models, including GPT-4o and Azure whisper-1.
2025-06-02 16:25:51 -07:00
Ishaan Jaff 7d47417906 test: fixes 2025-05-31 12:42:56 -07:00
Krish Dholakia 5d4ae9aa4d Support dropping non-openai params when specified in additional_drop_params + Add VertexAI Anthropic support on /v1/messages (#11246)
* feat(utils.py): support dropping non-openai params when specified via additional drop params

Closes https://github.com/BerriAI/litellm/issues/11205

* fix(utils.py): fix linting error

* refactor(handler.py): add custom llm provider to anthropic messages provider config exception

* feat: initial commit adding vertex ai anthropic support on `/v1/messages`

* test: add working unit test

* test(vertex_ai_partner_models/anthropic): add /v1/messages support for anthropic api

Adds vertex ai auth

* feat(vertex_ai/anthropic): return correct url when calling via `/v1/messages`

* fix: more alignment to expected anthropic request format

* fix: fix ruff linting check

* Removed syntax error from docs (#11242)

* [Feat]: Add Bedrock InvokeAgents as a /chat/completions route on LiteLLM (#11239)

* feat: init structure for bedrock AGENTs

* feat: add basic  routing for bedrock AGENTs

* feat: add basic transforms for bedrock AGENTs

* fix: url for bedrock agent runtime

* fix: working agents request

* feat: working agents non-streaming request

* feat: bedrock agents

* feat: add streaming for bedrock agents

* feat: add cost tracking for bedrock agents

* docs litellm with bedrock agents

* fix: linting errors

* test: invoke agents tests

* fix: import session handling

* Revert "fix: import session handling"

This reverts commit deb257dc10.

* fix: linting pin mypy

* [Feat]: Guardrails - Add streaming for bedrock post guard (#11247)

* feat: add streaming for bedrock post guard

* fix: bedrock guardrails

* fix: add clear comments

* Update litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Update litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix: clean up bedrock guardrails

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* [Fix] Responses API - Session management  (#11254)

* fix: import session handling

* fix: imports for session handler

* tests: tests for session handler

* Update enterprise/litellm_enterprise/enterprise_callbacks/session_handler.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

---------

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* bump: bump litellm enterprise

* fixes: test_create_user_default_budget

* fix: fix linting error

* fix: fix linting error

---------

Co-authored-by: Fadil Rahman <87557055+fadil4u@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
2025-05-29 23:42:48 -07:00
Krish Dholakia ba39f9e360 Helicone base url support + fix for embedding cache hits on str input (#11211)
* fix(helicone.py): add helicone api base support

Fixes https://github.com/BerriAI/litellm/issues/10825

* test: add unit test for cache hit response on embedding calls

* fix(caching_handler.py): fix handling cache hit on embedding when input is string

Fixes LIT-197

* docs(helicone_integration.md): document new helicone api base param
2025-05-28 22:02:55 -07:00
Krish Dholakia 7072466775 VertexAI - codeExecution tool support + anyOf handling (#11195)
* fix(vertex_and_google_ai_studio_gemini.py): handle both camel case and underscores in the tool for vertex ai code execution

support vertex ai code execution

* docs(vertex.md): add code execution example to vertex ai

* fix(vertex_ai/common_utils.py): when anyof in field, just select anyof - don't include other k,v pairs - vertex throws error

Fixes https://github.com/BerriAI/litellm/issues/11164

* fix(common_utils.py): add title field inside anyof - to retain some description

Addresses https://github.com/BerriAI/litellm/issues/11164#issuecomment-2914728385
2025-05-27 21:23:14 -07:00
Ishaan Jaff 6c36dc269b test: fix test_vertexai_model_garden_model_completion 2025-05-27 18:51:50 -07:00
Akim Tsvigun acaa80294c Integration with Nebius AI Studio added (#11143)
* integration with Nebius AI Studio added

* Merged with main

* Reviewer's comments resolved

* spelling error fixed

* accidental change reverted
2025-05-27 11:05:22 -07:00
Ishaan Jaff 4d2edc4e7a [Fixes] Aiohttp transport fixes - add handling for aiohttp.ClientPayloadError and ssl_verification settings (#11162)
* fix: AiohttpResponseStream transport

* fix: use AiohttpResponseStream transport by default

* fix: AiohttpResponseStream transport

* fixes: mapping aiohttp exceptions

* fixes: aiohttp rollout

* fixes: add support ssl_verify for aiohttp

* fixes: add support ssl_verify for aiohttp

* fixes: remove duplicates
2025-05-26 21:14:35 -07:00
Krish Dholakia 010a4d44af Fix passing standard optional params (#11124)
* fix(main.py): use processed non-default-params as standard input params for langfuse

Fixes https://github.com/BerriAI/litellm/issues/11072

 Fixes https://github.com/BerriAI/litellm/issues/11096

* fix(main.py): rename variable to be more accurate

* test(test_langfuse_e2e_test.py): add router unit test for langfuse e2e testing

Prevent https://github.com/BerriAI/litellm/issues/11072 from happening again

* build: update lock

* fix(utils.py): refactor optional params function

make it easier to get the standardized non default params

* fix(utils.py): improve process non default params function

* fix(main.py): include provider specific params in processed non default params used in logging

ensures user can see any provider specific params on langfuse

 ensures user can see any provider specific params on langfus e
2025-05-24 12:12:31 -07:00
Ishaan Jaff 86cdb8382b [Feat] Use aiohttp transport by default - 97% lower median latency (#11097)
* fix: add flag for disabling use_aiohttp_transport

* feat: add _create_async_transport

* feat: fixes for transport

* add httpx-aiohttp

* feat: fixes for transport

* refactor: fixes for transport

* build: fix deps

* fixes: test fixes

* fix: ensure aiohttp does not auto set content type

* test: test fixes

* feat: add LiteLLMAiohttpTransport

* fix: fixes for responses API handling

* test: fixes for responses API handling

* test: fixes for responses API handling

* feat: fixes for transport

* fix: base embedding handler

* test: test_async_http_handler_force_ipv4

* test: fix failing deepeval test

* fix: add YARL for bedrock urls

* fix: issues with transport

* fix: comment out linting issues

* test fix

* test: XAI is unstable

* test: fixes for using respx

* test: XAI fixes

* test: XAI fixes

* test: infinity testing fixes

* docs(config_settings.md): document param

* test: test_openai_image_edit_litellm_sdk

* test: remove deprecated test

* bump respx==0.22.0

* test: test_xai_message_name_filtering

* test: fix anthropic test after bumping httpx

* use n 4 for mapped tests (#11109)

* fix: use 1 session per event loop

* test: test_client_session_helper

* fix: linting error

* fix: resolving GET requests on httpx 0.28.1

* test fixes proxy unit tests

* fix: add ssl verify settings

* fix: proxy unit tests

* fix: refactor

* tests: basic unit tests for aiohttp transports

* tests: fixes xai

---------

Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
2025-05-23 22:55:35 -07:00
Tornike Gurgenidze db4183715a feat: add embeddings to CustomLLM (#10980)
* feat: add embeddings to CustomLLM

* feat: add aembedding to custom llm
2025-05-22 22:55:46 -07:00
Krrish Dholakia 469d395177 test: update groq test - change on their end 2025-05-22 15:02:01 -07:00
slytechnical 98e9db340c [Feature] Add supports_computer_use to the model list (#10881)
* Add support for supports_computer_use in model info

* Corrected list of supports_computer_use models

* Further fix computer use compatible claude models, fix existing test that predated supports_computer_use in the model list

* Move computer use test case into existing test_utils file

* Moved tests in to test_utils.py
2025-05-20 17:07:43 -07:00