* fix(litellm_pre_call_utils.py): add user agent tags to spend logs in standard logging payload logic
avoid clash when tag based routing is enabled
* test: remove redundant test
* test: rename oidc test to run earlier
quicker debuging
* fix(azure.py): return more detailed error message
* fix(azure/common_utils.py): use default scope, if scope is none
fixes oidc test
* fix: always default to cognitiveservices.azure.com
* test: update test
* docs(deploy.md): move docker recommendation to `main-stable`
* feat(enterprise/internal_user_endpoints.py): expose endpoint for checking available premium users
* feat(usage_indictor.tsx): add new element to help track remaining premium users
* feat(usage_indicator.tsx): show premium user remaining usage
allows users with user caps to know how much is left
* fix(vertex_and_google_ai_studio_gemini.py): bubble up stream is not finished, even if stop reason is given
prevents early completion of stream
Closes https://github.com/BerriAI/litellm/issues/11549
* fix(streaming_handler.py): respect is_finished = False in hidden params
internal logic for preventing ending stream early
* fix(litellm_license.py): add function to check if user is over limit
* fix(internal_user_endpoints.py): add function to check if user is over limit
* refactor: move test
* docs(customer_endpoints.py): document new param
* fix(streaming_handler.py): maintain same 'created' across all chunks
Fixes https://github.com/BerriAI/litellm/issues/11437
* test: add unit test to ensure created is always the same across all chunks
* fix(types/utils.py): set a tool call id, if missing in delta tool call
Ensures stream chunk builder can reconstruct tool calls correctly
Fixes https://github.com/BerriAI/litellm/issues/11262
* fix(responses/transformation.py): support passing mcp server tool call to anthropic
allows switching between openai and anthropic for mcp tool calling
* fix(ollama/chat/transformation.py): set tool call id's when missing
* refactor: comment out circuit breaker
causes incorrect rate limiting in high traffic
* fix(base_routing_strategy.py): don't reset value if redis val is lower than current in-memory value
Fixes issue where redis might be trailing in-memory value
* fix(parallel_request_limiter_v2.py): if in-memory higher than redis, don't reset value; add previous slot keys to redis increment to correctly 'get' them
* fix(parallel_request_limiter_v3.py): v3 implementation of parallel request limiter
does not use background redis syncing - increments redis in call
simplify rate limiting logic, to improve accuracy
* fix: fix ruff errors
* fix(parallel_request_limiter_v3.py): don't decrement limit on post call success - causes double decrements
* fix(parallel_request_limiter_v3.py): working accurate multi-instance logic
ensured just 100 requests allowed on 100 users, 10 ramp up, 100 rpm limit key, 2 instances
* fix(parallel_request_limiter_v3.py): working accurate rate limiting with time window resets
allows rate limiting to work across multiple windows
* test: add unit tests for v3 rate limiter
* fix(parallel_request_limiter_v3.py): return window value into in-memory cache
allows in-memory cache checks to be used correctly
* refactor(parallel_request_limiter_v3.py): refactor rate limiting to work for multiple window/counter key pairs
enables using for user/team/model rate limiting
* feat(parallel_request_limiter_v3.py): working rate limiting, across key/user/team/end-user
* fix(parallel_request_limiter_v3.py): add model specific rate limiting
* fix(parallel_request_limiter_v3.py): ignore if no rate limits set
skip unecessary rate limit checks - if no limits set
* fix(parallel_request_limiter_v3.py): initial commit bringing token rate limits back
* fix(parallel_request_limiter_v3.py): increment by value in list + update assertions to handle tokens + max parallel requests
* test(parallel_request_limiter_v3.py): more testing
* fix(parallel_request_limiter.py): working in-memory cache limiter
* fix(redis_cache.py): ignore linting error - use safe hasattr
* fix(parallel_request_limiter_v3.py): fix linting error
* refactor: remove redundant parallel_Request_limiter_v2.py
old / inaccurate implementation
* test: update tests
* style: cleanup
* test: update test
* docs(config_settings.md): document new env var
* test(test_base_routing_strategy.py): update test
* Refactor get_end_user_id_from_request_body to support user ID retrieval from custom headers and multiple request body formats. Enhance tests to cover various scenarios including header precedence and fallback mechanisms.
* Refactor get_end_user_id_from_request_body function to accept request_body as the first parameter, improving clarity and flexibility. Update tests for compatibility and add new cases to ensure correct functionality across various request body formats.
* Update _user_api_key_auth_builder and user_api_key_auth to pass request object to get_end_user_id_from_request_body, enhancing user ID retrieval from request data.
* refactor(auth_utils.py): update get_end_user_id_from_request_body to accept request_headers instead of request, and adjust related function calls in user_api_key_auth and tests
* refactor(tests): update mock request handling in LLM pass-through endpoint tests
- Replaced the Request object with a Mock for better flexibility in testing.
- Enhanced mock setup to include user API key handling and virtual key retrieval.
- Updated test calls to reflect changes in mock request structure and added necessary patches for new dependencies.
* refactor(vertex_and_google_ai_studio_gemini.py): remove redundant variable declaration for url_context_metadata, linting error
Introduce the ExceptionCheckers class to encapsulate methods for checking error conditions in exception strings, specifically for identifying rate limit errors. Update the Fireworks AI exception mapping tests to cover various scenarios, including standard 429 errors and text-based detection, ensuring accurate mapping to RateLimitError. Enhance test coverage for both positive and negative cases of rate limit detection.
- Changed hardcoded model "whisper-1" to dynamic model extraction in AzureAudioTranscription and OpenAIAudioTranscription classes.
- Added tests to ensure correct model mapping for various transcription models, including GPT-4o and Azure whisper-1.
* fix(helicone.py): add helicone api base support
Fixes https://github.com/BerriAI/litellm/issues/10825
* test: add unit test for cache hit response on embedding calls
* fix(caching_handler.py): fix handling cache hit on embedding when input is string
Fixes LIT-197
* docs(helicone_integration.md): document new helicone api base param
* fix(vertex_and_google_ai_studio_gemini.py): handle both camel case and underscores in the tool for vertex ai code execution
support vertex ai code execution
* docs(vertex.md): add code execution example to vertex ai
* fix(vertex_ai/common_utils.py): when anyof in field, just select anyof - don't include other k,v pairs - vertex throws error
Fixes https://github.com/BerriAI/litellm/issues/11164
* fix(common_utils.py): add title field inside anyof - to retain some description
Addresses https://github.com/BerriAI/litellm/issues/11164#issuecomment-2914728385
* fix: AiohttpResponseStream transport
* fix: use AiohttpResponseStream transport by default
* fix: AiohttpResponseStream transport
* fixes: mapping aiohttp exceptions
* fixes: aiohttp rollout
* fixes: add support ssl_verify for aiohttp
* fixes: add support ssl_verify for aiohttp
* fixes: remove duplicates
* fix(main.py): use processed non-default-params as standard input params for langfuse
Fixes https://github.com/BerriAI/litellm/issues/11072
Fixes https://github.com/BerriAI/litellm/issues/11096
* fix(main.py): rename variable to be more accurate
* test(test_langfuse_e2e_test.py): add router unit test for langfuse e2e testing
Prevent https://github.com/BerriAI/litellm/issues/11072 from happening again
* build: update lock
* fix(utils.py): refactor optional params function
make it easier to get the standardized non default params
* fix(utils.py): improve process non default params function
* fix(main.py): include provider specific params in processed non default params used in logging
ensures user can see any provider specific params on langfuse
ensures user can see any provider specific params on langfus e
* fix: add flag for disabling use_aiohttp_transport
* feat: add _create_async_transport
* feat: fixes for transport
* add httpx-aiohttp
* feat: fixes for transport
* refactor: fixes for transport
* build: fix deps
* fixes: test fixes
* fix: ensure aiohttp does not auto set content type
* test: test fixes
* feat: add LiteLLMAiohttpTransport
* fix: fixes for responses API handling
* test: fixes for responses API handling
* test: fixes for responses API handling
* feat: fixes for transport
* fix: base embedding handler
* test: test_async_http_handler_force_ipv4
* test: fix failing deepeval test
* fix: add YARL for bedrock urls
* fix: issues with transport
* fix: comment out linting issues
* test fix
* test: XAI is unstable
* test: fixes for using respx
* test: XAI fixes
* test: XAI fixes
* test: infinity testing fixes
* docs(config_settings.md): document param
* test: test_openai_image_edit_litellm_sdk
* test: remove deprecated test
* bump respx==0.22.0
* test: test_xai_message_name_filtering
* test: fix anthropic test after bumping httpx
* use n 4 for mapped tests (#11109)
* fix: use 1 session per event loop
* test: test_client_session_helper
* fix: linting error
* fix: resolving GET requests on httpx 0.28.1
* test fixes proxy unit tests
* fix: add ssl verify settings
* fix: proxy unit tests
* fix: refactor
* tests: basic unit tests for aiohttp transports
* tests: fixes xai
---------
Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>
* Add support for supports_computer_use in model info
* Corrected list of supports_computer_use models
* Further fix computer use compatible claude models, fix existing test that predated supports_computer_use in the model list
* Move computer use test case into existing test_utils file
* Moved tests in to test_utils.py