litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-24 21:38:52 +00:00

Author	SHA1	Message	Date
Ishaan Jaff	19e26a5c60	test_default_api_base	2025-07-04 18:26:54 -07:00
Ishaan Jaff	59f3771799	test_text_completion_stream - hf	2025-07-03 16:00:51 -07:00
Ishaan Jaff	437f4765b4	test_completion_mistral_api_mistral_large_function_call_with_streaming	2025-07-03 14:58:28 -07:00
Krish Dholakia	c0319d0d01	Litellm dev fix gemini web search tracking (#12288 ) * feat(stream_chunk_builder_utils.py): correctly return web_search_requests on stream chunk builder * fix(types/utils.py): handle prompttokendetails * fix(stream_chunk_builder_utils.py): fix ruff check error * test: try-except rate limit error * fix: fix import	2025-07-03 12:27:14 -07:00
Ishaan Jaff	75bb22a868	fix huggingface/deepseek-ai/DeepSeek-R1	2025-07-03 12:13:51 -07:00
Ishaan Jaff	5630147e80	Revert "Revert "fix tests (#12286 )"" This reverts commit `12f157513b`.	2025-07-03 12:08:27 -07:00
Ishaan Jaff	12f157513b	Revert "fix tests (#12286 )" This reverts commit `99ce3a24cc`.	2025-07-03 12:04:23 -07:00
célina	99ce3a24cc	fix tests (#12286 )	2025-07-03 10:57:19 -07:00
Krrish Dholakia	a198d4a39f	test: change mistral model service tier exceeded	2025-07-02 21:11:02 -07:00
Ishaan Jaff	6b623f9c98	test whitelisted models	2025-06-28 14:46:16 -07:00
Ishaan Jaff	041db0268c	[Bug fix] Router - handle cooldown_time = 0 for deployments (#12108 ) * fix get cooldown time * fixes for _should_run_cooldown_logic * test_cooldown_time_zero_uses_zero_not_default * Update litellm/router_utils/cooldown_cache.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update litellm/router_utils/cooldown_handlers.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-06-27 17:50:35 -07:00
Krish Dholakia	7f8b2579a2	Minor Fixes (#11868 ) * fix(litellm_pre_call_utils.py): add user agent tags to spend logs in standard logging payload logic avoid clash when tag based routing is enabled * test: remove redundant test * test: rename oidc test to run earlier quicker debuging * fix(azure.py): return more detailed error message * fix(azure/common_utils.py): use default scope, if scope is none fixes oidc test * fix: always default to cognitiveservices.azure.com * test: update test	2025-06-18 14:12:59 -07:00
Krish Dholakia	0319adbf5d	feat(speech/): working gemini tts support via openai's `/v1/speech` endpoint (#11832 ) * feat(speech/): working gemini tts support via openai's `/v1/speech` endpoint Enables calling gemini models via `/v1/speech` * feat(speech_to_completion_bridge/): voice param support enables passing voice param to gemini models * fix: fix ruff checks * fix: fix checks	2025-06-18 10:36:25 -07:00
Ishaan Jaff	355e6118d8	def test_text_completion_stream():	2025-06-14 16:46:09 -07:00
Ishaan Jaff	5a051cb264	test_async_embedding_azure_caching - flaky test	2025-06-14 13:55:29 -07:00
Ishaan Jaff	e3094c2249	set flaky tests as flaky	2025-06-14 13:51:52 -07:00
Krrish Dholakia	31a73be03f	fix(litellm_logging.py): skip should_run_logging check on streaming	2025-06-13 21:19:24 -07:00
Ishaan Jaff	5b451bf483	test_openai_azure_embedding_simple	2025-06-13 19:00:25 -07:00
Ishaan Jaff	7947139913	[Feat] MCP expose streamable https endpoint for LiteLLM Proxy (#11645 ) * feat - add https mcp support * fixes for MCP http integration * fix code QA * bump mcp dep * test_mcp_server_manager_https_server * test mcp server https * fix linting error * bump mcp in poetry * fix import streamablehttp_client * fix streamablehttp_client * fix streamablehttp_client * add streamablehttp_client * add simple https server * working mounted app * working HTTPS mcp streamable * fix code QA check * feat: add MCP Server * fix - init just as fastapi app * add LITELLM_MCP_SERVER_DESCRIPTION * fix importing / init litellm app * Update litellm/proxy/_experimental/mcp_server/server.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update litellm/proxy/_experimental/mcp_server/server.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update server.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fixes based on review + code check * fix linting * test_streamable_http_mcp_handler_mock * fix python 3.13 install * fix deps test --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-06-12 16:32:04 -07:00
Thiago Salvatore	fab24fae1a	fix: Do not add default model on tag based-routing when valid tag (#11454 ) * Do not add default when valid tagged model * Use default models when no tag matches * Add unit tests	2025-06-12 13:18:42 -07:00
Krrish Dholakia	e46ef9d642	test: update test with new kwargs	2025-06-11 22:19:17 -07:00
Ishaan Jaff	91010cda8f	[Bug Fix] Add audio/ogg mapping for Audio MIME types (#11635 ) * Add audio/ogg mapping * test_vertex_ai_gemini_audio_ogg * test_vertex_ai_gemini_audio_ogg	2025-06-11 14:19:53 -07:00
Krrish Dholakia	ec52600f98	test: handle fireworks ai instability	2025-06-11 10:09:28 -07:00
Krish Dholakia	c569056ea8	Show remaining users on UI (#11568 ) * docs(deploy.md): move docker recommendation to `main-stable` * feat(enterprise/internal_user_endpoints.py): expose endpoint for checking available premium users * feat(usage_indictor.tsx): add new element to help track remaining premium users * feat(usage_indicator.tsx): show premium user remaining usage allows users with user caps to know how much is left * fix(vertex_and_google_ai_studio_gemini.py): bubble up stream is not finished, even if stop reason is given prevents early completion of stream Closes https://github.com/BerriAI/litellm/issues/11549 * fix(streaming_handler.py): respect is_finished = False in hidden params internal logic for preventing ending stream early * fix(litellm_license.py): add function to check if user is over limit * fix(internal_user_endpoints.py): add function to check if user is over limit * refactor: move test * docs(customer_endpoints.py): document new param	2025-06-09 22:04:45 -07:00
Laurien	0c50f8bcc9	Update enduser spend and budget reset date based on budget duration (#8460 )	2025-06-08 08:39:14 -07:00
Krish Dholakia	8dd8615a54	Ensure consistent 'created' across all chunks + set tool call id for ollama streaming calls (#11528 ) * fix(streaming_handler.py): maintain same 'created' across all chunks Fixes https://github.com/BerriAI/litellm/issues/11437 * test: add unit test to ensure created is always the same across all chunks * fix(types/utils.py): set a tool call id, if missing in delta tool call Ensures stream chunk builder can reconstruct tool calls correctly Fixes https://github.com/BerriAI/litellm/issues/11262 * fix(responses/transformation.py): support passing mcp server tool call to anthropic allows switching between openai and anthropic for mcp tool calling * fix(ollama/chat/transformation.py): set tool call id's when missing	2025-06-07 20:50:07 -07:00
Krish Dholakia	c42740a4b9	Simplify experimental multi-instance rate limiter - more accurate (#11424 ) * refactor: comment out circuit breaker causes incorrect rate limiting in high traffic * fix(base_routing_strategy.py): don't reset value if redis val is lower than current in-memory value Fixes issue where redis might be trailing in-memory value * fix(parallel_request_limiter_v2.py): if in-memory higher than redis, don't reset value; add previous slot keys to redis increment to correctly 'get' them * fix(parallel_request_limiter_v3.py): v3 implementation of parallel request limiter does not use background redis syncing - increments redis in call simplify rate limiting logic, to improve accuracy * fix: fix ruff errors * fix(parallel_request_limiter_v3.py): don't decrement limit on post call success - causes double decrements * fix(parallel_request_limiter_v3.py): working accurate multi-instance logic ensured just 100 requests allowed on 100 users, 10 ramp up, 100 rpm limit key, 2 instances * fix(parallel_request_limiter_v3.py): working accurate rate limiting with time window resets allows rate limiting to work across multiple windows * test: add unit tests for v3 rate limiter * fix(parallel_request_limiter_v3.py): return window value into in-memory cache allows in-memory cache checks to be used correctly * refactor(parallel_request_limiter_v3.py): refactor rate limiting to work for multiple window/counter key pairs enables using for user/team/model rate limiting * feat(parallel_request_limiter_v3.py): working rate limiting, across key/user/team/end-user * fix(parallel_request_limiter_v3.py): add model specific rate limiting * fix(parallel_request_limiter_v3.py): ignore if no rate limits set skip unecessary rate limit checks - if no limits set * fix(parallel_request_limiter_v3.py): initial commit bringing token rate limits back * fix(parallel_request_limiter_v3.py): increment by value in list + update assertions to handle tokens + max parallel requests * test(parallel_request_limiter_v3.py): more testing * fix(parallel_request_limiter.py): working in-memory cache limiter * fix(redis_cache.py): ignore linting error - use safe hasattr * fix(parallel_request_limiter_v3.py): fix linting error * refactor: remove redundant parallel_Request_limiter_v2.py old / inaccurate implementation * test: update tests * style: cleanup * test: update test * docs(config_settings.md): document new env var * test(test_base_routing_strategy.py): update test	2025-06-07 11:10:55 -07:00
Ishaan Jaff	bc835c6044	test_lm_studio_completion	2025-06-06 20:41:00 -07:00
Cole McIntosh	e191e72746	Fix: Respect user_header_name property for budget selection and user identification (#11419 ) * Refactor get_end_user_id_from_request_body to support user ID retrieval from custom headers and multiple request body formats. Enhance tests to cover various scenarios including header precedence and fallback mechanisms. * Refactor get_end_user_id_from_request_body function to accept request_body as the first parameter, improving clarity and flexibility. Update tests for compatibility and add new cases to ensure correct functionality across various request body formats. * Update _user_api_key_auth_builder and user_api_key_auth to pass request object to get_end_user_id_from_request_body, enhancing user ID retrieval from request data. * refactor(auth_utils.py): update get_end_user_id_from_request_body to accept request_headers instead of request, and adjust related function calls in user_api_key_auth and tests * refactor(tests): update mock request handling in LLM pass-through endpoint tests - Replaced the Request object with a Mock for better flexibility in testing. - Enhanced mock setup to include user API key handling and virtual key retrieval. - Updated test calls to reflect changes in mock request structure and added necessary patches for new dependencies. * refactor(vertex_and_google_ai_studio_gemini.py): remove redundant variable declaration for url_context_metadata, linting error	2025-06-06 14:21:02 -07:00
Cole McIntosh	1ceb9f9621	Merge pull request #11455 from colesmcintosh/429-fireworks-mapping Fix Fireworks AI rate limit exception mapping - detect "rate limit" text in error messages	2025-06-06 15:06:15 -06:00
Krrish Dholakia	0c9f992af0	test: update to handle gemini-flash empty responses	2025-06-06 13:37:29 -07:00
Ishaan Jaff	f0cb80ec50	[Feat] Return response_id == upstream response ID for VertexAI + Google AI studio (Stream+Non stream) (#11456 ) * fix: vertexAI return responseID * fix: vertexAI return responseID * test_vertex_ai_response_id * test: test_vertex_ai_streaming_response_id * test_vertex_ai_streaming_response_id	2025-06-05 20:18:55 -07:00
Cole McIntosh	08239357cf	Add ExceptionCheckers class for improved error string detection Introduce the ExceptionCheckers class to encapsulate methods for checking error conditions in exception strings, specifically for identifying rate limit errors. Update the Fireworks AI exception mapping tests to cover various scenarios, including standard 429 errors and text-based detection, ensuring accurate mapping to RateLimitError. Enhance test coverage for both positive and negative cases of rate limit detection.	2025-06-05 17:15:53 -06:00
Cole McIntosh	fda99ecb41	Enhance exception mapping for Fireworks AI: add better handling for 429 status codes and text-based rate limit detection. Update tests to verify correct mapping to RateLimitError for both 429 and related error messages.	2025-06-05 15:47:25 -06:00
Ishaan Jaff	f0e0007eaf	fix: gemini-2.0-flash-preview-image-generation test	2025-06-04 21:21:28 -07:00
Krish Dholakia	4611b821ec	Support returning virtual key in custom auth + Handle provider-specific optional params for embedding calls (#11346 ) * feat(custom_auth_auto.py): support returning a litellm virtual key from custom auth allows admin to remap old keys to litellm virtual keys * fix(utils.py): correctly handle optional params for openai sdk calls Fixes https://github.com/BerriAI/litellm/issues/11126 * test: update test * fix(utils.py): handle edge cases	2025-06-03 07:24:13 -07:00
Krish Dholakia	ccc085faee	Merge in - Gemini streaming - thinking content parsing - return in `reasoning_content` (#11298 ) * fix(base_routing_strategy.py): compress increments to redis - reduces write ops * fix(base_routing_strategy.py): make get and reset in memory keys atomic * fix(base_routing_strategy.py): don't reset keys - causes discrepency on subsequent requests to instance * fix(parallel_request_limiter.py): retrieve values of previous slots from cache more accurate rate limiting with sliding window * fix: fix test * fix: fix linting error * fix(gemini/): fix streaming handler for function calling Closes https://github.com/BerriAI/litellm/pull/11294 * fix: fix linting error * test: update test * fix(vertex_and_google_ai_studio_gemini.py): return none on skipped chunk * fix(streaming_handler.py): skip none chunks on async streaming	2025-06-02 23:14:38 -07:00
Cole McIntosh	ba89d4f00f	refactor: update model handling in Azure and OpenAI audio transcription classes (#11333 ) - Changed hardcoded model "whisper-1" to dynamic model extraction in AzureAudioTranscription and OpenAIAudioTranscription classes. - Added tests to ensure correct model mapping for various transcription models, including GPT-4o and Azure whisper-1.	2025-06-02 16:25:51 -07:00
Ishaan Jaff	7d47417906	test: fixes	2025-05-31 12:42:56 -07:00
Krish Dholakia	5d4ae9aa4d	Support dropping non-openai params when specified in `additional_drop_params` + Add VertexAI Anthropic support on `/v1/messages` (#11246 ) * feat(utils.py): support dropping non-openai params when specified via additional drop params Closes https://github.com/BerriAI/litellm/issues/11205 * fix(utils.py): fix linting error * refactor(handler.py): add custom llm provider to anthropic messages provider config exception * feat: initial commit adding vertex ai anthropic support on `/v1/messages` * test: add working unit test * test(vertex_ai_partner_models/anthropic): add /v1/messages support for anthropic api Adds vertex ai auth * feat(vertex_ai/anthropic): return correct url when calling via `/v1/messages` * fix: more alignment to expected anthropic request format * fix: fix ruff linting check * Removed syntax error from docs (#11242) * [Feat]: Add Bedrock InvokeAgents as a /chat/completions route on LiteLLM (#11239) * feat: init structure for bedrock AGENTs * feat: add basic routing for bedrock AGENTs * feat: add basic transforms for bedrock AGENTs * fix: url for bedrock agent runtime * fix: working agents request * feat: working agents non-streaming request * feat: bedrock agents * feat: add streaming for bedrock agents * feat: add cost tracking for bedrock agents * docs litellm with bedrock agents * fix: linting errors * test: invoke agents tests * fix: import session handling * Revert "fix: import session handling" This reverts commit `deb257dc10`. * fix: linting pin mypy * [Feat]: Guardrails - Add streaming for bedrock post guard (#11247) * feat: add streaming for bedrock post guard * fix: bedrock guardrails * fix: add clear comments * Update litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Update litellm/proxy/guardrails/guardrail_hooks/bedrock_guardrails.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * fix: clean up bedrock guardrails --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * [Fix] Responses API - Session management (#11254) * fix: import session handling * fix: imports for session handler * tests: tests for session handler * Update enterprise/litellm_enterprise/enterprise_callbacks/session_handler.py Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --------- Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * bump: bump litellm enterprise * fixes: test_create_user_default_budget * fix: fix linting error * fix: fix linting error --------- Co-authored-by: Fadil Rahman <87557055+fadil4u@users.noreply.github.com> Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>	2025-05-29 23:42:48 -07:00
Krish Dholakia	ba39f9e360	Helicone base url support + fix for embedding cache hits on str input (#11211 ) * fix(helicone.py): add helicone api base support Fixes https://github.com/BerriAI/litellm/issues/10825 * test: add unit test for cache hit response on embedding calls * fix(caching_handler.py): fix handling cache hit on embedding when input is string Fixes LIT-197 * docs(helicone_integration.md): document new helicone api base param	2025-05-28 22:02:55 -07:00
Krish Dholakia	7072466775	VertexAI - `codeExecution` tool support + anyOf handling (#11195 ) * fix(vertex_and_google_ai_studio_gemini.py): handle both camel case and underscores in the tool for vertex ai code execution support vertex ai code execution * docs(vertex.md): add code execution example to vertex ai * fix(vertex_ai/common_utils.py): when anyof in field, just select anyof - don't include other k,v pairs - vertex throws error Fixes https://github.com/BerriAI/litellm/issues/11164 * fix(common_utils.py): add title field inside anyof - to retain some description Addresses https://github.com/BerriAI/litellm/issues/11164#issuecomment-2914728385	2025-05-27 21:23:14 -07:00
Ishaan Jaff	6c36dc269b	test: fix test_vertexai_model_garden_model_completion	2025-05-27 18:51:50 -07:00
Akim Tsvigun	acaa80294c	Integration with Nebius AI Studio added (#11143 ) * integration with Nebius AI Studio added * Merged with main * Reviewer's comments resolved * spelling error fixed * accidental change reverted	2025-05-27 11:05:22 -07:00
Ishaan Jaff	4d2edc4e7a	[Fixes] Aiohttp transport fixes - add handling for `aiohttp.ClientPayloadError` and ssl_verification settings (#11162 ) * fix: AiohttpResponseStream transport * fix: use AiohttpResponseStream transport by default * fix: AiohttpResponseStream transport * fixes: mapping aiohttp exceptions * fixes: aiohttp rollout * fixes: add support ssl_verify for aiohttp * fixes: add support ssl_verify for aiohttp * fixes: remove duplicates	2025-05-26 21:14:35 -07:00
Krish Dholakia	010a4d44af	Fix passing standard optional params (#11124 ) * fix(main.py): use processed non-default-params as standard input params for langfuse Fixes https://github.com/BerriAI/litellm/issues/11072 Fixes https://github.com/BerriAI/litellm/issues/11096 * fix(main.py): rename variable to be more accurate * test(test_langfuse_e2e_test.py): add router unit test for langfuse e2e testing Prevent https://github.com/BerriAI/litellm/issues/11072 from happening again * build: update lock * fix(utils.py): refactor optional params function make it easier to get the standardized non default params * fix(utils.py): improve process non default params function * fix(main.py): include provider specific params in processed non default params used in logging ensures user can see any provider specific params on langfuse ensures user can see any provider specific params on langfus e	2025-05-24 12:12:31 -07:00
Ishaan Jaff	86cdb8382b	[Feat] Use aiohttp transport by default - 97% lower median latency (#11097 ) * fix: add flag for disabling use_aiohttp_transport * feat: add _create_async_transport * feat: fixes for transport * add httpx-aiohttp * feat: fixes for transport * refactor: fixes for transport * build: fix deps * fixes: test fixes * fix: ensure aiohttp does not auto set content type * test: test fixes * feat: add LiteLLMAiohttpTransport * fix: fixes for responses API handling * test: fixes for responses API handling * test: fixes for responses API handling * feat: fixes for transport * fix: base embedding handler * test: test_async_http_handler_force_ipv4 * test: fix failing deepeval test * fix: add YARL for bedrock urls * fix: issues with transport * fix: comment out linting issues * test fix * test: XAI is unstable * test: fixes for using respx * test: XAI fixes * test: XAI fixes * test: infinity testing fixes * docs(config_settings.md): document param * test: test_openai_image_edit_litellm_sdk * test: remove deprecated test * bump respx==0.22.0 * test: test_xai_message_name_filtering * test: fix anthropic test after bumping httpx * use n 4 for mapped tests (#11109) * fix: use 1 session per event loop * test: test_client_session_helper * fix: linting error * fix: resolving GET requests on httpx 0.28.1 * test fixes proxy unit tests * fix: add ssl verify settings * fix: proxy unit tests * fix: refactor * tests: basic unit tests for aiohttp transports * tests: fixes xai --------- Co-authored-by: Krrish Dholakia <krrishdholakia@gmail.com>	2025-05-23 22:55:35 -07:00
Tornike Gurgenidze	db4183715a	feat: add embeddings to CustomLLM (#10980 ) * feat: add embeddings to CustomLLM * feat: add aembedding to custom llm	2025-05-22 22:55:46 -07:00
Krrish Dholakia	469d395177	test: update groq test - change on their end	2025-05-22 15:02:01 -07:00
slytechnical	98e9db340c	[Feature] Add supports_computer_use to the model list (#10881 ) * Add support for supports_computer_use in model info * Corrected list of supports_computer_use models * Further fix computer use compatible claude models, fix existing test that predated supports_computer_use in the model list * Move computer use test case into existing test_utils file * Moved tests in to test_utils.py	2025-05-20 17:07:43 -07:00

1 2 3 4 5 ...

646 Commits