litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-29 07:13:23 +00:00

Author	SHA1	Message	Date
Ishaan Jaff	8e9352fce7	test fix	2025-09-03 11:06:09 -07:00
Ishaan Jaff	c821f1ddf1	[Feature]: Support GPT-OSS models on vertex ai (#14184 ) * add VertexAIGPTOSSTransformation * fix: optional_params * fix: is_vertex_partner_model * test_partner_models_httpx * docs GPT oss docs * test_vertex_ai_gpt_oss_reasoning_effort * add vertex ai models	2025-09-02 14:15:26 -07:00
Ishaan Jaff	d37be48a80	test: llama-3.3-70b-versatile	2025-09-01 20:14:12 -07:00
Ishaan Jaff	8e72f991cc	test_model_alias_map	2025-09-01 17:59:40 -07:00
Ishaan Jaff	7656cb3d6e	test fix	2025-09-01 17:04:47 -07:00
Ishaan Jaff	48d3aad68f	test_caching_with_models_v2	2025-08-30 13:21:14 -07:00
Ishaan Jaff	fd39f22e3e	test_completion_openrouter_reasoning_content	2025-08-30 09:27:37 -07:00
Ishaan Jaff	6ce1d82970	[Bug] Fix: Vertex Mistral not working for streaming (#13952 ) * fix OpenAI like chat handler * fix MockResponse * test_partner_models_httpx_streaming * test_partner_models_httpx_streaming	2025-08-25 17:39:40 -07:00
Ishaan Jaff	0fccd619ea	test_vertex_ai_deepseek	2025-08-23 14:13:03 -07:00
Ishaan Jaff	b9132968b2	[Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS (#13905 ) * [Performance] Reduce Significant CPU overhead from litellm_logging.py (#13895) * fix: litellm.configured_cold_storage_logger * fix Session Management - Non-OpenAI Models docs * ruff fix * test fix * create LoggingWorker * add GLOBAL_LOGGING_WORKER for async task handling * fix logging tests * add conftest * fix conftest * test fix location of encode bedrock runtime modelid arn * fix conftest.py * tuning LoggingWorker * conftest.py * fix conftest batches/ * test_async_chat_azure * event_loop * test_bedrock_streaming_passthrough_test2 * fix GLOBAL_LOGGING_WORKER * logging worker * add flush for global logging worker * Revert "fix GLOBAL_LOGGING_WORKER" This reverts commit d254f508f48935652f054777652938ad71976cce. * fix conftest clear_queue * fix conftest clear_queue * setup_and_teardown for llm translation * docs AWS_REGION * test_async_chat_azure * change test DIR * run ci/cd again * use 1 job for litellm_router_unit_testing * fix space * fix litellm_router_unit_testing * test_aaarouter_dynamic_cooldown_message_retry_time * litellm_router_unit_testing * conftest.py clearing qu * fixes litellm_router_unit_testing * fixes clear_queue * fix router_unit_tests * remove conftest * add back conftest for router * fix event loop test * test fix * fixes for LoggingWorker * ruff fix	2025-08-23 13:13:23 -07:00
Ishaan Jaff	e93e266f84	[Performance] Use O(1) Set lookups for model routing (#13879 ) * o(1) lookups * Revert "o(1) lookups" This reverts commit 620d14246980813366b4b1f1c0ce396b528dd9df. * o(1) lookups * Revert "o(1) lookups" This reverts commit 676a9f5bcc3c2b9fa31e0a9fdf00389739b3052f. * o(1) lookups * register_model fix * test_aget_valid_models * lambda ai models fix * test_utils.py * test fix vertex ai	2025-08-21 22:56:46 -07:00
Ishaan Jaff	07f6235730	[Performance] Improve LiteLLM Python SDK RPS by +200 RPS (#13839 ) * fix _proxy_from_env +100 RPS * fix: global_braintrust_http_handler * test_braintrust_logging	2025-08-20 21:46:33 -07:00
Ishaan Jaff	55dcaded72	[Feat] Add VertexAI qwen API Service (#13828 ) * add support for vertex AI QWEN API * streaming QWEN API support * test_partner_models_httpx * test_partner_models_httpx_streaming * add cost tracking for vertex_ai/qwen/qwen3-235b-a22b-instruct-2507-maa * docs qwen models vertexAI	2025-08-20 15:00:33 -07:00
Ishaan Jaff	12bae8fdda	test_partner_models_httpx_streaming	2025-08-20 09:44:47 -07:00
drorbaron	6b78ade918	migrate to use new aim FW API	2025-08-19 12:20:19 +03:00
Krrish Dholakia	6322aef0e3	fix(streaming_handler.py): fix streaming chunk calculation	2025-08-16 14:25:29 -07:00
Krrish Dholakia	eb66daeef7	test: update test we now return correct token usage on clientside	2025-08-16 14:14:35 -07:00
Ishaan Jaff	f522f40228	test_passing_tool_result_as_list	2025-08-16 08:08:25 -07:00
Jugal D. Bhatt	aea0605eed	[LLM Translation] Fix Realtime API endpoint for no intent (#13476 ) * fix intent params * Add responses * fix unrelated test * test fix - fireworks API endpoint is down * test fix fireworks ai is having an active outage * test_completion_cost_databricks * dbrx fix test API currently not responding * Update OpenAI Realtime handler to use the correct endpoint and include all query parameters. Adjusted error messages for missing API base and key. Updated health check URL construction to pass model as a query parameter. * Enhance OpenAI Realtime handler tests to ensure model parameter inclusion in WebSocket URL. Added new tests to verify correct URL construction with model and additional parameters, preventing 'missing_model' errors. Updated existing tests for consistency. * Remove debug print statements for API base and key in OpenAIRealtime handler to clean up the code. --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>	2025-08-14 16:24:14 -07:00
Krrish Dholakia	aaf9c38a10	test: skip test - ran out of credits	2025-08-14 15:01:26 -07:00
Krrish Dholakia	0288ed35da	test: update tests	2025-08-13 23:33:32 -07:00
Krrish Dholakia	b53962dee2	test: update test	2025-08-13 23:09:18 -07:00
Krrish Dholakia	5ae44e3275	fix(router.py): fix cooldown increment logic	2025-08-12 23:33:56 -07:00
Krish Dholakia	6afaf5721a	[Fix] Streaming - consistent 'finish_reason' chunk index (#13560 ) * feat(model_response_utils.py): new function to check if modelresponsestream is empty used for checking https://github.com/BerriAI/litellm/issues/13348 * fix(streaming_handler.py): skip chunk if empty Fixes https://github.com/BerriAI/litellm/issues/13348 * fix(streaming_handler.py): add is_empty logic to async flow	2025-08-12 23:21:57 -07:00
Krish Dholakia	1c8761111f	Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362 ) * feat(proxy/utils.py): track pre-call hooks in OTEL some pre call hooks can cause latency in high traffic - make sure this is tracked * fix(router.py): move redis call on deployment_callback_on_success to pipeline operation reduces p99 latency by half when redis is enabled * fix(parallel_request_limiter_v3.py): only run check if any item has rate limits set Prevents unnecessary latency added by rate limit checks * test: add unit tests * Latency Improvements: only track tpm/rpm usage when set on deployment+ LLM Caching - use an in-memory cache to reduce redis calls + OTEL - track time spent on LLM caching (#13472) * fix(router.py): only track usage for deployments with tpm/rpm set ensures additional latency avoided for non-tpm/rpm models * fix(caching_handler.py): log time spent on request get cache to OTEL enables easy debugging of call latency * fix(caching_handler.py): use dual cache object for in-memory caching + trace redis call within caching handler * fix(caching_handler.py): working in-memory cache for redis calls ensures dual cache works when redis cache setup for llm calls makes calls quicker by only checking redis when in-memory cache missed for llm api call * test: remove redundant test * test: add unit tests	2025-08-09 16:09:51 -07:00
Ishaan Jaff	e8c081b8ff	test_stream_chunk_builder_litellm_usage_chunks	2025-08-07 15:22:52 -07:00
Ishaan Jaff	621b3dca7b	[Bug Fix] Mistral Tool Calling - Grammar error: at 3(11): failed to compile JSON schema (#13389 ) * test_claude_tool_use_with_gemini * add _remove_json_schema_refs * add _clean_tool_schema_for_mistral * fixes mistral tool calls * _remove_json_schema_refs * fix - vertex, remove hardcoded test	2025-08-07 13:50:22 -07:00
Ishaan Jaff	984f91f4f5	test_completion_gemini_stream	2025-08-07 13:24:00 -07:00
Ishaan Jaff	dfada882f1	vtx test fix gemini-2.5-flash-lite	2025-08-07 00:11:10 -07:00
Ishaan Jaff	eeed03a78f	test fix: gcp deprecated gemini-1.5-flash	2025-08-06 08:43:45 -07:00
zjx20	92c525ddfe	feat(JinaAI): support multimodal embedding models (#13181 ) * feat(JinaAI): support multimodal embedding models * add test case * add test * fix test	2025-08-05 19:21:56 -07:00
Krish Dholakia	324cfe8bdc	fix(streaming_handler.py): include cost in streaming usage object (#13319 ) Fixes https://github.com/BerriAI/litellm/issues/12689	2025-08-05 18:38:31 -07:00
Jugal D. Bhatt	b6fcda2f8a	[LLM Translation] Fix model group on clientside auth with API calls (#13314 ) * fix unsupported operand type(s) for +=: 'NoneType' and 'str' on clientside auth creds for responses * fix the client side auth to use correct metadata * add more tests * fix tests	2025-08-05 17:46:47 -07:00
Pascal Bro	a17d483c89	Add GCS bucket caching support (#13122 )	2025-08-04 16:09:33 -07:00
Jugal D. Bhatt	36229dc69f	[LLM Translation] Fix Model Usage not having text tokens (#13234 ) * fix + test * remove test comments * fix mypy * fix mypy * fix tests	2025-08-04 21:06:49 +05:30
Ishaan Jaff	44900e781a	testing fixes - vertex ai deprecated claude 3 sonnet models	2025-08-01 21:23:52 -07:00
Ishaan Jaff	9d6098e8cc	fix vertex deprecated old model	2025-08-01 16:46:16 -07:00
Krish Dholakia	78997c2e35	Anthropic - working mid-stream fallbacks (#13149 ) * fix(router.py): add acompletion_streaming_iterator inside router allows router to catch errors mid-stream for fallbacks Work for https://github.com/BerriAI/litellm/issues/6532 * fix(router.py): working mid-stream fallbacks * fix(router.py): more iterations * fix(router.py): working mid-stream fallbacks with fallbacks set on router * fix(router.py): pass prior content back in new request as assistant prefix message * fix(router.py): add a system prompt to help guide non-prefix supporting models to use the continued text correctly * fix(common_utils.py): support converting `prefix: true` for non-prefix supporting models * fix: reduce LOC in function * test(test_router.py): add unit tests for new function * test: add basic unit test * fix(router.py): ensure return type of fallback stream is compatible with CustomStreamWrapper prevent client code from breaking * fix: cleanup * test: update test * fix: fix linting error	2025-07-31 21:22:49 -07:00
Jugal D. Bhatt	5db4862cbf	[MCP Gateway] Litellm mcp client list fail (#13114 ) * fix headers * fix test * fix ruff * added try except for catching errors which lead to client failures * fix mypy * fix ruff * fix tests * fix python error * fix test * fix test * fixed the MCP Call Tool result	2025-07-30 15:23:19 -07:00
Krrish Dholakia	ae947e63ce	test: update test	2025-07-29 22:07:07 -07:00
Krrish Dholakia	378db1b62d	test: remove o1-preview	2025-07-28 17:47:57 -07:00
Krish Dholakia	1737cf4257	VertexAI - camelcase optional params for image generation + Anthropic - streaming, always ensure assistant role set on only first chunk (#12889 ) * fix(vertex_ai/image_generation): transform `_` param to camelcase Fixes https://github.com/BerriAI/litellm/issues/12690 * test(test_vertex_image_generation.py): add unit tests * fix(streaming_handler.py): assert only 1 assistant chunk in stream Fixes https://github.com/BerriAI/litellm/issues/12616 * fix(streaming_handler.py): fix check	2025-07-27 10:09:43 -07:00
Ishaan Jaff	2c38dc0de7	test_router_auto_router	2025-07-26 13:33:53 -07:00
sings-to-bees-on-wednesdays	eb96fb78bc	fix(auth_utils): make header comparison case-insensitive (#12950 ) If the user specified in the configuration e.g. "user_header_name: X-OpenWebUI-User-Email", here we were looking for a dict key "X-OpenWebUI-User-Email" when the dict actually contained "x-openwebui-user-email". Switch to iteration and case insensitive string comparison instead to fix this. This fixes customer budget enforcement when the customer ID is passed in as a header rather than as a "user" value in the body.	2025-07-24 22:06:12 -07:00
Ishaan Jaff	b8e404dd95	[Feat] Backend Router - Add Auto-Router powered by `semantic-router` (#12955 ) * add router.json * test_router_auto_router * async_pre_routing_hook * fixes for auto router * add async_pre_routing_hook * add LiteLLMRouterEncoder * update test auto_router_embedding_model * add auto_router_embedding_model * add AutoRouter * fix async_pre_routing_hook * update async_pre_routing_hook * fix auto router * fix router.json * working router init * working embedding encoder * working auto router * test_router_auto_router * test auto router * add semantic-router as optional for litellm * add extras * semantic_router==0.1.10 * ruff fix * use aiohttp==3.10.11 * python-dotenv==1.0.1 * test auto router * test_router_auto_router * semantic_router * test_is_auto_router_deployment * fix check * fix docker build step * add semantic_router * Revert "add semantic_router" This reverts commit 537b67288798731a119d811f643b682086377ee9.	2025-07-24 18:32:56 -07:00
Ishaan Jaff	99031bf8b6	ci/cd new release	2025-07-23 13:50:36 -07:00
Ishaan Jaff	461cd0c30a	test_completion_cost_deepseek	2025-07-23 13:16:12 -07:00
Ishaan Jaff	79a0841719	test_router_content_policy_fallbacks	2025-07-23 13:04:28 -07:00
Ishaan Jaff	642cfa26b0	remove deprecated	2025-07-22 20:59:34 -07:00
Ishaan Jaff	bf300f8ca7	Revert "Litellm dev 07 21 2025 p1 (#12848 )" This reverts commit `e4e10aa4ed`.	2025-07-22 18:28:36 -07:00

1 2 3 4 5 ...

717 Commits