Commit Graph

717 Commits

Author SHA1 Message Date
Ishaan Jaff 8e9352fce7 test fix 2025-09-03 11:06:09 -07:00
Ishaan Jaff c821f1ddf1 [Feature]: Support GPT-OSS models on vertex ai (#14184)
* add VertexAIGPTOSSTransformation

* fix: optional_params

* fix: is_vertex_partner_model

* test_partner_models_httpx

* docs GPT oss docs

* test_vertex_ai_gpt_oss_reasoning_effort

* add vertex ai models
2025-09-02 14:15:26 -07:00
Ishaan Jaff d37be48a80 test: llama-3.3-70b-versatile 2025-09-01 20:14:12 -07:00
Ishaan Jaff 8e72f991cc test_model_alias_map 2025-09-01 17:59:40 -07:00
Ishaan Jaff 7656cb3d6e test fix 2025-09-01 17:04:47 -07:00
Ishaan Jaff 48d3aad68f test_caching_with_models_v2 2025-08-30 13:21:14 -07:00
Ishaan Jaff fd39f22e3e test_completion_openrouter_reasoning_content 2025-08-30 09:27:37 -07:00
Ishaan Jaff 6ce1d82970 [Bug] Fix: Vertex Mistral not working for streaming (#13952)
* fix OpenAI like chat handler

* fix MockResponse

* test_partner_models_httpx_streaming

* test_partner_models_httpx_streaming
2025-08-25 17:39:40 -07:00
Ishaan Jaff 0fccd619ea test_vertex_ai_deepseek 2025-08-23 14:13:03 -07:00
Ishaan Jaff b9132968b2 [Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS (#13905)
* [Performance] Reduce Significant CPU overhead from litellm_logging.py (#13895)

* fix: litellm.configured_cold_storage_logger

* fix Session Management - Non-OpenAI Models docs

* ruff fix

* test fix

* create LoggingWorker

* add GLOBAL_LOGGING_WORKER for async task handling

* fix logging tests

* add conftest

* fix conftest

* test fix location of encode bedrock runtime modelid arn

* fix conftest.py

* tuning LoggingWorker

* conftest.py

* fix conftest batches/

* test_async_chat_azure

* event_loop

* test_bedrock_streaming_passthrough_test2

* fix GLOBAL_LOGGING_WORKER

* logging worker

* add flush for global logging worker

* Revert "fix GLOBAL_LOGGING_WORKER"

This reverts commit d254f508f48935652f054777652938ad71976cce.

* fix conftest clear_queue

* fix conftest clear_queue

* setup_and_teardown for llm translation

* docs AWS_REGION

* test_async_chat_azure

* change test DIR

* run ci/cd again

* use 1 job for litellm_router_unit_testing

* fix space

* fix litellm_router_unit_testing

* test_aaarouter_dynamic_cooldown_message_retry_time

* litellm_router_unit_testing

* conftest.py clearing qu

* fixes litellm_router_unit_testing

* fixes clear_queue

* fix router_unit_tests

* remove conftest

* add back conftest for router

* fix event loop test

* test fix

* fixes for LoggingWorker

* ruff fix
2025-08-23 13:13:23 -07:00
Ishaan Jaff e93e266f84 [Performance] Use O(1) Set lookups for model routing (#13879)
* o(1) lookups

* Revert "o(1) lookups"

This reverts commit 620d14246980813366b4b1f1c0ce396b528dd9df.

* o(1) lookups

* Revert "o(1) lookups"

This reverts commit 676a9f5bcc3c2b9fa31e0a9fdf00389739b3052f.

* o(1) lookups

* register_model fix

* test_aget_valid_models

* lambda ai models fix

* test_utils.py

* test fix vertex ai
2025-08-21 22:56:46 -07:00
Ishaan Jaff 07f6235730 [Performance] Improve LiteLLM Python SDK RPS by +200 RPS (#13839)
* fix _proxy_from_env +100 RPS

* fix: global_braintrust_http_handler

* test_braintrust_logging
2025-08-20 21:46:33 -07:00
Ishaan Jaff 55dcaded72 [Feat] Add VertexAI qwen API Service (#13828)
* add support for vertex AI QWEN API

* streaming QWEN API support

* test_partner_models_httpx

* test_partner_models_httpx_streaming

* add cost tracking for vertex_ai/qwen/qwen3-235b-a22b-instruct-2507-maa

* docs qwen models vertexAI
2025-08-20 15:00:33 -07:00
Ishaan Jaff 12bae8fdda test_partner_models_httpx_streaming 2025-08-20 09:44:47 -07:00
drorbaron 6b78ade918 migrate to use new aim FW API 2025-08-19 12:20:19 +03:00
Krrish Dholakia 6322aef0e3 fix(streaming_handler.py): fix streaming chunk calculation 2025-08-16 14:25:29 -07:00
Krrish Dholakia eb66daeef7 test: update test
we now return correct token usage on clientside
2025-08-16 14:14:35 -07:00
Ishaan Jaff f522f40228 test_passing_tool_result_as_list 2025-08-16 08:08:25 -07:00
Jugal D. Bhatt aea0605eed [LLM Translation] Fix Realtime API endpoint for no intent (#13476)
* fix intent params

* Add responses

* fix unrelated test

* test fix - fireworks API endpoint is down

* test fix fireworks ai is having an active outage

* test_completion_cost_databricks

* dbrx fix test API currently not responding

* Update OpenAI Realtime handler to use the correct endpoint and include all query parameters. Adjusted error messages for missing API base and key. Updated health check URL construction to pass model as a query parameter.

* Enhance OpenAI Realtime handler tests to ensure model parameter inclusion in WebSocket URL. Added new tests to verify correct URL construction with model and additional parameters, preventing 'missing_model' errors. Updated existing tests for consistency.

* Remove debug print statements for API base and key in OpenAIRealtime handler to clean up the code.

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2025-08-14 16:24:14 -07:00
Krrish Dholakia aaf9c38a10 test: skip test - ran out of credits 2025-08-14 15:01:26 -07:00
Krrish Dholakia 0288ed35da test: update tests 2025-08-13 23:33:32 -07:00
Krrish Dholakia b53962dee2 test: update test 2025-08-13 23:09:18 -07:00
Krrish Dholakia 5ae44e3275 fix(router.py): fix cooldown increment logic 2025-08-12 23:33:56 -07:00
Krish Dholakia 6afaf5721a [Fix] Streaming - consistent 'finish_reason' chunk index (#13560)
* feat(model_response_utils.py): new function to check if modelresponsestream is empty

used for checking https://github.com/BerriAI/litellm/issues/13348

* fix(streaming_handler.py): skip chunk if empty

Fixes https://github.com/BerriAI/litellm/issues/13348

* fix(streaming_handler.py): add is_empty logic to async flow
2025-08-12 23:21:57 -07:00
Krish Dholakia 1c8761111f Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362)
* feat(proxy/utils.py): track pre-call hooks in OTEL

some pre call hooks can cause latency in high traffic - make sure this is tracked

* fix(router.py): move redis call on deployment_callback_on_success to pipeline operation

reduces p99 latency by half when redis is enabled

* fix(parallel_request_limiter_v3.py): only run check if any item has rate limits set

Prevents unnecessary latency added by rate limit checks

* test: add unit tests

* Latency Improvements: only track tpm/rpm usage when set on deployment+ LLM Caching - use an in-memory cache to reduce redis calls + OTEL - track time spent on LLM caching (#13472)

* fix(router.py): only track usage for deployments with tpm/rpm set

ensures additional latency avoided for non-tpm/rpm models

* fix(caching_handler.py): log time spent on request get cache to OTEL

enables easy debugging of call latency

* fix(caching_handler.py): use dual cache object for in-memory caching + trace redis call within caching handler

* fix(caching_handler.py): working in-memory cache for redis calls

ensures dual cache works when redis cache setup for llm calls

makes calls quicker by only checking redis when in-memory cache missed for llm api call

* test: remove redundant test

* test: add unit tests
2025-08-09 16:09:51 -07:00
Ishaan Jaff e8c081b8ff test_stream_chunk_builder_litellm_usage_chunks 2025-08-07 15:22:52 -07:00
Ishaan Jaff 621b3dca7b [Bug Fix] Mistral Tool Calling - Grammar error: at 3(11): failed to compile JSON schema (#13389)
* test_claude_tool_use_with_gemini

* add _remove_json_schema_refs

* add _clean_tool_schema_for_mistral

* fixes mistral tool calls

* _remove_json_schema_refs

* fix - vertex, remove hardcoded test
2025-08-07 13:50:22 -07:00
Ishaan Jaff 984f91f4f5 test_completion_gemini_stream 2025-08-07 13:24:00 -07:00
Ishaan Jaff dfada882f1 vtx test fix gemini-2.5-flash-lite 2025-08-07 00:11:10 -07:00
Ishaan Jaff eeed03a78f test fix: gcp deprecated gemini-1.5-flash 2025-08-06 08:43:45 -07:00
zjx20 92c525ddfe feat(JinaAI): support multimodal embedding models (#13181)
* feat(JinaAI): support multimodal embedding models

* add test case

* add test

* fix test
2025-08-05 19:21:56 -07:00
Krish Dholakia 324cfe8bdc fix(streaming_handler.py): include cost in streaming usage object (#13319)
Fixes https://github.com/BerriAI/litellm/issues/12689
2025-08-05 18:38:31 -07:00
Jugal D. Bhatt b6fcda2f8a [LLM Translation] Fix model group on clientside auth with API calls (#13314)
* fix unsupported operand type(s) for +=: 'NoneType' and 'str' on clientside auth creds for responses

* fix the client side auth to use correct metadata

* add more tests

* fix tests
2025-08-05 17:46:47 -07:00
Pascal Bro a17d483c89 Add GCS bucket caching support (#13122) 2025-08-04 16:09:33 -07:00
Jugal D. Bhatt 36229dc69f [LLM Translation] Fix Model Usage not having text tokens (#13234)
* fix + test

* remove test comments

* fix mypy

* fix mypy

* fix tests
2025-08-04 21:06:49 +05:30
Ishaan Jaff 44900e781a testing fixes - vertex ai deprecated claude 3 sonnet models 2025-08-01 21:23:52 -07:00
Ishaan Jaff 9d6098e8cc fix vertex deprecated old model 2025-08-01 16:46:16 -07:00
Krish Dholakia 78997c2e35 Anthropic - working mid-stream fallbacks (#13149)
* fix(router.py): add acompletion_streaming_iterator inside router

allows router to catch errors mid-stream for fallbacks

Work for https://github.com/BerriAI/litellm/issues/6532

* fix(router.py): working mid-stream fallbacks

* fix(router.py): more iterations

* fix(router.py): working mid-stream fallbacks with fallbacks set on router

* fix(router.py): pass prior content back in new request as assistant prefix message

* fix(router.py): add a system prompt to help guide non-prefix supporting models to use the continued text correctly

* fix(common_utils.py): support converting `prefix: true` for non-prefix supporting models

* fix: reduce LOC in function

* test(test_router.py): add unit tests for new function

* test: add basic unit test

* fix(router.py): ensure return type of fallback stream is compatible with CustomStreamWrapper

prevent client code from breaking

* fix: cleanup

* test: update test

* fix: fix linting error
2025-07-31 21:22:49 -07:00
Jugal D. Bhatt 5db4862cbf [MCP Gateway] Litellm mcp client list fail (#13114)
* fix headers

* fix test

* fix ruff

* added try except for catching errors which lead to client failures

* fix mypy

* fix ruff

* fix tests

* fix python error

* fix test

* fix test

* fixed the MCP Call Tool result
2025-07-30 15:23:19 -07:00
Krrish Dholakia ae947e63ce test: update test 2025-07-29 22:07:07 -07:00
Krrish Dholakia 378db1b62d test: remove o1-preview 2025-07-28 17:47:57 -07:00
Krish Dholakia 1737cf4257 VertexAI - camelcase optional params for image generation + Anthropic - streaming, always ensure assistant role set on only first chunk (#12889)
* fix(vertex_ai/image_generation): transform `_` param to camelcase

Fixes https://github.com/BerriAI/litellm/issues/12690

* test(test_vertex_image_generation.py): add unit tests

* fix(streaming_handler.py): assert only 1 assistant chunk in stream

Fixes https://github.com/BerriAI/litellm/issues/12616

* fix(streaming_handler.py): fix check
2025-07-27 10:09:43 -07:00
Ishaan Jaff 2c38dc0de7 test_router_auto_router 2025-07-26 13:33:53 -07:00
sings-to-bees-on-wednesdays eb96fb78bc fix(auth_utils): make header comparison case-insensitive (#12950)
If the user specified in the configuration e.g. "user_header_name:
X-OpenWebUI-User-Email", here we were looking for a dict key
"X-OpenWebUI-User-Email" when the dict actually contained
"x-openwebui-user-email".

Switch to iteration and case insensitive string comparison instead to
fix this.

This fixes customer budget enforcement when the customer ID is passed
in as a header rather than as a "user" value in the body.
2025-07-24 22:06:12 -07:00
Ishaan Jaff b8e404dd95 [Feat] Backend Router - Add Auto-Router powered by semantic-router (#12955)
* add router.json

* test_router_auto_router

* async_pre_routing_hook

* fixes for auto router

* add async_pre_routing_hook

* add LiteLLMRouterEncoder

* update test auto_router_embedding_model

* add auto_router_embedding_model

* add AutoRouter

* fix async_pre_routing_hook

* update async_pre_routing_hook

* fix auto router

* fix router.json

* working router init

* working embedding encoder

* working auto router

* test_router_auto_router

* test auto router

* add semantic-router as optional for litellm

* add extras

* semantic_router==0.1.10

* ruff fix

* use aiohttp==3.10.11

* python-dotenv==1.0.1

* test auto router

* test_router_auto_router

* semantic_router

* test_is_auto_router_deployment

* fix check

* fix docker build step

* add semantic_router

* Revert "add semantic_router"

This reverts commit 537b67288798731a119d811f643b682086377ee9.
2025-07-24 18:32:56 -07:00
Ishaan Jaff 99031bf8b6 ci/cd new release 2025-07-23 13:50:36 -07:00
Ishaan Jaff 461cd0c30a test_completion_cost_deepseek 2025-07-23 13:16:12 -07:00
Ishaan Jaff 79a0841719 test_router_content_policy_fallbacks 2025-07-23 13:04:28 -07:00
Ishaan Jaff 642cfa26b0 remove deprecated 2025-07-22 20:59:34 -07:00
Ishaan Jaff bf300f8ca7 Revert "Litellm dev 07 21 2025 p1 (#12848)"
This reverts commit e4e10aa4ed.
2025-07-22 18:28:36 -07:00