Commit Graph

748 Commits

Author SHA1 Message Date
Krish Dholakia 1f9afcb349 Merge pull request #14438 from hakasecurity/change-aim-headers
rename aim headers + tests
2025-09-19 23:32:50 -07:00
Krrish Dholakia 92e841e311 fix: fix test 2025-09-18 23:37:38 -07:00
Krish Dholakia 664c83cfb5 Merge branch 'litellm_contributor_prs_09_18_2025_p2' into litellm_dev_09_17_2025_p2_v2 2025-09-18 19:50:55 -07:00
Sameer Kankute d213a2e066 correct the gaurdcontent name (#14684)
* correct the gaurdcontent name

* correct the gaurdcontent name

* fix model required error in test

* Add correct model
2025-09-18 11:00:19 -07:00
Ishaan Jaffer 1e1d174733 fix: test_completion_with_no_model 2025-09-18 10:13:32 -07:00
Krrish Dholakia e32ce6b053 feat(anthropic/chat/transformation.py): separate 5m vs. 1h cache creation token details for anthropic cost tracking 2025-09-17 15:51:07 -07:00
Krish Dholakia 895c41efa3 Merge pull request #14619 from BerriAI/litellm_dev_09_16_2025_p1
UI - allow team member to view service account keys they create + Anthropic - include cache creation tokens in prompt token total (separate out during cost tracking)
2025-09-17 15:43:04 -07:00
Krrish Dholakia 0e747aaaf1 test: fix test 2025-09-16 19:20:12 -07:00
Krrish Dholakia 1c855385c9 build(model_cost): add cache_creation_input_token_cost_above_1hr pricing 2025-09-16 18:43:57 -07:00
Krrish Dholakia 0341e7fc09 fix: fix test 2025-09-16 18:34:24 -07:00
Krrish Dholakia e488312873 fix(utils.py): log cache_creation_tokens in prompt token details
Closes LIT-907
2025-09-16 18:24:10 -07:00
Arseny Boykov f4318bccd3 [Performance] Use _PROXY_MaxParallelRequestsHandler_v3 by default again (#14450)
* Use _PROXY_MaxParallelRequestsHandler_v3 by default (#14352)

(cherry picked from commit f3fa45cf8fbd5f5cce2f45a7312776d5005fb08e)
(cherry picked from commit 5b680bb4a3)

* Use random api_key for parallel requests test

* Fix off-by-one error in parallel request rate limit

The rate limiter was incorrectly rejecting requests when the limit was met, but not exceeded. The check in `is_cache_list_over_limit` was `int(counter_value) + 1 > current_limit`, which caused the first request to be rejected if the limit was 1.

This commit removes the `+ 1`, changing the logic to `int(counter_value) > current_limit`. The check now correctly allows requests up to the specified parallel limit.

* Test actual parallel requests

* Ensure rate limiting works correctly for multiple users

* Add sequential rate-limit test

* Revert random key usage
2025-09-12 17:33:55 -07:00
Boopesh Shanmugam 8b338a4d8c User Headers X LiteLLM Users Mapping feature (#14485)
* Draft commit.

* user header mapping feature with backward compatibility with user_header_name field.

* user header mapping feature with backward compatibility with user_header_name field optimizations.

* Added unit tests.
2025-09-12 11:49:37 -07:00
drorbaron 2ee8c0c6d7 rename aim headers + tests 2025-09-11 11:19:58 +03:00
Krrish Dholakia a504c7dae3 test: update tests 2025-09-09 21:43:37 -07:00
Krrish Dholakia d05f58721e test: remove end of life model from tests 2025-09-09 21:01:45 -07:00
Krrish Dholakia e443d01925 test: remove redundant test 2025-09-09 20:37:09 -07:00
Krrish Dholakia 0854c35d3e test: remove eol bedrock model from tests 2025-09-09 19:48:35 -07:00
Krish Dholakia 351896cd1d Merge pull request #12414 from dotmobo/feature/fix-timestamp-granularities
The parameter timestamp_granularities is broken for openai-like transcription
2025-09-08 23:13:13 -07:00
Krish Dholakia b9ce3a1587 Merge pull request #12416 from dotmobo/feature/fix-alloy
feat: add a health_check_voice parameter in model_info
2025-09-08 23:12:48 -07:00
Ishaan Jaff c7f9be6803 test_async_log_cache_hit_on_callbacks 2025-09-08 17:15:53 -07:00
Ishaan Jaff 679d0414e2 test fix 2025-09-06 17:08:31 -07:00
Ishaan Jaff d89a2a0797 test 2025-09-06 16:38:43 -07:00
Ishaan Jaff 7054067238 test_cooldown_handlers.py 2025-09-06 16:13:30 -07:00
Ishaan Jaff c709d7505d test fix: test_parallel_streaming_requests 2025-09-06 16:07:30 -07:00
Ishaan Jaff 982800069c [Bug Fix] x-litellm-tags not routing with Responses API (#14289)
* fix: get_deployments_for_tag

* fix get_deployments_for_tag

* test_router_tag_routing.py

* test_get_metadata_variable_name_from_kwargs

* fix mapped tests

* docs fix
2025-09-05 09:40:37 -07:00
Ishaan Jaff 8e9352fce7 test fix 2025-09-03 11:06:09 -07:00
Ishaan Jaff c821f1ddf1 [Feature]: Support GPT-OSS models on vertex ai (#14184)
* add VertexAIGPTOSSTransformation

* fix: optional_params

* fix: is_vertex_partner_model

* test_partner_models_httpx

* docs GPT oss docs

* test_vertex_ai_gpt_oss_reasoning_effort

* add vertex ai models
2025-09-02 14:15:26 -07:00
Ishaan Jaff d37be48a80 test: llama-3.3-70b-versatile 2025-09-01 20:14:12 -07:00
Ishaan Jaff 8e72f991cc test_model_alias_map 2025-09-01 17:59:40 -07:00
Ishaan Jaff 7656cb3d6e test fix 2025-09-01 17:04:47 -07:00
Ishaan Jaff 48d3aad68f test_caching_with_models_v2 2025-08-30 13:21:14 -07:00
Ishaan Jaff fd39f22e3e test_completion_openrouter_reasoning_content 2025-08-30 09:27:37 -07:00
Ishaan Jaff 6ce1d82970 [Bug] Fix: Vertex Mistral not working for streaming (#13952)
* fix OpenAI like chat handler

* fix MockResponse

* test_partner_models_httpx_streaming

* test_partner_models_httpx_streaming
2025-08-25 17:39:40 -07:00
Ishaan Jaff 0fccd619ea test_vertex_ai_deepseek 2025-08-23 14:13:03 -07:00
Ishaan Jaff b9132968b2 [Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS (#13905)
* [Performance] Reduce Significant CPU overhead from litellm_logging.py (#13895)

* fix: litellm.configured_cold_storage_logger

* fix Session Management - Non-OpenAI Models docs

* ruff fix

* test fix

* create LoggingWorker

* add GLOBAL_LOGGING_WORKER for async task handling

* fix logging tests

* add conftest

* fix conftest

* test fix location of encode bedrock runtime modelid arn

* fix conftest.py

* tuning LoggingWorker

* conftest.py

* fix conftest batches/

* test_async_chat_azure

* event_loop

* test_bedrock_streaming_passthrough_test2

* fix GLOBAL_LOGGING_WORKER

* logging worker

* add flush for global logging worker

* Revert "fix GLOBAL_LOGGING_WORKER"

This reverts commit d254f508f48935652f054777652938ad71976cce.

* fix conftest clear_queue

* fix conftest clear_queue

* setup_and_teardown for llm translation

* docs AWS_REGION

* test_async_chat_azure

* change test DIR

* run ci/cd again

* use 1 job for litellm_router_unit_testing

* fix space

* fix litellm_router_unit_testing

* test_aaarouter_dynamic_cooldown_message_retry_time

* litellm_router_unit_testing

* conftest.py clearing qu

* fixes litellm_router_unit_testing

* fixes clear_queue

* fix router_unit_tests

* remove conftest

* add back conftest for router

* fix event loop test

* test fix

* fixes for LoggingWorker

* ruff fix
2025-08-23 13:13:23 -07:00
Ishaan Jaff e93e266f84 [Performance] Use O(1) Set lookups for model routing (#13879)
* o(1) lookups

* Revert "o(1) lookups"

This reverts commit 620d14246980813366b4b1f1c0ce396b528dd9df.

* o(1) lookups

* Revert "o(1) lookups"

This reverts commit 676a9f5bcc3c2b9fa31e0a9fdf00389739b3052f.

* o(1) lookups

* register_model fix

* test_aget_valid_models

* lambda ai models fix

* test_utils.py

* test fix vertex ai
2025-08-21 22:56:46 -07:00
Ishaan Jaff 07f6235730 [Performance] Improve LiteLLM Python SDK RPS by +200 RPS (#13839)
* fix _proxy_from_env +100 RPS

* fix: global_braintrust_http_handler

* test_braintrust_logging
2025-08-20 21:46:33 -07:00
Ishaan Jaff 55dcaded72 [Feat] Add VertexAI qwen API Service (#13828)
* add support for vertex AI QWEN API

* streaming QWEN API support

* test_partner_models_httpx

* test_partner_models_httpx_streaming

* add cost tracking for vertex_ai/qwen/qwen3-235b-a22b-instruct-2507-maa

* docs qwen models vertexAI
2025-08-20 15:00:33 -07:00
Ishaan Jaff 12bae8fdda test_partner_models_httpx_streaming 2025-08-20 09:44:47 -07:00
drorbaron 6b78ade918 migrate to use new aim FW API 2025-08-19 12:20:19 +03:00
Krrish Dholakia 6322aef0e3 fix(streaming_handler.py): fix streaming chunk calculation 2025-08-16 14:25:29 -07:00
Krrish Dholakia eb66daeef7 test: update test
we now return correct token usage on clientside
2025-08-16 14:14:35 -07:00
Ishaan Jaff f522f40228 test_passing_tool_result_as_list 2025-08-16 08:08:25 -07:00
Jugal D. Bhatt aea0605eed [LLM Translation] Fix Realtime API endpoint for no intent (#13476)
* fix intent params

* Add responses

* fix unrelated test

* test fix - fireworks API endpoint is down

* test fix fireworks ai is having an active outage

* test_completion_cost_databricks

* dbrx fix test API currently not responding

* Update OpenAI Realtime handler to use the correct endpoint and include all query parameters. Adjusted error messages for missing API base and key. Updated health check URL construction to pass model as a query parameter.

* Enhance OpenAI Realtime handler tests to ensure model parameter inclusion in WebSocket URL. Added new tests to verify correct URL construction with model and additional parameters, preventing 'missing_model' errors. Updated existing tests for consistency.

* Remove debug print statements for API base and key in OpenAIRealtime handler to clean up the code.

---------

Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
2025-08-14 16:24:14 -07:00
Krrish Dholakia aaf9c38a10 test: skip test - ran out of credits 2025-08-14 15:01:26 -07:00
Krrish Dholakia 0288ed35da test: update tests 2025-08-13 23:33:32 -07:00
Krrish Dholakia b53962dee2 test: update test 2025-08-13 23:09:18 -07:00
Krrish Dholakia 5ae44e3275 fix(router.py): fix cooldown increment logic 2025-08-12 23:33:56 -07:00
Krish Dholakia 6afaf5721a [Fix] Streaming - consistent 'finish_reason' chunk index (#13560)
* feat(model_response_utils.py): new function to check if modelresponsestream is empty

used for checking https://github.com/BerriAI/litellm/issues/13348

* fix(streaming_handler.py): skip chunk if empty

Fixes https://github.com/BerriAI/litellm/issues/13348

* fix(streaming_handler.py): add is_empty logic to async flow
2025-08-12 23:21:57 -07:00