Krish Dholakia
1f9afcb349
Merge pull request #14438 from hakasecurity/change-aim-headers
...
rename aim headers + tests
2025-09-19 23:32:50 -07:00
Krrish Dholakia
92e841e311
fix: fix test
2025-09-18 23:37:38 -07:00
Krish Dholakia
664c83cfb5
Merge branch 'litellm_contributor_prs_09_18_2025_p2' into litellm_dev_09_17_2025_p2_v2
2025-09-18 19:50:55 -07:00
Sameer Kankute
d213a2e066
correct the gaurdcontent name ( #14684 )
...
* correct the gaurdcontent name
* correct the gaurdcontent name
* fix model required error in test
* Add correct model
2025-09-18 11:00:19 -07:00
Ishaan Jaffer
1e1d174733
fix: test_completion_with_no_model
2025-09-18 10:13:32 -07:00
Krrish Dholakia
e32ce6b053
feat(anthropic/chat/transformation.py): separate 5m vs. 1h cache creation token details for anthropic cost tracking
2025-09-17 15:51:07 -07:00
Krish Dholakia
895c41efa3
Merge pull request #14619 from BerriAI/litellm_dev_09_16_2025_p1
...
UI - allow team member to view service account keys they create + Anthropic - include cache creation tokens in prompt token total (separate out during cost tracking)
2025-09-17 15:43:04 -07:00
Krrish Dholakia
0e747aaaf1
test: fix test
2025-09-16 19:20:12 -07:00
Krrish Dholakia
1c855385c9
build(model_cost): add cache_creation_input_token_cost_above_1hr pricing
2025-09-16 18:43:57 -07:00
Krrish Dholakia
0341e7fc09
fix: fix test
2025-09-16 18:34:24 -07:00
Krrish Dholakia
e488312873
fix(utils.py): log cache_creation_tokens in prompt token details
...
Closes LIT-907
2025-09-16 18:24:10 -07:00
Arseny Boykov
f4318bccd3
[Performance] Use _PROXY_MaxParallelRequestsHandler_v3 by default again ( #14450 )
...
* Use _PROXY_MaxParallelRequestsHandler_v3 by default (#14352 )
(cherry picked from commit f3fa45cf8fbd5f5cce2f45a7312776d5005fb08e)
(cherry picked from commit 5b680bb4a3 )
* Use random api_key for parallel requests test
* Fix off-by-one error in parallel request rate limit
The rate limiter was incorrectly rejecting requests when the limit was met, but not exceeded. The check in `is_cache_list_over_limit` was `int(counter_value) + 1 > current_limit`, which caused the first request to be rejected if the limit was 1.
This commit removes the `+ 1`, changing the logic to `int(counter_value) > current_limit`. The check now correctly allows requests up to the specified parallel limit.
* Test actual parallel requests
* Ensure rate limiting works correctly for multiple users
* Add sequential rate-limit test
* Revert random key usage
2025-09-12 17:33:55 -07:00
Boopesh Shanmugam
8b338a4d8c
User Headers X LiteLLM Users Mapping feature ( #14485 )
...
* Draft commit.
* user header mapping feature with backward compatibility with user_header_name field.
* user header mapping feature with backward compatibility with user_header_name field optimizations.
* Added unit tests.
2025-09-12 11:49:37 -07:00
drorbaron
2ee8c0c6d7
rename aim headers + tests
2025-09-11 11:19:58 +03:00
Krrish Dholakia
a504c7dae3
test: update tests
2025-09-09 21:43:37 -07:00
Krrish Dholakia
d05f58721e
test: remove end of life model from tests
2025-09-09 21:01:45 -07:00
Krrish Dholakia
e443d01925
test: remove redundant test
2025-09-09 20:37:09 -07:00
Krrish Dholakia
0854c35d3e
test: remove eol bedrock model from tests
2025-09-09 19:48:35 -07:00
Krish Dholakia
351896cd1d
Merge pull request #12414 from dotmobo/feature/fix-timestamp-granularities
...
The parameter timestamp_granularities is broken for openai-like transcription
2025-09-08 23:13:13 -07:00
Krish Dholakia
b9ce3a1587
Merge pull request #12416 from dotmobo/feature/fix-alloy
...
feat: add a health_check_voice parameter in model_info
2025-09-08 23:12:48 -07:00
Ishaan Jaff
c7f9be6803
test_async_log_cache_hit_on_callbacks
2025-09-08 17:15:53 -07:00
Ishaan Jaff
679d0414e2
test fix
2025-09-06 17:08:31 -07:00
Ishaan Jaff
d89a2a0797
test
2025-09-06 16:38:43 -07:00
Ishaan Jaff
7054067238
test_cooldown_handlers.py
2025-09-06 16:13:30 -07:00
Ishaan Jaff
c709d7505d
test fix: test_parallel_streaming_requests
2025-09-06 16:07:30 -07:00
Ishaan Jaff
982800069c
[Bug Fix] x-litellm-tags not routing with Responses API ( #14289 )
...
* fix: get_deployments_for_tag
* fix get_deployments_for_tag
* test_router_tag_routing.py
* test_get_metadata_variable_name_from_kwargs
* fix mapped tests
* docs fix
2025-09-05 09:40:37 -07:00
Ishaan Jaff
8e9352fce7
test fix
2025-09-03 11:06:09 -07:00
Ishaan Jaff
c821f1ddf1
[Feature]: Support GPT-OSS models on vertex ai ( #14184 )
...
* add VertexAIGPTOSSTransformation
* fix: optional_params
* fix: is_vertex_partner_model
* test_partner_models_httpx
* docs GPT oss docs
* test_vertex_ai_gpt_oss_reasoning_effort
* add vertex ai models
2025-09-02 14:15:26 -07:00
Ishaan Jaff
d37be48a80
test: llama-3.3-70b-versatile
2025-09-01 20:14:12 -07:00
Ishaan Jaff
8e72f991cc
test_model_alias_map
2025-09-01 17:59:40 -07:00
Ishaan Jaff
7656cb3d6e
test fix
2025-09-01 17:04:47 -07:00
Ishaan Jaff
48d3aad68f
test_caching_with_models_v2
2025-08-30 13:21:14 -07:00
Ishaan Jaff
fd39f22e3e
test_completion_openrouter_reasoning_content
2025-08-30 09:27:37 -07:00
Ishaan Jaff
6ce1d82970
[Bug] Fix: Vertex Mistral not working for streaming ( #13952 )
...
* fix OpenAI like chat handler
* fix MockResponse
* test_partner_models_httpx_streaming
* test_partner_models_httpx_streaming
2025-08-25 17:39:40 -07:00
Ishaan Jaff
0fccd619ea
test_vertex_ai_deepseek
2025-08-23 14:13:03 -07:00
Ishaan Jaff
b9132968b2
[Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS ( #13905 )
...
* [Performance] Reduce Significant CPU overhead from litellm_logging.py (#13895 )
* fix: litellm.configured_cold_storage_logger
* fix Session Management - Non-OpenAI Models docs
* ruff fix
* test fix
* create LoggingWorker
* add GLOBAL_LOGGING_WORKER for async task handling
* fix logging tests
* add conftest
* fix conftest
* test fix location of encode bedrock runtime modelid arn
* fix conftest.py
* tuning LoggingWorker
* conftest.py
* fix conftest batches/
* test_async_chat_azure
* event_loop
* test_bedrock_streaming_passthrough_test2
* fix GLOBAL_LOGGING_WORKER
* logging worker
* add flush for global logging worker
* Revert "fix GLOBAL_LOGGING_WORKER"
This reverts commit d254f508f48935652f054777652938ad71976cce.
* fix conftest clear_queue
* fix conftest clear_queue
* setup_and_teardown for llm translation
* docs AWS_REGION
* test_async_chat_azure
* change test DIR
* run ci/cd again
* use 1 job for litellm_router_unit_testing
* fix space
* fix litellm_router_unit_testing
* test_aaarouter_dynamic_cooldown_message_retry_time
* litellm_router_unit_testing
* conftest.py clearing qu
* fixes litellm_router_unit_testing
* fixes clear_queue
* fix router_unit_tests
* remove conftest
* add back conftest for router
* fix event loop test
* test fix
* fixes for LoggingWorker
* ruff fix
2025-08-23 13:13:23 -07:00
Ishaan Jaff
e93e266f84
[Performance] Use O(1) Set lookups for model routing ( #13879 )
...
* o(1) lookups
* Revert "o(1) lookups"
This reverts commit 620d14246980813366b4b1f1c0ce396b528dd9df.
* o(1) lookups
* Revert "o(1) lookups"
This reverts commit 676a9f5bcc3c2b9fa31e0a9fdf00389739b3052f.
* o(1) lookups
* register_model fix
* test_aget_valid_models
* lambda ai models fix
* test_utils.py
* test fix vertex ai
2025-08-21 22:56:46 -07:00
Ishaan Jaff
07f6235730
[Performance] Improve LiteLLM Python SDK RPS by +200 RPS ( #13839 )
...
* fix _proxy_from_env +100 RPS
* fix: global_braintrust_http_handler
* test_braintrust_logging
2025-08-20 21:46:33 -07:00
Ishaan Jaff
55dcaded72
[Feat] Add VertexAI qwen API Service ( #13828 )
...
* add support for vertex AI QWEN API
* streaming QWEN API support
* test_partner_models_httpx
* test_partner_models_httpx_streaming
* add cost tracking for vertex_ai/qwen/qwen3-235b-a22b-instruct-2507-maa
* docs qwen models vertexAI
2025-08-20 15:00:33 -07:00
Ishaan Jaff
12bae8fdda
test_partner_models_httpx_streaming
2025-08-20 09:44:47 -07:00
drorbaron
6b78ade918
migrate to use new aim FW API
2025-08-19 12:20:19 +03:00
Krrish Dholakia
6322aef0e3
fix(streaming_handler.py): fix streaming chunk calculation
2025-08-16 14:25:29 -07:00
Krrish Dholakia
eb66daeef7
test: update test
...
we now return correct token usage on clientside
2025-08-16 14:14:35 -07:00
Ishaan Jaff
f522f40228
test_passing_tool_result_as_list
2025-08-16 08:08:25 -07:00
Jugal D. Bhatt
aea0605eed
[LLM Translation] Fix Realtime API endpoint for no intent ( #13476 )
...
* fix intent params
* Add responses
* fix unrelated test
* test fix - fireworks API endpoint is down
* test fix fireworks ai is having an active outage
* test_completion_cost_databricks
* dbrx fix test API currently not responding
* Update OpenAI Realtime handler to use the correct endpoint and include all query parameters. Adjusted error messages for missing API base and key. Updated health check URL construction to pass model as a query parameter.
* Enhance OpenAI Realtime handler tests to ensure model parameter inclusion in WebSocket URL. Added new tests to verify correct URL construction with model and additional parameters, preventing 'missing_model' errors. Updated existing tests for consistency.
* Remove debug print statements for API base and key in OpenAIRealtime handler to clean up the code.
---------
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com >
2025-08-14 16:24:14 -07:00
Krrish Dholakia
aaf9c38a10
test: skip test - ran out of credits
2025-08-14 15:01:26 -07:00
Krrish Dholakia
0288ed35da
test: update tests
2025-08-13 23:33:32 -07:00
Krrish Dholakia
b53962dee2
test: update test
2025-08-13 23:09:18 -07:00
Krrish Dholakia
5ae44e3275
fix(router.py): fix cooldown increment logic
2025-08-12 23:33:56 -07:00
Krish Dholakia
6afaf5721a
[Fix] Streaming - consistent 'finish_reason' chunk index ( #13560 )
...
* feat(model_response_utils.py): new function to check if modelresponsestream is empty
used for checking https://github.com/BerriAI/litellm/issues/13348
* fix(streaming_handler.py): skip chunk if empty
Fixes https://github.com/BerriAI/litellm/issues/13348
* fix(streaming_handler.py): add is_empty logic to async flow
2025-08-12 23:21:57 -07:00