litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-07-04 13:04:47 +00:00

Author	SHA1	Message	Date
Krrish Dholakia	92e841e311	fix: fix test	2025-09-18 23:37:38 -07:00
Krish Dholakia	664c83cfb5	Merge branch 'litellm_contributor_prs_09_18_2025_p2' into litellm_dev_09_17_2025_p2_v2	2025-09-18 19:50:55 -07:00
Sameer Kankute	d213a2e066	correct the gaurdcontent name (#14684 ) * correct the gaurdcontent name * correct the gaurdcontent name * fix model required error in test * Add correct model	2025-09-18 11:00:19 -07:00
Ishaan Jaffer	1e1d174733	fix: test_completion_with_no_model	2025-09-18 10:13:32 -07:00
Krrish Dholakia	e32ce6b053	feat(anthropic/chat/transformation.py): separate 5m vs. 1h cache creation token details for anthropic cost tracking	2025-09-17 15:51:07 -07:00
Krish Dholakia	895c41efa3	Merge pull request #14619 from BerriAI/litellm_dev_09_16_2025_p1 UI - allow team member to view service account keys they create + Anthropic - include cache creation tokens in prompt token total (separate out during cost tracking)	2025-09-17 15:43:04 -07:00
Krrish Dholakia	0e747aaaf1	test: fix test	2025-09-16 19:20:12 -07:00
Krrish Dholakia	1c855385c9	build(model_cost): add cache_creation_input_token_cost_above_1hr pricing	2025-09-16 18:43:57 -07:00
Krrish Dholakia	0341e7fc09	fix: fix test	2025-09-16 18:34:24 -07:00
Krrish Dholakia	e488312873	fix(utils.py): log cache_creation_tokens in prompt token details Closes LIT-907	2025-09-16 18:24:10 -07:00
Arseny Boykov	f4318bccd3	[Performance] Use _PROXY_MaxParallelRequestsHandler_v3 by default again (#14450 ) * Use _PROXY_MaxParallelRequestsHandler_v3 by default (#14352) (cherry picked from commit f3fa45cf8fbd5f5cce2f45a7312776d5005fb08e) (cherry picked from commit `5b680bb4a3`) * Use random api_key for parallel requests test * Fix off-by-one error in parallel request rate limit The rate limiter was incorrectly rejecting requests when the limit was met, but not exceeded. The check in `is_cache_list_over_limit` was `int(counter_value) + 1 > current_limit`, which caused the first request to be rejected if the limit was 1. This commit removes the `+ 1`, changing the logic to `int(counter_value) > current_limit`. The check now correctly allows requests up to the specified parallel limit. * Test actual parallel requests * Ensure rate limiting works correctly for multiple users * Add sequential rate-limit test * Revert random key usage	2025-09-12 17:33:55 -07:00
Boopesh Shanmugam	8b338a4d8c	User Headers X LiteLLM Users Mapping feature (#14485 ) * Draft commit. * user header mapping feature with backward compatibility with user_header_name field. * user header mapping feature with backward compatibility with user_header_name field optimizations. * Added unit tests.	2025-09-12 11:49:37 -07:00
Krrish Dholakia	a504c7dae3	test: update tests	2025-09-09 21:43:37 -07:00
Krrish Dholakia	d05f58721e	test: remove end of life model from tests	2025-09-09 21:01:45 -07:00
Krrish Dholakia	e443d01925	test: remove redundant test	2025-09-09 20:37:09 -07:00
Krrish Dholakia	0854c35d3e	test: remove eol bedrock model from tests	2025-09-09 19:48:35 -07:00
Krish Dholakia	351896cd1d	Merge pull request #12414 from dotmobo/feature/fix-timestamp-granularities The parameter timestamp_granularities is broken for openai-like transcription	2025-09-08 23:13:13 -07:00
Krish Dholakia	b9ce3a1587	Merge pull request #12416 from dotmobo/feature/fix-alloy feat: add a health_check_voice parameter in model_info	2025-09-08 23:12:48 -07:00
Ishaan Jaff	c7f9be6803	test_async_log_cache_hit_on_callbacks	2025-09-08 17:15:53 -07:00
Ishaan Jaff	679d0414e2	test fix	2025-09-06 17:08:31 -07:00
Ishaan Jaff	d89a2a0797	test	2025-09-06 16:38:43 -07:00
Ishaan Jaff	7054067238	test_cooldown_handlers.py	2025-09-06 16:13:30 -07:00
Ishaan Jaff	c709d7505d	test fix: test_parallel_streaming_requests	2025-09-06 16:07:30 -07:00
Ishaan Jaff	982800069c	[Bug Fix] x-litellm-tags not routing with Responses API (#14289 ) * fix: get_deployments_for_tag * fix get_deployments_for_tag * test_router_tag_routing.py * test_get_metadata_variable_name_from_kwargs * fix mapped tests * docs fix	2025-09-05 09:40:37 -07:00
Ishaan Jaff	8e9352fce7	test fix	2025-09-03 11:06:09 -07:00
Ishaan Jaff	c821f1ddf1	[Feature]: Support GPT-OSS models on vertex ai (#14184 ) * add VertexAIGPTOSSTransformation * fix: optional_params * fix: is_vertex_partner_model * test_partner_models_httpx * docs GPT oss docs * test_vertex_ai_gpt_oss_reasoning_effort * add vertex ai models	2025-09-02 14:15:26 -07:00
Ishaan Jaff	d37be48a80	test: llama-3.3-70b-versatile	2025-09-01 20:14:12 -07:00
Ishaan Jaff	8e72f991cc	test_model_alias_map	2025-09-01 17:59:40 -07:00
Ishaan Jaff	7656cb3d6e	test fix	2025-09-01 17:04:47 -07:00
Ishaan Jaff	48d3aad68f	test_caching_with_models_v2	2025-08-30 13:21:14 -07:00
Ishaan Jaff	fd39f22e3e	test_completion_openrouter_reasoning_content	2025-08-30 09:27:37 -07:00
Ishaan Jaff	6ce1d82970	[Bug] Fix: Vertex Mistral not working for streaming (#13952 ) * fix OpenAI like chat handler * fix MockResponse * test_partner_models_httpx_streaming * test_partner_models_httpx_streaming	2025-08-25 17:39:40 -07:00
Ishaan Jaff	0fccd619ea	test_vertex_ai_deepseek	2025-08-23 14:13:03 -07:00
Ishaan Jaff	b9132968b2	[Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS (#13905 ) * [Performance] Reduce Significant CPU overhead from litellm_logging.py (#13895) * fix: litellm.configured_cold_storage_logger * fix Session Management - Non-OpenAI Models docs * ruff fix * test fix * create LoggingWorker * add GLOBAL_LOGGING_WORKER for async task handling * fix logging tests * add conftest * fix conftest * test fix location of encode bedrock runtime modelid arn * fix conftest.py * tuning LoggingWorker * conftest.py * fix conftest batches/ * test_async_chat_azure * event_loop * test_bedrock_streaming_passthrough_test2 * fix GLOBAL_LOGGING_WORKER * logging worker * add flush for global logging worker * Revert "fix GLOBAL_LOGGING_WORKER" This reverts commit d254f508f48935652f054777652938ad71976cce. * fix conftest clear_queue * fix conftest clear_queue * setup_and_teardown for llm translation * docs AWS_REGION * test_async_chat_azure * change test DIR * run ci/cd again * use 1 job for litellm_router_unit_testing * fix space * fix litellm_router_unit_testing * test_aaarouter_dynamic_cooldown_message_retry_time * litellm_router_unit_testing * conftest.py clearing qu * fixes litellm_router_unit_testing * fixes clear_queue * fix router_unit_tests * remove conftest * add back conftest for router * fix event loop test * test fix * fixes for LoggingWorker * ruff fix	2025-08-23 13:13:23 -07:00
Ishaan Jaff	e93e266f84	[Performance] Use O(1) Set lookups for model routing (#13879 ) * o(1) lookups * Revert "o(1) lookups" This reverts commit 620d14246980813366b4b1f1c0ce396b528dd9df. * o(1) lookups * Revert "o(1) lookups" This reverts commit 676a9f5bcc3c2b9fa31e0a9fdf00389739b3052f. * o(1) lookups * register_model fix * test_aget_valid_models * lambda ai models fix * test_utils.py * test fix vertex ai	2025-08-21 22:56:46 -07:00
Ishaan Jaff	07f6235730	[Performance] Improve LiteLLM Python SDK RPS by +200 RPS (#13839 ) * fix _proxy_from_env +100 RPS * fix: global_braintrust_http_handler * test_braintrust_logging	2025-08-20 21:46:33 -07:00
Ishaan Jaff	55dcaded72	[Feat] Add VertexAI qwen API Service (#13828 ) * add support for vertex AI QWEN API * streaming QWEN API support * test_partner_models_httpx * test_partner_models_httpx_streaming * add cost tracking for vertex_ai/qwen/qwen3-235b-a22b-instruct-2507-maa * docs qwen models vertexAI	2025-08-20 15:00:33 -07:00
Ishaan Jaff	12bae8fdda	test_partner_models_httpx_streaming	2025-08-20 09:44:47 -07:00
drorbaron	6b78ade918	migrate to use new aim FW API	2025-08-19 12:20:19 +03:00
Krrish Dholakia	6322aef0e3	fix(streaming_handler.py): fix streaming chunk calculation	2025-08-16 14:25:29 -07:00
Krrish Dholakia	eb66daeef7	test: update test we now return correct token usage on clientside	2025-08-16 14:14:35 -07:00
Ishaan Jaff	f522f40228	test_passing_tool_result_as_list	2025-08-16 08:08:25 -07:00
Jugal D. Bhatt	aea0605eed	[LLM Translation] Fix Realtime API endpoint for no intent (#13476 ) * fix intent params * Add responses * fix unrelated test * test fix - fireworks API endpoint is down * test fix fireworks ai is having an active outage * test_completion_cost_databricks * dbrx fix test API currently not responding * Update OpenAI Realtime handler to use the correct endpoint and include all query parameters. Adjusted error messages for missing API base and key. Updated health check URL construction to pass model as a query parameter. * Enhance OpenAI Realtime handler tests to ensure model parameter inclusion in WebSocket URL. Added new tests to verify correct URL construction with model and additional parameters, preventing 'missing_model' errors. Updated existing tests for consistency. * Remove debug print statements for API base and key in OpenAIRealtime handler to clean up the code. --------- Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>	2025-08-14 16:24:14 -07:00
Krrish Dholakia	aaf9c38a10	test: skip test - ran out of credits	2025-08-14 15:01:26 -07:00
Krrish Dholakia	0288ed35da	test: update tests	2025-08-13 23:33:32 -07:00
Krrish Dholakia	b53962dee2	test: update test	2025-08-13 23:09:18 -07:00
Krrish Dholakia	5ae44e3275	fix(router.py): fix cooldown increment logic	2025-08-12 23:33:56 -07:00
Krish Dholakia	6afaf5721a	[Fix] Streaming - consistent 'finish_reason' chunk index (#13560 ) * feat(model_response_utils.py): new function to check if modelresponsestream is empty used for checking https://github.com/BerriAI/litellm/issues/13348 * fix(streaming_handler.py): skip chunk if empty Fixes https://github.com/BerriAI/litellm/issues/13348 * fix(streaming_handler.py): add is_empty logic to async flow	2025-08-12 23:21:57 -07:00
Krish Dholakia	1c8761111f	Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362 ) * feat(proxy/utils.py): track pre-call hooks in OTEL some pre call hooks can cause latency in high traffic - make sure this is tracked * fix(router.py): move redis call on deployment_callback_on_success to pipeline operation reduces p99 latency by half when redis is enabled * fix(parallel_request_limiter_v3.py): only run check if any item has rate limits set Prevents unnecessary latency added by rate limit checks * test: add unit tests * Latency Improvements: only track tpm/rpm usage when set on deployment+ LLM Caching - use an in-memory cache to reduce redis calls + OTEL - track time spent on LLM caching (#13472) * fix(router.py): only track usage for deployments with tpm/rpm set ensures additional latency avoided for non-tpm/rpm models * fix(caching_handler.py): log time spent on request get cache to OTEL enables easy debugging of call latency * fix(caching_handler.py): use dual cache object for in-memory caching + trace redis call within caching handler * fix(caching_handler.py): working in-memory cache for redis calls ensures dual cache works when redis cache setup for llm calls makes calls quicker by only checking redis when in-memory cache missed for llm api call * test: remove redundant test * test: add unit tests	2025-08-09 16:09:51 -07:00
Ishaan Jaff	e8c081b8ff	test_stream_chunk_builder_litellm_usage_chunks	2025-08-07 15:22:52 -07:00

1 2 3 4 5 ...

746 Commits