litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-08-02 08:21:53 +00:00

Files

T

Krish DholakiaandGitHub 1c8761111f Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362 )

* feat(proxy/utils.py): track pre-call hooks in OTEL

some pre call hooks can cause latency in high traffic - make sure this is tracked

* fix(router.py): move redis call on deployment_callback_on_success to pipeline operation

reduces p99 latency by half when redis is enabled

* fix(parallel_request_limiter_v3.py): only run check if any item has rate limits set

Prevents unnecessary latency added by rate limit checks

* test: add unit tests

* Latency Improvements: only track tpm/rpm usage when set on deployment+ LLM Caching - use an in-memory cache to reduce redis calls + OTEL - track time spent on LLM caching (#13472)

* fix(router.py): only track usage for deployments with tpm/rpm set

ensures additional latency avoided for non-tpm/rpm models

* fix(caching_handler.py): log time spent on request get cache to OTEL

enables easy debugging of call latency

* fix(caching_handler.py): use dual cache object for in-memory caching + trace redis call within caching handler

* fix(caching_handler.py): working in-memory cache for redis calls

ensures dual cache works when redis cache setup for llm calls

makes calls quicker by only checking redis when in-memory cache missed for llm api call

* test: remove redundant test

* test: add unit tests

2025-08-09 16:09:51 -07:00

.litellm_cache

…

auto_router

[Feat] Backend Router - Add Auto-Router powered by semantic-router (#12955 )

2025-07-24 18:32:56 -07:00

example_config_yaml

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_configs

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_model_response_typing

LiteLLM Minor Fixes & Improvements (11/05/2024) (#6590 )

2024-11-07 04:17:05 +05:30

adroit-crow-413218-bc47f303efc9.json

vertex testing use pathrise-convert-1606954137718

2025-01-05 14:00:17 -08:00

azure_fine_tune.jsonl

…

batch_job_results_furniture.jsonl

…

cache_unit_tests.py

(code refactor) - Add BaseRerankConfig. Use BaseRerankConfig for cohere/rerank and azure_ai/rerank (#7319 )

2024-12-19 17:03:34 -08:00

conftest.py

ci(conftest.py): reset conftest.py for local_testing/ (#6657 )

2024-11-08 19:14:16 +05:30

create_mock_standard_logging_payload.py

[Bug Fix]: Errors in LiteLLM When Using Embeddings Model with Usage-Based Routing (#7390 )

2024-12-23 17:42:24 -08:00

data_map.txt

…

eagle.wav

…

example.jsonl

VertexAI non-jsonl file storage support (#9781 )

2025-04-09 14:01:48 -07:00

gettysburg.wav

…

large_text.py

…

model_cost.json

…

openai_batch_completions_router.jsonl

…

openai_batch_completions.jsonl

…

speech_vertex.mp3

…

stream_chunk_testdata.py

…

test_acompletion_fallbacks.py

(core sdk fix) - fix fallbacks stuck in infinite loop (#7751 )

2025-01-13 19:34:34 -08:00

test_acompletion.py

Complete o3 model support (#8183 )

2025-02-02 22:36:37 -08:00

test_acooldowns_router.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_add_function_to_prompt.py

LiteLLM Minor Fixes & Improvements (11/05/2024) (#6590 )

2024-11-07 04:17:05 +05:30

test_add_update_models.py

Allow team admins to add/update/delete models on UI + show api base and model id on request logs (#9572 )

2025-03-27 12:06:31 -07:00

test_aim_guardrails.py

Litellm dev 07 05 2025 p3 (#12349 )

2025-07-05 18:44:00 -07:00

test_alangfuse.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_amazing_vertex_completion.py

Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362 )

2025-08-09 16:09:51 -07:00

test_anthropic_prompt_caching.py

LiteLLM Minor Fixes & Improvements (01/16/2025) - p2 (#7828 )

2025-02-02 23:17:50 -08:00

test_arize_ai.py

Merge branch 'main' into litellm_arize_dynamic_logging

2025-03-18 22:13:35 -07:00

test_arize_phoenix.py

fix arize config tests

2025-05-13 20:21:14 -07:00

test_assistants.py

test_create_delete_assistants

2025-07-15 21:35:25 -07:00

test_async_fn.py

test_text_completion_stream - hf

2025-07-03 16:00:51 -07:00

test_audio_speech.py

feat(speech/): working gemini tts support via openai's /v1/speech endpoint (#11832 )

2025-06-18 10:36:25 -07:00

test_auth_utils.py

fix(auth_utils): make header comparison case-insensitive (#12950 )

2025-07-24 22:06:12 -07:00

test_azure_content_safety.py

(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 )

2024-10-14 16:34:01 +05:30

test_azure_openai.py

test: fixes

2025-05-31 12:42:56 -07:00

test_azure_perf.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_bad_params.py

test_completion_invalid_param_cohere

2025-04-02 06:49:11 -07:00

test_basic_python_version.py

[MCP Gateway] Litellm mcp client list fail (#13114 )

2025-07-30 15:23:19 -07:00

test_batch_completion_return_exceptions.py

…

test_batch_completions.py

test fix: gcp deprecated gemini-1.5-flash

2025-08-06 08:43:45 -07:00

test_blocked_user_list.py

(docs) add docstrings for all /key, /user, /team, /customer endpoints (#6804 )

2024-11-18 19:44:06 -08:00

test_braintrust.py

Litellm dev 01 07 2025 p3 (#7635 )

2025-01-08 11:46:24 -08:00

test_budget_manager.py

…

test_caching_handler.py

[LLM Translation - Redis] fix: redis caching for embedding response models (#12750 )

2025-07-18 16:31:10 -07:00

test_caching_ssl.py

test: update tests

2025-05-20 13:08:47 -07:00

test_caching.py

[LLM Translation] Fix Model Usage not having text tokens (#13234 )

2025-08-04 21:06:49 +05:30

test_class.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_completion_cost.py

test_completion_cost_deepseek

2025-07-23 13:16:12 -07:00

test_completion_with_retries.py

fix(main.py): fix retries being multiplied when using openai sdk (#7221 )

2024-12-14 11:56:55 -08:00

test_completion.py

test fix: gcp deprecated gemini-1.5-flash

2025-08-06 08:43:45 -07:00

test_config.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_cost_calc.py

test(test_cost_calc.py): fix test to handle llm api errors

2024-12-24 16:49:02 -08:00

test_custom_api_logger.py

…

test_custom_callback_input.py

test_async_embedding_azure_caching - flaky test

2025-06-14 13:55:29 -07:00

test_custom_callback_router.py

Litellm dev 04 30 2025 p1 (#10462 )

2025-04-30 22:11:12 -07:00

test_custom_llm.py

test: update test with new kwargs

2025-06-11 22:19:17 -07:00

test_custom_logger.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_disk_cache_unit_tests.py

LiteLLM Minor Fixes & Improvements (11/12/2024) (#6705 )

2024-11-12 22:50:51 +05:30

test_dual_cache.py

(code refactor) - Add BaseRerankConfig. Use BaseRerankConfig for cohere/rerank and azure_ai/rerank (#7319 )

2024-12-19 17:03:34 -08:00

test_dynamic_rate_limit_handler.py

LiteLLM Minor Fixes & Improvements (10/15/2024) (#6242 )

2024-10-16 07:32:06 -07:00

test_dynamodb_logs.py

…

test_embedding.py

feat(JinaAI): support multimodal embedding models (#13181 )

2025-08-05 19:21:56 -07:00

test_exceptions.py

Revert "Litellm dev 07 21 2025 p1 (#12848 )"

2025-07-22 18:28:36 -07:00

test_file_types.py

…

test_function_call_parsing.py

…

test_function_calling.py

Revert "Litellm dev 07 21 2025 p1 (#12848 )"

2025-07-22 18:28:36 -07:00

test_function_setup.py

…

test_gcs_bucket.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_gcs_cache_unit_tests.py

Add GCS bucket caching support (#13122 )

2025-08-04 16:09:33 -07:00

test_get_llm_provider.py

test_default_api_base

2025-07-04 18:26:54 -07:00

test_get_model_file.py

…

test_get_model_info.py

test whitelisted models

2025-06-28 14:46:16 -07:00

test_get_optional_params_embeddings.py

…

test_get_optional_params_functions_not_supported.py

…

test_google_ai_studio_gemini.py

…

test_guardrails_ai.py

LiteLLM Minor Fixes & Improvements (10/15/2024) (#6242 )

2024-10-16 07:32:06 -07:00

test_health_check.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_helicone_integration.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_http_parsing_utils.py

2025-07-10 18:20:41 -07:00

test_img_resize.py

fix: Support WebP image format and avoid token calculation error (#7182 )

2024-12-12 14:32:39 -08:00

test_lakera_ai_prompt_injection.py

Merge pull request #9222 from BerriAI/litellm_snowflake_pr_mar_13

2025-03-13 21:35:39 -07:00

test_langchain_ChatLiteLLM.py

…

test_langsmith.py

Litellm dev 11 30 2024 (#6974 )

2024-12-02 21:03:33 -08:00

test_least_busy_routing.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_litellm_max_budget.py

…

test_literalai.py

…

test_llm_guard.py

[Refactor] Move LLM Guard, Secret Detection to Enterprise Pip packagea (#10782 )

2025-05-13 09:42:22 -07:00

test_load_test_router_s3.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_loadtest_router.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_logfire.py

…

test_logging.py

LiteLLM Minor Fixes & Improvements (11/05/2024) (#6590 )

2024-11-07 04:17:05 +05:30

test_longer_context_fallback.py

…

test_lowest_cost_routing.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_lowest_latency_routing.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_lunary.py

…

test_max_tpm_rpm_limiter.py

(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 )

2024-10-14 16:34:01 +05:30

test_mem_leak.py

LiteLLM Minor Fixes & Improvements (10/30/2024) (#6519 )

2024-11-02 00:44:32 +05:30

test_mem_usage.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_mock_request.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_model_alias_map.py

test: fix test

2025-04-16 07:57:10 -07:00

test_model_max_token_adjust.py

…

test_multiple_deployments.py

…

test_ollama_local_chat.py

…

test_ollama_local.py

…

test_ollama.py

Ensure consistent 'created' across all chunks + set tool call id for ollama streaming calls (#11528 )

2025-06-07 20:50:07 -07:00

test_openai_moderations_hook.py

(refactor) caching use LLMCachingHandler for async_get_cache and set_cache (#6208 )

2024-10-14 16:34:01 +05:30

test_opik.py

…

test_pass_through_endpoints.py

oops

2025-03-11 08:27:36 -04:00

test_profiling_router.py

…

test_prometheus_service.py

Embedding caching fixes - handle str -> list cache, set usage tokens for cache hits, combine usage tokens on partial cache hits (#10424 )

2025-04-29 21:21:28 -07:00

test_prompt_caching.py

LiteLLM Minor Fixes & Improvements (12/05/2024) (#7037 )

2024-12-05 00:02:31 -08:00

test_prompt_injection_detection.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_promptlayer_integration.py

LiteLLM Minor Fixes & Improvements (11/05/2024) (#6590 )

2024-11-07 04:17:05 +05:30

test_provider_specific_config.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_pydantic_namespaces.py

…

test_pydantic.py

…

test_register_model.py

…

test_router_auto_router.py

test_router_auto_router

2025-07-26 13:33:53 -07:00

test_router_batch_completion.py

Anthropic - working mid-stream fallbacks (#13149 )

2025-07-31 21:22:49 -07:00

test_router_budget_limiter.py

test_provider_budgets_e2e_test_expect_to_fail

2025-07-19 16:00:25 -07:00

test_router_caching.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_router_client_init.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_router_cooldowns.py

[Bug fix] Router - handle cooldown_time = 0 for deployments (#12108 )

2025-06-27 17:50:35 -07:00

test_router_custom_routing.py

…

test_router_debug_logs.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_router_fallback_handlers.py

(Feat) - return x-litellm-attempted-fallbacks in responses from litellm proxy (#8558 )

2025-02-15 14:54:23 -08:00

test_router_fallbacks.py

ci/cd new release

2025-07-23 13:50:36 -07:00

test_router_get_deployments.py

set flaky tests as flaky

2025-06-14 13:51:52 -07:00

test_router_init.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_router_max_parallel_requests.py

fix(lowest_tpm_rpm_routing.py): fix parallel rate limit check (#6577 )

2024-11-05 22:03:44 +05:30

test_router_pattern_matching.py

(code quality) run ruff rule to ban unused imports (#7313 )

2024-12-19 12:33:42 -08:00

test_router_policy_violation.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_router_retries.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_router_tag_routing.py

fix: Do not add default model on tag based-routing when valid tag (#11454 )

2025-06-12 13:18:42 -07:00

test_router_timeout.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_router_utils.py

Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362 )

2025-08-09 16:09:51 -07:00

test_router_with_fallbacks.py

…

test_router.py

test fix: gcp deprecated gemini-1.5-flash

2025-08-06 08:43:45 -07:00

test_rules.py

…

test_sagemaker.py

test: mock sagemaker tests

2025-03-21 16:21:18 -07:00

test_scheduler.py

…

test_secret_detect_hook.py

[Refactor] Move LLM Guard, Secret Detection to Enterprise Pip packagea (#10782 )

2025-05-13 09:42:22 -07:00

test_simple_shuffle.py

…

test_spend_calculate_endpoint.py

…

test_stream_chunk_builder.py

test_stream_chunk_builder_litellm_usage_chunks

2025-08-07 15:22:52 -07:00

test_streaming.py

test_completion_gemini_stream

2025-08-07 13:24:00 -07:00

test_supabase_integration.py

…

test_team_config.py

…

test_text_completion.py

test fix: gcp deprecated gemini-1.5-flash

2025-08-06 08:43:45 -07:00

test_timeout.py

Update fireworks ai pricing (#10425 )

2025-04-29 20:58:05 -07:00

test_together_ai.py

…

test_tpm_rpm_routing_v2.py

test: update tests to new deployment model (#10142 )

2025-04-18 14:22:12 -07:00

test_traceloop.py

test: skip redundant test

2025-02-10 22:13:58 -08:00

test_ui_sso_helper_utils.py

LiteLLM Minor Fixes & Improvements (10/17/2024) (#6293 )

2024-10-17 22:09:11 -07:00

test_unit_test_caching.py

(Bug fix) - don't log messages in model_parameters in StandardLoggingPayload (#8932 )

2025-03-01 13:39:45 -08:00

test_update_spend.py

test_batch_update_spend

2025-04-01 07:12:29 -07:00

test_validate_environment.py

…

test_wandb.py

LiteLLM Minor Fixes & Improvements (11/05/2024) (#6590 )

2024-11-07 04:17:05 +05:30

test_whisper.py

refactor: update model handling in Azure and OpenAI audio transcription classes (#11333 )

2025-06-02 16:25:51 -07:00

user_cost.json

…

vertex_ai.jsonl

…

vertex_batch_completions.jsonl

(feat) add Vertex Batches API support in OpenAI format (#7032 )

2024-12-04 19:40:28 -08:00

vertex_key.json

ci/cd update vertex acct

2025-01-05 13:43:32 -08:00

whitelisted_bedrock_models.txt

Add supports_pdf_input: true to Claude 3.7 bedrock models (#9917 )

2025-05-01 14:56:54 -07:00