litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-30 13:05:52 +00:00

Files

T

Krish Dholakia 1c8761111f Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362 )

* feat(proxy/utils.py): track pre-call hooks in OTEL

some pre call hooks can cause latency in high traffic - make sure this is tracked

* fix(router.py): move redis call on deployment_callback_on_success to pipeline operation

reduces p99 latency by half when redis is enabled

* fix(parallel_request_limiter_v3.py): only run check if any item has rate limits set

Prevents unnecessary latency added by rate limit checks

* test: add unit tests

* Latency Improvements: only track tpm/rpm usage when set on deployment+ LLM Caching - use an in-memory cache to reduce redis calls + OTEL - track time spent on LLM caching (#13472)

* fix(router.py): only track usage for deployments with tpm/rpm set

ensures additional latency avoided for non-tpm/rpm models

* fix(caching_handler.py): log time spent on request get cache to OTEL

enables easy debugging of call latency

* fix(caching_handler.py): use dual cache object for in-memory caching + trace redis call within caching handler

* fix(caching_handler.py): working in-memory cache for redis calls

ensures dual cache works when redis cache setup for llm calls

makes calls quicker by only checking redis when in-memory cache missed for llm api call

* test: remove redundant test

* test: add unit tests

2025-08-09 16:09:51 -07:00

test_parallel_request_limiter_v3.py

Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362 )

2025-08-09 16:09:51 -07:00

test_parallel_request_limiter.py

test_pre_call_hook_team_rpm_limits

2025-07-11 15:52:21 -07:00

test_proxy_track_cost_callback.py

…