Files
litellm/tests/local_testing
Krish Dholakia 1c8761111f Router - reduce p99 latency w/ redis enabled by 50% + OTEL - track pre_call hook latency (#13362)
* feat(proxy/utils.py): track pre-call hooks in OTEL

some pre call hooks can cause latency in high traffic - make sure this is tracked

* fix(router.py): move redis call on deployment_callback_on_success to pipeline operation

reduces p99 latency by half when redis is enabled

* fix(parallel_request_limiter_v3.py): only run check if any item has rate limits set

Prevents unnecessary latency added by rate limit checks

* test: add unit tests

* Latency Improvements: only track tpm/rpm usage when set on deployment+ LLM Caching - use an in-memory cache to reduce redis calls + OTEL - track time spent on LLM caching (#13472)

* fix(router.py): only track usage for deployments with tpm/rpm set

ensures additional latency avoided for non-tpm/rpm models

* fix(caching_handler.py): log time spent on request get cache to OTEL

enables easy debugging of call latency

* fix(caching_handler.py): use dual cache object for in-memory caching + trace redis call within caching handler

* fix(caching_handler.py): working in-memory cache for redis calls

ensures dual cache works when redis cache setup for llm calls

makes calls quicker by only checking redis when in-memory cache missed for llm api call

* test: remove redundant test

* test: add unit tests
2025-08-09 16:09:51 -07:00
..
2025-05-13 20:21:14 -07:00
2025-05-31 12:42:56 -07:00
2025-05-20 13:08:47 -07:00
2025-07-04 18:26:54 -07:00
2025-06-28 14:46:16 -07:00
2025-04-16 07:57:10 -07:00
2025-03-11 08:27:36 -04:00
2025-07-23 13:50:36 -07:00
2025-03-21 16:21:18 -07:00
2025-02-10 22:13:58 -08:00
2025-04-01 07:12:29 -07:00
2025-01-05 13:43:32 -08:00