mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-18 07:33:58 +00:00
1c8761111f
* feat(proxy/utils.py): track pre-call hooks in OTEL some pre call hooks can cause latency in high traffic - make sure this is tracked * fix(router.py): move redis call on deployment_callback_on_success to pipeline operation reduces p99 latency by half when redis is enabled * fix(parallel_request_limiter_v3.py): only run check if any item has rate limits set Prevents unnecessary latency added by rate limit checks * test: add unit tests * Latency Improvements: only track tpm/rpm usage when set on deployment+ LLM Caching - use an in-memory cache to reduce redis calls + OTEL - track time spent on LLM caching (#13472) * fix(router.py): only track usage for deployments with tpm/rpm set ensures additional latency avoided for non-tpm/rpm models * fix(caching_handler.py): log time spent on request get cache to OTEL enables easy debugging of call latency * fix(caching_handler.py): use dual cache object for in-memory caching + trace redis call within caching handler * fix(caching_handler.py): working in-memory cache for redis calls ensures dual cache works when redis cache setup for llm calls makes calls quicker by only checking redis when in-memory cache missed for llm api call * test: remove redundant test * test: add unit tests