Files
litellm/tests/test_litellm/proxy/hooks
Krish Dholakia 4be0ec8e35 GA Multi-instance rate limiting v2 Requirements + New - specify token rate limit type - output / input / total (#11646)
* feat(parallel_request_limiter_v3.py): allows admin to enforce token rate limit based on just output tokens

Useful when trying to rate limit for primarily self hosted model use-cases

* test(test_parallel_request_limiter_v3.py): add unit test for token rate limit type

* feat(parallel_request_limiter_v3.py): return remaining token limits in header

* feat: return rate limit headers in response

* feat(parallel_request_limiter_v3.py): working rate limit response headers

* feat(parallel_request_limiter_v3.py): fix rate limit tracking for tpm when rpm also set

* feat(parallel_request_limiter_v3.py): show headers for key/user/team

* feat(parallel_request_limiter_v3.py): decrement max parallel request limiter on failure event

* feat(parallel_request_limiter_v3.py): add in-memory cache implementation of parallel request rate limiter

allows rate limiter to work even without redis cache setup

Work for GA of parallel request limiter v3

* refactor(proxy/hooks/__init__.py): replace with new parallel request handler

* test: update testing

* fix: fix ruff check

* fix: revert ga of multi instance rate limiting - needs more work to pass testing
2025-06-11 22:05:13 -07:00
..