mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-24 19:39:12 +00:00
4be0ec8e35
* feat(parallel_request_limiter_v3.py): allows admin to enforce token rate limit based on just output tokens Useful when trying to rate limit for primarily self hosted model use-cases * test(test_parallel_request_limiter_v3.py): add unit test for token rate limit type * feat(parallel_request_limiter_v3.py): return remaining token limits in header * feat: return rate limit headers in response * feat(parallel_request_limiter_v3.py): working rate limit response headers * feat(parallel_request_limiter_v3.py): fix rate limit tracking for tpm when rpm also set * feat(parallel_request_limiter_v3.py): show headers for key/user/team * feat(parallel_request_limiter_v3.py): decrement max parallel request limiter on failure event * feat(parallel_request_limiter_v3.py): add in-memory cache implementation of parallel request rate limiter allows rate limiter to work even without redis cache setup Work for GA of parallel request limiter v3 * refactor(proxy/hooks/__init__.py): replace with new parallel request handler * test: update testing * fix: fix ruff check * fix: revert ga of multi instance rate limiting - needs more work to pass testing