Files
litellm/tests/local_testing
Krish Dholakia c42740a4b9 Simplify experimental multi-instance rate limiter - more accurate (#11424)
* refactor: comment out circuit breaker

causes incorrect rate limiting in high traffic

* fix(base_routing_strategy.py): don't reset value if redis val is lower than current in-memory value

Fixes issue where redis might be trailing in-memory value

* fix(parallel_request_limiter_v2.py): if in-memory higher than redis, don't reset value; add previous slot keys to redis increment to correctly 'get' them

* fix(parallel_request_limiter_v3.py): v3 implementation of parallel request limiter

does not use background redis syncing - increments redis in call

 simplify rate limiting logic, to improve accuracy

* fix: fix ruff errors

* fix(parallel_request_limiter_v3.py): don't decrement limit on post call success - causes double decrements

* fix(parallel_request_limiter_v3.py): working accurate multi-instance logic

ensured just 100 requests allowed on 100 users, 10 ramp up, 100 rpm limit key, 2 instances

* fix(parallel_request_limiter_v3.py): working accurate rate limiting with time window resets

allows rate limiting to work across multiple windows

* test: add unit tests for v3 rate limiter

* fix(parallel_request_limiter_v3.py): return window value into in-memory cache

allows in-memory cache checks to be used correctly

* refactor(parallel_request_limiter_v3.py): refactor rate limiting to work for multiple window/counter key pairs

enables using for user/team/model rate limiting

* feat(parallel_request_limiter_v3.py): working rate limiting, across key/user/team/end-user

* fix(parallel_request_limiter_v3.py): add model specific rate limiting

* fix(parallel_request_limiter_v3.py): ignore if no rate limits set

skip unecessary rate limit checks - if no limits set

* fix(parallel_request_limiter_v3.py): initial commit bringing token rate limits back

* fix(parallel_request_limiter_v3.py): increment by value in list + update assertions to handle tokens + max parallel requests

* test(parallel_request_limiter_v3.py): more testing

* fix(parallel_request_limiter.py): working in-memory cache limiter

* fix(redis_cache.py): ignore linting error - use safe hasattr

* fix(parallel_request_limiter_v3.py): fix linting error

* refactor: remove redundant parallel_Request_limiter_v2.py

old / inaccurate implementation

* test: update tests

* style: cleanup

* test: update test

* docs(config_settings.md): document new env var

* test(test_base_routing_strategy.py): update test
2025-06-07 11:10:55 -07:00
..
2025-05-13 20:21:14 -07:00
2025-04-19 08:09:45 -07:00
2025-05-31 12:42:56 -07:00
2025-05-20 13:08:47 -07:00
2025-06-06 20:41:00 -07:00
2025-04-16 07:57:10 -07:00
2025-03-11 08:27:36 -04:00
2025-01-22 20:19:31 +09:00
2025-03-21 16:21:18 -07:00
2025-02-10 22:13:58 -08:00
2025-04-01 07:12:29 -07:00
2025-01-05 13:43:32 -08:00