mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-25 07:07:41 +00:00
c42740a4b9
* refactor: comment out circuit breaker causes incorrect rate limiting in high traffic * fix(base_routing_strategy.py): don't reset value if redis val is lower than current in-memory value Fixes issue where redis might be trailing in-memory value * fix(parallel_request_limiter_v2.py): if in-memory higher than redis, don't reset value; add previous slot keys to redis increment to correctly 'get' them * fix(parallel_request_limiter_v3.py): v3 implementation of parallel request limiter does not use background redis syncing - increments redis in call simplify rate limiting logic, to improve accuracy * fix: fix ruff errors * fix(parallel_request_limiter_v3.py): don't decrement limit on post call success - causes double decrements * fix(parallel_request_limiter_v3.py): working accurate multi-instance logic ensured just 100 requests allowed on 100 users, 10 ramp up, 100 rpm limit key, 2 instances * fix(parallel_request_limiter_v3.py): working accurate rate limiting with time window resets allows rate limiting to work across multiple windows * test: add unit tests for v3 rate limiter * fix(parallel_request_limiter_v3.py): return window value into in-memory cache allows in-memory cache checks to be used correctly * refactor(parallel_request_limiter_v3.py): refactor rate limiting to work for multiple window/counter key pairs enables using for user/team/model rate limiting * feat(parallel_request_limiter_v3.py): working rate limiting, across key/user/team/end-user * fix(parallel_request_limiter_v3.py): add model specific rate limiting * fix(parallel_request_limiter_v3.py): ignore if no rate limits set skip unecessary rate limit checks - if no limits set * fix(parallel_request_limiter_v3.py): initial commit bringing token rate limits back * fix(parallel_request_limiter_v3.py): increment by value in list + update assertions to handle tokens + max parallel requests * test(parallel_request_limiter_v3.py): more testing * fix(parallel_request_limiter.py): working in-memory cache limiter * fix(redis_cache.py): ignore linting error - use safe hasattr * fix(parallel_request_limiter_v3.py): fix linting error * refactor: remove redundant parallel_Request_limiter_v2.py old / inaccurate implementation * test: update tests * style: cleanup * test: update test * docs(config_settings.md): document new env var * test(test_base_routing_strategy.py): update test