Files
litellm/tests/test_litellm/proxy/middleware
Ishaan Jaff 6486db3646 fix: improve streaming proxy throughput by fixing middleware and logging bottlenecks (#21501)
* fix(middleware): replace BaseHTTPMiddleware with pure ASGI middleware

BaseHTTPMiddleware wraps streaming responses with receive_or_disconnect
per chunk, blocking the event loop and causing severe throughput
degradation under concurrent streaming load (53% of CPU in profiling).

Converts PrometheusAuthMiddleware to a pure ASGI middleware using the
__call__(scope, receive, send) protocol.

* fix(streaming): remove expensive debug logging and optimize usage stripping

- Remove print_verbose calls that format chunk/response Pydantic objects,
  triggering millions of __repr__ calls (8% of CPU in profiling)
- Guard remaining verbose_logger.debug with isEnabledFor(DEBUG) and use
  lazy %s formatting instead of f-strings
- Replace usage stripping round-trip (model_dump + delete + reconstruct)
  with a _usage_stripped flag, deferring exclusion to serialization time

* fix(proxy): remove per-chunk debug log and use _usage_stripped flag

- Remove verbose_proxy_logger.debug that formatted every streaming chunk
- Honor _usage_stripped flag from streaming handler to exclude usage
  during model_dump_json serialization instead of reconstructing objects

* fix(proxy): remove per-chunk debug log in async_data_generator

Remove verbose_proxy_logger.debug that formatted every streaming chunk,
which triggered expensive Pydantic serialization on the hot path.

* fix indentation and add clarifying comment for usage stripping

* fix: guard calculate_total_usage against None usage in chunks

* fix: store chunk copy to preserve usage for calculate_total_usage
2026-02-18 16:16:49 -08:00
..