mirror of
https://github.com/tiennm99/litellm.git
synced 2026-07-04 19:08:13 +00:00
6486db3646
* fix(middleware): replace BaseHTTPMiddleware with pure ASGI middleware BaseHTTPMiddleware wraps streaming responses with receive_or_disconnect per chunk, blocking the event loop and causing severe throughput degradation under concurrent streaming load (53% of CPU in profiling). Converts PrometheusAuthMiddleware to a pure ASGI middleware using the __call__(scope, receive, send) protocol. * fix(streaming): remove expensive debug logging and optimize usage stripping - Remove print_verbose calls that format chunk/response Pydantic objects, triggering millions of __repr__ calls (8% of CPU in profiling) - Guard remaining verbose_logger.debug with isEnabledFor(DEBUG) and use lazy %s formatting instead of f-strings - Replace usage stripping round-trip (model_dump + delete + reconstruct) with a _usage_stripped flag, deferring exclusion to serialization time * fix(proxy): remove per-chunk debug log and use _usage_stripped flag - Remove verbose_proxy_logger.debug that formatted every streaming chunk - Honor _usage_stripped flag from streaming handler to exclude usage during model_dump_json serialization instead of reconstructing objects * fix(proxy): remove per-chunk debug log in async_data_generator Remove verbose_proxy_logger.debug that formatted every streaming chunk, which triggered expensive Pydantic serialization on the hot path. * fix indentation and add clarifying comment for usage stripping * fix: guard calculate_total_usage against None usage in chunks * fix: store chunk copy to preserve usage for calculate_total_usage