mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-18 00:48:01 +00:00
d11832bfad
* fix(responses): fix O(n²) CPU overhead in reasoning streaming path stream_chunk_builder was called on every reasoning chunk, rebuilding the entire response from all collected chunks each time. Replace with incremental accumulation of reasoning_content parts, only joining at reasoning end. * fix(responses): eliminate per-chunk thread spawning in async streaming path _process_chunk() called run_async_function() on every SSE chunk, which when invoked from an async context spawns a thread + event loop per call. Move the hook call out of _process_chunk into the callers: async __anext__ directly awaits it, sync __next__ uses run_async_function. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * perf: reduce responses streaming CPU for text-only streams * fix(test): replace deprecated claude-3-7-sonnet-latest in responses API test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): replace deprecated claude-3-7-sonnet-latest in tool result fix test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(test): replace deprecated claude-3-7-sonnet-latest in tool result empty call_id test Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>