Files
litellm/tests/llm_responses_api_testing
Ishaan Jaff d11832bfad fix(responses): eliminate per-chunk thread spawning in async streaming path (#21709)
* fix(responses): fix O(n²) CPU overhead in reasoning streaming path

stream_chunk_builder was called on every reasoning chunk, rebuilding the
entire response from all collected chunks each time. Replace with
incremental accumulation of reasoning_content parts, only joining at
reasoning end.

* fix(responses): eliminate per-chunk thread spawning in async streaming path

_process_chunk() called run_async_function() on every SSE chunk, which
when invoked from an async context spawns a thread + event loop per call.

Move the hook call out of _process_chunk into the callers: async __anext__
directly awaits it, sync __next__ uses run_async_function.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: reduce responses streaming CPU for text-only streams

* fix(test): replace deprecated claude-3-7-sonnet-latest in responses API test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): replace deprecated claude-3-7-sonnet-latest in tool result fix test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): replace deprecated claude-3-7-sonnet-latest in tool result empty call_id test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-02-20 16:26:23 -08:00
..