litellm

tiennm99/litellm

Fork 0

mirror of https://github.com/tiennm99/litellm.git synced 2026-08-02 10:21:52 +00:00

Files

T

History

d11832bfad fix(responses): eliminate per-chunk thread spawning in async streaming path (#21709 )

* fix(responses): fix O(n²) CPU overhead in reasoning streaming path

stream_chunk_builder was called on every reasoning chunk, rebuilding the
entire response from all collected chunks each time. Replace with
incremental accumulation of reasoning_content parts, only joining at
reasoning end.

* fix(responses): eliminate per-chunk thread spawning in async streaming path

_process_chunk() called run_async_function() on every SSE chunk, which
when invoked from an async context spawns a thread + event loop per call.

Move the hook call out of _process_chunk into the callers: async __anext__
directly awaits it, sync __next__ uses run_async_function.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* perf: reduce responses streaming CPU for text-only streams

* fix(test): replace deprecated claude-3-7-sonnet-latest in responses API test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): replace deprecated claude-3-7-sonnet-latest in tool result fix test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): replace deprecated claude-3-7-sonnet-latest in tool result empty call_id test

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-02-20 16:26:23 -08:00

base_responses_api.py

fixing test_basic_openai_responses_api

2026-02-16 20:14:06 -08:00

conftest.py

[Perf] Improvements for Async Success Handler (Logging Callbacks) - Approx +130 RPS (#13905 )

2025-08-23 13:13:23 -07:00

test_anthropic_responses_api.py

fix(responses): eliminate per-chunk thread spawning in async streaming path (#21709 )

2026-02-20 16:26:23 -08:00

test_anthropic_tool_result_empty_call_id.py

fix(responses): eliminate per-chunk thread spawning in async streaming path (#21709 )