* fix(responses): fix O(n²) CPU overhead in reasoning streaming path
stream_chunk_builder was called on every reasoning chunk, rebuilding the
entire response from all collected chunks each time. Replace with
incremental accumulation of reasoning_content parts, only joining at
reasoning end.
* fix(responses): eliminate per-chunk thread spawning in async streaming path
_process_chunk() called run_async_function() on every SSE chunk, which
when invoked from an async context spawns a thread + event loop per call.
Move the hook call out of _process_chunk into the callers: async __anext__
directly awaits it, sync __next__ uses run_async_function.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* perf: reduce responses streaming CPU for text-only streams
* fix(test): replace deprecated claude-3-7-sonnet-latest in responses API test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(test): replace deprecated claude-3-7-sonnet-latest in tool result fix test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(test): replace deprecated claude-3-7-sonnet-latest in tool result empty call_id test
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Replace copy.deepcopy with model_dump + model_validate in streaming
iterator logging to handle Pydantic ValidatorIterator objects that
cannot be pickled when tool_choice uses allowed_tools mode.
Co-authored-by: Krish Dholakia <krrishdholakia@gmail.com>
* Add openai metadata filed in the request
* Add docs related to openai metadata
* Add utils
* test_completion_openai_metadata[True]
* Added support for though signature for gemini 3 in responses api (#16872)
* Added support for though signature for gemini 3
* Update docs with all supported endpoints and cost tracking
* Added config based routing support for batches and files
* fix lint errors
* Litellm anthropic image url support (#16868)
* Add image as url support to anthropic
* fix mypy errors
* fix tests
* Fix: Populate spend_logs_metadata in batch and files endpoints (#16921)
* Add spend-logs-metadata to the metadata
* Add tests for spend logs metadata in batches
* use better names
* Remove support for penalty param for gemini 3 (#16907)
* Remove support for penalty param
* remove halucinated model names
* fix mypy/test errors
* fix tests
* fix too many lines error
* fix too many lines error
* Add config for cicd test case
* Fix final tests
* fix batch tests
* fix batch tests
## Problem
The `extra_body` parameter in `litellm.responses()` and `litellm.aresponses()`
was being accepted but never passed to the HTTP request sent to the LLM provider.
This prevented users from sending custom/experimental parameters to provider APIs.
## Changes
- Added `data.update(extra_body)` in `async_response_api_handler` (line 2138)
- Added `data.update(extra_body)` in `response_api_handler` (line 2012)
- Added tests to `test_openai_responses_api.py` for extra_body functionality
## Testing
- Tests verify extra_body params are passed in both sync and async modes
- Existing Responses API tests continue to pass
- Manually verified with OpenAI API that custom params are sent correctly
## Impact
Users can now pass custom/experimental parameters via extra_body:
```python
litellm.aresponses(
model="gpt-4o",
input="hello",
extra_body={"custom_param": "value"} # Now works!
)
```
This aligns with the OpenAI SDK pattern and matches behavior in other
LiteLLM endpoints (completion, embedding, etc.) that already support extra_body.
This commit fixes two bugs in Responses API streaming tests:
1. **Usage field naming bug**: Tests were using `input_tokens` and
`output_tokens` but the Usage object uses `prompt_tokens` and
`completion_tokens`.
2. **Missing cost in streaming usage**: When `include_cost_in_streaming_usage`
was enabled, the cost was calculated and added to ResponseAPIUsage, but was
lost during the transformation to the Usage object.
Changes:
- Updated test assertions to use correct field names (prompt_tokens, completion_tokens)
- Added cost preservation logic in FakeStreamerResponsesAPIIterator
- Modified _transform_response_api_usage_to_chat_usage() to preserve cost attribute
All streaming tests now pass successfully.