* Add openai metadata filed in the request
* Add docs related to openai metadata
* Add utils
* test_completion_openai_metadata[True]
* Added support for though signature for gemini 3 in responses api (#16872)
* Added support for though signature for gemini 3
* Update docs with all supported endpoints and cost tracking
* Added config based routing support for batches and files
* fix lint errors
* Litellm anthropic image url support (#16868)
* Add image as url support to anthropic
* fix mypy errors
* fix tests
* Fix: Populate spend_logs_metadata in batch and files endpoints (#16921)
* Add spend-logs-metadata to the metadata
* Add tests for spend logs metadata in batches
* use better names
* Remove support for penalty param for gemini 3 (#16907)
* Remove support for penalty param
* remove halucinated model names
* fix mypy/test errors
* fix tests
* fix too many lines error
* fix too many lines error
* Add config for cicd test case
* Fix final tests
* fix batch tests
* fix batch tests
## Problem
The `extra_body` parameter in `litellm.responses()` and `litellm.aresponses()`
was being accepted but never passed to the HTTP request sent to the LLM provider.
This prevented users from sending custom/experimental parameters to provider APIs.
## Changes
- Added `data.update(extra_body)` in `async_response_api_handler` (line 2138)
- Added `data.update(extra_body)` in `response_api_handler` (line 2012)
- Added tests to `test_openai_responses_api.py` for extra_body functionality
## Testing
- Tests verify extra_body params are passed in both sync and async modes
- Existing Responses API tests continue to pass
- Manually verified with OpenAI API that custom params are sent correctly
## Impact
Users can now pass custom/experimental parameters via extra_body:
```python
litellm.aresponses(
model="gpt-4o",
input="hello",
extra_body={"custom_param": "value"} # Now works!
)
```
This aligns with the OpenAI SDK pattern and matches behavior in other
LiteLLM endpoints (completion, embedding, etc.) that already support extra_body.
This commit fixes two bugs in Responses API streaming tests:
1. **Usage field naming bug**: Tests were using `input_tokens` and
`output_tokens` but the Usage object uses `prompt_tokens` and
`completion_tokens`.
2. **Missing cost in streaming usage**: When `include_cost_in_streaming_usage`
was enabled, the cost was calculated and added to ResponseAPIUsage, but was
lost during the transformation to the Usage object.
Changes:
- Updated test assertions to use correct field names (prompt_tokens, completion_tokens)
- Added cost preservation logic in FakeStreamerResponsesAPIIterator
- Modified _transform_response_api_usage_to_chat_usage() to preserve cost attribute
All streaming tests now pass successfully.
* fix: use fastuuid helper across the codebase
First batch of changes, simple drop in replacement.
* second batch of changes
* fixed: script mistake on helper file
* fix: ensure /responses/cancel works for non admins
* test: cancel endpoint
* fix responses API cancel endpoint
* test fix
* TestGoogleAIStudioResponsesAPITest
* fix response api for litellm proxy
* Add test for checking if status is getting removed
* add test in correct file
* remove hardcoded fields
* Make the handling simpler
* fix lint error:
* fix proxy config
* fix(responses api): fix streaming ID consistency and tool format handling (#12640)
* fix(responses): ensure streaming chunk IDs use consistent encoding format
Fixes streaming ID inconsistency where streaming responses used raw provider IDs
while non-streaming responses used properly encoded IDs with provider context.
Changes:
- Updated LiteLLMCompletionStreamingIterator to accept provider context
- Added _encode_chunk_id() method using same logic as non-streaming responses
- Modified chunk transformation to encode all streaming item_ids with resp_ prefix
- Updated handlers to pass custom_llm_provider and litellm_metadata to streaming iterator
Impact:
- Streaming chunk IDs now format: resp_<base64_encoded_provider_context>
- Enables session continuity when using streaming response IDs as previous_response_id
- Allows provider detection and load balancing with streaming responses
- Maintains backward compatibility with existing streaming functionality
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(types): add explicit Optional[str] type annotation for model_id
This resolves MyPy type checking error where model_id could be None
but wasn't explicitly typed as Optional[str].
* fix(types): handle None case for litellm_metadata access
Prevents 'Item None has no attribute get' error by checking for None
before accessing litellm_metadata dictionary.
* test: add comprehensive tests for streaming ID consistency
Adds unit and E2E tests to verify streaming chunk IDs are properly encoded
with consistent format across streaming responses.
## Tests Added
### Unit Test (test_reasoning_content_transformation.py)
- `test_streaming_chunk_id_encoding()`: Validates the `_encode_chunk_id()` method
correctly encodes chunk IDs with `resp_` prefix and provider context
### E2E Tests (test_e2e_openai_responses_api.py)
- `test_streaming_id_consistency_across_chunks()`: Tests that all streaming chunk IDs
are properly encoded across multiple chunks in a real streaming response
- `test_streaming_response_id_as_previous_response_id()`: Tests the core use case -
using streaming response IDs for session continuity with `previous_response_id`
## Key Testing Approach
- Uses **Gemini** (non-OpenAI model) to test the transformation logic rather than
OpenAI passthrough, since the streaming ID consistency issue occurs when LiteLLM
transforms responses rather than just passing through to native OpenAI responses API
- Tests validate that streaming chunk IDs now use same encoding as non-streaming responses
- Verifies session continuity works with streaming responses
Addresses @ishaan-jaff's request for unit tests covering the streaming ID consistency fix.
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* fix(lint): remove unused imports in transformation.py
Removes unused imports to fix CI linting errors:
- GenericResponseOutputItem
- OutputFunctionToolCall
* test: remove E2E tests from openai_endpoints_tests
Remove streaming ID consistency E2E tests as requested by @ishaan-jaff.
Keep only the mock/unit test in test_reasoning_content_transformation.py
* revert: remove streaming chunk ID encoding to original behavior
This reverts the streaming chunk ID encoding changes to understand the original issue better.
Original behavior was:
- Streaming chunks: raw provider IDs
- Streaming final response: raw IDs (PROBLEM!)
- Non-streaming final response: encoded IDs (correct)
The real issue: streaming final response IDs were not encoded, breaking session continuity.
* fix(responses): encode streaming final response IDs to match OpenAI behavior
Fixes streaming ID inconsistency to match OpenAI's Responses API behavior:
- Streaming chunks: raw message IDs (like OpenAI's msg_xxx)
- Final response: encoded IDs (like OpenAI's resp_xxx)
This enables session continuity by ensuring streaming final response IDs
have the same encoded format as non-streaming responses, allowing them
to be used as previous_response_id in follow-up requests.
Changes:
- Add custom_llm_provider and litellm_metadata to LiteLLMCompletionStreamingIterator
- Update handlers to pass provider context to streaming iterator
- Apply _update_responses_api_response_id_with_model_id to final streaming response
- Keep streaming chunks as raw IDs to match OpenAI format
Impact:
- Session continuity works with streaming responses
- Load balancing can detect provider from streaming final response IDs
- Format matches OpenAI's Responses API exactly
🤖 Generated with [Claude Code](https://claude.ai/code)
Co-Authored-By: Claude <noreply@anthropic.com>
* test: update unit test to match correct OpenAI-compatible behavior
Updates the unit test to verify streaming chunk IDs are raw (not encoded)
to match OpenAI's responses API format:
- Streaming chunks: raw message IDs (like msg_xxx)
- Final response: encoded IDs (like resp_xxx)
This reflects the correct behavior implemented in the fix.
---------
Co-authored-by: Claude <noreply@anthropic.com>
* cleanup
* TestBaseResponsesAPIStreamingIterator
---------
Co-authored-by: Javier de la Torre <jatorre@carto.com>
Co-authored-by: Claude <noreply@anthropic.com>
* add _transform_responses_api_function_call_to_chat_completion_message
* test_responses_api_with_tool_calls
* TestFunctionCallTransformation
* fixes for responses API testing google ai studio
* TestGoogleAIStudioResponsesAPITest
* test_responses_api_with_tool_calls
* test_responses_api_with_tool_calls
* test_basic_openai_responses_streaming_delete_endpoint