* attempt to implement the passthrough feature
* Formatting and small change
* Fix formatting
* feat: grayswan guardrail overwrite ModelResponse in passthrough mode
* fix missing exception error catching on certain
endpoints
* fix wrong call site
* fix: patch anthropic endpoint internal error on streaming obj
* fix grayswan testcase
* feat: update the violation response to more natural
* Formatting
* move passthrough exception definition to custom_guardrail.
* Enhancement: show whether the blocked at input or output
* update exception name
* fix a typo in testing unit.
---------
Co-authored-by: Xiaohan Fu <xiaohan@grayswan.ai>
* docs: update Getting Started page with accurate endpoints and fix exception handling
- Update endpoints list to include /responses, /audio, /batches
- Change "Consistent output" to be endpoint-agnostic
- Clarify Response Format title as "OpenAI Chat Completions Format"
- Fix exception handling example: use litellm exceptions instead of deprecated openai.error
- Add model prefix (anthropic/) to example
* docs: reorganize sidebar and improve SDK documentation structure
Sidebar changes:
- Reorder: Python SDK first, then AI Gateway (Proxy)
- Rename "LiteLLM - Getting Started" to "Getting Started"
- Restructure SDK section with Core Functions, Configuration subsections
- Move budget_manager to Guides
- Move sdk_custom_pricing and migration to Extras
- Remove duplicate embedding/async_embedding and embedding/moderation
Content changes:
- Add Response Format section to response_api.md
- Add async aembedding() section to supported_embedding.md
* docs: add deprecation notice for OpenAI Assistants API
OpenAI has deprecated the Assistants API, shutting down on August 26, 2026.
Added warning banner directing users to the Responses API.
* docs: expand Core Functions in SDK sidebar
Add more SDK functions to Core Functions category:
- text_completion()
- image_generation()
- transcription()
- speech()
- Link to "All Supported Endpoints" for complete list
* Rename Sidebar Item
* docs: revert Getting Started label to original
* Rename sidebar label from 'LiteLLM - Getting Started' to 'Getting Started'
* fix: correct type annotations for anthropic streaming handlers
- Fix return type of _handle_accumulated_json_chunk from Optional[GenericStreamingChunk] to Optional[ModelResponseStream]
- Fix return type of _parse_sse_data from Optional[GenericStreamingChunk] to Optional[ModelResponseStream]
- Add type annotation for output_items in background_streaming.py
These changes align type annotations with actual return values from chunk_parser() which returns ModelResponseStream.
* docs: add missing ONYX_API_KEY and ONYX_API_BASE to environment variables reference
- Add ONYX_API_BASE documentation entry
- Add ONYX_API_KEY documentation entry
- Fixes test_env_keys.py test failure
* Add Amazon Nova as a first party provider
* Added new provider folder under llms/ to outline the openai supported params
* Updated supported endpoints on the documnetation
* docs: add Microsoft GraphRAG to projects using LiteLLM
* docs: add arXiv paper link for GraphRAG
* docs: add GraphRAG to sidebar
* Update projects in sidebars.js
Reordered items in the projects list to include 'GraphRAG'.
* fix(responses): Add image generation support for Responses API
Fixes#16227
## Problem
When using Gemini 2.5 Flash Image with /responses endpoint, image generation
outputs were not being returned correctly. The response contained only text
with empty content instead of the generated images.
## Solution
1. Created new `OutputImageGenerationCall` type for image generation outputs
2. Modified `_extract_message_output_items()` to detect images in completion responses
3. Added `_extract_image_generation_output_items()` to transform images from
completion format (data URL) to responses format (pure base64)
4. Added `_extract_base64_from_data_url()` helper to extract base64 from data URLs
5. Updated `ResponsesAPIResponse.output` type to include `OutputImageGenerationCall`
## Changes
- litellm/types/responses/main.py: Added OutputImageGenerationCall type
- litellm/types/llms/openai.py: Updated ResponsesAPIResponse.output type
- litellm/responses/litellm_completion_transformation/transformation.py:
Added image detection and extraction logic
- tests/test_litellm/responses/litellm_completion_transformation/test_image_generation_output.py:
Added comprehensive unit tests (16 tests, all passing)
## Result
/responses endpoint now correctly returns:
```json
{
"output": [{
"type": "image_generation_call",
"id": "..._img_0",
"status": "completed",
"result": "iVBORw0KGgo..." // Pure base64, no data: prefix
}]
}
```
This matches OpenAI Responses API specification where image generation
outputs have type "image_generation_call" with base64 data in "result" field.
* docs(responses): Add image generation documentation and tests
- Add comprehensive image generation documentation to response_api.md
- Include examples for Gemini (no tools param) and OpenAI (with tools param)
- Document response format and base64 handling
- Add supported models table with provider-specific requirements
- Add unit tests for image generation output transformation
- Test base64 extraction from data URLs
- Test image generation output item creation
- Test status mapping and integration scenarios
- Verify proper transformation from completions to responses format
Related to #16227
* fix(responses): Correct status type for image generation output
- Add _map_finish_reason_to_image_generation_status() helper function
- Fix MyPy type error: OutputImageGenerationCall.status only accepts
['in_progress', 'completed', 'incomplete', 'failed'], not the full
ResponsesAPIStatus union which includes 'cancelled' and 'queued'
Fixes MyPy error in transformation.py:838
When Gemini image generation models return `text_tokens=0` with `image_tokens > 0`,
the cost calculator was assuming no token breakdown existed and treating all
completion tokens as text tokens, resulting in ~10x underestimation of costs.
Changes:
- Fix cost calculation logic to respect token breakdown when image/audio/reasoning
tokens are present, even if text_tokens=0
- Add `output_cost_per_image_token` pricing for gemini-3-pro-image-preview models
- Add test case reproducing the issue
- Add documentation explaining image token pricing
Fixes#17410