OpenAI's GPT-5 model family supports a verbosity parameter to control
the length and detail of responses. This parameter accepts three values:
'low', 'medium', or 'high'.
Changes:
- Added verbosity parameter to completion() and acompletion() signatures
- Added verbosity to DEFAULT_CHAT_COMPLETION_PARAM_VALUES in constants.py
- Added verbosity to get_optional_params() in utils.py
- Added verbosity to GPT-5 supported params list
- Updated OpenAI docs with verbosity usage examples
- Added comprehensive test for verbosity parameter
Supported models: gpt-5, gpt-5.1, gpt-5-mini, gpt-5-nano, gpt-5-codex, gpt-5-pro
Fixes#16613
The issue was caused by two test files having the same module name
(test_transformation.py) in different directories, which caused pytest
to fail with an import file mismatch error.
Changes:
- Renamed tests/test_litellm/llms/xai/responses/test_transformation.py
to test_xai_responses_transformation.py
- Renamed tests/test_litellm/llms/openai_like/chat/test_transformation.py
to test_openai_like_chat_transformation.py
Both files now have unique, descriptive names that reflect their
specific test purposes and prevent module name collisions.
* add NewModelGroupRequest
* add endpoint for create_model_group
* fix model_access_group_management_router
* add UpdateModelGroupRequest, info and delete
* fix model management tag
* fix validate_models_exist
* fix get_all_access_groups_from_db
* test_create_duplicate_access_group_fails
* test fixes
* fix working create access groups
* fix access group management endpoints
* add is db model checks for model access groups
* Refresh VoyageAI models and prices and context
* Refresh VoyageAI models and prices and context
* Refresh VoyageAI models and prices and context
* Updating the available VoyageAI models in the docs
* Updating the available VoyageAI models in the docs
* Updating the model prices and the docs
* feat(openai): Add support for reasoning_effort='none' in GPT-5.1
OpenAI's GPT-5.1 introduced a new reasoning effort parameter 'none'
which replaces the previous 'minimal' setting for faster, lower-latency
responses. This is now the default setting for GPT-5.1.
Changes:
- Updated REASONING_EFFORT type to include 'none' value
- Added GPT-5.1, GPT-5-mini, and GPT-5-nano to documentation
- Updated docs to reflect 'none' as GPT-5.1's default reasoning effort
- Added test to verify reasoning_effort='none' passes through correctly
Fixes#16633
* feat(responses): Add support for reasoning_effort='none' in Responses API transformation
* Refactor proxy embeddings to use shared processor
- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks
- route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses
- tighten token array decoding logic by using router deployment lookups and the unified error handler
* Fix: Correctly process embedding requests with token arrays
The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected.
This was caused by a combination of three distinct issues:
1. In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup.
2. In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers.
3. In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists.
Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.
* test: align proxy embedding assertions
Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.
* Update proxy exception test
The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.
* testing: unsure of this change
I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.
* fix: remove unrelated change
This change was not related to the embeddings refactor and actually belonged to a different branch.
When using MCP tools with require_approval='never' and Gemini models,
the follow-up call after tool execution was failing with:
'Please ensure that function call turn comes immediately after a user
turn or after a function response turn.'
This was caused by adding an empty assistant message between the user
message and function calls, which violates Gemini's conversation format
requirements.
Changes:
- Only add assistant message to follow-up input if it contains actual content
- Allow function calls to come directly after user messages (as Gemini requires)
- Add explanatory comments about Gemini's format requirements
This fix allows MCP auto-execution to work correctly with Gemini models
while maintaining compatibility with other models.
Fixes: #[issue-number-if-any]