Fixes#16613
The issue was caused by two test files having the same module name
(test_transformation.py) in different directories, which caused pytest
to fail with an import file mismatch error.
Changes:
- Renamed tests/test_litellm/llms/xai/responses/test_transformation.py
to test_xai_responses_transformation.py
- Renamed tests/test_litellm/llms/openai_like/chat/test_transformation.py
to test_openai_like_chat_transformation.py
Both files now have unique, descriptive names that reflect their
specific test purposes and prevent module name collisions.
* add NewModelGroupRequest
* add endpoint for create_model_group
* fix model_access_group_management_router
* add UpdateModelGroupRequest, info and delete
* fix model management tag
* fix validate_models_exist
* fix get_all_access_groups_from_db
* test_create_duplicate_access_group_fails
* test fixes
* fix working create access groups
* fix access group management endpoints
* add is db model checks for model access groups
* Refresh VoyageAI models and prices and context
* Refresh VoyageAI models and prices and context
* Refresh VoyageAI models and prices and context
* Updating the available VoyageAI models in the docs
* Updating the available VoyageAI models in the docs
* Updating the model prices and the docs
* feat(openai): Add support for reasoning_effort='none' in GPT-5.1
OpenAI's GPT-5.1 introduced a new reasoning effort parameter 'none'
which replaces the previous 'minimal' setting for faster, lower-latency
responses. This is now the default setting for GPT-5.1.
Changes:
- Updated REASONING_EFFORT type to include 'none' value
- Added GPT-5.1, GPT-5-mini, and GPT-5-nano to documentation
- Updated docs to reflect 'none' as GPT-5.1's default reasoning effort
- Added test to verify reasoning_effort='none' passes through correctly
Fixes#16633
* feat(responses): Add support for reasoning_effort='none' in Responses API transformation
* Refactor proxy embeddings to use shared processor
- allow ProxyBaseLLMRequestProcessing to accept the aembedding route so embeddings requests reuse the base pipeline hooks
- route embeddings requests through base_process_llm_request, sharing logging, hook execution, retries, and header handling with chat/responses
- tighten token array decoding logic by using router deployment lookups and the unified error handler
* Fix: Correctly process embedding requests with token arrays
The `test_embedding_input_array_of_tokens` test was failing due to a regression that caused embedding requests with token arrays to be processed incorrectly. This prevented the `aembedding` function from being called as expected.
This was caused by a combination of three distinct issues:
1. In `litellm/proxy/common_request_processing.py`, the `function_setup` utility was called with `aembedding` as the `original_function` for embedding routes. This has been corrected to `embedding` to ensure proper request setup.
2. In `litellm/proxy/proxy_server.py`, a `TypeError` occurred because the `get_deployment` method was called with the `model_name` keyword argument instead of the expected `model_id`. This has been corrected. Additionally, the check for token arrays was improved to validate that all elements in the input subarray are integers.
3. In `litellm/proxy/litellm_pre_call_utils.py`, the check for the `enforced_params` enterprise feature was too strict. It blocked valid requests even when the `enforced_params` list was empty. The condition has been adjusted to trigger the check only for non-empty lists.
Finally, the `test_embedding_input_array_of_tokens` assertion was updated to be more robust. The previous `assert_called_once_with` was overly strict, causing failures when unrelated internal parameters were added to the function call. The test now first asserts that `aembedding` is called and then separately verifies the `model` and `input` arguments. This makes the test more resilient to future changes without sacrificing its ability to catch regressions.
* test: align proxy embedding assertions
Update the embedding proxy test to match the new request pipeline: keep the data the proxy builds, expect the extra control kwargs, let the post-call hook return the actual response, and assert the normalized 'embeddings' hook type. This proves the refactor still forwards metadata and returns the mocked payload.
* Update proxy exception test
The proxy now forwards additional kwargs (request_timeout, litellm_call_id, litellm_logging_obj) to llm_router.aembedding. The test needs to accept these to match the real call signature and keep validating the error path instead of the kwargs list.
* testing: unsure of this change
I don't remember why I changed this, will revert and see if any tests fail since the manual test isn't failing without it.
* fix: remove unrelated change
This change was not related to the embeddings refactor and actually belonged to a different branch.
When using MCP tools with require_approval='never' and Gemini models,
the follow-up call after tool execution was failing with:
'Please ensure that function call turn comes immediately after a user
turn or after a function response turn.'
This was caused by adding an empty assistant message between the user
message and function calls, which violates Gemini's conversation format
requirements.
Changes:
- Only add assistant message to follow-up input if it contains actual content
- Allow function calls to come directly after user messages (as Gemini requires)
- Add explanatory comments about Gemini's format requirements
This fix allows MCP auto-execution to work correctly with Gemini models
while maintaining compatibility with other models.
Fixes: #[issue-number-if-any]
The tooltip for OpenAI api_base select fields incorrectly mentioned 'choose Custom to enter your own' but there was no Custom option available in the dropdown. This fix updates the tooltip text to accurately reflect the available options.
Affected providers:
- OpenAI
- OpenAI_Text
* fix: support Anthropic tool_use and tool_result in token counter
* refactor(token_counter): add dynamic field inference for Anthropic content blocks
* test: Add additional tests
* make format
* Fix lint error
* Fix mypy narrow type lint errors