Extract the single-server branch of list_tool_rest_api into a dedicated
helper function _list_tools_for_single_server. This reduces the statement
count in list_tool_rest_api from 57 to under 50, resolving the PLR0915
ruff lint error without needing a per-file-ignore.
The behavior is unchanged — all validation, IP-filtering, error handling,
and tool-fetching logic is preserved identically in the extracted helper.
Co-authored-by: yuneng-jiang <yuneng-jiang@users.noreply.github.com>
Lint fixes (check_code_and_doc_quality job):
- Remove unused variable reasoning_effort in gpt_5_transformation.py (F841)
- Remove unused timezone imports in mcp_server rest_endpoints.py and server.py (F401)
- Remove unused ProxyBaseLLMRequestProcessing import in realtime endpoints.py (F401)
- Add BaseRealtimeHTTPConfig to TYPE_CHECKING block in utils.py (F821)
- Add PLR0915 per-file-ignore for mcp_server/rest_endpoints.py in ruff.toml
Test fixes (litellm_mapped_tests_llms job):
- Gemini video cost tests: pass explicit model_info to video_generation_cost()
instead of relying on gemini/veo-3.0-generate-preview being in model_prices JSON
- Anthropic max_tokens tests: mock get_max_tokens() to return expected values
instead of depending on claude-3-5-sonnet-20241022 being in model_prices JSON
- Vertex AI pydantic obj test: update from removed gemini-1.5-pro to gemini-2.5-flash,
update expected request body to use response_json_schema format
- Vertex AI/Bedrock file_content integration tests: update mocks to target
base_llm_http_handler.retrieve_file_content (the new code path via
ProviderConfigManager) instead of the old vertex_ai_files_instance/
bedrock_files_instance paths
Co-authored-by: yuneng-jiang <yuneng-jiang@users.noreply.github.com>
OpenAI rejects any reasoning_effort (even 'none') with tools in
/v1/chat/completions for gpt-5.4. Update the guard to drop reasoning_effort
regardless of value. Add docs explaining the auto-drop behavior.
Remove dead routing condition (is_model_gpt_5_4_plus_model) that
prevented reasoning_effort from being dropped when tools are present.
The Responses API routing was never merged, so the guard was a no-op
that broke the drop logic introduced by 14b52b1318 on main.
Keep both sets of tests: upstream's OAuth2 token injection test and
our case-insensitive tool matching tests. Use upstream's version of
the bedrock output_config test (more comprehensive).
* fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints
The response headers hook had 5 gaps that prevented callbacks from
reliably extracting routing metadata across endpoint types:
1. Hook never fired for /audio/transcriptions (endpoint bypasses
base_process_llm_request)
2. custom_llm_provider not accessible in hook data for any endpoint
3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params
(unlike chat completions)
4. model_info under inconsistent keys (metadata vs litellm_metadata)
5. request_headers always None at all call sites
This adds a litellm_call_info parameter to the hook that normalizes
routing metadata (custom_llm_provider, model_info, api_base, model_id)
regardless of endpoint type. Also stamps custom_llm_provider on
Responses API responses, adds the hook call to the transcription
handler, and passes request_headers at all call sites.
Supersedes PR #21385.
* fix(proxy): address review feedback — safer backwards compat and None guards
- Replace try/except TypeError with inspect.signature() check for
litellm_call_info backwards compatibility. This avoids masking real
TypeErrors inside callback implementations and prevents double
invocation with inconsistent parameters.
- Use (data.get("key") or {}) instead of data.get("key", {}) to guard
against keys that exist with an explicit None value, which would
cause AttributeError on the subsequent .get() call.
* fix(proxy): cache inspect.signature result for callback compat check
Move the inspect.signature() call into a module-level helper with a
dict cache keyed by callback identity. Avoids repeated introspection
per request per callback in the hot path.
* fix(proxy): use class identity for signature cache key
Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of
id(cb) to avoid stale entries from Python address reuse after GC.
All instances of the same callback class share the same method
signature, so class identity is both safer and more cache-efficient.
- Thread api_version through HTTP handlers to Azure realtime endpoints
- Make expires_at optional in RealtimeClientSecretResponse
- Fix test token expiry times to be in the future
- Populate user_id and team_id in minimal_auth for spend tracking
Made-with: Cursor