- Skip short-circuit for providers that have a BaseAnthropicMessagesConfig
(bedrock, vertex_ai, azure_ai, anthropic) — they use the agentic loop
which includes a follow-up LLM synthesis step. Short-circuiting would
return raw search text instead of an LLM-synthesized answer.
- Add fallback to litellm.get_llm_provider() for custom_llm_provider
derivation when litellm_params is overwritten by kwargs.
- Add test for bedrock guard.
Addresses Greptile review comments #3 and #4.
- Replace hand-rolled _extract_search_query with existing
get_last_user_message from common_utils
- Use full UUID (str(uuid.uuid4())) to match codebase convention
- Move uuid import to module level per CLAUDE.md
Addresses Greptile review feedback:
- Save original stream flag before pre-request hooks convert it, so
streaming callers get SSE events instead of a plain dict
- Propagate custom_llm_provider derived inside _execute_pre_request_hooks
when it was not explicitly passed by the caller
- Add tests covering both scenarios
For providers like github_copilot that don't natively support web search,
Claude Code's search sub-conversations were falling through to the adapter
path which strips the web_search tool and has no stream reconversion.
Instead of routing search requests through the full LLM pipeline, detect
web-search-only requests early (all tools are web_search, simple prompt)
and execute the search directly via Tavily/Perplexity, returning a
synthetic Anthropic response. No adapter, no backend LLM call needed.
Fixes#21733
When extended thinking is enabled, the websearch interception agentic loop
builds a follow-up assistant message with only tool_use blocks. Anthropic's
API requires assistant messages to start with thinking/redacted_thinking
blocks when thinking is enabled, causing a 400 Bad Request.
Extract thinking blocks from the model's initial response, thread them
through the agentic loop, and prepend them to the follow-up assistant
message — matching the pattern used by anthropic_messages_pt in factory.py.
Fixes the error: "Expected 'thinking' or 'redacted_thinking', but found
'tool_use'"
The websearch interception handler was passing internal flags like
`_websearch_interception_converted_stream` to the follow-up LLM request.
This caused "Extra inputs are not permitted" errors from providers like
Bedrock that use strict Pydantic validation.
Fix: Filter out all kwargs starting with `_websearch_interception` prefix
before making the follow-up anthropic_messages.acreate() call.