litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-07-05 11:07:29 +00:00

Files

T

ishaan-berri 39e1831e84 Emit native web_search_tool_result blocks for Anthropic clients (Claude Desktop / Cowork citations) (#27886 )

* feat(custom_logger): add async_post_agentic_loop_response_hook

Lets a CustomLogger shape the response returned by the agentic-loop
follow-up call without bypassing the loop's safety / observability
machinery (depth tracking, fingerprinting, etc.). Default returns the
response unchanged.

Used by websearch_interception to inject Anthropic-native
web_search_tool_result blocks when the originating client requested a
native web_search_* tool.

* feat(llm_http_handler): call post-agentic-loop hook on the originating callback

In _execute_anthropic_agentic_plan, after anthropic_messages.acreate
returns, call the originating callback's
async_post_agentic_loop_response_hook so it can mutate the final
response (e.g. inject native tool_result blocks). Pass the callback
through from _call_agentic_completion_hooks.

Exceptions in the post-hook are caught and logged so a buggy callback
can't kill the request.

* feat(websearch_interception): add is_anthropic_native_web_search_tool

Identifies tools the Anthropic-native clients (Claude Desktop, the
Anthropic SDK, the Anthropic Console) use to request native search:
type starts with "web_search_" (e.g. web_search_20250305). Rejects the
LiteLLM standard tool, the OpenAI-function variant, the bare
"WebSearch" legacy name, and the bare "web_search" Claude Code shape.

This lets us decide per-request whether the client expects
web_search_tool_result content blocks in the response, without
renaming any existing constants or touching native-provider skip
logic.

* feat(websearch_interception): add build_web_search_tool_result_block

Produces the Anthropic-native web_search_tool_result content block
from a structured SearchResponse. Anthropic-native clients use this
block to populate citations / source links — the existing text-blob
flatten path only feeds readable evidence to the model and discards
the structure, so this builder gives us the missing piece.

Shape matches https://docs.anthropic.com/en/api/web-search-tool —
web_search_result items carry url, title, page_age, encrypted_content
(empty string when the search provider doesn't supply one).

* feat(websearch_interception): emit native web_search_tool_result blocks

When the originating client request carried a native Anthropic
web_search_* tool, the final response now also carries
web_search_tool_result content blocks alongside the model's text
answer — so Claude Desktop / Anthropic SDK clients can populate the
citations panel and replay conversation history with structured search
evidence.

Wiring:
- Pre-request hooks (both deployment + Anthropic path) set a flag on
kwargs when they see a native web_search_* tool, so the signal
survives the conversion-to-litellm_web_search step regardless of
which hook fires first.
- _execute_search now returns (text, SearchResponse) so the structured
results aren't lost when the text is flattened for the follow-up
model call.
- _build_anthropic_request_patch returns the parallel list of
SearchResponse objects.
- async_build_agentic_loop_plan pre-builds the web_search_tool_result
blocks (one per tool_use_id) and stashes them on plan.metadata when
the flag is set.
- async_post_agentic_loop_response_hook reads the metadata and
prepends the blocks to response.content.
- _execute_agentic_loop mirrors the injection for the legacy path so
both paths behave identically.

Clients that send the LiteLLM standard tool keep the existing
text-only behavior — no regression.

* test(websearch_interception): cover native web_search_tool_result emission

18 tests across:
- detector branches (native vs litellm-standard, OpenAI-function shape,
Claude Desktop builtin WebSearch, bare web_search, missing type)
- block-builder shape (results, none, empty)
- pre-request hook flag-setting (native sets, standard does not)
- async_build_agentic_loop_plan attaches blocks to plan.metadata when
the flag is present, leaves metadata untouched when absent
- post-hook injection into dict and object responses
- legacy _execute_agentic_loop mirrors the injection so both paths
return the same shape

* test(websearch_short_circuit): keep _execute_search mocks in sync with new tuple return

* test(websearch_thinking_constraint): keep _execute_search mocks in sync with new tuple return

* feat(websearch_interception): emit native blocks from try_short_circuit_search

The agentic-loop post-hook only fires when the model returns a tool_use
block. Cowork / Claude Desktop on Bedrock actually make TWO requests
per user turn: the main /v1/messages with their builtin tool, and a
separate standalone /v1/messages whose only tool is
web_search_20250305. That second request hits try_short_circuit_search
— no agentic loop, no post-hook — and was returning text-only, leaving
the citations panel empty.

When the short-circuit input carries a native web_search_* tool, build
a synthetic server_tool_use + web_search_tool_result pair (using the
structured SearchResponse already returned by _execute_search) so the
client gets the native shape it expects. The legacy text block is
preserved so non-native short-circuit callers (Claude Code,
github_copilot, etc.) see the same payload as before.

Failure path still emits the native block pair (with empty results)
plus the text-error block, so the client gets a well-formed response
rather than a malformed half-shape.

* test(websearch_native_blocks): cover short-circuit native-block emission

Three new cases on top of the existing 18:
- native web_search_20250305 short-circuit → [server_tool_use,
web_search_tool_result, text], ids paired, urls/titles carried.
- litellm_web_search short-circuit → text-only (no regression).
- native short-circuit on search failure → still emits the native
block pair (empty results) plus the text-error block, so the client
never sees a malformed half-shape.

* test(websearch_short_circuit): index assertions by block type, not by position

Native short-circuit responses now have [server_tool_use,
web_search_tool_result, text] when the input carries
web_search_20250305 — find the text block by type rather than relying
on content[0].

* fix(websearch_interception): gate legacy WebSearch name on schema absence

Clients like Cowork / Claude Desktop ship a client-side tool named
"WebSearch" with a full input_schema — they handle it themselves and
expect to make a separate native web_search_20250305 sub-request for
the actual search.

Today is_web_search_tool matches the bare name regardless of other
fields, which hijacks the client's tool server-side. The agentic loop
fires on the main request, the model never gets to emit the
client-side tool_use, and the separate native sub-request (where
citation data flows) is never made. Net: citations panel empty.

Real Anthropic client tools always carry input_schema (the API rejects
them otherwise), so a bare {name: "WebSearch"} with no schema is the
only thing that could be a legacy interception marker. Gate the match
on schema absence: legacy callers (if any) keep working, real
client-side WebSearch tools pass through untouched.

* fix(websearch_interception): drop "WebSearch" from response-detection lists

Post-conversion the model always sees ``litellm_web_search``, so the
"WebSearch" entry in the response-side tool_use detection lists was
dead at best. If a model ever did return ``tool_use(name="WebSearch")``
it would now (incorrectly) hijack the client's own ``WebSearch`` tool
again — same Cowork problem we just fixed on the input side. Drop it.

* test(websearch_native_blocks): cover the WebSearch legacy-name schema gate

Three new cases:
- {name: "WebSearch"} (bare interception marker) → still matched
- {name: "WebSearch", input_schema: {...}} (Cowork client tool) →
passes through untouched
- {name: "WebSearch", description: "..."} (no schema) → still matched
on the assumption it's a legacy marker rather than a malformed real
client tool.

---------

Co-authored-by: Ishaan Jaffer <ishaanjaffer0324@gmail.com>

2026-05-14 12:30:47 -07:00

test_websearch_chat_completion.py

style: run black formatter on files from main merge

2026-04-17 13:02:59 -07:00

test_websearch_interception_handler.py

Prompt Compression - add it to the proxy (#25729 )

2026-04-20 15:08:00 -07:00

test_websearch_interception_thinking.py

style: run black formatter on files from main merge

2026-04-17 13:02:59 -07:00