Files
litellm/tests/test_litellm/proxy/hooks
michelligabriele 7c5e2e8389 fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints (#22985)
* fix(proxy): make async_post_call_response_headers_hook consistent across all endpoints

The response headers hook had 5 gaps that prevented callbacks from
reliably extracting routing metadata across endpoint types:

1. Hook never fired for /audio/transcriptions (endpoint bypasses
   base_process_llm_request)
2. custom_llm_provider not accessible in hook data for any endpoint
3. custom_llm_provider not stamped in ResponsesAPIResponse._hidden_params
   (unlike chat completions)
4. model_info under inconsistent keys (metadata vs litellm_metadata)
5. request_headers always None at all call sites

This adds a litellm_call_info parameter to the hook that normalizes
routing metadata (custom_llm_provider, model_info, api_base, model_id)
regardless of endpoint type. Also stamps custom_llm_provider on
Responses API responses, adds the hook call to the transcription
handler, and passes request_headers at all call sites.

Supersedes PR #21385.

* fix(proxy): address review feedback — safer backwards compat and None guards

- Replace try/except TypeError with inspect.signature() check for
  litellm_call_info backwards compatibility. This avoids masking real
  TypeErrors inside callback implementations and prevents double
  invocation with inconsistent parameters.

- Use (data.get("key") or {}) instead of data.get("key", {}) to guard
  against keys that exist with an explicit None value, which would
  cause AttributeError on the subsequent .get() call.

* fix(proxy): cache inspect.signature result for callback compat check

Move the inspect.signature() call into a module-level helper with a
dict cache keyed by callback identity. Avoids repeated introspection
per request per callback in the hot path.

* fix(proxy): use class identity for signature cache key

Key the _CALLBACK_ACCEPTS_CALL_INFO cache by id(type(cb)) instead of
id(cb) to avoid stale entries from Python address reuse after GC.
All instances of the same callback class share the same method
signature, so class identity is both safer and more cache-efficient.
2026-03-12 08:51:00 -07:00
..