litellm

tiennm99/litellm

Fork 0

mirror of https://github.com/tiennm99/litellm.git synced 2026-08-02 10:21:52 +00:00

Files

T

History

2453936a82 Litellm websocket improvements (#29563 )

* Add support for websocket via codex

* Add model alias and creds support

* fix: skip cost tracking for WS session wrapper call types

The @client decorator on _aresponses_websocket fires async_success_handler
with result=None after the session ends. This triggered cost tracking errors
because standard_logging_object is never built for None results.

Per-turn costs are correctly tracked by individual litellm.aresponses calls
inside the session. The outer session-level logging obj should not attempt
cost tracking.

Fix: skip _aresponses_websocket and _arealtime call types in deployment_callback_on_success,
RouterBudgetLimiting.async_log_success_event, and _PROXY_track_cost_callback.

* fix: address Greptile review comments

Fix JSON injection: use json.dumps instead of f-string interpolation for model name in WS body.

Add 30s timeout for first WS frame to prevent unbounded connection resource tie-up.

Restore per-event model override in streaming_iterator; fall back to connection-level model when event omits it.

Strengthen regression test: inject alias into kwargs via _update_kwargs_with_deployment mock so the test would fail on un-fixed code.

* fix: handle nested response.create format in first-frame model extraction

When ?model= is omitted, the first WS frame can carry the model in either flat
format (first_event["model"]) or nested format (first_event["response"]["model"]).
The flat-only check would silently reject clients using the nested wire format.

Mirrors the same two-format logic in _build_base_call_kwargs.

* fix: don't force connection-level custom_llm_provider on per-event model overrides

If a client sends a different model per response.create turn, litellm needs to
re-resolve the provider from that model string. Forcing the connection-level
custom_llm_provider would silently route the request to the wrong backend.

Only inject custom_llm_provider when the per-event model matches the
connection-level model.

* refactor: extract WS model extraction into testable function

Pull the flat/nested model extraction into _extract_model_from_first_ws_event
so tests import and exercise the real function rather than a copy.

* fix: compare providers not full model strings in _inject_credentials

The model == self.model guard was too strict: same-provider model variants
(e.g., vertex_ai/gemini-2.0 -> vertex_ai/gemini-1.5 on one connection) would
lose custom_llm_provider, breaking routing when a custom api_base is in use.

Compare the provider extracted by get_llm_provider instead, so same-provider
variants still inherit the connection-level provider while cross-provider
overrides let litellm re-resolve.

* style: black formatting

* refactor: extract first-frame model resolution to fix PLR0915 (too many statements)

* Fix responses WebSocket first-frame validation

* fix: classify WS first-frame read errors and clarify cost-skip log

Distinguish client disconnects from server errors when reading the
responses WebSocket first frame, make the cost-tracking skip log message
accurate for session wrappers (which do carry a model), and resolve the
connection-level provider once per session instead of on every
response.create event.

* test: cover WS first-frame read errors and same-provider credential injection

Adds regression tests for the still-uncovered responses WebSocket paths:
the timeout, invalid-JSON and missing-model branches of
_read_ws_model_from_first_frame, plus the provider comparison in
ManagedResponsesWebSocketHandler._same_provider and _inject_credentials
(same-provider model variants keep the connection provider; cross-provider
models re-resolve).

* fix(responses-ws): fall back to explicit custom_llm_provider when connection model is unresolvable

When a WebSocket session is opened with a custom deployment alias that litellm
cannot resolve to a provider, _connection_provider was None, so _same_provider
returned False for every resolvable per-event model and the connection-level
custom_llm_provider was dropped. Use the explicitly-set custom_llm_provider as
the connection provider in that case so same-provider per-event models still
inherit it while genuinely cross-provider models continue to re-resolve.

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>

2026-06-03 11:48:35 -07:00

__init__.py

[Fix] - Responses API - add /openai routes for responses API. (Azure OpenAI SDK Compatibility) (#15988 )

2025-10-27 19:12:13 -07:00

test_endpoints.py

Litellm websocket improvements (#29563 )

2026-06-03 11:48:35 -07:00