litellm

tiennm99/litellm

Fork 0

mirror of https://github.com/tiennm99/litellm.git synced 2026-06-18 00:48:01 +00:00

Commit Graph

Author	SHA1	Message	Date
Ishaan Jaffer	e8461b5b97	style: run black formatter on files from main merge	2026-04-17 13:02:59 -07:00
ishaan-berri	a588f76789	Litellm ishaan april15 2 (#25828 ) * [Test] Add Azure async chat completion timeout test. WIP * Capture TTFT for /v1/messages streaming responses The pass-through streaming path for /v1/messages (Anthropic, Bedrock, Vertex AI, Azure AI, Minimax) logged completion_start_time only after the entire stream finished. async_success_handler then fell back to end_time, making TTFT equal to total duration or null in the UI and Prometheus. Record the timestamp of the first chunk in async_sse_wrapper and propagate it to model_call_details before the logging handler runs, so gen_ai.response.time_to_first_token reflects the real first-chunk latency. Fixes #25598 * [Refactor] Implement timeout resolution logic in completion function add fetch ``request_timeout`` from litellm_settings * remove stale test case * remove extra print statement * default request timeout value in constants to 600s to match timeout defaults handled in the proxy * fix request timeout if using default value from constants.py * update code structure, test cases * only override if the global timeout sets timeout to 6000s * update code structure, move hard coded values to const and make the reslve function readable by moving fallback logic to a seperate function * modify default timeout values, replacing hard coded ones with default values defined --------- Co-authored-by: harish876 <harishgokul01@gmail.com> Co-authored-by: Joaquin Hui Gomez <joaquinhuigomez@users.noreply.github.com>	2026-04-15 18:42:23 -07:00

Author

SHA1

Message

Date

Ishaan Jaffer

e8461b5b97

style: run black formatter on files from main merge

2026-04-17 13:02:59 -07:00

ishaan-berri

a588f76789

Litellm ishaan april15 2 (#25828 )

* [Test] Add Azure async chat completion timeout test. WIP

* Capture TTFT for /v1/messages streaming responses

The pass-through streaming path for /v1/messages (Anthropic, Bedrock,
Vertex AI, Azure AI, Minimax) logged completion_start_time only after
the entire stream finished. async_success_handler then fell back to
end_time, making TTFT equal to total duration or null in the UI and
Prometheus.

Record the timestamp of the first chunk in async_sse_wrapper and
propagate it to model_call_details before the logging handler runs,
so gen_ai.response.time_to_first_token reflects the real first-chunk
latency.

Fixes #25598

* [Refactor] Implement timeout resolution logic in completion function

add fetch ``request_timeout`` from litellm_settings

* remove stale test case

* remove extra print statement

* default request timeout value in constants to 600s to match timeout defaults handled in the proxy

* fix request timeout if using default value from constants.py

* update code structure, test cases

* only override if the global timeout sets timeout to 6000s

* update code structure, move hard coded values to const and make the reslve function readable by moving fallback logic to a seperate function

* modify default timeout values, replacing hard coded ones with default values defined

---------

Co-authored-by: harish876 <harishgokul01@gmail.com>
Co-authored-by: Joaquin Hui Gomez <joaquinhuigomez@users.noreply.github.com>

2026-04-15 18:42:23 -07:00

2 Commits