mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-18 05:28:02 +00:00
0300333753
* feat(otel): expose http.response.status_code on failure spans
Set the OTel-standard http.response.status_code (integer) on failure
spans alongside the existing OpenInference error.code (kept for
back-compat). error.type is already emitted via ERROR_TYPE.
Crucially, also record structured error attributes on the proxy SERVER
span ('Received Proxy Server Request') from async_post_call_failure_hook
- the only place the SERVER span is in hand. _handle_failure records on
the litellm_request child span (the parent span is not propagated into
its kwargs), so prior to this change the SERVER span that dashboards
query carried only span status, never error.code/error.type. Reuses
_record_exception_on_span + StandardLoggingPayloadSetup.get_error_information
so values match the child span.
Tests: recorder unit coverage + a hook-driven test asserting the SERVER
span is stamped (the gap recorder-only tests missed). Full
test_opentelemetry.py suite: 197 passed.
* feat(otel): set http.route + url.path on the proxy SERVER span
Add the OTel-standard http.route (low-cardinality route template, e.g.
/v1/threads/{thread_id}/runs) and url.path (literal path) to the SERVER
span ('Received Proxy Server Request') so dashboards can group traffic
by endpoint instead of seeing every path param as a unique value.
Same architectural gap as the status-code commit: the success/failure
logging handlers write the litellm_request CHILD span, and
_handle_success explicitly refuses to copy to the SERVER span. Verified
with a console-exporter run that the SERVER span was bare on success.
Unlike error info, route/path are known at request time, so set them
directly on the freshly-created SERVER span in user_api_key_auth (one
edit point, works for success and failure, no hook-ordering risk):
- http.route from the matched FastAPI route (scope['route'].path),
empirically confirmed populated at auth-dependency time.
- url.path from the existing literal-path variable.
New get_request_route_template helper + set_proxy_request_route_attributes
(no-op on None span, so the Langfuse override stays safe).
Tests: route-attribute setter + route-template helper edges. Full
test_opentelemetry.py and test_auth_utils.py green.
* feat(otel): set litellm.preprocessing.duration_ms on the proxy SERVER span
Expose the total time LiteLLM spends before the upstream provider
request begins (auth + parsing + pre-call hooks) as a single number on
the SERVER span ('Received Proxy Server Request'). Window:
proxy-receive -> FIRST provider handoff.
Retry semantics: first attempt only (pure preprocessing, excludes
retry loops + backoff). api_call_start_time is overwritten on every
attempt, so a set-once first_api_call_start_time pins the first handoff.
Same architectural gap as the prior two commits: the success/failure
logging handlers write the litellm_request CHILD span, not the SERVER
span. Set it instead from the post-call hooks on
user_api_key_dict.parent_otel_span.
Failure-path subtlety: request_data.pop('litellm_logging_obj') runs
before the failure-hook loop, so the failure hook can't read the
logging object. litellm_received_at is propagated via the existing
request->metadata channel, and first_api_call_start_time is mirrored
onto litellm_params.metadata, so both anchors survive into request_data
and the OTel helper reads them uniformly for success and failure.
Edits: user_api_key_auth (stash receive instant), litellm_pre_call_utils
(propagate it), litellm_logging (set-once first handoff + metadata
mirror), opentelemetry (constant + set_preprocessing_duration_attribute,
called from both post-call hooks).
Tests: duration helper (both container shapes, missing/negative/None
edges) + set-once invariant (retry doesn't overwrite, metadata mirror).
test_opentelemetry.py + test_auth_utils.py + test_litellm_logging.py:
447 passed. Verified live: SERVER span carries the attribute on success
and failure, coexisting with the status-code and route attributes.
* fix(otel): MyPy type-narrowing for status-code + preprocessing-duration
No behavior change. MyPy (CI lint) flagged:
- error_information["error_code"] is str|None: narrow via a None-checked
local before int().
- _to_timestamp returns Optional[float]: resolve both anchors and return
early if either is None instead of subtracting possibly-None floats.
* fix(otel): stop polluting user request metadata with first_api_call_start_time
The PR3 set-once preprocessing anchor was mirrored into
litellm_params["metadata"] from core litellm_logging.py. That dict is
the caller's request metadata, mutated in place and shared across every
call path including pure SDK (litellm.acreate_batch). It got echoed into
LiteLLMBatch(metadata=...), which the OpenAI batch schema types as
Dict[str, str] -> pydantic ValidationError on a datetime value.
- litellm_logging.py: set first_api_call_start_time only on
model_call_details (success path reads it there directly).
- proxy/utils.py: post_call_failure_hook lifts it off the logging object
into request_data (internal top-level key, same convention as the
other proxy-internal request_data keys) right before the existing
litellm_logging_obj pop. Never touches user metadata.
- opentelemetry.py: read the anchor from the container top level
(model_call_details on success, request_data on failure).
- Tests updated; add TestPostCallFailureHookLiftsFirstApiCallStartTime.
Fixes the batches_testing regression introduced on this branch.
* chore(otel): trim verbose comments to concise rationale
Collapse multi-line why-blocks to one or two lines and drop process/plan references (PR-numbering, "the plan") from test comments. No behavior change.
324 lines
11 KiB
Python
324 lines
11 KiB
Python
import datetime as real_datetime
|
|
import json
|
|
import os
|
|
import sys
|
|
|
|
import pytest
|
|
from fastapi import HTTPException
|
|
|
|
from litellm.caching.caching import DualCache
|
|
from litellm.proxy._types import ProxyErrorTypes
|
|
from litellm.proxy.utils import ProxyLogging
|
|
|
|
sys.path.insert(
|
|
0, os.path.abspath("../../..")
|
|
) # Adds the parent directory to the system path
|
|
|
|
|
|
from unittest.mock import MagicMock
|
|
|
|
from litellm.proxy.utils import get_custom_url, join_paths
|
|
|
|
|
|
def test_get_custom_url(monkeypatch):
|
|
monkeypatch.setenv("SERVER_ROOT_PATH", "/litellm")
|
|
custom_url = get_custom_url(request_base_url="http://0.0.0.0:4000", route="ui/")
|
|
assert custom_url == "http://0.0.0.0:4000/litellm/ui/"
|
|
|
|
|
|
def test_proxy_only_error_true_for_llm_route():
|
|
proxy_logging_obj = ProxyLogging(user_api_key_cache=DualCache())
|
|
assert proxy_logging_obj._is_proxy_only_llm_api_error(
|
|
original_exception=Exception(),
|
|
error_type=ProxyErrorTypes.auth_error,
|
|
route="/v1/chat/completions",
|
|
)
|
|
|
|
|
|
def test_proxy_only_error_true_for_info_route():
|
|
proxy_logging_obj = ProxyLogging(user_api_key_cache=DualCache())
|
|
assert (
|
|
proxy_logging_obj._is_proxy_only_llm_api_error(
|
|
original_exception=Exception(),
|
|
error_type=ProxyErrorTypes.auth_error,
|
|
route="/key/info",
|
|
)
|
|
is True
|
|
)
|
|
|
|
|
|
def test_proxy_only_error_false_for_non_llm_non_info_route():
|
|
proxy_logging_obj = ProxyLogging(user_api_key_cache=DualCache())
|
|
assert (
|
|
proxy_logging_obj._is_proxy_only_llm_api_error(
|
|
original_exception=Exception(),
|
|
error_type=ProxyErrorTypes.auth_error,
|
|
route="/key/generate",
|
|
)
|
|
is False
|
|
)
|
|
|
|
|
|
def test_proxy_only_error_false_for_other_error_type():
|
|
proxy_logging_obj = ProxyLogging(user_api_key_cache=DualCache())
|
|
assert (
|
|
proxy_logging_obj._is_proxy_only_llm_api_error(
|
|
original_exception=Exception(),
|
|
error_type=None,
|
|
route="/v1/chat/completions",
|
|
)
|
|
is False
|
|
)
|
|
|
|
|
|
def test_get_model_group_info_order():
|
|
from litellm import Router
|
|
from litellm.proxy.proxy_server import _get_model_group_info
|
|
|
|
router = Router(
|
|
model_list=[
|
|
{
|
|
"model_name": "openai/tts-1",
|
|
"litellm_params": {
|
|
"model": "openai/tts-1",
|
|
"api_key": "sk-1234",
|
|
},
|
|
},
|
|
{
|
|
"model_name": "openai/gpt-3.5-turbo",
|
|
"litellm_params": {
|
|
"model": "openai/gpt-3.5-turbo",
|
|
"api_key": "sk-1234",
|
|
},
|
|
},
|
|
]
|
|
)
|
|
model_list = _get_model_group_info(
|
|
llm_router=router,
|
|
all_models_str=["openai/tts-1", "openai/gpt-3.5-turbo"],
|
|
model_group=None,
|
|
)
|
|
|
|
model_groups = [m.model_group for m in model_list]
|
|
assert model_groups == ["openai/tts-1", "openai/gpt-3.5-turbo"]
|
|
|
|
|
|
def test_join_paths_no_duplication():
|
|
"""Test that join_paths doesn't duplicate route when base_path already ends with it"""
|
|
result = join_paths(
|
|
base_path="http://0.0.0.0:4000/my-custom-path/", route="/my-custom-path"
|
|
)
|
|
assert result == "http://0.0.0.0:4000/my-custom-path"
|
|
|
|
|
|
def test_join_paths_normal_join():
|
|
"""Test normal path joining"""
|
|
result = join_paths(base_path="http://0.0.0.0:4000", route="/api/v1")
|
|
assert result == "http://0.0.0.0:4000/api/v1"
|
|
|
|
|
|
def test_join_paths_with_trailing_slash():
|
|
"""Test path joining with trailing slash on base_path"""
|
|
result = join_paths(base_path="http://0.0.0.0:4000/", route="api/v1")
|
|
assert result == "http://0.0.0.0:4000/api/v1"
|
|
|
|
|
|
def test_join_paths_empty_base():
|
|
"""Test path joining with empty base_path"""
|
|
result = join_paths(base_path="", route="api/v1")
|
|
assert result == "/api/v1"
|
|
|
|
|
|
def test_join_paths_empty_route():
|
|
"""Test path joining with empty route"""
|
|
result = join_paths(base_path="http://0.0.0.0:4000", route="")
|
|
assert result == "http://0.0.0.0:4000"
|
|
|
|
|
|
def test_join_paths_both_empty():
|
|
"""Test path joining with both empty"""
|
|
result = join_paths(base_path="", route="")
|
|
assert result == "/"
|
|
|
|
|
|
def test_join_paths_nested_path():
|
|
"""Test path joining with nested paths"""
|
|
result = join_paths(base_path="http://0.0.0.0:4000/v1", route="chat/completions")
|
|
assert result == "http://0.0.0.0:4000/v1/chat/completions"
|
|
|
|
|
|
def _patch_today(monkeypatch, year, month, day):
|
|
class PatchedDate(real_datetime.date):
|
|
@classmethod
|
|
def today(cls):
|
|
return real_datetime.date(year, month, day)
|
|
|
|
monkeypatch.setattr("litellm.proxy.utils.date", PatchedDate)
|
|
|
|
|
|
def test_get_projected_spend_over_limit_day_one(monkeypatch):
|
|
from litellm.proxy.utils import _get_projected_spend_over_limit
|
|
|
|
_patch_today(monkeypatch, 2026, 1, 1)
|
|
result = _get_projected_spend_over_limit(100.0, 1.0)
|
|
|
|
assert result is not None
|
|
projected_spend, projected_exceeded_date = result
|
|
assert projected_spend == 3100.0
|
|
assert projected_exceeded_date == real_datetime.date(2026, 1, 1)
|
|
|
|
|
|
def test_get_projected_spend_over_limit_december(monkeypatch):
|
|
from litellm.proxy.utils import _get_projected_spend_over_limit
|
|
|
|
_patch_today(monkeypatch, 2026, 12, 15)
|
|
result = _get_projected_spend_over_limit(100.0, 1.0)
|
|
|
|
assert result is not None
|
|
projected_spend, projected_exceeded_date = result
|
|
assert projected_spend == pytest.approx(214.28571428571428)
|
|
assert projected_exceeded_date == real_datetime.date(2026, 12, 15)
|
|
|
|
|
|
def test_get_projected_spend_over_limit_includes_current_spend(monkeypatch):
|
|
from litellm.proxy.utils import _get_projected_spend_over_limit
|
|
|
|
_patch_today(monkeypatch, 2026, 4, 11)
|
|
result = _get_projected_spend_over_limit(100.0, 200.0)
|
|
|
|
assert result is not None
|
|
projected_spend, projected_exceeded_date = result
|
|
assert projected_spend == 290.0
|
|
assert projected_exceeded_date == real_datetime.date(2026, 4, 21)
|
|
|
|
|
|
# ---------------------------------------------------------------------------
|
|
# L2: _enrich_http_exception_with_guardrail_context
|
|
# Regression coverage for case 2026-04-10-internal-bedrock-guardrail-streaming-error.
|
|
# ---------------------------------------------------------------------------
|
|
|
|
|
|
def test_enrich_http_exception_with_guardrail_context_dict_detail():
|
|
"""L2: dict-detail HTTPException is enriched with guardrail_name and mode."""
|
|
from litellm.proxy.utils import _enrich_http_exception_with_guardrail_context
|
|
|
|
class StubCallback:
|
|
guardrail_name = "bedrock-pii-guard"
|
|
event_hook = "post_call"
|
|
|
|
exc = HTTPException(status_code=400, detail={"error": "Violated guardrail policy"})
|
|
_enrich_http_exception_with_guardrail_context(exc, StubCallback())
|
|
assert exc.detail["guardrail_name"] == "bedrock-pii-guard"
|
|
assert exc.detail["guardrail_mode"] == "post_call"
|
|
|
|
|
|
def test_enrich_http_exception_string_detail_noop():
|
|
"""L2: string-detail HTTPException is not mutated (can't add fields to a str)."""
|
|
from litellm.proxy.utils import _enrich_http_exception_with_guardrail_context
|
|
|
|
class StubCallback:
|
|
guardrail_name = "x"
|
|
event_hook = "pre_call"
|
|
|
|
exc = HTTPException(status_code=400, detail="Content blocked")
|
|
_enrich_http_exception_with_guardrail_context(exc, StubCallback())
|
|
assert exc.detail == "Content blocked"
|
|
|
|
|
|
def test_enrich_http_exception_setdefault_does_not_overwrite():
|
|
"""L2: a guardrail that already populates guardrail_name explicitly wins."""
|
|
from litellm.proxy.utils import _enrich_http_exception_with_guardrail_context
|
|
|
|
class StubCallback:
|
|
guardrail_name = "inferred-name"
|
|
event_hook = "pre_call"
|
|
|
|
exc = HTTPException(
|
|
status_code=400,
|
|
detail={"error": "x", "guardrail_name": "explicit-name"},
|
|
)
|
|
_enrich_http_exception_with_guardrail_context(exc, StubCallback())
|
|
assert exc.detail["guardrail_name"] == "explicit-name"
|
|
|
|
|
|
def test_enrich_http_exception_non_http_exception_noop():
|
|
"""L2: non-HTTPException is left alone and the helper does not raise."""
|
|
from litellm.proxy.utils import _enrich_http_exception_with_guardrail_context
|
|
|
|
class StubCallback:
|
|
guardrail_name = "x"
|
|
event_hook = "pre_call"
|
|
|
|
exc = ValueError("not an HTTPException")
|
|
_enrich_http_exception_with_guardrail_context(exc, StubCallback())
|
|
assert str(exc) == "not an HTTPException"
|
|
|
|
|
|
def test_enrich_http_exception_callback_without_guardrail_name_noop():
|
|
"""L2: callback without guardrail_name attribute leaves detail alone."""
|
|
from litellm.proxy.utils import _enrich_http_exception_with_guardrail_context
|
|
|
|
class StubCallback:
|
|
pass
|
|
|
|
exc = HTTPException(status_code=400, detail={"error": "x"})
|
|
_enrich_http_exception_with_guardrail_context(exc, StubCallback())
|
|
assert exc.detail == {"error": "x"}
|
|
|
|
|
|
class TestPostCallFailureHookLiftsFirstApiCallStartTime:
|
|
"""post_call_failure_hook lifts first_api_call_start_time off the
|
|
logging object into request_data (an internal top-level key) before
|
|
the non-serialisable logging object is popped, so failure-path
|
|
callbacks (OTel preprocessing latency) can still read it. It must
|
|
never land in request_data["metadata"] (user request metadata,
|
|
echoed downstream and typed Dict[str, str] in batch objects).
|
|
"""
|
|
|
|
async def _run(self, request_data):
|
|
from unittest.mock import AsyncMock, patch
|
|
|
|
from litellm.proxy._types import UserAPIKeyAuth
|
|
|
|
proxy_logging_obj = ProxyLogging(user_api_key_cache=DualCache())
|
|
proxy_logging_obj.alert_types = [] # skip alerting branch
|
|
with patch.object(proxy_logging_obj, "update_request_status", new=AsyncMock()):
|
|
await proxy_logging_obj.post_call_failure_hook(
|
|
request_data=request_data,
|
|
original_exception=Exception("boom"),
|
|
user_api_key_dict=UserAPIKeyAuth(),
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_lifts_to_top_level_and_pops_logging_obj(self):
|
|
handoff = real_datetime.datetime(2026, 1, 1, 0, 0, 0)
|
|
logging_obj = MagicMock()
|
|
logging_obj.model_call_details = {"first_api_call_start_time": handoff}
|
|
user_meta = {}
|
|
request_data = {
|
|
"litellm_logging_obj": logging_obj,
|
|
"metadata": user_meta,
|
|
}
|
|
await self._run(request_data)
|
|
|
|
assert request_data["first_api_call_start_time"] == handoff
|
|
assert "litellm_logging_obj" not in request_data
|
|
# user metadata is never touched
|
|
assert user_meta == {}
|
|
assert "first_api_call_start_time" not in request_data["metadata"]
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_no_logging_obj_is_noop(self):
|
|
request_data = {"metadata": {}}
|
|
await self._run(request_data)
|
|
assert "first_api_call_start_time" not in request_data
|
|
|
|
@pytest.mark.asyncio
|
|
async def test_logging_obj_without_anchor_is_noop(self):
|
|
logging_obj = MagicMock()
|
|
logging_obj.model_call_details = {}
|
|
request_data = {"litellm_logging_obj": logging_obj}
|
|
await self._run(request_data)
|
|
assert "first_api_call_start_time" not in request_data
|
|
assert "litellm_logging_obj" not in request_data
|