Files
litellm/tests/logging_callback_tests/test_langfuse_e2e_test.py
T
Ishaan Jaff 8e61b32b8e [Staging] - Ishaan March 17th (#23903)
* feat(xai): add grok-4.20 beta 2 models with pricing (#23900)

Add three grok-4.20 beta 2 model variants from xAI:
- grok-4.20-multi-agent-beta-0309 (reasoning + multi-agent)
- grok-4.20-beta-0309-reasoning (reasoning)
- grok-4.20-beta-0309-non-reasoning

Pricing (from https://docs.x.ai/docs/models):
- Input: $2.00/1M tokens ($0.20/1M cached)
- Output: $6.00/1M tokens
- Context: 2M tokens

All variants support vision, function calling, tool choice, and web search.
Closes LIT-2171

* docs: add Quick Install section for litellm --setup wizard (#23905)

* docs: add Quick Install section for litellm --setup wizard

* docs: clarify setup wizard is for local/beginner use

* feat(setup): interactive setup wizard + install.sh (#23644)

* feat(setup): add interactive setup wizard + install.sh

Adds `litellm --setup` — a Claude Code-style TUI onboarding wizard that
guides users through provider selection, API key entry, and proxy config
generation, then optionally starts the proxy immediately.

- litellm/setup_wizard.py: wizard with ASCII art, numbered provider menu
  (OpenAI, Anthropic, Azure, Gemini, Bedrock, Ollama), API key prompts,
  port/master-key config, and litellm_config.yaml generation
- litellm/proxy/proxy_cli.py: adds --setup flag that invokes the wizard
- scripts/install.sh: curl-installable script (detect OS/Python, pip
  install litellm[proxy], launch wizard)

Usage:
  curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh
  litellm --setup

* fix(install.sh): remove orange color, add LITELLM_BRANCH env var for branch installs

* fix(install.sh): install from git branch so --setup is available for QA

* fix(install.sh): remove stale LITELLM_BRANCH reference that caused unbound variable error

* fix(install.sh): force-reinstall from git to bypass cached PyPI version

* fix(install.sh): show pip progress bar during install

* fix(install.sh): always launch wizard via $PYTHON_BIN -m litellm, not PATH binary

* fix(install.sh): use litellm.proxy.proxy_cli module (no __main__.py exists)

* fix(install.sh): suppress RuntimeWarning from module invocation

* fix(install.sh): use Python bin-dir litellm binary to avoid CWD sys.path shadowing

* fix(install.sh): use sysconfig.get_path('scripts') to find pip-installed litellm binary

* fix(install.sh): redirect stdin from /dev/tty on exec so wizard gets terminal, not exhausted pipe

* fix(install.sh): warn about git clone duration, drop --no-cache-dir so re-runs are faster

* feat(setup_wizard): arrow-key selector, updated model names

* fix(setup_wizard): use sysconfig binary to start proxy, not python -m litellm

* feat(setup_wizard): credential validation after key entry + clear next-steps after proxy start

* style(install.sh): show git clone warning in blue

* refactor(setup_wizard): class with static methods, use check_valid_key from litellm.utils

* address greptile review: fix yaml escaping, port validation, display name collisions, tests

- setup_wizard.py: add _yaml_escape() for safe YAML embedding of API keys
- setup_wizard.py: add _styled_input() with readline ANSI ignore markers
- setup_wizard.py: change DIVIDER to _divider() fn to avoid import-time color capture
- setup_wizard.py: validate port range 1-65535, initialize before loop
- setup_wizard.py: qualify azure display names (azure-gpt-4o) to avoid collision with openai
- setup_wizard.py: work on env_copy in _build_config to avoid mutating caller's dict
- setup_wizard.py: skip model_list entries for providers with no credentials
- setup_wizard.py: prompt for azure deployment name
- setup_wizard.py: wrap os.execlp in try/except with friendly fallback
- setup_wizard.py: wrap config write in try/except OSError
- setup_wizard.py: fix _validate_and_report to use two print lines (no \r overwrite)
- setup_wizard.py: add .gitignore tip next to key storage notice
- setup_wizard.py: fix run_setup_wizard() return type annotation to None
- scripts/install.sh: drop pipefail (not supported by dash on Ubuntu when invoked as sh)
- scripts/install.sh: use litellm[proxy] from PyPI (not hardcoded dev branch)
- scripts/install.sh: guard /dev/tty read with -r check for Docker/CI compat
- scripts/install.sh: remove --force-reinstall to avoid downgrading dependencies
- tests/test_litellm/test_setup_wizard.py: 13 unit tests for _build_config and _yaml_escape

* style: black format setup_wizard.py

* fix: address remaining greptile issues - Windows compat, YAML quoting, credential flow

- guard termios/tty imports with try/except ImportError for Windows compat
- quote master_key as YAML double-quoted scalar (same as env vars)
- remove unused port param from _build_config signature
- _validate_and_report now returns the final key so re-entered creds are stored
- add test for master_key YAML quoting

* fix: add --port to suggested command, guard /dev/tty exec in install.sh

* fix: quote api_base in YAML, skip azure if no deployment, only redraw on state change

* fix: address greptile review comments

- _yaml_escape: add control character escaping (\n, \r, \t)
- test: fix tautological assertion in test_build_config_azure_no_deployment_skipped
- test: add tests for control character escaping in _yaml_escape

* feat(ui): remove Chat UI page link and banner from sidebar and playground (#23908)

* feat(guardrails): MCPJWTSigner - built-in guardrail for zero trust MCP auth (#23897)

* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers

* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.

* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.

* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers

* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.

* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.

* feat(guardrails): add MCPJWTSigner built-in guardrail for zero trust MCP auth

Signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT so MCP servers
can trust a single signing authority instead of every upstream IdP.

Enable in config.yaml:
  guardrails:
    - guardrail_name: mcp-jwt-signer
      litellm_params:
        guardrail: mcp_jwt_signer
        mode: pre_mcp_call
        default_on: true

JWT carries sub (user_id), act.sub (team_id, RFC 8693), tool-level scope, iss,
aud, iat/exp/nbf. RSA-2048 keypair auto-generated at startup unless
MCP_JWT_SIGNING_KEY env var is set.

Adds /.well-known/jwks.json endpoint and jwks_uri to /.well-known/openid-configuration
so MCP servers can verify LiteLLM-issued tokens via OIDC discovery.

* Update MCPServerManager to raise HTTPException with status code 400 for extra headers in OpenAPI-backed servers. Adjust tests to verify the correct status code and exception message.

* fix: address P1 issues in MCPJWTSigner

- OpenAPI servers: warn + skip header injection instead of 500
- JWKS Cache-Control: 5min for auto-generated keys, 1h for persistent
- sub claim: fallback to apikey:{token_hash} for anonymous callers
- ttl_seconds: validate > 0 at init time

* docs: add MCP zero trust auth guide with architecture diagram

* docs: add FastMCP JWT verification guide to zero trust doc

* fix: address remaining Greptile review issues (round 2)

- mcp_server_manager: warn when hook Authorization overwrites existing header
- __init__: remove _mcp_jwt_signer_instance from __all__ (private internal)
- discoverable_endpoints: copy dict instead of mutating in-place on OIDC augmentation
- test docstring: reflect warn-and-continue behavior for OpenAPI servers
- test: update scope assertions for least-privilege (no mcp:tools/list on tool-call JWTs)

* fix: address Greptile round 3 feedback

- initialize_guardrail: validate mode='pre_mcp_call' at init time — misconfigured
  mode silently bypasses JWT injection, which is a zero-trust bypass
- _build_claims: remove duplicate inline 'import re' (module-level import already present)
- _types.py: add TODO comment explaining jwt_claims is forward-compat plumbing
  for a follow-up PR that will forward upstream IdP claims into outbound MCP JWTs

* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes

Addresses all missing pieces from the scoping doc review:

FR-5 (Verify + re-sign): MCPJWTSigner now accepts access_token_discovery_uri
and token_introspection_endpoint.  When set, the incoming Bearer token is
extracted from raw_headers (threaded through pre_call_tool_check), verified
against the IdP's JWKS (JWT) or introspected (opaque), and only re-signed if
valid.  Falls back to user_api_key_dict.jwt_claims for LiteLLM JWT-auth mode.

FR-12 (Configurable end-user identity mapping): end_user_claim_sources
ordered list drives sub resolution — sources: token:<claim>, litellm:user_id,
litellm:email, litellm:end_user_id, litellm:team_id.

FR-13 (Claim operations): add_claims (insert-if-absent), set_claims (always
override), remove_claims (delete) applied in that order.

FR-14 (Two-token model): channel_token_audience + channel_token_ttl issue a
second JWT injected as x-mcp-channel-token: Bearer <token>.

FR-15 (Incoming claim validation): required_claims raises HTTP 403 when any
listed claim is absent; optional_claims passes listed claims from verified
token into the outbound JWT.

FR-9 (Debug headers): debug_headers: true emits x-litellm-mcp-debug with kid,
sub, iss, exp, scope.

FR-10 (Configurable scopes): allowed_scopes replaces auto-generation.  Also
fixed: tool-call JWTs no longer grant mcp:tools/list (overpermission).

P1 fixes:
- proxy/utils.py: _convert_mcp_hook_response_to_kwargs merges rather than
  replaces extra_headers, preserving headers from prior guardrails.
- mcp_server_manager.py: warns when hook injects Authorization alongside a
  server-configured authentication_token (previously silent).
- mcp_server_manager.py: pre_call_tool_check now accepts raw_headers and
  extracts incoming_bearer_token so FR-5 verification has the raw token.
- proxy/utils.py: remove stray inline import inspect inside loop (pre-existing
  lint error, now cleaned up).

Tests: 43 passing (28 new tests covering all FR flags + P1 fixes).

* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes (core)

Remaining files from the FR implementation:

mcp_jwt_signer.py — full rewrite with all new params:
  FR-5:  access_token_discovery_uri, token_introspection_endpoint,
         verify_issuer, verify_audience + _verify_incoming_jwt(),
         _introspect_opaque_token()
  FR-12: end_user_claim_sources ordered resolution chain
  FR-13: add_claims, set_claims, remove_claims
  FR-14: channel_token_audience, channel_token_ttl → x-mcp-channel-token
  FR-15: required_claims (raises 403), optional_claims (passthrough)
  FR-9:  debug_headers → x-litellm-mcp-debug
  FR-10: allowed_scopes; tool-call JWTs no longer over-grant tools/list

mcp_server_manager.py:
  - pre_call_tool_check gains raw_headers param to extract incoming_bearer_token
  - Silent Authorization override warning fixed: now fires when server has
    authentication_token AND hook injects Authorization

tests/test_mcp_jwt_signer.py:
  28 new tests covering all FR flags + P1 fixes (43 total, all passing)

* fix(mcp_jwt_signer): address pre-landing review issues

- Remove stale TODO comment on UserAPIKeyAuth.jwt_claims — the field is
  already populated and consumed by MCPJWTSigner in the same PR
- Fix _get_oidc_discovery to only cache the OIDC discovery doc when
  jwks_uri is present; a malformed/empty doc now retries on the next
  request instead of being permanently cached until proxy restart
- Add FR-5 test coverage for _fetch_jwks (cache hit/miss),
  _get_oidc_discovery (cache/no-cache on bad doc), _verify_incoming_jwt
  (valid token, expired token), _introspect_opaque_token (active,
  inactive, no endpoint), and the end-to-end 401 hook path — 53 tests
  total, all passing

* docs(mcp_zero_trust): rewrite as use-case guide covering all new JWT signer features

Add scenario-driven sections for each new config area:
- Verify+re-sign with Okta/Azure AD (access_token_discovery_uri,
  end_user_claim_sources, token_introspection_endpoint)
- Enforcing caller attributes with required_claims / optional_claims
- Adding metadata via add_claims / set_claims / remove_claims
- Two-token model for AWS Bedrock AgentCore Gateway
  (channel_token_audience / channel_token_ttl)
- Controlling scopes with allowed_scopes
- Debugging JWT rejections with debug_headers

Update JWT claims table to reflect configurable sub (end_user_claim_sources)

* fix(mcp_jwt_signer): wire all config.yaml params through initialize_guardrail

The factory was only passing issuer/audience/ttl_seconds to MCPJWTSigner.
All FR-5/9/10/12/13/14/15 params (access_token_discovery_uri,
end_user_claim_sources, add/set/remove_claims, channel_token_audience,
required/optional_claims, debug_headers, allowed_scopes, etc.) were
silently dropped, making every advertised advanced feature non-functional
when loaded from config.yaml.

Add regression test that asserts every param is wired through correctly.

* docs(mcp_zero_trust): add hero image

* docs(mcp_zero_trust): apply Linear-style edits

- Lead with the problem (unsigned direct calls bypass access controls)
- Shorter statement section headers instead of question-form headers
- Move diagram/OIDC discovery block after the reader is bought in
- Add 'read further only if you need to' callout after basic setup
- Two-token section now opens from the user problem not product jargon
- Add concrete 403 error response example in required_claims section
- Debug section opens from the symptom (MCP server returning 401)
- Lowercase claims reference header for consistency

* fix(mcp_jwt_signer): fix algorithm confusion attack + add OIDC discovery 24h TTL

- Remove alg from unverified JWT header; use signing_jwk.algorithm_name from JWKS key instead.
  Reading alg from attacker-controlled headers enables alg:none / HS256 confusion attacks.
- Add _oidc_discovery_fetched_at timestamp and _OIDC_DISCOVERY_TTL = 86400 (24h).
  Without a TTL the cached discovery doc never refreshes, so IdP key rotation is invisible.

---------

Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>

* fix(ci): stabilize CI - formatting, type errors, test polling, security CVEs, router bug, batch resolution

Fix 1: Run Black formatter on 35 files
Fix 2: Fix MyPy type errors:
  - setup_wizard.py: add type annotation for 'selected' set variable
  - user_api_key_auth.py: remove redundant type annotation on jwt_claims reassignment
Fix 3: Fix spend accuracy test burst 2 polling to wait for expected total
  spend instead of just 'any increase' from burst 2
Fix 4: Bump Next.js 16.1.6 -> 16.1.7 to fix CVE-2026-27978, CVE-2026-27979,
  CVE-2026-27980, CVE-2026-29057
Fix 5: Fix router _pre_call_checks model variable being overwritten inside
  loop, causing wrong model lookups on subsequent deployments. Use local
  _deployment_model variable instead.
Fix 6: Add missing resolve_output_file_ids_to_unified call in batch retrieve
  non-terminal-to-terminal path (matching the terminal path behavior)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* chore: regenerate poetry.lock to sync with pyproject.toml

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: format merged files from main and regenerate poetry.lock

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): annotate jwt_claims as Optional[dict] to fix type incompatibility

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): update router region test to use gpt-4.1-mini (fix flaky model lookup)

Replace deprecated gpt-3.5-turbo-1106 with gpt-4.1-mini + mock_response in
test_router_region_pre_call_check, following the same pattern used in commit
717d37cc5b for test_router_context_window_check_pre_call_check_out_group.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retry flaky logging_testing (async event loop race condition)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): aggregate all mock calls in langfuse e2e test to fix race condition

The _verify_langfuse_call helper only inspected the last mock call
(mock_post.call_args), but the Langfuse SDK may split trace-create and
generation-create events across separate HTTP flush cycles. This caused
an IndexError when the last call's batch contained only one event type.

Fix: iterate over mock_post.call_args_list to collect batch items from
ALL calls. Also add a safety assertion after filtering by trace_id and
mark all langfuse e2e tests with @pytest.mark.flaky(retries=3) as an
extra safety net for any residual timing issues.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): black formatting + update OpenAPI compliance tests for spec changes

- Apply Black 26.x formatting to litellm_logging.py (parenthesized style)
- Update test_input_types_match_spec to follow $ref to InteractionsInput schema
  (Google updated their OpenAPI spec to use $ref instead of inline oneOf)
- Update test_content_schema_uses_discriminator to handle discriminator without
  explicit mapping (Google removed the mapping key from Content discriminator)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* revert: undo incorrect Black 26.x formatting on litellm_logging.py

The file was correctly formatted for Black 23.12.1 (the version pinned
in pyproject.toml). The previous commit applied Black 26.x formatting
which was incompatible with the CI's Black version.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): deduplicate and sort langfuse batch events after aggregation

The Langfuse SDK may send the same event (e.g., trace-create) in
multiple flush cycles, causing duplicates when we aggregate from all
mock calls. After filtering by trace_id, deduplicate by keeping only
the first event of each type, then sort to ensure trace-create is at
index 0 and generation-create at index 1.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-03-18 15:09:01 -07:00

589 lines
22 KiB
Python

import asyncio
import copy
import json
import logging
import os
import sys
import threading
from typing import Any, Optional
from unittest.mock import AsyncMock, MagicMock, patch
import httpx
logging.basicConfig(level=logging.DEBUG)
sys.path.insert(0, os.path.abspath("../.."))
import litellm
from litellm import completion
from litellm.caching import InMemoryCache
from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler
litellm.num_retries = 3
litellm.success_callback = ["langfuse"]
os.environ["LANGFUSE_DEBUG"] = "True"
import time
import pytest
import pytest_asyncio
def assert_langfuse_request_matches_expected(
actual_request_body: dict,
expected_file_name: str,
trace_id: Optional[str] = None,
):
"""
Helper function to compare actual Langfuse request body with expected JSON file.
Args:
actual_request_body (dict): The actual request body received from the API call
expected_file_name (str): Name of the JSON file containing expected request body (e.g., "transcription.json")
"""
# Get the current directory and read the expected request body
pwd = os.path.dirname(os.path.realpath(__file__))
expected_body_path = os.path.join(
pwd, "langfuse_expected_request_body", expected_file_name
)
with open(expected_body_path, "r") as f:
expected_request_body = json.load(f)
# Filter out events that don't match the trace_id
if trace_id:
actual_request_body["batch"] = [
item
for item in actual_request_body["batch"]
if (item["type"] == "trace-create" and item["body"].get("id") == trace_id)
or (
item["type"] == "generation-create"
and item["body"].get("traceId") == trace_id
)
]
# When aggregating from multiple flush cycles, deduplicate by keeping
# only one trace-create and one generation-create per trace_id.
seen_types: dict = {}
deduped_batch: list = []
for item in actual_request_body["batch"]:
item_type = item["type"]
if item_type not in seen_types:
seen_types[item_type] = True
deduped_batch.append(item)
actual_request_body["batch"] = deduped_batch
# Ensure canonical order: trace-create first, generation-create second
actual_request_body["batch"].sort(
key=lambda x: 0 if x["type"] == "trace-create" else 1
)
print(
"actual_request_body after filtering", json.dumps(actual_request_body, indent=4)
)
assert len(actual_request_body["batch"]) >= 2, (
f"Expected at least 2 batch items (trace-create + generation-create) "
f"after filtering by trace_id={trace_id}, "
f"but got {len(actual_request_body['batch'])}. "
f"Items: {json.dumps(actual_request_body['batch'], indent=2)}"
)
# Replace dynamic values in actual request body
for item in actual_request_body["batch"]:
# Replace IDs with expected IDs
if item["type"] == "trace-create":
item["id"] = expected_request_body["batch"][0]["id"]
item["body"]["id"] = expected_request_body["batch"][0]["body"]["id"]
item["timestamp"] = expected_request_body["batch"][0]["timestamp"]
item["body"]["timestamp"] = expected_request_body["batch"][0]["body"][
"timestamp"
]
elif item["type"] == "generation-create":
item["id"] = expected_request_body["batch"][1]["id"]
item["body"]["id"] = expected_request_body["batch"][1]["body"]["id"]
item["timestamp"] = expected_request_body["batch"][1]["timestamp"]
item["body"]["startTime"] = expected_request_body["batch"][1]["body"][
"startTime"
]
item["body"]["endTime"] = expected_request_body["batch"][1]["body"][
"endTime"
]
item["body"]["completionStartTime"] = expected_request_body["batch"][1][
"body"
]["completionStartTime"]
if trace_id is None:
print("popping traceId")
item["body"].pop("traceId")
else:
item["body"]["traceId"] = trace_id
expected_request_body["batch"][1]["body"]["traceId"] = trace_id
# Replace SDK version with expected version
actual_request_body["batch"][0]["body"].pop("release", None)
actual_request_body["metadata"]["sdk_version"] = expected_request_body["metadata"][
"sdk_version"
]
# replace "public_key" with expected public key
actual_request_body["metadata"]["public_key"] = expected_request_body["metadata"][
"public_key"
]
actual_request_body["batch"][1]["body"]["metadata"] = expected_request_body[
"batch"
][1]["body"]["metadata"]
actual_request_body["metadata"]["sdk_integration"] = expected_request_body[
"metadata"
]["sdk_integration"]
actual_request_body["metadata"]["batch_size"] = expected_request_body["metadata"][
"batch_size"
]
# Assert the entire request body matches
assert (
actual_request_body == expected_request_body
), f"Difference in request bodies: {json.dumps(actual_request_body, indent=2)} != {json.dumps(expected_request_body, indent=2)}"
class TestLangfuseLogging:
@pytest_asyncio.fixture
async def mock_setup(self):
"""Common setup for Langfuse logging tests"""
from litellm._uuid import uuid
from unittest.mock import AsyncMock, patch
import httpx
# Create a mock Response object
mock_response = AsyncMock(spec=httpx.Response)
mock_response.status_code = 200
mock_response.json.return_value = {"status": "success"}
# Create mock for httpx.Client.post
mock_post = AsyncMock()
mock_post.return_value = mock_response
litellm.set_verbose = True
litellm.success_callback = ["langfuse"]
return {"trace_id": f"litellm-test-{str(uuid.uuid4())}", "mock_post": mock_post}
async def _verify_langfuse_call(
self,
mock_post,
expected_file_name: str,
trace_id: str,
):
"""Helper method to verify Langfuse API calls"""
await asyncio.sleep(3)
# Verify at least one call was made
assert mock_post.call_count >= 1
# Aggregate batch items from ALL calls — the Langfuse SDK may split
# trace-create and generation-create across separate HTTP flushes.
langfuse_url = "https://us.cloud.langfuse.com/api/public/ingestion"
all_batch_items: list = []
metadata: Optional[dict] = None
for call in mock_post.call_args_list:
url = call[0][0]
if url != langfuse_url:
continue
request_body = call[1].get("content")
if request_body:
body = json.loads(request_body)
all_batch_items.extend(body.get("batch", []))
if metadata is None:
metadata = body.get("metadata")
assert len(all_batch_items) > 0, "No Langfuse ingestion calls found"
assert metadata is not None, "No metadata found in Langfuse calls"
actual_request_body = {
"batch": all_batch_items,
"metadata": metadata,
}
print("\nMocked Request Details (aggregated from all calls):")
print(f"Request Body: {json.dumps(actual_request_body, indent=4)}")
assert_langfuse_request_matches_expected(
actual_request_body,
expected_file_name,
trace_id,
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_completion(self, mock_setup):
"""Test Langfuse logging for chat completion"""
setup = mock_setup
with patch("httpx.Client.post", setup["mock_post"]):
await litellm.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
mock_response="Hello! How can I assist you today?",
metadata={"trace_id": setup["trace_id"]},
)
await self._verify_langfuse_call(
setup["mock_post"], "completion.json", setup["trace_id"]
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_completion_with_tags(self, mock_setup):
"""Test Langfuse logging for chat completion with tags"""
setup = mock_setup
with patch("httpx.Client.post", setup["mock_post"]):
await litellm.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
mock_response="Hello! How can I assist you today?",
metadata={
"trace_id": setup["trace_id"],
"tags": ["test_tag", "test_tag_2"],
},
)
await self._verify_langfuse_call(
setup["mock_post"], "completion_with_tags.json", setup["trace_id"]
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_completion_with_tags_stream(self, mock_setup):
"""Test Langfuse logging for chat completion with tags"""
setup = mock_setup
with patch("httpx.Client.post", setup["mock_post"]):
await litellm.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
mock_response="Hello! How can I assist you today?",
metadata={
"trace_id": setup["trace_id"],
"tags": ["test_tag_stream", "test_tag_2_stream"],
},
)
await self._verify_langfuse_call(
setup["mock_post"],
"completion_with_tags_stream.json",
setup["trace_id"],
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_completion_with_langfuse_metadata(self, mock_setup):
"""Test Langfuse logging for chat completion with metadata for langfuse"""
setup = mock_setup
with patch("httpx.Client.post", setup["mock_post"]):
await litellm.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
mock_response="Hello! How can I assist you today?",
metadata={
"trace_id": setup["trace_id"],
"tags": ["test_tag", "test_tag_2"],
"generation_name": "test_generation_name",
"parent_observation_id": "test_parent_observation_id",
"version": "test_version",
"trace_user_id": "test_user_id",
"session_id": "test_session_id",
"trace_name": "test_trace_name",
"trace_metadata": {"test_key": "test_value"},
"trace_version": "test_trace_version",
"trace_release": "test_trace_release",
},
)
await self._verify_langfuse_call(
setup["mock_post"],
"completion_with_langfuse_metadata.json",
setup["trace_id"],
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_with_non_serializable_metadata(self, mock_setup):
"""Test Langfuse logging with metadata that requires preparation (Pydantic models, sets, etc)"""
from pydantic import BaseModel
from typing import Set
import datetime
class UserPreferences(BaseModel):
favorite_colors: Set[str]
last_login: datetime.datetime
settings: dict
setup = mock_setup
test_metadata = {
"user_prefs": UserPreferences(
favorite_colors={"red", "blue"},
last_login=datetime.datetime.now(),
settings={"theme": "dark", "notifications": True},
),
"nested_set": {
"inner_set": {1, 2, 3},
"inner_pydantic": UserPreferences(
favorite_colors={"green", "yellow"},
last_login=datetime.datetime.now(),
settings={"theme": "light"},
),
},
"trace_id": setup["trace_id"],
}
with patch("httpx.Client.post", setup["mock_post"]):
response = await litellm.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
mock_response="Hello! How can I assist you today?",
metadata=test_metadata,
)
await self._verify_langfuse_call(
setup["mock_post"],
"completion_with_complex_metadata.json",
setup["trace_id"],
)
@pytest.mark.asyncio
@pytest.mark.parametrize(
"test_metadata, response_json_file",
[
({"a": 1, "b": 2, "c": 3}, "simple_metadata.json"),
(
{"a": {"nested_a": 1}, "b": {"nested_b": 2}},
"nested_metadata.json",
),
({"a": [1, 2, 3], "b": {4, 5, 6}}, "simple_metadata2.json"),
(
{"a": (1, 2), "b": frozenset([3, 4]), "c": {"d": [5, 6]}},
"simple_metadata3.json",
),
({"lock": threading.Lock()}, "metadata_with_lock.json"),
({"func": lambda x: x + 1}, "metadata_with_function.json"),
(
{
"int": 42,
"str": "hello",
"list": [1, 2, 3],
"set": {4, 5},
"dict": {"nested": "value"},
"non_copyable": threading.Lock(),
"function": print,
},
"complex_metadata.json",
),
(
{"list": ["list", "not", "a", "dict"]},
"complex_metadata_2.json",
),
({}, "empty_metadata.json"),
],
)
@pytest.mark.flaky(retries=6, delay=1)
async def test_langfuse_logging_with_various_metadata_types(
self, mock_setup, test_metadata, response_json_file
):
"""Test Langfuse logging with various metadata types including non-serializable objects"""
import threading
setup = mock_setup
if test_metadata is not None:
test_metadata["trace_id"] = setup["trace_id"]
with patch("httpx.Client.post", setup["mock_post"]):
await litellm.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
mock_response="Hello! How can I assist you today?",
metadata=test_metadata,
)
await self._verify_langfuse_call(
setup["mock_post"],
response_json_file,
setup["trace_id"],
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_completion_with_malformed_llm_response(
self, mock_setup
):
"""Test Langfuse logging for chat completion with malformed LLM response"""
setup = mock_setup
litellm._turn_on_debug()
with patch("httpx.Client.post", setup["mock_post"]):
mock_response = litellm.ModelResponse(
choices=[],
usage=litellm.Usage(
prompt_tokens=10,
completion_tokens=10,
total_tokens=20,
),
model="gpt-3.5-turbo",
object="chat.completion",
created=1723081200,
).model_dump()
await litellm.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
mock_response=mock_response,
metadata={"trace_id": setup["trace_id"]},
)
await self._verify_langfuse_call(
setup["mock_post"], "completion_with_no_choices.json", setup["trace_id"]
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_completion_with_bedrock_llm_response(
self, mock_setup
):
"""Test Langfuse logging for chat completion with malformed LLM response"""
setup = mock_setup
litellm._turn_on_debug()
with patch("httpx.Client.post", setup["mock_post"]):
mock_response = litellm.ModelResponse(
choices=[],
usage=litellm.Usage(
prompt_tokens=10,
completion_tokens=10,
total_tokens=20,
),
model="anthropic.claude-3-5-sonnet-20240620-v1:0",
object="chat.completion",
created=1723081200,
).model_dump()
await litellm.acompletion(
model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
messages=[{"role": "user", "content": "Hello!"}],
mock_response=mock_response,
metadata={"trace_id": setup["trace_id"]},
aws_access_key_id="fake-key",
aws_secret_access_key="fake-key",
aws_region="us-east-1",
)
await self._verify_langfuse_call(
setup["mock_post"], "completion_with_bedrock_call.json", setup["trace_id"]
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_completion_with_vertex_llm_response(
self, mock_setup
):
"""Test Langfuse logging for chat completion with malformed LLM response"""
setup = mock_setup
litellm._turn_on_debug()
with patch("httpx.Client.post", setup["mock_post"]):
mock_response = litellm.ModelResponse(
choices=[],
usage=litellm.Usage(
prompt_tokens=10,
completion_tokens=10,
total_tokens=20,
),
model="vertex/gemini-2.0-flash-001",
object="chat.completion",
created=1723081200,
).model_dump()
await litellm.acompletion(
model="vertex_ai/gemini-2.0-flash-001",
messages=[{"role": "user", "content": "Hello!"}],
mock_response=mock_response,
metadata={"trace_id": setup["trace_id"]},
vertex_credentials="my-mock-credentials",
api_key="my-mock-credentials-2",
)
await self._verify_langfuse_call(
setup["mock_post"], "completion_with_vertex_call.json", setup["trace_id"]
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_vllm_embedding(self, mock_setup):
"""
Test that the request sent to the vllm embedding endpoint is correct.
Verifies the request body matches the expected JSON fixture,
including that the hosted_vllm/ prefix is stripped from the model name
and that no unexpected fields (e.g. encoding_format) are included.
"""
setup = mock_setup
vllm_response_data = {
"object": "list",
"data": [{"object": "embedding", "index": 0, "embedding": [0.1, 0.2, 0.3]}],
"model": "BAAI/bge-small-en-v1.5",
"usage": {"prompt_tokens": 10, "total_tokens": 10},
}
mock_vllm_response = httpx.Response(
status_code=200,
json=vllm_response_data,
)
mock_async_client = AsyncHTTPHandler()
mock_async_client.post = AsyncMock(return_value=mock_vllm_response)
with patch("httpx.Client.post", setup["mock_post"]):
await litellm.aembedding(
model="hosted_vllm/BAAI/bge-small-en-v1.5",
input=["Hello from litellm!"],
api_base="http://my-fake-vllm.com/v1",
metadata={"trace_id": setup["trace_id"]},
client=mock_async_client,
)
# Verify the request sent to vllm matches the expected JSON fixture
assert mock_async_client.post.call_count == 1
actual_vllm_request = mock_async_client.post.call_args.kwargs["json"]
pwd = os.path.dirname(os.path.realpath(__file__))
expected_body_path = os.path.join(
pwd, "langfuse_expected_request_body", "embedding_with_vllm.json"
)
with open(expected_body_path, "r") as f:
expected_vllm_request = json.load(f)
assert actual_vllm_request == expected_vllm_request, (
f"vllm request body mismatch:\n"
f"actual: {json.dumps(actual_vllm_request, indent=2)}\n"
f"expected: {json.dumps(expected_vllm_request, indent=2)}"
)
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_langfuse_logging_with_router(self, mock_setup):
"""Test Langfuse logging with router"""
litellm._turn_on_debug()
router = litellm.Router(
model_list=[
{
"model_name": "gpt-3.5-turbo",
"litellm_params": {
"model": "gpt-3.5-turbo",
"mock_response": "Hello! How can I assist you today?",
"api_key": "test_api_key",
}
}
]
)
with patch("httpx.Client.post", mock_setup["mock_post"]):
mock_response = litellm.ModelResponse(
choices=[],
usage=litellm.Usage(
prompt_tokens=10,
completion_tokens=10,
total_tokens=20,
),
model="gpt-3.5-turbo",
object="chat.completion",
created=1723081200,
).model_dump()
await router.acompletion(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": "Hello!"}],
mock_response=mock_response,
metadata={"trace_id": mock_setup["trace_id"]},
)
await self._verify_langfuse_call(
mock_setup["mock_post"], "completion_with_router.json", mock_setup["trace_id"]
)