mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 22:48:35 +00:00
8e61b32b8e
* feat(xai): add grok-4.20 beta 2 models with pricing (#23900)
Add three grok-4.20 beta 2 model variants from xAI:
- grok-4.20-multi-agent-beta-0309 (reasoning + multi-agent)
- grok-4.20-beta-0309-reasoning (reasoning)
- grok-4.20-beta-0309-non-reasoning
Pricing (from https://docs.x.ai/docs/models):
- Input: $2.00/1M tokens ($0.20/1M cached)
- Output: $6.00/1M tokens
- Context: 2M tokens
All variants support vision, function calling, tool choice, and web search.
Closes LIT-2171
* docs: add Quick Install section for litellm --setup wizard (#23905)
* docs: add Quick Install section for litellm --setup wizard
* docs: clarify setup wizard is for local/beginner use
* feat(setup): interactive setup wizard + install.sh (#23644)
* feat(setup): add interactive setup wizard + install.sh
Adds `litellm --setup` — a Claude Code-style TUI onboarding wizard that
guides users through provider selection, API key entry, and proxy config
generation, then optionally starts the proxy immediately.
- litellm/setup_wizard.py: wizard with ASCII art, numbered provider menu
(OpenAI, Anthropic, Azure, Gemini, Bedrock, Ollama), API key prompts,
port/master-key config, and litellm_config.yaml generation
- litellm/proxy/proxy_cli.py: adds --setup flag that invokes the wizard
- scripts/install.sh: curl-installable script (detect OS/Python, pip
install litellm[proxy], launch wizard)
Usage:
curl -fsSL https://raw.githubusercontent.com/BerriAI/litellm/main/scripts/install.sh | sh
litellm --setup
* fix(install.sh): remove orange color, add LITELLM_BRANCH env var for branch installs
* fix(install.sh): install from git branch so --setup is available for QA
* fix(install.sh): remove stale LITELLM_BRANCH reference that caused unbound variable error
* fix(install.sh): force-reinstall from git to bypass cached PyPI version
* fix(install.sh): show pip progress bar during install
* fix(install.sh): always launch wizard via $PYTHON_BIN -m litellm, not PATH binary
* fix(install.sh): use litellm.proxy.proxy_cli module (no __main__.py exists)
* fix(install.sh): suppress RuntimeWarning from module invocation
* fix(install.sh): use Python bin-dir litellm binary to avoid CWD sys.path shadowing
* fix(install.sh): use sysconfig.get_path('scripts') to find pip-installed litellm binary
* fix(install.sh): redirect stdin from /dev/tty on exec so wizard gets terminal, not exhausted pipe
* fix(install.sh): warn about git clone duration, drop --no-cache-dir so re-runs are faster
* feat(setup_wizard): arrow-key selector, updated model names
* fix(setup_wizard): use sysconfig binary to start proxy, not python -m litellm
* feat(setup_wizard): credential validation after key entry + clear next-steps after proxy start
* style(install.sh): show git clone warning in blue
* refactor(setup_wizard): class with static methods, use check_valid_key from litellm.utils
* address greptile review: fix yaml escaping, port validation, display name collisions, tests
- setup_wizard.py: add _yaml_escape() for safe YAML embedding of API keys
- setup_wizard.py: add _styled_input() with readline ANSI ignore markers
- setup_wizard.py: change DIVIDER to _divider() fn to avoid import-time color capture
- setup_wizard.py: validate port range 1-65535, initialize before loop
- setup_wizard.py: qualify azure display names (azure-gpt-4o) to avoid collision with openai
- setup_wizard.py: work on env_copy in _build_config to avoid mutating caller's dict
- setup_wizard.py: skip model_list entries for providers with no credentials
- setup_wizard.py: prompt for azure deployment name
- setup_wizard.py: wrap os.execlp in try/except with friendly fallback
- setup_wizard.py: wrap config write in try/except OSError
- setup_wizard.py: fix _validate_and_report to use two print lines (no \r overwrite)
- setup_wizard.py: add .gitignore tip next to key storage notice
- setup_wizard.py: fix run_setup_wizard() return type annotation to None
- scripts/install.sh: drop pipefail (not supported by dash on Ubuntu when invoked as sh)
- scripts/install.sh: use litellm[proxy] from PyPI (not hardcoded dev branch)
- scripts/install.sh: guard /dev/tty read with -r check for Docker/CI compat
- scripts/install.sh: remove --force-reinstall to avoid downgrading dependencies
- tests/test_litellm/test_setup_wizard.py: 13 unit tests for _build_config and _yaml_escape
* style: black format setup_wizard.py
* fix: address remaining greptile issues - Windows compat, YAML quoting, credential flow
- guard termios/tty imports with try/except ImportError for Windows compat
- quote master_key as YAML double-quoted scalar (same as env vars)
- remove unused port param from _build_config signature
- _validate_and_report now returns the final key so re-entered creds are stored
- add test for master_key YAML quoting
* fix: add --port to suggested command, guard /dev/tty exec in install.sh
* fix: quote api_base in YAML, skip azure if no deployment, only redraw on state change
* fix: address greptile review comments
- _yaml_escape: add control character escaping (\n, \r, \t)
- test: fix tautological assertion in test_build_config_azure_no_deployment_skipped
- test: add tests for control character escaping in _yaml_escape
* feat(ui): remove Chat UI page link and banner from sidebar and playground (#23908)
* feat(guardrails): MCPJWTSigner - built-in guardrail for zero trust MCP auth (#23897)
* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers
* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.
* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.
* Allow pre_mcp_call guardrail hooks to mutate outbound MCP headers
* Enhance MCPServerManager to support hook-modified arguments and extra headers. Update tests to validate argument mutation and header injection behavior, including warnings for OpenAPI-backed servers when headers are present.
* Refactor MCPServerManager to raise HTTPException for extra headers in OpenAPI-backed servers. Update tests to reflect this change, ensuring proper exception handling instead of logging warnings.
* feat(guardrails): add MCPJWTSigner built-in guardrail for zero trust MCP auth
Signs outbound MCP tool calls with a LiteLLM-issued RS256 JWT so MCP servers
can trust a single signing authority instead of every upstream IdP.
Enable in config.yaml:
guardrails:
- guardrail_name: mcp-jwt-signer
litellm_params:
guardrail: mcp_jwt_signer
mode: pre_mcp_call
default_on: true
JWT carries sub (user_id), act.sub (team_id, RFC 8693), tool-level scope, iss,
aud, iat/exp/nbf. RSA-2048 keypair auto-generated at startup unless
MCP_JWT_SIGNING_KEY env var is set.
Adds /.well-known/jwks.json endpoint and jwks_uri to /.well-known/openid-configuration
so MCP servers can verify LiteLLM-issued tokens via OIDC discovery.
* Update MCPServerManager to raise HTTPException with status code 400 for extra headers in OpenAPI-backed servers. Adjust tests to verify the correct status code and exception message.
* fix: address P1 issues in MCPJWTSigner
- OpenAPI servers: warn + skip header injection instead of 500
- JWKS Cache-Control: 5min for auto-generated keys, 1h for persistent
- sub claim: fallback to apikey:{token_hash} for anonymous callers
- ttl_seconds: validate > 0 at init time
* docs: add MCP zero trust auth guide with architecture diagram
* docs: add FastMCP JWT verification guide to zero trust doc
* fix: address remaining Greptile review issues (round 2)
- mcp_server_manager: warn when hook Authorization overwrites existing header
- __init__: remove _mcp_jwt_signer_instance from __all__ (private internal)
- discoverable_endpoints: copy dict instead of mutating in-place on OIDC augmentation
- test docstring: reflect warn-and-continue behavior for OpenAPI servers
- test: update scope assertions for least-privilege (no mcp:tools/list on tool-call JWTs)
* fix: address Greptile round 3 feedback
- initialize_guardrail: validate mode='pre_mcp_call' at init time — misconfigured
mode silently bypasses JWT injection, which is a zero-trust bypass
- _build_claims: remove duplicate inline 'import re' (module-level import already present)
- _types.py: add TODO comment explaining jwt_claims is forward-compat plumbing
for a follow-up PR that will forward upstream IdP claims into outbound MCP JWTs
* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes
Addresses all missing pieces from the scoping doc review:
FR-5 (Verify + re-sign): MCPJWTSigner now accepts access_token_discovery_uri
and token_introspection_endpoint. When set, the incoming Bearer token is
extracted from raw_headers (threaded through pre_call_tool_check), verified
against the IdP's JWKS (JWT) or introspected (opaque), and only re-signed if
valid. Falls back to user_api_key_dict.jwt_claims for LiteLLM JWT-auth mode.
FR-12 (Configurable end-user identity mapping): end_user_claim_sources
ordered list drives sub resolution — sources: token:<claim>, litellm:user_id,
litellm:email, litellm:end_user_id, litellm:team_id.
FR-13 (Claim operations): add_claims (insert-if-absent), set_claims (always
override), remove_claims (delete) applied in that order.
FR-14 (Two-token model): channel_token_audience + channel_token_ttl issue a
second JWT injected as x-mcp-channel-token: Bearer <token>.
FR-15 (Incoming claim validation): required_claims raises HTTP 403 when any
listed claim is absent; optional_claims passes listed claims from verified
token into the outbound JWT.
FR-9 (Debug headers): debug_headers: true emits x-litellm-mcp-debug with kid,
sub, iss, exp, scope.
FR-10 (Configurable scopes): allowed_scopes replaces auto-generation. Also
fixed: tool-call JWTs no longer grant mcp:tools/list (overpermission).
P1 fixes:
- proxy/utils.py: _convert_mcp_hook_response_to_kwargs merges rather than
replaces extra_headers, preserving headers from prior guardrails.
- mcp_server_manager.py: warns when hook injects Authorization alongside a
server-configured authentication_token (previously silent).
- mcp_server_manager.py: pre_call_tool_check now accepts raw_headers and
extracts incoming_bearer_token so FR-5 verification has the raw token.
- proxy/utils.py: remove stray inline import inspect inside loop (pre-existing
lint error, now cleaned up).
Tests: 43 passing (28 new tests covering all FR flags + P1 fixes).
* feat(mcp_jwt_signer): add verify+re-sign, claim ops, two-token model, configurable scopes (core)
Remaining files from the FR implementation:
mcp_jwt_signer.py — full rewrite with all new params:
FR-5: access_token_discovery_uri, token_introspection_endpoint,
verify_issuer, verify_audience + _verify_incoming_jwt(),
_introspect_opaque_token()
FR-12: end_user_claim_sources ordered resolution chain
FR-13: add_claims, set_claims, remove_claims
FR-14: channel_token_audience, channel_token_ttl → x-mcp-channel-token
FR-15: required_claims (raises 403), optional_claims (passthrough)
FR-9: debug_headers → x-litellm-mcp-debug
FR-10: allowed_scopes; tool-call JWTs no longer over-grant tools/list
mcp_server_manager.py:
- pre_call_tool_check gains raw_headers param to extract incoming_bearer_token
- Silent Authorization override warning fixed: now fires when server has
authentication_token AND hook injects Authorization
tests/test_mcp_jwt_signer.py:
28 new tests covering all FR flags + P1 fixes (43 total, all passing)
* fix(mcp_jwt_signer): address pre-landing review issues
- Remove stale TODO comment on UserAPIKeyAuth.jwt_claims — the field is
already populated and consumed by MCPJWTSigner in the same PR
- Fix _get_oidc_discovery to only cache the OIDC discovery doc when
jwks_uri is present; a malformed/empty doc now retries on the next
request instead of being permanently cached until proxy restart
- Add FR-5 test coverage for _fetch_jwks (cache hit/miss),
_get_oidc_discovery (cache/no-cache on bad doc), _verify_incoming_jwt
(valid token, expired token), _introspect_opaque_token (active,
inactive, no endpoint), and the end-to-end 401 hook path — 53 tests
total, all passing
* docs(mcp_zero_trust): rewrite as use-case guide covering all new JWT signer features
Add scenario-driven sections for each new config area:
- Verify+re-sign with Okta/Azure AD (access_token_discovery_uri,
end_user_claim_sources, token_introspection_endpoint)
- Enforcing caller attributes with required_claims / optional_claims
- Adding metadata via add_claims / set_claims / remove_claims
- Two-token model for AWS Bedrock AgentCore Gateway
(channel_token_audience / channel_token_ttl)
- Controlling scopes with allowed_scopes
- Debugging JWT rejections with debug_headers
Update JWT claims table to reflect configurable sub (end_user_claim_sources)
* fix(mcp_jwt_signer): wire all config.yaml params through initialize_guardrail
The factory was only passing issuer/audience/ttl_seconds to MCPJWTSigner.
All FR-5/9/10/12/13/14/15 params (access_token_discovery_uri,
end_user_claim_sources, add/set/remove_claims, channel_token_audience,
required/optional_claims, debug_headers, allowed_scopes, etc.) were
silently dropped, making every advertised advanced feature non-functional
when loaded from config.yaml.
Add regression test that asserts every param is wired through correctly.
* docs(mcp_zero_trust): add hero image
* docs(mcp_zero_trust): apply Linear-style edits
- Lead with the problem (unsigned direct calls bypass access controls)
- Shorter statement section headers instead of question-form headers
- Move diagram/OIDC discovery block after the reader is bought in
- Add 'read further only if you need to' callout after basic setup
- Two-token section now opens from the user problem not product jargon
- Add concrete 403 error response example in required_claims section
- Debug section opens from the symptom (MCP server returning 401)
- Lowercase claims reference header for consistency
* fix(mcp_jwt_signer): fix algorithm confusion attack + add OIDC discovery 24h TTL
- Remove alg from unverified JWT header; use signing_jwk.algorithm_name from JWKS key instead.
Reading alg from attacker-controlled headers enables alg:none / HS256 confusion attacks.
- Add _oidc_discovery_fetched_at timestamp and _OIDC_DISCOVERY_TTL = 86400 (24h).
Without a TTL the cached discovery doc never refreshes, so IdP key rotation is invisible.
---------
Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>
* fix(ci): stabilize CI - formatting, type errors, test polling, security CVEs, router bug, batch resolution
Fix 1: Run Black formatter on 35 files
Fix 2: Fix MyPy type errors:
- setup_wizard.py: add type annotation for 'selected' set variable
- user_api_key_auth.py: remove redundant type annotation on jwt_claims reassignment
Fix 3: Fix spend accuracy test burst 2 polling to wait for expected total
spend instead of just 'any increase' from burst 2
Fix 4: Bump Next.js 16.1.6 -> 16.1.7 to fix CVE-2026-27978, CVE-2026-27979,
CVE-2026-27980, CVE-2026-29057
Fix 5: Fix router _pre_call_checks model variable being overwritten inside
loop, causing wrong model lookups on subsequent deployments. Use local
_deployment_model variable instead.
Fix 6: Add missing resolve_output_file_ids_to_unified call in batch retrieve
non-terminal-to-terminal path (matching the terminal path behavior)
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* chore: regenerate poetry.lock to sync with pyproject.toml
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: format merged files from main and regenerate poetry.lock
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(mypy): annotate jwt_claims as Optional[dict] to fix type incompatibility
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): update router region test to use gpt-4.1-mini (fix flaky model lookup)
Replace deprecated gpt-3.5-turbo-1106 with gpt-4.1-mini + mock_response in
test_router_region_pre_call_check, following the same pattern used in commit
717d37cc5b for test_router_context_window_check_pre_call_check_out_group.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* ci: retry flaky logging_testing (async event loop race condition)
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): aggregate all mock calls in langfuse e2e test to fix race condition
The _verify_langfuse_call helper only inspected the last mock call
(mock_post.call_args), but the Langfuse SDK may split trace-create and
generation-create events across separate HTTP flush cycles. This caused
an IndexError when the last call's batch contained only one event type.
Fix: iterate over mock_post.call_args_list to collect batch items from
ALL calls. Also add a safety assertion after filtering by trace_id and
mark all langfuse e2e tests with @pytest.mark.flaky(retries=3) as an
extra safety net for any residual timing issues.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): black formatting + update OpenAPI compliance tests for spec changes
- Apply Black 26.x formatting to litellm_logging.py (parenthesized style)
- Update test_input_types_match_spec to follow $ref to InteractionsInput schema
(Google updated their OpenAPI spec to use $ref instead of inline oneOf)
- Update test_content_schema_uses_discriminator to handle discriminator without
explicit mapping (Google removed the mapping key from Content discriminator)
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* revert: undo incorrect Black 26.x formatting on litellm_logging.py
The file was correctly formatted for Black 23.12.1 (the version pinned
in pyproject.toml). The previous commit applied Black 26.x formatting
which was incompatible with the CI's Black version.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): deduplicate and sort langfuse batch events after aggregation
The Langfuse SDK may send the same event (e.g., trace-create) in
multiple flush cycles, causing duplicates when we aggregate from all
mock calls. After filtering by trace_id, deduplicate by keeping only
the first event of each type, then sort to ensure trace-create is at
index 0 and generation-create at index 1.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
---------
Co-authored-by: Noah Nistler <60981020+noahnistler@users.noreply.github.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
589 lines
22 KiB
Python
589 lines
22 KiB
Python
import asyncio
|
|
import copy
|
|
import json
|
|
import logging
|
|
import os
|
|
import sys
|
|
import threading
|
|
from typing import Any, Optional
|
|
from unittest.mock import AsyncMock, MagicMock, patch
|
|
|
|
import httpx
|
|
|
|
logging.basicConfig(level=logging.DEBUG)
|
|
sys.path.insert(0, os.path.abspath("../.."))
|
|
|
|
import litellm
|
|
from litellm import completion
|
|
from litellm.caching import InMemoryCache
|
|
from litellm.llms.custom_httpx.http_handler import AsyncHTTPHandler
|
|
|
|
litellm.num_retries = 3
|
|
litellm.success_callback = ["langfuse"]
|
|
os.environ["LANGFUSE_DEBUG"] = "True"
|
|
import time
|
|
|
|
import pytest
|
|
import pytest_asyncio
|
|
|
|
|
|
def assert_langfuse_request_matches_expected(
|
|
actual_request_body: dict,
|
|
expected_file_name: str,
|
|
trace_id: Optional[str] = None,
|
|
):
|
|
"""
|
|
Helper function to compare actual Langfuse request body with expected JSON file.
|
|
|
|
Args:
|
|
actual_request_body (dict): The actual request body received from the API call
|
|
expected_file_name (str): Name of the JSON file containing expected request body (e.g., "transcription.json")
|
|
"""
|
|
# Get the current directory and read the expected request body
|
|
pwd = os.path.dirname(os.path.realpath(__file__))
|
|
expected_body_path = os.path.join(
|
|
pwd, "langfuse_expected_request_body", expected_file_name
|
|
)
|
|
|
|
with open(expected_body_path, "r") as f:
|
|
expected_request_body = json.load(f)
|
|
|
|
# Filter out events that don't match the trace_id
|
|
if trace_id:
|
|
actual_request_body["batch"] = [
|
|
item
|
|
for item in actual_request_body["batch"]
|
|
if (item["type"] == "trace-create" and item["body"].get("id") == trace_id)
|
|
or (
|
|
item["type"] == "generation-create"
|
|
and item["body"].get("traceId") == trace_id
|
|
)
|
|
]
|
|
|
|
# When aggregating from multiple flush cycles, deduplicate by keeping
|
|
# only one trace-create and one generation-create per trace_id.
|
|
seen_types: dict = {}
|
|
deduped_batch: list = []
|
|
for item in actual_request_body["batch"]:
|
|
item_type = item["type"]
|
|
if item_type not in seen_types:
|
|
seen_types[item_type] = True
|
|
deduped_batch.append(item)
|
|
actual_request_body["batch"] = deduped_batch
|
|
|
|
# Ensure canonical order: trace-create first, generation-create second
|
|
actual_request_body["batch"].sort(
|
|
key=lambda x: 0 if x["type"] == "trace-create" else 1
|
|
)
|
|
|
|
print(
|
|
"actual_request_body after filtering", json.dumps(actual_request_body, indent=4)
|
|
)
|
|
|
|
assert len(actual_request_body["batch"]) >= 2, (
|
|
f"Expected at least 2 batch items (trace-create + generation-create) "
|
|
f"after filtering by trace_id={trace_id}, "
|
|
f"but got {len(actual_request_body['batch'])}. "
|
|
f"Items: {json.dumps(actual_request_body['batch'], indent=2)}"
|
|
)
|
|
|
|
# Replace dynamic values in actual request body
|
|
for item in actual_request_body["batch"]:
|
|
|
|
# Replace IDs with expected IDs
|
|
if item["type"] == "trace-create":
|
|
item["id"] = expected_request_body["batch"][0]["id"]
|
|
item["body"]["id"] = expected_request_body["batch"][0]["body"]["id"]
|
|
item["timestamp"] = expected_request_body["batch"][0]["timestamp"]
|
|
item["body"]["timestamp"] = expected_request_body["batch"][0]["body"][
|
|
"timestamp"
|
|
]
|
|
elif item["type"] == "generation-create":
|
|
item["id"] = expected_request_body["batch"][1]["id"]
|
|
item["body"]["id"] = expected_request_body["batch"][1]["body"]["id"]
|
|
item["timestamp"] = expected_request_body["batch"][1]["timestamp"]
|
|
item["body"]["startTime"] = expected_request_body["batch"][1]["body"][
|
|
"startTime"
|
|
]
|
|
item["body"]["endTime"] = expected_request_body["batch"][1]["body"][
|
|
"endTime"
|
|
]
|
|
item["body"]["completionStartTime"] = expected_request_body["batch"][1][
|
|
"body"
|
|
]["completionStartTime"]
|
|
if trace_id is None:
|
|
print("popping traceId")
|
|
item["body"].pop("traceId")
|
|
else:
|
|
item["body"]["traceId"] = trace_id
|
|
expected_request_body["batch"][1]["body"]["traceId"] = trace_id
|
|
|
|
# Replace SDK version with expected version
|
|
actual_request_body["batch"][0]["body"].pop("release", None)
|
|
actual_request_body["metadata"]["sdk_version"] = expected_request_body["metadata"][
|
|
"sdk_version"
|
|
]
|
|
# replace "public_key" with expected public key
|
|
actual_request_body["metadata"]["public_key"] = expected_request_body["metadata"][
|
|
"public_key"
|
|
]
|
|
actual_request_body["batch"][1]["body"]["metadata"] = expected_request_body[
|
|
"batch"
|
|
][1]["body"]["metadata"]
|
|
actual_request_body["metadata"]["sdk_integration"] = expected_request_body[
|
|
"metadata"
|
|
]["sdk_integration"]
|
|
actual_request_body["metadata"]["batch_size"] = expected_request_body["metadata"][
|
|
"batch_size"
|
|
]
|
|
# Assert the entire request body matches
|
|
assert (
|
|
actual_request_body == expected_request_body
|
|
), f"Difference in request bodies: {json.dumps(actual_request_body, indent=2)} != {json.dumps(expected_request_body, indent=2)}"
|
|
|
|
|
|
class TestLangfuseLogging:
|
|
@pytest_asyncio.fixture
|
|
async def mock_setup(self):
|
|
"""Common setup for Langfuse logging tests"""
|
|
from litellm._uuid import uuid
|
|
from unittest.mock import AsyncMock, patch
|
|
import httpx
|
|
|
|
# Create a mock Response object
|
|
mock_response = AsyncMock(spec=httpx.Response)
|
|
mock_response.status_code = 200
|
|
mock_response.json.return_value = {"status": "success"}
|
|
|
|
# Create mock for httpx.Client.post
|
|
mock_post = AsyncMock()
|
|
mock_post.return_value = mock_response
|
|
|
|
litellm.set_verbose = True
|
|
litellm.success_callback = ["langfuse"]
|
|
|
|
return {"trace_id": f"litellm-test-{str(uuid.uuid4())}", "mock_post": mock_post}
|
|
|
|
async def _verify_langfuse_call(
|
|
self,
|
|
mock_post,
|
|
expected_file_name: str,
|
|
trace_id: str,
|
|
):
|
|
"""Helper method to verify Langfuse API calls"""
|
|
await asyncio.sleep(3)
|
|
|
|
# Verify at least one call was made
|
|
assert mock_post.call_count >= 1
|
|
|
|
# Aggregate batch items from ALL calls — the Langfuse SDK may split
|
|
# trace-create and generation-create across separate HTTP flushes.
|
|
langfuse_url = "https://us.cloud.langfuse.com/api/public/ingestion"
|
|
all_batch_items: list = []
|
|
metadata: Optional[dict] = None
|
|
for call in mock_post.call_args_list:
|
|
url = call[0][0]
|
|
if url != langfuse_url:
|
|
continue
|
|
request_body = call[1].get("content")
|
|
if request_body:
|
|
body = json.loads(request_body)
|
|
all_batch_items.extend(body.get("batch", []))
|
|
if metadata is None:
|
|
metadata = body.get("metadata")
|
|
|
|
assert len(all_batch_items) > 0, "No Langfuse ingestion calls found"
|
|
assert metadata is not None, "No metadata found in Langfuse calls"
|
|
|
|
actual_request_body = {
|
|
"batch": all_batch_items,
|
|
"metadata": metadata,
|
|
}
|
|
|
|
print("\nMocked Request Details (aggregated from all calls):")
|
|
print(f"Request Body: {json.dumps(actual_request_body, indent=4)}")
|
|
|
|
assert_langfuse_request_matches_expected(
|
|
actual_request_body,
|
|
expected_file_name,
|
|
trace_id,
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_completion(self, mock_setup):
|
|
"""Test Langfuse logging for chat completion"""
|
|
setup = mock_setup
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
await litellm.acompletion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response="Hello! How can I assist you today?",
|
|
metadata={"trace_id": setup["trace_id"]},
|
|
)
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"], "completion.json", setup["trace_id"]
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_completion_with_tags(self, mock_setup):
|
|
"""Test Langfuse logging for chat completion with tags"""
|
|
setup = mock_setup
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
await litellm.acompletion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response="Hello! How can I assist you today?",
|
|
metadata={
|
|
"trace_id": setup["trace_id"],
|
|
"tags": ["test_tag", "test_tag_2"],
|
|
},
|
|
)
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"], "completion_with_tags.json", setup["trace_id"]
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_completion_with_tags_stream(self, mock_setup):
|
|
"""Test Langfuse logging for chat completion with tags"""
|
|
setup = mock_setup
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
await litellm.acompletion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response="Hello! How can I assist you today?",
|
|
metadata={
|
|
"trace_id": setup["trace_id"],
|
|
"tags": ["test_tag_stream", "test_tag_2_stream"],
|
|
},
|
|
)
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"],
|
|
"completion_with_tags_stream.json",
|
|
setup["trace_id"],
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_completion_with_langfuse_metadata(self, mock_setup):
|
|
"""Test Langfuse logging for chat completion with metadata for langfuse"""
|
|
setup = mock_setup
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
await litellm.acompletion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response="Hello! How can I assist you today?",
|
|
metadata={
|
|
"trace_id": setup["trace_id"],
|
|
"tags": ["test_tag", "test_tag_2"],
|
|
"generation_name": "test_generation_name",
|
|
"parent_observation_id": "test_parent_observation_id",
|
|
"version": "test_version",
|
|
"trace_user_id": "test_user_id",
|
|
"session_id": "test_session_id",
|
|
"trace_name": "test_trace_name",
|
|
"trace_metadata": {"test_key": "test_value"},
|
|
"trace_version": "test_trace_version",
|
|
"trace_release": "test_trace_release",
|
|
},
|
|
)
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"],
|
|
"completion_with_langfuse_metadata.json",
|
|
setup["trace_id"],
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_with_non_serializable_metadata(self, mock_setup):
|
|
"""Test Langfuse logging with metadata that requires preparation (Pydantic models, sets, etc)"""
|
|
from pydantic import BaseModel
|
|
from typing import Set
|
|
import datetime
|
|
|
|
class UserPreferences(BaseModel):
|
|
favorite_colors: Set[str]
|
|
last_login: datetime.datetime
|
|
settings: dict
|
|
|
|
setup = mock_setup
|
|
|
|
test_metadata = {
|
|
"user_prefs": UserPreferences(
|
|
favorite_colors={"red", "blue"},
|
|
last_login=datetime.datetime.now(),
|
|
settings={"theme": "dark", "notifications": True},
|
|
),
|
|
"nested_set": {
|
|
"inner_set": {1, 2, 3},
|
|
"inner_pydantic": UserPreferences(
|
|
favorite_colors={"green", "yellow"},
|
|
last_login=datetime.datetime.now(),
|
|
settings={"theme": "light"},
|
|
),
|
|
},
|
|
"trace_id": setup["trace_id"],
|
|
}
|
|
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
response = await litellm.acompletion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response="Hello! How can I assist you today?",
|
|
metadata=test_metadata,
|
|
)
|
|
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"],
|
|
"completion_with_complex_metadata.json",
|
|
setup["trace_id"],
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.parametrize(
|
|
"test_metadata, response_json_file",
|
|
[
|
|
({"a": 1, "b": 2, "c": 3}, "simple_metadata.json"),
|
|
(
|
|
{"a": {"nested_a": 1}, "b": {"nested_b": 2}},
|
|
"nested_metadata.json",
|
|
),
|
|
({"a": [1, 2, 3], "b": {4, 5, 6}}, "simple_metadata2.json"),
|
|
(
|
|
{"a": (1, 2), "b": frozenset([3, 4]), "c": {"d": [5, 6]}},
|
|
"simple_metadata3.json",
|
|
),
|
|
({"lock": threading.Lock()}, "metadata_with_lock.json"),
|
|
({"func": lambda x: x + 1}, "metadata_with_function.json"),
|
|
(
|
|
{
|
|
"int": 42,
|
|
"str": "hello",
|
|
"list": [1, 2, 3],
|
|
"set": {4, 5},
|
|
"dict": {"nested": "value"},
|
|
"non_copyable": threading.Lock(),
|
|
"function": print,
|
|
},
|
|
"complex_metadata.json",
|
|
),
|
|
(
|
|
{"list": ["list", "not", "a", "dict"]},
|
|
"complex_metadata_2.json",
|
|
),
|
|
({}, "empty_metadata.json"),
|
|
],
|
|
)
|
|
@pytest.mark.flaky(retries=6, delay=1)
|
|
async def test_langfuse_logging_with_various_metadata_types(
|
|
self, mock_setup, test_metadata, response_json_file
|
|
):
|
|
"""Test Langfuse logging with various metadata types including non-serializable objects"""
|
|
import threading
|
|
|
|
setup = mock_setup
|
|
|
|
if test_metadata is not None:
|
|
test_metadata["trace_id"] = setup["trace_id"]
|
|
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
await litellm.acompletion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response="Hello! How can I assist you today?",
|
|
metadata=test_metadata,
|
|
)
|
|
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"],
|
|
response_json_file,
|
|
setup["trace_id"],
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_completion_with_malformed_llm_response(
|
|
self, mock_setup
|
|
):
|
|
"""Test Langfuse logging for chat completion with malformed LLM response"""
|
|
setup = mock_setup
|
|
litellm._turn_on_debug()
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
mock_response = litellm.ModelResponse(
|
|
choices=[],
|
|
usage=litellm.Usage(
|
|
prompt_tokens=10,
|
|
completion_tokens=10,
|
|
total_tokens=20,
|
|
),
|
|
model="gpt-3.5-turbo",
|
|
object="chat.completion",
|
|
created=1723081200,
|
|
).model_dump()
|
|
await litellm.acompletion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response=mock_response,
|
|
metadata={"trace_id": setup["trace_id"]},
|
|
)
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"], "completion_with_no_choices.json", setup["trace_id"]
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_completion_with_bedrock_llm_response(
|
|
self, mock_setup
|
|
):
|
|
"""Test Langfuse logging for chat completion with malformed LLM response"""
|
|
setup = mock_setup
|
|
litellm._turn_on_debug()
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
mock_response = litellm.ModelResponse(
|
|
choices=[],
|
|
usage=litellm.Usage(
|
|
prompt_tokens=10,
|
|
completion_tokens=10,
|
|
total_tokens=20,
|
|
),
|
|
model="anthropic.claude-3-5-sonnet-20240620-v1:0",
|
|
object="chat.completion",
|
|
created=1723081200,
|
|
).model_dump()
|
|
await litellm.acompletion(
|
|
model="bedrock/anthropic.claude-3-5-sonnet-20240620-v1:0",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response=mock_response,
|
|
metadata={"trace_id": setup["trace_id"]},
|
|
aws_access_key_id="fake-key",
|
|
aws_secret_access_key="fake-key",
|
|
aws_region="us-east-1",
|
|
)
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"], "completion_with_bedrock_call.json", setup["trace_id"]
|
|
)
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_completion_with_vertex_llm_response(
|
|
self, mock_setup
|
|
):
|
|
"""Test Langfuse logging for chat completion with malformed LLM response"""
|
|
setup = mock_setup
|
|
litellm._turn_on_debug()
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
mock_response = litellm.ModelResponse(
|
|
choices=[],
|
|
usage=litellm.Usage(
|
|
prompt_tokens=10,
|
|
completion_tokens=10,
|
|
total_tokens=20,
|
|
),
|
|
model="vertex/gemini-2.0-flash-001",
|
|
object="chat.completion",
|
|
created=1723081200,
|
|
).model_dump()
|
|
await litellm.acompletion(
|
|
model="vertex_ai/gemini-2.0-flash-001",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response=mock_response,
|
|
metadata={"trace_id": setup["trace_id"]},
|
|
vertex_credentials="my-mock-credentials",
|
|
api_key="my-mock-credentials-2",
|
|
)
|
|
await self._verify_langfuse_call(
|
|
setup["mock_post"], "completion_with_vertex_call.json", setup["trace_id"]
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_vllm_embedding(self, mock_setup):
|
|
"""
|
|
Test that the request sent to the vllm embedding endpoint is correct.
|
|
|
|
Verifies the request body matches the expected JSON fixture,
|
|
including that the hosted_vllm/ prefix is stripped from the model name
|
|
and that no unexpected fields (e.g. encoding_format) are included.
|
|
"""
|
|
setup = mock_setup
|
|
|
|
vllm_response_data = {
|
|
"object": "list",
|
|
"data": [{"object": "embedding", "index": 0, "embedding": [0.1, 0.2, 0.3]}],
|
|
"model": "BAAI/bge-small-en-v1.5",
|
|
"usage": {"prompt_tokens": 10, "total_tokens": 10},
|
|
}
|
|
mock_vllm_response = httpx.Response(
|
|
status_code=200,
|
|
json=vllm_response_data,
|
|
)
|
|
|
|
mock_async_client = AsyncHTTPHandler()
|
|
mock_async_client.post = AsyncMock(return_value=mock_vllm_response)
|
|
|
|
with patch("httpx.Client.post", setup["mock_post"]):
|
|
await litellm.aembedding(
|
|
model="hosted_vllm/BAAI/bge-small-en-v1.5",
|
|
input=["Hello from litellm!"],
|
|
api_base="http://my-fake-vllm.com/v1",
|
|
metadata={"trace_id": setup["trace_id"]},
|
|
client=mock_async_client,
|
|
)
|
|
|
|
# Verify the request sent to vllm matches the expected JSON fixture
|
|
assert mock_async_client.post.call_count == 1
|
|
actual_vllm_request = mock_async_client.post.call_args.kwargs["json"]
|
|
|
|
pwd = os.path.dirname(os.path.realpath(__file__))
|
|
expected_body_path = os.path.join(
|
|
pwd, "langfuse_expected_request_body", "embedding_with_vllm.json"
|
|
)
|
|
with open(expected_body_path, "r") as f:
|
|
expected_vllm_request = json.load(f)
|
|
|
|
assert actual_vllm_request == expected_vllm_request, (
|
|
f"vllm request body mismatch:\n"
|
|
f"actual: {json.dumps(actual_vllm_request, indent=2)}\n"
|
|
f"expected: {json.dumps(expected_vllm_request, indent=2)}"
|
|
)
|
|
|
|
@pytest.mark.asyncio
|
|
@pytest.mark.flaky(retries=3, delay=1)
|
|
async def test_langfuse_logging_with_router(self, mock_setup):
|
|
"""Test Langfuse logging with router"""
|
|
litellm._turn_on_debug()
|
|
router = litellm.Router(
|
|
model_list=[
|
|
{
|
|
"model_name": "gpt-3.5-turbo",
|
|
"litellm_params": {
|
|
"model": "gpt-3.5-turbo",
|
|
"mock_response": "Hello! How can I assist you today?",
|
|
"api_key": "test_api_key",
|
|
}
|
|
}
|
|
]
|
|
)
|
|
with patch("httpx.Client.post", mock_setup["mock_post"]):
|
|
mock_response = litellm.ModelResponse(
|
|
choices=[],
|
|
usage=litellm.Usage(
|
|
prompt_tokens=10,
|
|
completion_tokens=10,
|
|
total_tokens=20,
|
|
),
|
|
model="gpt-3.5-turbo",
|
|
object="chat.completion",
|
|
created=1723081200,
|
|
).model_dump()
|
|
await router.acompletion(
|
|
model="gpt-3.5-turbo",
|
|
messages=[{"role": "user", "content": "Hello!"}],
|
|
mock_response=mock_response,
|
|
metadata={"trace_id": mock_setup["trace_id"]},
|
|
)
|
|
await self._verify_langfuse_call(
|
|
mock_setup["mock_post"], "completion_with_router.json", mock_setup["trace_id"]
|
|
)
|