mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 12:48:57 +00:00
2c733c00f5
* test: modernize models used in CircleCI e2e test suites
Replaces obsolete models (gpt-4o, gpt-4o-mini, gpt-3.5-turbo,
claude-3-5-sonnet-20240620, claude-sonnet-4-20250514) with current
equivalents across the e2e_openai_endpoints and
proxy_e2e_anthropic_messages_tests CircleCI jobs.
- gpt-4o -> gpt-5.5 (responses API e2e tests)
- gpt-4o-mini -> gpt-5-mini (websocket responses, oai_misc_config)
- gpt-4o-mini-2024-07-18 -> gpt-4.1-mini-2025-04-14 (fine-tuning,
still actively fine-tunable)
- gpt-4 / gpt-3.5-turbo target_model_names example -> gpt-5.5 /
gpt-5-mini
- bedrock claude-3-5-sonnet-20240620 batch entry -> haiku-4-5-20251001
(also aligning oai_misc_config model_name with what
test_bedrock_batches_api.py actually requests)
- bedrock claude-sonnet-4-20250514 (deprecated, retires 2026-06-15)
-> claude-sonnet-4-5-20250929
* test: point bedrock-claude-sonnet-4 alias at Sonnet 4.6, not 4.5
Greptile/Cursor flagged that after the previous commit, the
bedrock-claude-sonnet-4 alias collided with bedrock-claude-sonnet-4.5
(both pointed to claude-sonnet-4-5-20250929). Rename to
bedrock-claude-sonnet-4.6 and point it at the Sonnet 4.6 Bedrock ID
(us.anthropic.claude-sonnet-4-6, already in the litellm model
registry) so the alias name matches the underlying model version.
* test: modernize models across remaining CI-mounted configs & tests
Expands the modernization sweep to all CircleCI-mounted proxy configs
and to test directories where the model literal is a fixture/route key
(not the test's subject).
Config changes:
- proxy_server_config.yaml: bump gpt-3.5-turbo / gpt-3.5-turbo-1106 /
gpt-4o / gemini-1.5-flash / dall-e-3 underlying models; rename
gpt-3.5-turbo-end-user-test alias to gpt-5-mini-end-user-test; bump
text-embedding-ada-002 underlying to text-embedding-3-small. User-
facing aliases (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, etc.)
preserved for backward compatibility with tests.
- simple_config.yaml, otel_test_config.yaml, spend_tracking_config.yaml:
bump gpt-3.5-turbo underlying to gpt-5-mini.
- pass_through_config.yaml: claude-3-5-sonnet / claude-3-7-sonnet /
claude-3-haiku entries replaced with claude-sonnet-4-5 / claude-
haiku-4-5 / claude-opus-4-7.
- oai_misc_config.yaml: align alias name with the gpt-5-mini rename.
Test changes (proactive: claude-sonnet-4-20250514 / claude-opus-4-
20250514 retire 2026-06-15):
- tests/llm_translation/test_anthropic_completion.py: bump 3 references
+ paired Vertex AI ID to claude-sonnet-4-5.
- tests/llm_translation/test_optional_params.py: bump 2 references.
- tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py
and test_bedrock_anthropic_messages_test.py: bump router fixtures
using the deprecated model IDs.
- tests/pass_through_unit_tests/base_anthropic_messages_tool_search_test.py:
modernize docstring examples.
- tests/test_end_users.py: update references to renamed alias.
* test: modernize placeholder model literals in router_unit_tests
Mass replace_all on fixture/placeholder model literals across the
router_unit_tests/ suite (model name is a routing key / label, not the
test subject). Sub-agent sweep so far — additional commits will follow
for logging_callback_tests/, enterprise/, top-level tests/test_*.py,
and other CI-mounted dirs.
Mappings applied:
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 / claude-3-opus-20240229 /
claude-3-haiku-20240307 / claude-3-5-sonnet-20240620 ->
claude-sonnet-4-5-20250929 / claude-opus-4-7 /
claude-haiku-4-5-20251001 as appropriate
Explicitly preserved:
- gpt-4o-mini-* variants (transcribe, tts, etc.) where they're current
- gpt-4-turbo / gpt-4-vision-preview / gpt-4-0613 (subject literals)
- JSONL batch body literals
- Mock LLM response model fields (must match upstream)
- Fake/mock identifiers
* test: modernize placeholder model literals across remaining CI suites
Sub-agent sweep across logging_callback_tests/, guardrails_tests/,
enterprise/, pass_through_unit_tests/, otel_tests/,
llm_responses_api_testing/, batches_tests/, spend_tracking_tests/,
litellm_utils_tests/, unified_google_tests/, and a few top-level
tests/test_*.py files where the model literal is a fixture or
placeholder (router model_list, mock standard logging payload, mock
callback data) rather than the test's subject.
Mappings applied (see scope notes below):
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5.5 (corrected from initial gpt-5 — bare gpt-5
is not a valid OpenAI alias; only gpt-5.5 / gpt-5.4 / gpt-5.2-codex
/ gpt-5-mini exist)
- gpt-4o-mini (bare) -> gpt-5-mini
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 -> claude-sonnet-4-5-20250929
- claude-3-opus-20240229 -> claude-opus-4-7
- claude-3-haiku-20240307 -> claude-haiku-4-5-20251001
- claude-3-5-sonnet-20240620/20241022 -> claude-sonnet-4-5-20250929
- claude-3-7-sonnet-20250219 -> claude-sonnet-4-6
- gemini-1.5-flash -> gemini-2.5-flash
- gemini-1.5-pro -> gemini-2.5-pro
Explicitly preserved (not modernized):
- llm_translation/ tests where model is the SUBJECT (provider-specific
translation/transformation logic). Only the deprecated 20250514
references were already bumped in a prior commit.
- Cost-calc / tokenizer subject tests in test_utils.py (skip-ranges
documented by the sub-agent).
- Bedrock model IDs in test_health_check.py path-stripping tests.
- JSONL batch request bodies and mock LLM response bodies (must match
upstream literal).
- Langfuse expected-request-body JSON fixtures (cost values are exact-
match-asserted; changing the model would shift response_cost).
- gpt-3.5-turbo-instruct (text-completion endpoint; no modern OpenAI
equivalent).
- Top-level tests calling the proxy through user-facing aliases
(gpt-3.5-turbo, gpt-4, text-embedding-ada-002, dall-e-3) — aliases
in proxy_server_config.yaml stay; only the underlying model was
bumped.
- tests/test_gpt5_azure_temperature_support.py (the test's whole point
is model-name handling).
- Fake / mock / openai/fake identifiers.
Notable side fixes:
- test_spend_accuracy_tests.py: UPSTREAM_MODEL now matches what
spend_tracking_config.yaml's proxy actually routes to (gpt-5-mini),
resolving a latent inconsistency.
- proxy_server_config.yaml: bare `gpt-5` alias renamed to `gpt-5.5`
(bare gpt-5 is not a valid OpenAI alias).
- test_batches_logging_unit_tests.py: explicit_models list entries
kept distinct (gpt-5-mini + gpt-5.5) after bulk rename.
* test: fix CI failures from model modernization sweep
CI surfaced 4 categories of regression from the bulk modernization:
1. Azure deployment names are customer-specific. Reverted:
- tests/litellm_utils_tests/test_health_check.py: azure/text-
embedding-3-small -> azure/text-embedding-ada-002 (the CI Azure
account does not have a text-embedding-3-small deployment).
- tests/logging_callback_tests/test_custom_callback_router.py:
same revert for two router fixtures driving aembedding.
2. gpt-5 family does not accept temperature != 1. Tests that pass a
custom temperature swapped from gpt-5-mini to gpt-4.1-mini (modern
non-reasoning OpenAI mini that still accepts temperature/logprobs):
- tests/logging_callback_tests/test_datadog.py
- tests/logging_callback_tests/test_langsmith_unit_test.py
- tests/logging_callback_tests/test_otel_logging.py
3. proxy_server_config.yaml's gpt-3.5-turbo-large alias was routing to
gpt-5.5 (a reasoning model that rejects logprobs). The proxy test
tests/test_openai_endpoints.py::test_chat_completion_streaming
exercises logprobs/top_logprobs through that alias. Bumped the
underlying model to gpt-4.1 (non-reasoning, still modern).
4. tests/logging_callback_tests/test_gcs_pub_sub.py asserts against a
pinned JSON fixture (gcs_pub_sub_body/spend_logs_payload.json) with
hardcoded model="gpt-4o" and a model-specific spend value. Reverted
the litellm.acompletion calls in the test to model="gpt-4o" so the
fixture's exact-match assertions still hold.
5. tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py:
anthropic.messages.create routing to openai/gpt-5-mini returned an
empty content[0] with max_tokens=100 (reasoning-token consumption).
Swapped to openai/gpt-4.1-mini.
* test: fix Assistants API model + 2 cursor[bot] review nits
1. pass_through_unit_tests/test_custom_logger_passthrough.py: gpt-5.5
isn't accepted by the /v1/assistants endpoint
("unsupported_model"). Switch to gpt-4.1-mini (modern, Assistants-
API-supported, non-reasoning).
2. example_config_yaml/pass_through_config.yaml: the previous sweep
bumped the claude-3-7-sonnet alias to claude-opus-4-7, which is a
tier change (Sonnet -> Opus). Map to claude-sonnet-4-6 to keep the
Sonnet tier intact. (Cursor bugbot review.)
3. example_config_yaml/simple_config.yaml: model_name was left as
gpt-3.5-turbo while the underlying was bumped to gpt-5-mini, which
muddles the "simple" example. Make both sides gpt-5-mini so the
most basic example is a straight 1:1 mapping again. (Cursor bugbot
review.)
* fix: revert gpt-4/gpt-3.5-turbo alias underlying to non-reasoning models
tests/test_openai_endpoints.py::test_completion calls the proxy alias
"gpt-4" with temperature=0, and other tests call gpt-3.5-turbo with
custom temperature / logprobs / the legacy /v1/completions endpoint.
The earlier modernization mapped both aliases to gpt-5.5 / gpt-5-mini,
which are reasoning models that reject temperature != 1 and don't
expose /v1/completions. Map the aliases to gpt-4.1 / gpt-4.1-mini
(modern non-reasoning OpenAI models) instead — keeps user-facing
aliases preserved while picking a current underlying that still
supports the parameters/endpoints the tests exercise.
589 lines
20 KiB
Python
589 lines
20 KiB
Python
import os
|
|
import sys
|
|
|
|
sys.path.insert(
|
|
0, os.path.abspath("../..")
|
|
) # Adds the parent directory to the system-path
|
|
|
|
import pytest
|
|
from litellm.integrations.langfuse.langfuse import (
|
|
LangFuseLogger,
|
|
)
|
|
from litellm.integrations.langfuse.langfuse_handler import LangFuseHandler
|
|
from litellm.litellm_core_utils.litellm_logging import DynamicLoggingCache
|
|
from unittest.mock import Mock, patch
|
|
from litellm.types.utils import (
|
|
StandardLoggingPayload,
|
|
StandardLoggingModelInformation,
|
|
StandardLoggingMetadata,
|
|
StandardLoggingHiddenParams,
|
|
StandardCallbackDynamicParams,
|
|
ModelResponse,
|
|
Choices,
|
|
Message,
|
|
TextCompletionResponse,
|
|
TextChoices,
|
|
)
|
|
|
|
|
|
def create_standard_logging_payload() -> StandardLoggingPayload:
|
|
return StandardLoggingPayload(
|
|
id="test_id",
|
|
call_type="completion",
|
|
response_cost=0.1,
|
|
response_cost_failure_debug_info=None,
|
|
status="success",
|
|
total_tokens=30,
|
|
prompt_tokens=20,
|
|
completion_tokens=10,
|
|
startTime=1234567890.0,
|
|
endTime=1234567891.0,
|
|
completionStartTime=1234567890.5,
|
|
model_map_information=StandardLoggingModelInformation(
|
|
model_map_key="gpt-5-mini", model_map_value=None
|
|
),
|
|
model="gpt-5-mini",
|
|
model_id="model-123",
|
|
model_group="openai-gpt",
|
|
api_base="https://api.openai.com",
|
|
metadata=StandardLoggingMetadata(
|
|
user_api_key_hash="test_hash",
|
|
user_api_key_org_id=None,
|
|
user_api_key_alias="test_alias",
|
|
user_api_key_team_id="test_team",
|
|
user_api_key_user_id="test_user",
|
|
user_api_key_team_alias="test_team_alias",
|
|
spend_logs_metadata=None,
|
|
requester_ip_address="127.0.0.1",
|
|
requester_metadata=None,
|
|
),
|
|
cache_hit=False,
|
|
cache_key=None,
|
|
saved_cache_cost=0.0,
|
|
request_tags=[],
|
|
end_user=None,
|
|
requester_ip_address="127.0.0.1",
|
|
messages=[{"role": "user", "content": "Hello, world!"}],
|
|
response={"choices": [{"message": {"content": "Hi there!"}}]},
|
|
error_str=None,
|
|
model_parameters={"stream": True},
|
|
hidden_params=StandardLoggingHiddenParams(
|
|
model_id="model-123",
|
|
cache_key=None,
|
|
api_base="https://api.openai.com",
|
|
response_cost="0.1",
|
|
additional_headers=None,
|
|
),
|
|
)
|
|
|
|
|
|
@pytest.fixture
|
|
def dynamic_logging_cache():
|
|
return DynamicLoggingCache()
|
|
|
|
|
|
global_langfuse_logger = LangFuseLogger(
|
|
langfuse_public_key="global_public_key",
|
|
langfuse_secret="global_secret",
|
|
langfuse_host="https://global.langfuse.com",
|
|
)
|
|
|
|
|
|
# IMPORTANT: Test that passing both langfuse_secret_key and langfuse_secret works
|
|
standard_params_1 = StandardCallbackDynamicParams(
|
|
langfuse_public_key="test_public_key",
|
|
langfuse_secret="test_secret",
|
|
langfuse_host="https://test.langfuse.com",
|
|
)
|
|
|
|
standard_params_2 = StandardCallbackDynamicParams(
|
|
langfuse_public_key="test_public_key",
|
|
langfuse_secret_key="test_secret",
|
|
langfuse_host="https://test.langfuse.com",
|
|
)
|
|
|
|
|
|
@pytest.mark.parametrize("globalLangfuseLogger", [None, global_langfuse_logger])
|
|
@pytest.mark.parametrize("standard_params", [standard_params_1, standard_params_2])
|
|
def test_get_langfuse_logger_for_request_with_dynamic_params(
|
|
dynamic_logging_cache, globalLangfuseLogger, standard_params
|
|
):
|
|
"""
|
|
If StandardCallbackDynamicParams contain langfuse credentials the returned Langfuse logger should use the dynamic params
|
|
|
|
the new Langfuse logger should be cached
|
|
|
|
Even if globalLangfuseLogger is provided, it should use dynamic params if they are passed
|
|
"""
|
|
|
|
result = LangFuseHandler.get_langfuse_logger_for_request(
|
|
standard_callback_dynamic_params=standard_params,
|
|
in_memory_dynamic_logger_cache=dynamic_logging_cache,
|
|
globalLangfuseLogger=globalLangfuseLogger,
|
|
)
|
|
|
|
assert isinstance(result, LangFuseLogger)
|
|
assert result.public_key == "test_public_key"
|
|
assert result.secret_key == "test_secret"
|
|
assert result.langfuse_host == "https://test.langfuse.com"
|
|
|
|
# Check if the logger is cached
|
|
cached_logger = dynamic_logging_cache.get_cache(
|
|
credentials={
|
|
"langfuse_public_key": "test_public_key",
|
|
"langfuse_secret": "test_secret",
|
|
"langfuse_host": "https://test.langfuse.com",
|
|
},
|
|
service_name="langfuse",
|
|
)
|
|
assert cached_logger is result
|
|
|
|
|
|
@pytest.mark.parametrize("globalLangfuseLogger", [None, global_langfuse_logger])
|
|
def test_get_langfuse_logger_for_request_with_no_dynamic_params(
|
|
dynamic_logging_cache, globalLangfuseLogger
|
|
):
|
|
"""
|
|
If StandardCallbackDynamicParams are not provided, the globalLangfuseLogger should be returned
|
|
"""
|
|
result = LangFuseHandler.get_langfuse_logger_for_request(
|
|
standard_callback_dynamic_params=StandardCallbackDynamicParams(),
|
|
in_memory_dynamic_logger_cache=dynamic_logging_cache,
|
|
globalLangfuseLogger=globalLangfuseLogger,
|
|
)
|
|
|
|
assert result is not None
|
|
assert isinstance(result, LangFuseLogger)
|
|
|
|
if globalLangfuseLogger is not None:
|
|
assert result.public_key == "global_public_key"
|
|
assert result.secret_key == "global_secret"
|
|
assert result.langfuse_host == "https://global.langfuse.com"
|
|
|
|
|
|
def test_dynamic_langfuse_credentials_are_passed():
|
|
# Test when credentials are passed
|
|
params_with_credentials = StandardCallbackDynamicParams(
|
|
langfuse_public_key="test_key",
|
|
langfuse_secret="test_secret",
|
|
langfuse_host="https://test.langfuse.com",
|
|
)
|
|
assert (
|
|
LangFuseHandler._dynamic_langfuse_credentials_are_passed(
|
|
params_with_credentials
|
|
)
|
|
is True
|
|
)
|
|
|
|
# Test when no credentials are passed
|
|
params_without_credentials = StandardCallbackDynamicParams()
|
|
assert (
|
|
LangFuseHandler._dynamic_langfuse_credentials_are_passed(
|
|
params_without_credentials
|
|
)
|
|
is False
|
|
)
|
|
|
|
# Test when only some credentials are passed
|
|
params_partial_credentials = StandardCallbackDynamicParams(
|
|
langfuse_public_key="test_key"
|
|
)
|
|
assert (
|
|
LangFuseHandler._dynamic_langfuse_credentials_are_passed(
|
|
params_partial_credentials
|
|
)
|
|
is True
|
|
)
|
|
|
|
|
|
def test_get_dynamic_langfuse_logging_config():
|
|
# Test with dynamic params
|
|
dynamic_params = StandardCallbackDynamicParams(
|
|
langfuse_public_key="dynamic_key",
|
|
langfuse_secret="dynamic_secret",
|
|
langfuse_host="https://dynamic.langfuse.com",
|
|
)
|
|
config = LangFuseHandler.get_dynamic_langfuse_logging_config(dynamic_params)
|
|
assert config["langfuse_public_key"] == "dynamic_key"
|
|
assert config["langfuse_secret"] == "dynamic_secret"
|
|
assert config["langfuse_host"] == "https://dynamic.langfuse.com"
|
|
|
|
# Test with no dynamic params
|
|
empty_params = StandardCallbackDynamicParams()
|
|
config = LangFuseHandler.get_dynamic_langfuse_logging_config(empty_params)
|
|
assert config["langfuse_public_key"] is None
|
|
assert config["langfuse_secret"] is None
|
|
assert config["langfuse_host"] is None
|
|
|
|
|
|
def test_return_global_langfuse_logger():
|
|
mock_cache = Mock()
|
|
global_logger = LangFuseLogger(
|
|
langfuse_public_key="global_key", langfuse_secret="global_secret"
|
|
)
|
|
|
|
# Test with existing global logger
|
|
result = LangFuseHandler._return_global_langfuse_logger(global_logger, mock_cache)
|
|
assert result == global_logger
|
|
|
|
# Test without global logger, but with cached logger, should return cached logger
|
|
mock_cache.get_cache.return_value = global_logger
|
|
result = LangFuseHandler._return_global_langfuse_logger(None, mock_cache)
|
|
assert result == global_logger
|
|
|
|
# Test without global logger and without cached logger, should create new logger
|
|
mock_cache.get_cache.return_value = None
|
|
with patch.object(
|
|
LangFuseHandler,
|
|
"_create_langfuse_logger_from_credentials",
|
|
return_value=global_logger,
|
|
):
|
|
result = LangFuseHandler._return_global_langfuse_logger(None, mock_cache)
|
|
assert result == global_logger
|
|
|
|
|
|
def test_get_langfuse_logger_for_request_with_cached_logger():
|
|
"""
|
|
Test that get_langfuse_logger_for_request returns the cached logger if it exists when dynamic params are passed
|
|
"""
|
|
mock_cache = Mock()
|
|
cached_logger = LangFuseLogger(
|
|
langfuse_public_key="cached_key", langfuse_secret="cached_secret"
|
|
)
|
|
mock_cache.get_cache.return_value = cached_logger
|
|
|
|
dynamic_params = StandardCallbackDynamicParams(
|
|
langfuse_public_key="test_key",
|
|
langfuse_secret="test_secret",
|
|
langfuse_host="https://test.langfuse.com",
|
|
)
|
|
|
|
result = LangFuseHandler.get_langfuse_logger_for_request(
|
|
standard_callback_dynamic_params=dynamic_params,
|
|
in_memory_dynamic_logger_cache=mock_cache,
|
|
globalLangfuseLogger=None,
|
|
)
|
|
|
|
assert result == cached_logger
|
|
mock_cache.get_cache.assert_called_once()
|
|
|
|
|
|
def test_get_langfuse_tags():
|
|
"""
|
|
Test that _get_langfuse_tags correctly extracts tags from the standard logging payload
|
|
"""
|
|
# Create a mock logging payload with tags
|
|
mock_payload = create_standard_logging_payload()
|
|
mock_payload["request_tags"] = ["tag1", "tag2", "test_tag"]
|
|
|
|
# Test with payload containing tags
|
|
result = global_langfuse_logger._get_langfuse_tags(mock_payload)
|
|
assert result == ["tag1", "tag2", "test_tag"]
|
|
|
|
# Test with payload without tags
|
|
mock_payload["request_tags"] = None
|
|
result = global_langfuse_logger._get_langfuse_tags(mock_payload)
|
|
assert result == []
|
|
|
|
# Test with empty tags list
|
|
mock_payload["request_tags"] = []
|
|
result = global_langfuse_logger._get_langfuse_tags(mock_payload)
|
|
assert result == []
|
|
|
|
|
|
@patch.dict(os.environ, {}, clear=True) # Start with empty environment
|
|
def test_get_langfuse_flush_interval():
|
|
"""
|
|
Test that _get_langfuse_flush_interval correctly reads from environment variable
|
|
or falls back to the provided flush_interval
|
|
"""
|
|
default_interval = 60
|
|
|
|
# Test when env var is not set
|
|
result = LangFuseLogger._get_langfuse_flush_interval(
|
|
flush_interval=default_interval
|
|
)
|
|
assert result == default_interval
|
|
|
|
# Test when env var is set
|
|
with patch.dict(os.environ, {"LANGFUSE_FLUSH_INTERVAL": "120"}):
|
|
result = LangFuseLogger._get_langfuse_flush_interval(
|
|
flush_interval=default_interval
|
|
)
|
|
assert result == 120
|
|
|
|
|
|
def test_langfuse_e2e_sync(monkeypatch):
|
|
from litellm import completion
|
|
import litellm
|
|
import respx
|
|
import httpx
|
|
import time
|
|
|
|
litellm.disable_aiohttp_transport = (
|
|
True # since this uses respx, we need to set use_aiohttp_transport to False
|
|
)
|
|
|
|
litellm._turn_on_debug()
|
|
monkeypatch.setattr(litellm, "success_callback", ["langfuse"])
|
|
|
|
with respx.mock:
|
|
# Mock Langfuse
|
|
# Mock any Langfuse endpoint
|
|
langfuse_mock = respx.post(
|
|
"https://*.cloud.langfuse.com/api/public/ingestion"
|
|
).mock(return_value=httpx.Response(200))
|
|
completion(
|
|
model="openai/my-fake-endpoint",
|
|
messages=[{"role": "user", "content": "hello from litellm"}],
|
|
stream=False,
|
|
mock_response="Hello from litellm 2",
|
|
)
|
|
|
|
time.sleep(3)
|
|
|
|
assert langfuse_mock.called
|
|
|
|
|
|
def test_get_chat_content_for_langfuse():
|
|
"""
|
|
Test that _get_chat_content_for_langfuse correctly extracts content from chat completion responses
|
|
"""
|
|
# Test with valid response
|
|
mock_response = ModelResponse(
|
|
choices=[Choices(message=Message(role="assistant", content="Hello world"))]
|
|
)
|
|
|
|
result = LangFuseLogger._get_chat_content_for_langfuse(mock_response)
|
|
assert result["content"] == "Hello world"
|
|
assert result["role"] == "assistant"
|
|
|
|
# Test with empty choices
|
|
mock_response = ModelResponse(choices=[])
|
|
result = LangFuseLogger._get_chat_content_for_langfuse(mock_response)
|
|
assert result is None
|
|
|
|
|
|
def test_get_text_completion_content_for_langfuse():
|
|
"""
|
|
Test that _get_text_completion_content_for_langfuse correctly extracts content from text completion responses
|
|
"""
|
|
# Test with valid response
|
|
mock_response = TextCompletionResponse(choices=[TextChoices(text="Hello world")])
|
|
result = LangFuseLogger._get_text_completion_content_for_langfuse(mock_response)
|
|
assert result == "Hello world"
|
|
|
|
# Test with empty choices
|
|
mock_response = TextCompletionResponse(choices=[])
|
|
result = LangFuseLogger._get_text_completion_content_for_langfuse(mock_response)
|
|
assert result is None
|
|
|
|
# Test with no choices field
|
|
mock_response = TextCompletionResponse()
|
|
result = LangFuseLogger._get_text_completion_content_for_langfuse(mock_response)
|
|
assert result is None
|
|
|
|
|
|
def test_apply_masking_function_with_string():
|
|
"""
|
|
Test that _apply_masking_function correctly applies masking to strings
|
|
"""
|
|
import re
|
|
|
|
def mask_credit_cards(data):
|
|
if isinstance(data, str):
|
|
return re.sub(r"\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b", "[CARD]", data)
|
|
return data
|
|
|
|
# Test with string containing credit card
|
|
input_str = "My card is 4532-1234-5678-9012"
|
|
result = LangFuseLogger._apply_masking_function(input_str, mask_credit_cards)
|
|
assert result == "My card is [CARD]"
|
|
assert "4532" not in result
|
|
|
|
# Test with string without sensitive data
|
|
input_str = "Hello world"
|
|
result = LangFuseLogger._apply_masking_function(input_str, mask_credit_cards)
|
|
assert result == "Hello world"
|
|
|
|
|
|
def test_apply_masking_function_with_dict():
|
|
"""
|
|
Test that _apply_masking_function correctly applies masking to nested dicts
|
|
"""
|
|
import re
|
|
|
|
def mask_emails(data):
|
|
if isinstance(data, str):
|
|
return re.sub(r"[\w\.-]+@[\w\.-]+", "[EMAIL]", data)
|
|
return data
|
|
|
|
# Test with dict containing messages
|
|
input_dict = {
|
|
"messages": [{"role": "user", "content": "My email is test@example.com"}]
|
|
}
|
|
result = LangFuseLogger._apply_masking_function(input_dict, mask_emails)
|
|
assert result["messages"][0]["content"] == "My email is [EMAIL]"
|
|
assert "test@example.com" not in str(result)
|
|
|
|
|
|
def test_apply_masking_function_with_none():
|
|
"""
|
|
Test that _apply_masking_function handles None correctly
|
|
"""
|
|
|
|
def dummy_mask(data):
|
|
return data
|
|
|
|
result = LangFuseLogger._apply_masking_function(None, dummy_mask)
|
|
assert result is None
|
|
|
|
|
|
def test_apply_masking_function_with_list():
|
|
"""
|
|
Test that _apply_masking_function correctly applies masking to lists
|
|
"""
|
|
import re
|
|
|
|
def mask_ssn(data):
|
|
if isinstance(data, str):
|
|
return re.sub(r"\b\d{3}-\d{2}-\d{4}\b", "[SSN]", data)
|
|
return data
|
|
|
|
input_list = ["SSN: 123-45-6789", "No sensitive data here"]
|
|
result = LangFuseLogger._apply_masking_function(input_list, mask_ssn)
|
|
assert result[0] == "SSN: [SSN]"
|
|
assert result[1] == "No sensitive data here"
|
|
|
|
|
|
def test_masking_function_isolated_from_other_loggers():
|
|
"""
|
|
Test that langfuse_masking_function is extracted from metadata and stored separately.
|
|
This ensures the callable doesn't leak to other logging integrations.
|
|
"""
|
|
from litellm.litellm_core_utils.litellm_logging import (
|
|
scrub_sensitive_keys_in_metadata,
|
|
)
|
|
|
|
def my_masking_fn(data):
|
|
return data
|
|
|
|
# Simulate litellm_params with masking function in metadata
|
|
litellm_params = {
|
|
"metadata": {
|
|
"langfuse_masking_function": my_masking_fn,
|
|
"other_key": "other_value",
|
|
}
|
|
}
|
|
|
|
# Scrub should extract the function
|
|
result = scrub_sensitive_keys_in_metadata(litellm_params)
|
|
|
|
# Function should be removed from metadata (won't leak to other loggers)
|
|
assert "langfuse_masking_function" not in result["metadata"]
|
|
|
|
# Function should be stored in dedicated key for Langfuse to access
|
|
assert result.get("_langfuse_masking_function") == my_masking_fn
|
|
|
|
# Other metadata should remain intact
|
|
assert result["metadata"]["other_key"] == "other_value"
|
|
|
|
|
|
def test_masking_function_not_in_metadata_when_not_provided():
|
|
"""
|
|
Test that scrub_sensitive_keys_in_metadata works normally when no masking function is provided.
|
|
"""
|
|
from litellm.litellm_core_utils.litellm_logging import (
|
|
scrub_sensitive_keys_in_metadata,
|
|
)
|
|
|
|
litellm_params = {
|
|
"metadata": {
|
|
"some_key": "some_value",
|
|
}
|
|
}
|
|
|
|
result = scrub_sensitive_keys_in_metadata(litellm_params)
|
|
|
|
# No _langfuse_masking_function should be added
|
|
assert "_langfuse_masking_function" not in result
|
|
|
|
# Original metadata should be unchanged
|
|
assert result["metadata"]["some_key"] == "some_value"
|
|
|
|
|
|
def test_langfuse_model_parameters_no_secret_leakage():
|
|
"""
|
|
Test that sensitive keys in optional_params (api_key, secret_fields,
|
|
authorization headers, etc.) are NOT passed to Langfuse as modelParameters.
|
|
Only whitelisted model parameters (temperature, top_p, etc.) should survive.
|
|
"""
|
|
from litellm.litellm_core_utils.model_param_helper import ModelParamHelper
|
|
|
|
optional_params_with_secrets = {
|
|
# Safe params that should be kept
|
|
"temperature": 0.7,
|
|
"top_p": 0.9,
|
|
"max_tokens": 100,
|
|
"stream": True,
|
|
# Sensitive params that must NOT leak
|
|
"api_key": "sk-secret-key-12345",
|
|
"api_base": "https://my-private-endpoint.com",
|
|
"secret_fields": {"raw_headers": {"Authorization": "Bearer sk-super-secret"}},
|
|
"authorization": "Bearer sk-another-secret",
|
|
"headers": {"X-Api-Key": "secret-header-value"},
|
|
}
|
|
|
|
sanitized = ModelParamHelper.get_standard_logging_model_parameters(
|
|
optional_params_with_secrets
|
|
)
|
|
|
|
# Safe params should be present
|
|
assert sanitized["temperature"] == 0.7
|
|
assert sanitized["top_p"] == 0.9
|
|
assert sanitized["max_tokens"] == 100
|
|
assert sanitized["stream"] is True
|
|
|
|
# Sensitive params must be excluded
|
|
assert "api_key" not in sanitized
|
|
assert "api_base" not in sanitized
|
|
assert "secret_fields" not in sanitized
|
|
assert "authorization" not in sanitized
|
|
assert "headers" not in sanitized
|
|
|
|
|
|
def test_langfuse_v2_uses_standard_logging_model_parameters():
|
|
"""
|
|
Test that _log_langfuse_v2 uses sanitized model_parameters from
|
|
standard_logging_object instead of raw optional_params, preventing
|
|
secret leakage to Langfuse traces.
|
|
"""
|
|
standard_logging_object = create_standard_logging_payload()
|
|
# Simulate standard_logging_object having safe model_parameters
|
|
standard_logging_object["model_parameters"] = {"temperature": 0.5, "stream": True}
|
|
|
|
# optional_params has secrets — these should NOT be used
|
|
optional_params_with_secrets = {
|
|
"temperature": 0.5,
|
|
"api_key": "sk-secret-key-12345",
|
|
"secret_fields": {"raw_headers": {"Authorization": "Bearer sk-secret"}},
|
|
}
|
|
|
|
# When standard_logging_object is available, its model_parameters should be used
|
|
sanitized = standard_logging_object.get(
|
|
"model_parameters", optional_params_with_secrets
|
|
)
|
|
assert "api_key" not in sanitized
|
|
assert "secret_fields" not in sanitized
|
|
assert sanitized["temperature"] == 0.5
|
|
|
|
# When standard_logging_object is None, ModelParamHelper should filter
|
|
from litellm.litellm_core_utils.model_param_helper import ModelParamHelper
|
|
|
|
fallback_sanitized = ModelParamHelper.get_standard_logging_model_parameters(
|
|
optional_params_with_secrets
|
|
)
|
|
assert "api_key" not in fallback_sanitized
|
|
assert "secret_fields" not in fallback_sanitized
|
|
assert fallback_sanitized["temperature"] == 0.5
|