Files
litellm/tests/logging_callback_tests/test_langfuse_unit_tests.py
T
Mateo Wang 2c733c00f5 chore(ci): modernize model references in tests and configs (#27856)
* test: modernize models used in CircleCI e2e test suites

Replaces obsolete models (gpt-4o, gpt-4o-mini, gpt-3.5-turbo,
claude-3-5-sonnet-20240620, claude-sonnet-4-20250514) with current
equivalents across the e2e_openai_endpoints and
proxy_e2e_anthropic_messages_tests CircleCI jobs.

- gpt-4o -> gpt-5.5 (responses API e2e tests)
- gpt-4o-mini -> gpt-5-mini (websocket responses, oai_misc_config)
- gpt-4o-mini-2024-07-18 -> gpt-4.1-mini-2025-04-14 (fine-tuning,
  still actively fine-tunable)
- gpt-4 / gpt-3.5-turbo target_model_names example -> gpt-5.5 /
  gpt-5-mini
- bedrock claude-3-5-sonnet-20240620 batch entry -> haiku-4-5-20251001
  (also aligning oai_misc_config model_name with what
  test_bedrock_batches_api.py actually requests)
- bedrock claude-sonnet-4-20250514 (deprecated, retires 2026-06-15)
  -> claude-sonnet-4-5-20250929

* test: point bedrock-claude-sonnet-4 alias at Sonnet 4.6, not 4.5

Greptile/Cursor flagged that after the previous commit, the
bedrock-claude-sonnet-4 alias collided with bedrock-claude-sonnet-4.5
(both pointed to claude-sonnet-4-5-20250929). Rename to
bedrock-claude-sonnet-4.6 and point it at the Sonnet 4.6 Bedrock ID
(us.anthropic.claude-sonnet-4-6, already in the litellm model
registry) so the alias name matches the underlying model version.

* test: modernize models across remaining CI-mounted configs & tests

Expands the modernization sweep to all CircleCI-mounted proxy configs
and to test directories where the model literal is a fixture/route key
(not the test's subject).

Config changes:
- proxy_server_config.yaml: bump gpt-3.5-turbo / gpt-3.5-turbo-1106 /
  gpt-4o / gemini-1.5-flash / dall-e-3 underlying models; rename
  gpt-3.5-turbo-end-user-test alias to gpt-5-mini-end-user-test; bump
  text-embedding-ada-002 underlying to text-embedding-3-small. User-
  facing aliases (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, etc.)
  preserved for backward compatibility with tests.
- simple_config.yaml, otel_test_config.yaml, spend_tracking_config.yaml:
  bump gpt-3.5-turbo underlying to gpt-5-mini.
- pass_through_config.yaml: claude-3-5-sonnet / claude-3-7-sonnet /
  claude-3-haiku entries replaced with claude-sonnet-4-5 / claude-
  haiku-4-5 / claude-opus-4-7.
- oai_misc_config.yaml: align alias name with the gpt-5-mini rename.

Test changes (proactive: claude-sonnet-4-20250514 / claude-opus-4-
20250514 retire 2026-06-15):
- tests/llm_translation/test_anthropic_completion.py: bump 3 references
  + paired Vertex AI ID to claude-sonnet-4-5.
- tests/llm_translation/test_optional_params.py: bump 2 references.
- tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py
  and test_bedrock_anthropic_messages_test.py: bump router fixtures
  using the deprecated model IDs.
- tests/pass_through_unit_tests/base_anthropic_messages_tool_search_test.py:
  modernize docstring examples.
- tests/test_end_users.py: update references to renamed alias.

* test: modernize placeholder model literals in router_unit_tests

Mass replace_all on fixture/placeholder model literals across the
router_unit_tests/ suite (model name is a routing key / label, not the
test subject). Sub-agent sweep so far — additional commits will follow
for logging_callback_tests/, enterprise/, top-level tests/test_*.py,
and other CI-mounted dirs.

Mappings applied:
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 / claude-3-opus-20240229 /
  claude-3-haiku-20240307 / claude-3-5-sonnet-20240620 ->
  claude-sonnet-4-5-20250929 / claude-opus-4-7 /
  claude-haiku-4-5-20251001 as appropriate

Explicitly preserved:
- gpt-4o-mini-* variants (transcribe, tts, etc.) where they're current
- gpt-4-turbo / gpt-4-vision-preview / gpt-4-0613 (subject literals)
- JSONL batch body literals
- Mock LLM response model fields (must match upstream)
- Fake/mock identifiers

* test: modernize placeholder model literals across remaining CI suites

Sub-agent sweep across logging_callback_tests/, guardrails_tests/,
enterprise/, pass_through_unit_tests/, otel_tests/,
llm_responses_api_testing/, batches_tests/, spend_tracking_tests/,
litellm_utils_tests/, unified_google_tests/, and a few top-level
tests/test_*.py files where the model literal is a fixture or
placeholder (router model_list, mock standard logging payload, mock
callback data) rather than the test's subject.

Mappings applied (see scope notes below):
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5.5 (corrected from initial gpt-5 — bare gpt-5
  is not a valid OpenAI alias; only gpt-5.5 / gpt-5.4 / gpt-5.2-codex
  / gpt-5-mini exist)
- gpt-4o-mini (bare) -> gpt-5-mini
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 -> claude-sonnet-4-5-20250929
- claude-3-opus-20240229 -> claude-opus-4-7
- claude-3-haiku-20240307 -> claude-haiku-4-5-20251001
- claude-3-5-sonnet-20240620/20241022 -> claude-sonnet-4-5-20250929
- claude-3-7-sonnet-20250219 -> claude-sonnet-4-6
- gemini-1.5-flash -> gemini-2.5-flash
- gemini-1.5-pro -> gemini-2.5-pro

Explicitly preserved (not modernized):
- llm_translation/ tests where model is the SUBJECT (provider-specific
  translation/transformation logic). Only the deprecated 20250514
  references were already bumped in a prior commit.
- Cost-calc / tokenizer subject tests in test_utils.py (skip-ranges
  documented by the sub-agent).
- Bedrock model IDs in test_health_check.py path-stripping tests.
- JSONL batch request bodies and mock LLM response bodies (must match
  upstream literal).
- Langfuse expected-request-body JSON fixtures (cost values are exact-
  match-asserted; changing the model would shift response_cost).
- gpt-3.5-turbo-instruct (text-completion endpoint; no modern OpenAI
  equivalent).
- Top-level tests calling the proxy through user-facing aliases
  (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, dall-e-3) — aliases
  in proxy_server_config.yaml stay; only the underlying model was
  bumped.
- tests/test_gpt5_azure_temperature_support.py (the test's whole point
  is model-name handling).
- Fake / mock / openai/fake identifiers.

Notable side fixes:
- test_spend_accuracy_tests.py: UPSTREAM_MODEL now matches what
  spend_tracking_config.yaml's proxy actually routes to (gpt-5-mini),
  resolving a latent inconsistency.
- proxy_server_config.yaml: bare `gpt-5` alias renamed to `gpt-5.5`
  (bare gpt-5 is not a valid OpenAI alias).
- test_batches_logging_unit_tests.py: explicit_models list entries
  kept distinct (gpt-5-mini + gpt-5.5) after bulk rename.

* test: fix CI failures from model modernization sweep

CI surfaced 4 categories of regression from the bulk modernization:

1. Azure deployment names are customer-specific. Reverted:
   - tests/litellm_utils_tests/test_health_check.py: azure/text-
     embedding-3-small -> azure/text-embedding-ada-002 (the CI Azure
     account does not have a text-embedding-3-small deployment).
   - tests/logging_callback_tests/test_custom_callback_router.py:
     same revert for two router fixtures driving aembedding.

2. gpt-5 family does not accept temperature != 1. Tests that pass a
   custom temperature swapped from gpt-5-mini to gpt-4.1-mini (modern
   non-reasoning OpenAI mini that still accepts temperature/logprobs):
   - tests/logging_callback_tests/test_datadog.py
   - tests/logging_callback_tests/test_langsmith_unit_test.py
   - tests/logging_callback_tests/test_otel_logging.py

3. proxy_server_config.yaml's gpt-3.5-turbo-large alias was routing to
   gpt-5.5 (a reasoning model that rejects logprobs). The proxy test
   tests/test_openai_endpoints.py::test_chat_completion_streaming
   exercises logprobs/top_logprobs through that alias. Bumped the
   underlying model to gpt-4.1 (non-reasoning, still modern).

4. tests/logging_callback_tests/test_gcs_pub_sub.py asserts against a
   pinned JSON fixture (gcs_pub_sub_body/spend_logs_payload.json) with
   hardcoded model="gpt-4o" and a model-specific spend value. Reverted
   the litellm.acompletion calls in the test to model="gpt-4o" so the
   fixture's exact-match assertions still hold.

5. tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py:
   anthropic.messages.create routing to openai/gpt-5-mini returned an
   empty content[0] with max_tokens=100 (reasoning-token consumption).
   Swapped to openai/gpt-4.1-mini.

* test: fix Assistants API model + 2 cursor[bot] review nits

1. pass_through_unit_tests/test_custom_logger_passthrough.py: gpt-5.5
   isn't accepted by the /v1/assistants endpoint
   ("unsupported_model"). Switch to gpt-4.1-mini (modern, Assistants-
   API-supported, non-reasoning).

2. example_config_yaml/pass_through_config.yaml: the previous sweep
   bumped the claude-3-7-sonnet alias to claude-opus-4-7, which is a
   tier change (Sonnet -> Opus). Map to claude-sonnet-4-6 to keep the
   Sonnet tier intact. (Cursor bugbot review.)

3. example_config_yaml/simple_config.yaml: model_name was left as
   gpt-3.5-turbo while the underlying was bumped to gpt-5-mini, which
   muddles the "simple" example. Make both sides gpt-5-mini so the
   most basic example is a straight 1:1 mapping again. (Cursor bugbot
   review.)

* fix: revert gpt-4/gpt-3.5-turbo alias underlying to non-reasoning models

tests/test_openai_endpoints.py::test_completion calls the proxy alias
"gpt-4" with temperature=0, and other tests call gpt-3.5-turbo with
custom temperature / logprobs / the legacy /v1/completions endpoint.
The earlier modernization mapped both aliases to gpt-5.5 / gpt-5-mini,
which are reasoning models that reject temperature != 1 and don't
expose /v1/completions. Map the aliases to gpt-4.1 / gpt-4.1-mini
(modern non-reasoning OpenAI models) instead — keeps user-facing
aliases preserved while picking a current underlying that still
supports the parameters/endpoints the tests exercise.
2026-05-15 15:44:28 -07:00

589 lines
20 KiB
Python

import os
import sys
sys.path.insert(
0, os.path.abspath("../..")
) # Adds the parent directory to the system-path
import pytest
from litellm.integrations.langfuse.langfuse import (
LangFuseLogger,
)
from litellm.integrations.langfuse.langfuse_handler import LangFuseHandler
from litellm.litellm_core_utils.litellm_logging import DynamicLoggingCache
from unittest.mock import Mock, patch
from litellm.types.utils import (
StandardLoggingPayload,
StandardLoggingModelInformation,
StandardLoggingMetadata,
StandardLoggingHiddenParams,
StandardCallbackDynamicParams,
ModelResponse,
Choices,
Message,
TextCompletionResponse,
TextChoices,
)
def create_standard_logging_payload() -> StandardLoggingPayload:
return StandardLoggingPayload(
id="test_id",
call_type="completion",
response_cost=0.1,
response_cost_failure_debug_info=None,
status="success",
total_tokens=30,
prompt_tokens=20,
completion_tokens=10,
startTime=1234567890.0,
endTime=1234567891.0,
completionStartTime=1234567890.5,
model_map_information=StandardLoggingModelInformation(
model_map_key="gpt-5-mini", model_map_value=None
),
model="gpt-5-mini",
model_id="model-123",
model_group="openai-gpt",
api_base="https://api.openai.com",
metadata=StandardLoggingMetadata(
user_api_key_hash="test_hash",
user_api_key_org_id=None,
user_api_key_alias="test_alias",
user_api_key_team_id="test_team",
user_api_key_user_id="test_user",
user_api_key_team_alias="test_team_alias",
spend_logs_metadata=None,
requester_ip_address="127.0.0.1",
requester_metadata=None,
),
cache_hit=False,
cache_key=None,
saved_cache_cost=0.0,
request_tags=[],
end_user=None,
requester_ip_address="127.0.0.1",
messages=[{"role": "user", "content": "Hello, world!"}],
response={"choices": [{"message": {"content": "Hi there!"}}]},
error_str=None,
model_parameters={"stream": True},
hidden_params=StandardLoggingHiddenParams(
model_id="model-123",
cache_key=None,
api_base="https://api.openai.com",
response_cost="0.1",
additional_headers=None,
),
)
@pytest.fixture
def dynamic_logging_cache():
return DynamicLoggingCache()
global_langfuse_logger = LangFuseLogger(
langfuse_public_key="global_public_key",
langfuse_secret="global_secret",
langfuse_host="https://global.langfuse.com",
)
# IMPORTANT: Test that passing both langfuse_secret_key and langfuse_secret works
standard_params_1 = StandardCallbackDynamicParams(
langfuse_public_key="test_public_key",
langfuse_secret="test_secret",
langfuse_host="https://test.langfuse.com",
)
standard_params_2 = StandardCallbackDynamicParams(
langfuse_public_key="test_public_key",
langfuse_secret_key="test_secret",
langfuse_host="https://test.langfuse.com",
)
@pytest.mark.parametrize("globalLangfuseLogger", [None, global_langfuse_logger])
@pytest.mark.parametrize("standard_params", [standard_params_1, standard_params_2])
def test_get_langfuse_logger_for_request_with_dynamic_params(
dynamic_logging_cache, globalLangfuseLogger, standard_params
):
"""
If StandardCallbackDynamicParams contain langfuse credentials the returned Langfuse logger should use the dynamic params
the new Langfuse logger should be cached
Even if globalLangfuseLogger is provided, it should use dynamic params if they are passed
"""
result = LangFuseHandler.get_langfuse_logger_for_request(
standard_callback_dynamic_params=standard_params,
in_memory_dynamic_logger_cache=dynamic_logging_cache,
globalLangfuseLogger=globalLangfuseLogger,
)
assert isinstance(result, LangFuseLogger)
assert result.public_key == "test_public_key"
assert result.secret_key == "test_secret"
assert result.langfuse_host == "https://test.langfuse.com"
# Check if the logger is cached
cached_logger = dynamic_logging_cache.get_cache(
credentials={
"langfuse_public_key": "test_public_key",
"langfuse_secret": "test_secret",
"langfuse_host": "https://test.langfuse.com",
},
service_name="langfuse",
)
assert cached_logger is result
@pytest.mark.parametrize("globalLangfuseLogger", [None, global_langfuse_logger])
def test_get_langfuse_logger_for_request_with_no_dynamic_params(
dynamic_logging_cache, globalLangfuseLogger
):
"""
If StandardCallbackDynamicParams are not provided, the globalLangfuseLogger should be returned
"""
result = LangFuseHandler.get_langfuse_logger_for_request(
standard_callback_dynamic_params=StandardCallbackDynamicParams(),
in_memory_dynamic_logger_cache=dynamic_logging_cache,
globalLangfuseLogger=globalLangfuseLogger,
)
assert result is not None
assert isinstance(result, LangFuseLogger)
if globalLangfuseLogger is not None:
assert result.public_key == "global_public_key"
assert result.secret_key == "global_secret"
assert result.langfuse_host == "https://global.langfuse.com"
def test_dynamic_langfuse_credentials_are_passed():
# Test when credentials are passed
params_with_credentials = StandardCallbackDynamicParams(
langfuse_public_key="test_key",
langfuse_secret="test_secret",
langfuse_host="https://test.langfuse.com",
)
assert (
LangFuseHandler._dynamic_langfuse_credentials_are_passed(
params_with_credentials
)
is True
)
# Test when no credentials are passed
params_without_credentials = StandardCallbackDynamicParams()
assert (
LangFuseHandler._dynamic_langfuse_credentials_are_passed(
params_without_credentials
)
is False
)
# Test when only some credentials are passed
params_partial_credentials = StandardCallbackDynamicParams(
langfuse_public_key="test_key"
)
assert (
LangFuseHandler._dynamic_langfuse_credentials_are_passed(
params_partial_credentials
)
is True
)
def test_get_dynamic_langfuse_logging_config():
# Test with dynamic params
dynamic_params = StandardCallbackDynamicParams(
langfuse_public_key="dynamic_key",
langfuse_secret="dynamic_secret",
langfuse_host="https://dynamic.langfuse.com",
)
config = LangFuseHandler.get_dynamic_langfuse_logging_config(dynamic_params)
assert config["langfuse_public_key"] == "dynamic_key"
assert config["langfuse_secret"] == "dynamic_secret"
assert config["langfuse_host"] == "https://dynamic.langfuse.com"
# Test with no dynamic params
empty_params = StandardCallbackDynamicParams()
config = LangFuseHandler.get_dynamic_langfuse_logging_config(empty_params)
assert config["langfuse_public_key"] is None
assert config["langfuse_secret"] is None
assert config["langfuse_host"] is None
def test_return_global_langfuse_logger():
mock_cache = Mock()
global_logger = LangFuseLogger(
langfuse_public_key="global_key", langfuse_secret="global_secret"
)
# Test with existing global logger
result = LangFuseHandler._return_global_langfuse_logger(global_logger, mock_cache)
assert result == global_logger
# Test without global logger, but with cached logger, should return cached logger
mock_cache.get_cache.return_value = global_logger
result = LangFuseHandler._return_global_langfuse_logger(None, mock_cache)
assert result == global_logger
# Test without global logger and without cached logger, should create new logger
mock_cache.get_cache.return_value = None
with patch.object(
LangFuseHandler,
"_create_langfuse_logger_from_credentials",
return_value=global_logger,
):
result = LangFuseHandler._return_global_langfuse_logger(None, mock_cache)
assert result == global_logger
def test_get_langfuse_logger_for_request_with_cached_logger():
"""
Test that get_langfuse_logger_for_request returns the cached logger if it exists when dynamic params are passed
"""
mock_cache = Mock()
cached_logger = LangFuseLogger(
langfuse_public_key="cached_key", langfuse_secret="cached_secret"
)
mock_cache.get_cache.return_value = cached_logger
dynamic_params = StandardCallbackDynamicParams(
langfuse_public_key="test_key",
langfuse_secret="test_secret",
langfuse_host="https://test.langfuse.com",
)
result = LangFuseHandler.get_langfuse_logger_for_request(
standard_callback_dynamic_params=dynamic_params,
in_memory_dynamic_logger_cache=mock_cache,
globalLangfuseLogger=None,
)
assert result == cached_logger
mock_cache.get_cache.assert_called_once()
def test_get_langfuse_tags():
"""
Test that _get_langfuse_tags correctly extracts tags from the standard logging payload
"""
# Create a mock logging payload with tags
mock_payload = create_standard_logging_payload()
mock_payload["request_tags"] = ["tag1", "tag2", "test_tag"]
# Test with payload containing tags
result = global_langfuse_logger._get_langfuse_tags(mock_payload)
assert result == ["tag1", "tag2", "test_tag"]
# Test with payload without tags
mock_payload["request_tags"] = None
result = global_langfuse_logger._get_langfuse_tags(mock_payload)
assert result == []
# Test with empty tags list
mock_payload["request_tags"] = []
result = global_langfuse_logger._get_langfuse_tags(mock_payload)
assert result == []
@patch.dict(os.environ, {}, clear=True) # Start with empty environment
def test_get_langfuse_flush_interval():
"""
Test that _get_langfuse_flush_interval correctly reads from environment variable
or falls back to the provided flush_interval
"""
default_interval = 60
# Test when env var is not set
result = LangFuseLogger._get_langfuse_flush_interval(
flush_interval=default_interval
)
assert result == default_interval
# Test when env var is set
with patch.dict(os.environ, {"LANGFUSE_FLUSH_INTERVAL": "120"}):
result = LangFuseLogger._get_langfuse_flush_interval(
flush_interval=default_interval
)
assert result == 120
def test_langfuse_e2e_sync(monkeypatch):
from litellm import completion
import litellm
import respx
import httpx
import time
litellm.disable_aiohttp_transport = (
True # since this uses respx, we need to set use_aiohttp_transport to False
)
litellm._turn_on_debug()
monkeypatch.setattr(litellm, "success_callback", ["langfuse"])
with respx.mock:
# Mock Langfuse
# Mock any Langfuse endpoint
langfuse_mock = respx.post(
"https://*.cloud.langfuse.com/api/public/ingestion"
).mock(return_value=httpx.Response(200))
completion(
model="openai/my-fake-endpoint",
messages=[{"role": "user", "content": "hello from litellm"}],
stream=False,
mock_response="Hello from litellm 2",
)
time.sleep(3)
assert langfuse_mock.called
def test_get_chat_content_for_langfuse():
"""
Test that _get_chat_content_for_langfuse correctly extracts content from chat completion responses
"""
# Test with valid response
mock_response = ModelResponse(
choices=[Choices(message=Message(role="assistant", content="Hello world"))]
)
result = LangFuseLogger._get_chat_content_for_langfuse(mock_response)
assert result["content"] == "Hello world"
assert result["role"] == "assistant"
# Test with empty choices
mock_response = ModelResponse(choices=[])
result = LangFuseLogger._get_chat_content_for_langfuse(mock_response)
assert result is None
def test_get_text_completion_content_for_langfuse():
"""
Test that _get_text_completion_content_for_langfuse correctly extracts content from text completion responses
"""
# Test with valid response
mock_response = TextCompletionResponse(choices=[TextChoices(text="Hello world")])
result = LangFuseLogger._get_text_completion_content_for_langfuse(mock_response)
assert result == "Hello world"
# Test with empty choices
mock_response = TextCompletionResponse(choices=[])
result = LangFuseLogger._get_text_completion_content_for_langfuse(mock_response)
assert result is None
# Test with no choices field
mock_response = TextCompletionResponse()
result = LangFuseLogger._get_text_completion_content_for_langfuse(mock_response)
assert result is None
def test_apply_masking_function_with_string():
"""
Test that _apply_masking_function correctly applies masking to strings
"""
import re
def mask_credit_cards(data):
if isinstance(data, str):
return re.sub(r"\b\d{4}[\s-]?\d{4}[\s-]?\d{4}[\s-]?\d{4}\b", "[CARD]", data)
return data
# Test with string containing credit card
input_str = "My card is 4532-1234-5678-9012"
result = LangFuseLogger._apply_masking_function(input_str, mask_credit_cards)
assert result == "My card is [CARD]"
assert "4532" not in result
# Test with string without sensitive data
input_str = "Hello world"
result = LangFuseLogger._apply_masking_function(input_str, mask_credit_cards)
assert result == "Hello world"
def test_apply_masking_function_with_dict():
"""
Test that _apply_masking_function correctly applies masking to nested dicts
"""
import re
def mask_emails(data):
if isinstance(data, str):
return re.sub(r"[\w\.-]+@[\w\.-]+", "[EMAIL]", data)
return data
# Test with dict containing messages
input_dict = {
"messages": [{"role": "user", "content": "My email is test@example.com"}]
}
result = LangFuseLogger._apply_masking_function(input_dict, mask_emails)
assert result["messages"][0]["content"] == "My email is [EMAIL]"
assert "test@example.com" not in str(result)
def test_apply_masking_function_with_none():
"""
Test that _apply_masking_function handles None correctly
"""
def dummy_mask(data):
return data
result = LangFuseLogger._apply_masking_function(None, dummy_mask)
assert result is None
def test_apply_masking_function_with_list():
"""
Test that _apply_masking_function correctly applies masking to lists
"""
import re
def mask_ssn(data):
if isinstance(data, str):
return re.sub(r"\b\d{3}-\d{2}-\d{4}\b", "[SSN]", data)
return data
input_list = ["SSN: 123-45-6789", "No sensitive data here"]
result = LangFuseLogger._apply_masking_function(input_list, mask_ssn)
assert result[0] == "SSN: [SSN]"
assert result[1] == "No sensitive data here"
def test_masking_function_isolated_from_other_loggers():
"""
Test that langfuse_masking_function is extracted from metadata and stored separately.
This ensures the callable doesn't leak to other logging integrations.
"""
from litellm.litellm_core_utils.litellm_logging import (
scrub_sensitive_keys_in_metadata,
)
def my_masking_fn(data):
return data
# Simulate litellm_params with masking function in metadata
litellm_params = {
"metadata": {
"langfuse_masking_function": my_masking_fn,
"other_key": "other_value",
}
}
# Scrub should extract the function
result = scrub_sensitive_keys_in_metadata(litellm_params)
# Function should be removed from metadata (won't leak to other loggers)
assert "langfuse_masking_function" not in result["metadata"]
# Function should be stored in dedicated key for Langfuse to access
assert result.get("_langfuse_masking_function") == my_masking_fn
# Other metadata should remain intact
assert result["metadata"]["other_key"] == "other_value"
def test_masking_function_not_in_metadata_when_not_provided():
"""
Test that scrub_sensitive_keys_in_metadata works normally when no masking function is provided.
"""
from litellm.litellm_core_utils.litellm_logging import (
scrub_sensitive_keys_in_metadata,
)
litellm_params = {
"metadata": {
"some_key": "some_value",
}
}
result = scrub_sensitive_keys_in_metadata(litellm_params)
# No _langfuse_masking_function should be added
assert "_langfuse_masking_function" not in result
# Original metadata should be unchanged
assert result["metadata"]["some_key"] == "some_value"
def test_langfuse_model_parameters_no_secret_leakage():
"""
Test that sensitive keys in optional_params (api_key, secret_fields,
authorization headers, etc.) are NOT passed to Langfuse as modelParameters.
Only whitelisted model parameters (temperature, top_p, etc.) should survive.
"""
from litellm.litellm_core_utils.model_param_helper import ModelParamHelper
optional_params_with_secrets = {
# Safe params that should be kept
"temperature": 0.7,
"top_p": 0.9,
"max_tokens": 100,
"stream": True,
# Sensitive params that must NOT leak
"api_key": "sk-secret-key-12345",
"api_base": "https://my-private-endpoint.com",
"secret_fields": {"raw_headers": {"Authorization": "Bearer sk-super-secret"}},
"authorization": "Bearer sk-another-secret",
"headers": {"X-Api-Key": "secret-header-value"},
}
sanitized = ModelParamHelper.get_standard_logging_model_parameters(
optional_params_with_secrets
)
# Safe params should be present
assert sanitized["temperature"] == 0.7
assert sanitized["top_p"] == 0.9
assert sanitized["max_tokens"] == 100
assert sanitized["stream"] is True
# Sensitive params must be excluded
assert "api_key" not in sanitized
assert "api_base" not in sanitized
assert "secret_fields" not in sanitized
assert "authorization" not in sanitized
assert "headers" not in sanitized
def test_langfuse_v2_uses_standard_logging_model_parameters():
"""
Test that _log_langfuse_v2 uses sanitized model_parameters from
standard_logging_object instead of raw optional_params, preventing
secret leakage to Langfuse traces.
"""
standard_logging_object = create_standard_logging_payload()
# Simulate standard_logging_object having safe model_parameters
standard_logging_object["model_parameters"] = {"temperature": 0.5, "stream": True}
# optional_params has secrets — these should NOT be used
optional_params_with_secrets = {
"temperature": 0.5,
"api_key": "sk-secret-key-12345",
"secret_fields": {"raw_headers": {"Authorization": "Bearer sk-secret"}},
}
# When standard_logging_object is available, its model_parameters should be used
sanitized = standard_logging_object.get(
"model_parameters", optional_params_with_secrets
)
assert "api_key" not in sanitized
assert "secret_fields" not in sanitized
assert sanitized["temperature"] == 0.5
# When standard_logging_object is None, ModelParamHelper should filter
from litellm.litellm_core_utils.model_param_helper import ModelParamHelper
fallback_sanitized = ModelParamHelper.get_standard_logging_model_parameters(
optional_params_with_secrets
)
assert "api_key" not in fallback_sanitized
assert "secret_fields" not in fallback_sanitized
assert fallback_sanitized["temperature"] == 0.5