Files
litellm/tests/logging_callback_tests/test_alerting.py
T
Mateo Wang 2c733c00f5 chore(ci): modernize model references in tests and configs (#27856)
* test: modernize models used in CircleCI e2e test suites

Replaces obsolete models (gpt-4o, gpt-4o-mini, gpt-3.5-turbo,
claude-3-5-sonnet-20240620, claude-sonnet-4-20250514) with current
equivalents across the e2e_openai_endpoints and
proxy_e2e_anthropic_messages_tests CircleCI jobs.

- gpt-4o -> gpt-5.5 (responses API e2e tests)
- gpt-4o-mini -> gpt-5-mini (websocket responses, oai_misc_config)
- gpt-4o-mini-2024-07-18 -> gpt-4.1-mini-2025-04-14 (fine-tuning,
  still actively fine-tunable)
- gpt-4 / gpt-3.5-turbo target_model_names example -> gpt-5.5 /
  gpt-5-mini
- bedrock claude-3-5-sonnet-20240620 batch entry -> haiku-4-5-20251001
  (also aligning oai_misc_config model_name with what
  test_bedrock_batches_api.py actually requests)
- bedrock claude-sonnet-4-20250514 (deprecated, retires 2026-06-15)
  -> claude-sonnet-4-5-20250929

* test: point bedrock-claude-sonnet-4 alias at Sonnet 4.6, not 4.5

Greptile/Cursor flagged that after the previous commit, the
bedrock-claude-sonnet-4 alias collided with bedrock-claude-sonnet-4.5
(both pointed to claude-sonnet-4-5-20250929). Rename to
bedrock-claude-sonnet-4.6 and point it at the Sonnet 4.6 Bedrock ID
(us.anthropic.claude-sonnet-4-6, already in the litellm model
registry) so the alias name matches the underlying model version.

* test: modernize models across remaining CI-mounted configs & tests

Expands the modernization sweep to all CircleCI-mounted proxy configs
and to test directories where the model literal is a fixture/route key
(not the test's subject).

Config changes:
- proxy_server_config.yaml: bump gpt-3.5-turbo / gpt-3.5-turbo-1106 /
  gpt-4o / gemini-1.5-flash / dall-e-3 underlying models; rename
  gpt-3.5-turbo-end-user-test alias to gpt-5-mini-end-user-test; bump
  text-embedding-ada-002 underlying to text-embedding-3-small. User-
  facing aliases (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, etc.)
  preserved for backward compatibility with tests.
- simple_config.yaml, otel_test_config.yaml, spend_tracking_config.yaml:
  bump gpt-3.5-turbo underlying to gpt-5-mini.
- pass_through_config.yaml: claude-3-5-sonnet / claude-3-7-sonnet /
  claude-3-haiku entries replaced with claude-sonnet-4-5 / claude-
  haiku-4-5 / claude-opus-4-7.
- oai_misc_config.yaml: align alias name with the gpt-5-mini rename.

Test changes (proactive: claude-sonnet-4-20250514 / claude-opus-4-
20250514 retire 2026-06-15):
- tests/llm_translation/test_anthropic_completion.py: bump 3 references
  + paired Vertex AI ID to claude-sonnet-4-5.
- tests/llm_translation/test_optional_params.py: bump 2 references.
- tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py
  and test_bedrock_anthropic_messages_test.py: bump router fixtures
  using the deprecated model IDs.
- tests/pass_through_unit_tests/base_anthropic_messages_tool_search_test.py:
  modernize docstring examples.
- tests/test_end_users.py: update references to renamed alias.

* test: modernize placeholder model literals in router_unit_tests

Mass replace_all on fixture/placeholder model literals across the
router_unit_tests/ suite (model name is a routing key / label, not the
test subject). Sub-agent sweep so far — additional commits will follow
for logging_callback_tests/, enterprise/, top-level tests/test_*.py,
and other CI-mounted dirs.

Mappings applied:
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 / claude-3-opus-20240229 /
  claude-3-haiku-20240307 / claude-3-5-sonnet-20240620 ->
  claude-sonnet-4-5-20250929 / claude-opus-4-7 /
  claude-haiku-4-5-20251001 as appropriate

Explicitly preserved:
- gpt-4o-mini-* variants (transcribe, tts, etc.) where they're current
- gpt-4-turbo / gpt-4-vision-preview / gpt-4-0613 (subject literals)
- JSONL batch body literals
- Mock LLM response model fields (must match upstream)
- Fake/mock identifiers

* test: modernize placeholder model literals across remaining CI suites

Sub-agent sweep across logging_callback_tests/, guardrails_tests/,
enterprise/, pass_through_unit_tests/, otel_tests/,
llm_responses_api_testing/, batches_tests/, spend_tracking_tests/,
litellm_utils_tests/, unified_google_tests/, and a few top-level
tests/test_*.py files where the model literal is a fixture or
placeholder (router model_list, mock standard logging payload, mock
callback data) rather than the test's subject.

Mappings applied (see scope notes below):
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5.5 (corrected from initial gpt-5 — bare gpt-5
  is not a valid OpenAI alias; only gpt-5.5 / gpt-5.4 / gpt-5.2-codex
  / gpt-5-mini exist)
- gpt-4o-mini (bare) -> gpt-5-mini
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 -> claude-sonnet-4-5-20250929
- claude-3-opus-20240229 -> claude-opus-4-7
- claude-3-haiku-20240307 -> claude-haiku-4-5-20251001
- claude-3-5-sonnet-20240620/20241022 -> claude-sonnet-4-5-20250929
- claude-3-7-sonnet-20250219 -> claude-sonnet-4-6
- gemini-1.5-flash -> gemini-2.5-flash
- gemini-1.5-pro -> gemini-2.5-pro

Explicitly preserved (not modernized):
- llm_translation/ tests where model is the SUBJECT (provider-specific
  translation/transformation logic). Only the deprecated 20250514
  references were already bumped in a prior commit.
- Cost-calc / tokenizer subject tests in test_utils.py (skip-ranges
  documented by the sub-agent).
- Bedrock model IDs in test_health_check.py path-stripping tests.
- JSONL batch request bodies and mock LLM response bodies (must match
  upstream literal).
- Langfuse expected-request-body JSON fixtures (cost values are exact-
  match-asserted; changing the model would shift response_cost).
- gpt-3.5-turbo-instruct (text-completion endpoint; no modern OpenAI
  equivalent).
- Top-level tests calling the proxy through user-facing aliases
  (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, dall-e-3) — aliases
  in proxy_server_config.yaml stay; only the underlying model was
  bumped.
- tests/test_gpt5_azure_temperature_support.py (the test's whole point
  is model-name handling).
- Fake / mock / openai/fake identifiers.

Notable side fixes:
- test_spend_accuracy_tests.py: UPSTREAM_MODEL now matches what
  spend_tracking_config.yaml's proxy actually routes to (gpt-5-mini),
  resolving a latent inconsistency.
- proxy_server_config.yaml: bare `gpt-5` alias renamed to `gpt-5.5`
  (bare gpt-5 is not a valid OpenAI alias).
- test_batches_logging_unit_tests.py: explicit_models list entries
  kept distinct (gpt-5-mini + gpt-5.5) after bulk rename.

* test: fix CI failures from model modernization sweep

CI surfaced 4 categories of regression from the bulk modernization:

1. Azure deployment names are customer-specific. Reverted:
   - tests/litellm_utils_tests/test_health_check.py: azure/text-
     embedding-3-small -> azure/text-embedding-ada-002 (the CI Azure
     account does not have a text-embedding-3-small deployment).
   - tests/logging_callback_tests/test_custom_callback_router.py:
     same revert for two router fixtures driving aembedding.

2. gpt-5 family does not accept temperature != 1. Tests that pass a
   custom temperature swapped from gpt-5-mini to gpt-4.1-mini (modern
   non-reasoning OpenAI mini that still accepts temperature/logprobs):
   - tests/logging_callback_tests/test_datadog.py
   - tests/logging_callback_tests/test_langsmith_unit_test.py
   - tests/logging_callback_tests/test_otel_logging.py

3. proxy_server_config.yaml's gpt-3.5-turbo-large alias was routing to
   gpt-5.5 (a reasoning model that rejects logprobs). The proxy test
   tests/test_openai_endpoints.py::test_chat_completion_streaming
   exercises logprobs/top_logprobs through that alias. Bumped the
   underlying model to gpt-4.1 (non-reasoning, still modern).

4. tests/logging_callback_tests/test_gcs_pub_sub.py asserts against a
   pinned JSON fixture (gcs_pub_sub_body/spend_logs_payload.json) with
   hardcoded model="gpt-4o" and a model-specific spend value. Reverted
   the litellm.acompletion calls in the test to model="gpt-4o" so the
   fixture's exact-match assertions still hold.

5. tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py:
   anthropic.messages.create routing to openai/gpt-5-mini returned an
   empty content[0] with max_tokens=100 (reasoning-token consumption).
   Swapped to openai/gpt-4.1-mini.

* test: fix Assistants API model + 2 cursor[bot] review nits

1. pass_through_unit_tests/test_custom_logger_passthrough.py: gpt-5.5
   isn't accepted by the /v1/assistants endpoint
   ("unsupported_model"). Switch to gpt-4.1-mini (modern, Assistants-
   API-supported, non-reasoning).

2. example_config_yaml/pass_through_config.yaml: the previous sweep
   bumped the claude-3-7-sonnet alias to claude-opus-4-7, which is a
   tier change (Sonnet -> Opus). Map to claude-sonnet-4-6 to keep the
   Sonnet tier intact. (Cursor bugbot review.)

3. example_config_yaml/simple_config.yaml: model_name was left as
   gpt-3.5-turbo while the underlying was bumped to gpt-5-mini, which
   muddles the "simple" example. Make both sides gpt-5-mini so the
   most basic example is a straight 1:1 mapping again. (Cursor bugbot
   review.)

* fix: revert gpt-4/gpt-3.5-turbo alias underlying to non-reasoning models

tests/test_openai_endpoints.py::test_completion calls the proxy alias
"gpt-4" with temperature=0, and other tests call gpt-3.5-turbo with
custom temperature / logprobs / the legacy /v1/completions endpoint.
The earlier modernization mapped both aliases to gpt-5.5 / gpt-5-mini,
which are reasoning models that reject temperature != 1 and don't
expose /v1/completions. Map the aliases to gpt-4.1 / gpt-4.1-mini
(modern non-reasoning OpenAI models) instead — keeps user-facing
aliases preserved while picking a current underlying that still
supports the parameters/endpoints the tests exercise.
2026-05-15 15:44:28 -07:00

1078 lines
33 KiB
Python

# What is this?
## Tests slack alerting on proxy logging object
import asyncio
import io
import json
import os
import random
import sys
import time
from litellm._uuid import uuid
from datetime import datetime, timedelta
from typing import Optional
import httpx
from litellm.types.integrations.slack_alerting import AlertType
# import logging
# logging.basicConfig(level=logging.DEBUG)
sys.path.insert(0, os.path.abspath("../.."))
import asyncio
import os
import unittest.mock
from unittest.mock import AsyncMock, MagicMock, patch
import pytest
from openai import APIError
import litellm
from litellm.caching.caching import DualCache, RedisCache
from litellm.integrations.SlackAlerting.slack_alerting import (
DeploymentMetrics,
SlackAlerting,
)
from litellm.proxy._types import CallInfo, Litellm_EntityType, WebhookEvent
from litellm.proxy.utils import ProxyLogging
from litellm.router import AlertingConfig, Router
from litellm.utils import get_api_base
@pytest.mark.parametrize(
"model, optional_params, expected_api_base",
[
("openai/my-fake-model", {"api_base": "my-fake-api-base"}, "my-fake-api-base"),
("gpt-5-mini", {}, "https://api.openai.com"),
],
)
def test_get_api_base_unit_test(model, optional_params, expected_api_base):
api_base = get_api_base(model=model, optional_params=optional_params)
assert api_base == expected_api_base
@pytest.mark.asyncio
async def test_get_api_base():
_pl = ProxyLogging(user_api_key_cache=DualCache())
_pl.update_values(alerting=["slack"], alerting_threshold=100, redis_cache=None)
model = "chatgpt-v-3"
messages = [{"role": "user", "content": "Hey how's it going?"}]
litellm_params = {
"acompletion": True,
"api_key": None,
"api_base": "https://openai-gpt-4-test-v-1.openai.azure.com/",
"force_timeout": 600,
"logger_fn": None,
"verbose": False,
"custom_llm_provider": "azure",
"litellm_call_id": "68f46d2d-714d-4ad8-8137-69600ec8755c",
"model_alias_map": {},
"completion_call_id": None,
"metadata": None,
"model_info": None,
"proxy_server_request": None,
"preset_cache_key": None,
"no-log": False,
"stream_response": {},
}
start_time = datetime.now()
end_time = datetime.now()
time_difference_float, model, api_base, messages = (
_pl.slack_alerting_instance._response_taking_too_long_callback_helper(
kwargs={
"model": model,
"messages": messages,
"litellm_params": litellm_params,
},
start_time=start_time,
end_time=end_time,
)
)
assert api_base is not None
assert isinstance(api_base, str)
assert len(api_base) > 0
request_info = (
f"\nRequest Model: `{model}`\nAPI Base: `{api_base}`\nMessages: `{messages}`"
)
slow_message = f"`Responses are slow - {round(time_difference_float,2)}s response time > Alerting threshold: {100}s`"
await _pl.alerting_handler(
message=slow_message + request_info,
level="Low",
alert_type=AlertType.llm_too_slow,
)
print("passed test_get_api_base")
# Create a mock environment for testing
@pytest.fixture
def mock_env(monkeypatch):
monkeypatch.setenv("SLACK_WEBHOOK_URL", "https://example.com/webhook")
monkeypatch.setenv("LANGFUSE_HOST", "https://cloud.langfuse.com")
monkeypatch.setenv("LANGFUSE_PROJECT_ID", "test-project-id")
# Test the __init__ method
def test_init():
slack_alerting = SlackAlerting(
alerting_threshold=32,
alerting=["slack"],
alert_types=[AlertType.llm_exceptions],
internal_usage_cache=DualCache(),
)
assert slack_alerting.alerting_threshold == 32
assert slack_alerting.alerting == ["slack"]
assert slack_alerting.alert_types == ["llm_exceptions"]
slack_no_alerting = SlackAlerting()
assert slack_no_alerting.alerting == []
print("passed testing slack alerting init")
from datetime import datetime, timedelta
from unittest.mock import AsyncMock, patch
@pytest.fixture
def slack_alerting():
return SlackAlerting(
alerting_threshold=1, internal_usage_cache=DualCache(), alerting=["slack"]
)
# Test for slow LLM responses
@pytest.mark.asyncio
async def test_response_taking_too_long_callback(slack_alerting):
start_time = datetime.now()
end_time = start_time + timedelta(seconds=301)
kwargs = {"model": "test_model", "messages": "test_messages", "litellm_params": {}}
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
await slack_alerting.response_taking_too_long_callback(
kwargs, None, start_time, end_time
)
mock_send_alert.assert_awaited_once()
@pytest.mark.asyncio
async def test_alerting_metadata(slack_alerting):
"""
Test alerting_metadata is propogated correctly for response taking too long
"""
start_time = datetime.now()
end_time = start_time + timedelta(seconds=301)
kwargs = {
"model": "test_model",
"messages": "test_messages",
"litellm_params": {"metadata": {"alerting_metadata": {"hello": "world"}}},
}
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
## RESPONSE TAKING TOO LONG
await slack_alerting.response_taking_too_long_callback(
kwargs, None, start_time, end_time
)
mock_send_alert.assert_awaited_once()
assert "hello" in mock_send_alert.call_args[1]["alerting_metadata"]
# Test for budget crossed
@pytest.mark.asyncio
async def test_budget_alerts_crossed(slack_alerting):
user_max_budget = 100
user_current_spend = 101
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
await slack_alerting.budget_alerts(
"user_budget",
user_info=CallInfo(
token="",
spend=user_current_spend,
max_budget=user_max_budget,
event_group=Litellm_EntityType.USER,
),
)
mock_send_alert.assert_awaited_once()
# Test for budget crossed again (should not fire alert 2nd time)
@pytest.mark.asyncio
async def test_budget_alerts_crossed_again(slack_alerting):
user_max_budget = 100
user_current_spend = 101
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
await slack_alerting.budget_alerts(
"user_budget",
user_info=CallInfo(
token="",
spend=user_current_spend,
max_budget=user_max_budget,
event_group=Litellm_EntityType.USER,
),
)
mock_send_alert.assert_awaited_once()
mock_send_alert.reset_mock()
await slack_alerting.budget_alerts(
"user_budget",
user_info=CallInfo(
token="",
spend=user_current_spend,
max_budget=user_max_budget,
event_group=Litellm_EntityType.USER,
),
)
mock_send_alert.assert_not_awaited()
# Test for send_alert - should be called once
@pytest.mark.asyncio
async def test_send_alert(slack_alerting):
import logging
from litellm._logging import verbose_logger
asyncio.create_task(slack_alerting.periodic_flush())
verbose_logger.setLevel(level=logging.DEBUG)
with patch.object(
slack_alerting.async_http_handler, "post", new=AsyncMock()
) as mock_post:
mock_post.return_value.status_code = 200
await slack_alerting.send_alert(
"Test message", "Low", "budget_alerts", alerting_metadata={}
)
await asyncio.sleep(6)
mock_post.assert_awaited_once()
@pytest.mark.asyncio
async def test_daily_reports_unit_test(slack_alerting):
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
router = litellm.Router(
model_list=[
{
"model_name": "test-gpt",
"litellm_params": {"model": "gpt-5-mini"},
"model_info": {"id": "1234"},
}
]
)
deployment_metrics = DeploymentMetrics(
id="1234",
failed_request=False,
latency_per_output_token=20.3,
updated_at=litellm.utils.get_utc_datetime(),
)
updated_val = await slack_alerting.async_update_daily_reports(
deployment_metrics=deployment_metrics
)
assert updated_val == 1
await slack_alerting.send_daily_reports(router=router)
mock_send_alert.assert_awaited_once()
@pytest.mark.asyncio
async def test_daily_reports_completion(slack_alerting):
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
litellm.callbacks = [slack_alerting]
# on async success
router = litellm.Router(
model_list=[
{
"model_name": "gpt-5.5",
"litellm_params": {
"model": "gpt-5-mini",
},
}
]
)
await router.acompletion(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)
await asyncio.sleep(3)
response_val = await slack_alerting.send_daily_reports(router=router)
assert response_val is True
mock_send_alert.assert_awaited_once()
# on async failure
router = litellm.Router(
model_list=[
{
"model_name": "gpt-5.5",
"litellm_params": {"model": "gpt-5-mini", "api_key": "bad_key"},
}
]
)
try:
await router.acompletion(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)
except Exception as e:
pass
await asyncio.sleep(3)
response_val = await slack_alerting.send_daily_reports(router=router)
assert response_val is True
mock_send_alert.assert_awaited()
@pytest.mark.asyncio
async def test_daily_reports_redis_cache_scheduler():
redis_cache = RedisCache()
slack_alerting = SlackAlerting(
internal_usage_cache=DualCache(redis_cache=redis_cache)
)
# we need this to be 0 so it actualy sends the report
slack_alerting.alerting_args.daily_report_frequency = 0
from litellm.router import AlertingConfig
router = litellm.Router(
model_list=[
{
"model_name": "gpt-5.5",
"litellm_params": {
"model": "gpt-5-mini",
},
}
]
)
with (
patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert,
patch.object(
redis_cache, "async_set_cache", new=AsyncMock()
) as mock_redis_set_cache,
):
# initial call - expect empty
await slack_alerting._run_scheduler_helper(llm_router=router)
try:
json.dumps(mock_redis_set_cache.call_args[0][1])
except Exception as e:
pytest.fail(
"Cache value can't be json dumped - {}".format(
mock_redis_set_cache.call_args[0][1]
)
)
mock_redis_set_cache.assert_awaited_once()
# second call - expect empty
await slack_alerting._run_scheduler_helper(llm_router=router)
@pytest.mark.asyncio
@pytest.mark.skip(reason="Local test. Test if slack alerts are sent.")
async def test_send_llm_exception_to_slack():
from litellm.router import AlertingConfig
# on async success
router = litellm.Router(
model_list=[
{
"model_name": "gpt-5-mini",
"litellm_params": {
"model": "gpt-5-mini",
"api_key": "bad_key",
},
},
{
"model_name": "gpt-5-good",
"litellm_params": {
"model": "gpt-5-mini",
},
},
],
alerting_config=AlertingConfig(
alerting_threshold=0.5, webhook_url=os.getenv("SLACK_WEBHOOK_URL")
),
)
try:
await router.acompletion(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)
except Exception:
pass
await router.acompletion(
model="gpt-5-good",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)
await asyncio.sleep(3)
# test models with 0 metrics are ignored
@pytest.mark.asyncio
async def test_send_daily_reports_ignores_zero_values():
router = MagicMock()
router.get_model_ids.return_value = ["model1", "model2", "model3"]
slack_alerting = SlackAlerting(internal_usage_cache=MagicMock())
# model1:failed=None, model2:failed=0, model3:failed=10, model1:latency=0; model2:latency=0; model3:latency=None
slack_alerting.internal_usage_cache.async_batch_get_cache = AsyncMock(
return_value=[None, 0, 10, 0, 0, None]
)
slack_alerting.internal_usage_cache.async_set_cache_pipeline = AsyncMock()
router.get_model_info.side_effect = lambda x: {"litellm_params": {"model": x}}
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
result = await slack_alerting.send_daily_reports(router)
# Check that the send_alert method was called
mock_send_alert.assert_called_once()
message = mock_send_alert.call_args[1]["message"]
# Ensure the message includes only the non-zero, non-None metrics
assert "model3" in message
assert "model2" not in message
assert "model1" not in message
assert result == True
# test no alert is sent if all None or 0 metrics
@pytest.mark.asyncio
async def test_send_daily_reports_all_zero_or_none():
router = MagicMock()
router.get_model_ids.return_value = ["model1", "model2", "model3"]
slack_alerting = SlackAlerting(internal_usage_cache=MagicMock())
slack_alerting.internal_usage_cache.async_batch_get_cache = AsyncMock(
return_value=[None, 0, None, 0, None, 0]
)
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
result = await slack_alerting.send_daily_reports(router)
# Check that the send_alert method was not called
mock_send_alert.assert_not_called()
assert result == False
# test user budget crossed alert sent only once, even if user makes multiple calls
@pytest.mark.parametrize(
"alerting_type",
[
"token_budget",
"user_budget",
"team_budget",
"organization_budget",
"proxy_budget",
"projected_limit_exceeded",
],
)
@pytest.mark.asyncio
async def test_send_token_budget_crossed_alerts(alerting_type):
slack_alerting = SlackAlerting()
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
user_info = {
"token": "sk-test-mock-token-606",
"spend": 86,
"max_budget": 100,
"user_id": "ishaan@berri.ai",
"user_email": "ishaan@berri.ai",
"key_alias": "my-test-key",
"projected_exceeded_date": "10/20/2024",
"projected_spend": 200,
"event_group": Litellm_EntityType.KEY,
}
user_info = CallInfo(**user_info)
for _ in range(50):
await slack_alerting.budget_alerts(
type=alerting_type,
user_info=user_info,
)
mock_send_alert.assert_awaited_once()
@pytest.mark.parametrize(
"alerting_type",
[
"token_budget",
"user_budget",
"team_budget",
"organization_budget",
"proxy_budget",
"projected_limit_exceeded",
],
)
@pytest.mark.asyncio
async def test_webhook_alerting(alerting_type):
slack_alerting = SlackAlerting(alerting=["webhook"])
with patch.object(
slack_alerting, "send_webhook_alert", new=AsyncMock()
) as mock_send_alert:
user_info = {
"token": "sk-test-mock-token-606",
"spend": 1,
"max_budget": 0,
"user_id": "ishaan@berri.ai",
"user_email": "ishaan@berri.ai",
"key_alias": "my-test-key",
"projected_exceeded_date": "10/20/2024",
"projected_spend": 200,
"event_group": Litellm_EntityType.KEY,
}
user_info = CallInfo(**user_info)
for _ in range(50):
await slack_alerting.budget_alerts(
type=alerting_type,
user_info=user_info,
)
mock_send_alert.assert_awaited_once()
# @pytest.mark.asyncio
# async def test_webhook_customer_spend_event():
# """
# Test if customer spend is working as expected
# """
# slack_alerting = SlackAlerting(alerting=["webhook"])
# with patch.object(
# slack_alerting, "send_webhook_alert", new=AsyncMock()
# ) as mock_send_alert:
# user_info = {
# "token": "sk-test-mock-token-606",
# "spend": 1,
# "max_budget": 0,
# "user_id": "ishaan@berri.ai",
# "user_email": "ishaan@berri.ai",
# "key_alias": "my-test-key",
# "projected_exceeded_date": "10/20/2024",
# "projected_spend": 200,
# }
# user_info = CallInfo(**user_info)
# for _ in range(50):
# await slack_alerting.budget_alerts(
# type=alerting_type,
# user_info=user_info,
# )
# mock_send_alert.assert_awaited_once()
@pytest.mark.parametrize(
"model, api_base, llm_provider, vertex_project, vertex_location",
[
("gpt-5-mini", None, "openai", None, None),
(
"azure/gpt-5-mini",
"https://openai-gpt-4-test-v-1.openai.azure.com",
"azure",
None,
None,
),
("gemini-2.0-flash", None, "vertex_ai", "hardy-device-38811", "us-central1"),
],
)
@pytest.mark.parametrize("error_code", [500, 408, 400])
@pytest.mark.asyncio
async def test_outage_alerting_called(
model, api_base, llm_provider, vertex_project, vertex_location, error_code
):
"""
If call fails, outage alert is called
If multiple calls fail, outage alert is sent
"""
slack_alerting = SlackAlerting(alerting=["webhook"])
litellm.callbacks = [slack_alerting]
error_to_raise: Optional[APIError] = None
if error_code == 400:
print("RAISING 400 ERROR CODE")
error_to_raise = litellm.BadRequestError(
message="this is a bad request",
model=model,
llm_provider=llm_provider,
)
elif error_code == 408:
print("RAISING 408 ERROR CODE")
error_to_raise = litellm.Timeout(
message="A timeout occurred", model=model, llm_provider=llm_provider
)
elif error_code == 500:
print("RAISING 500 ERROR CODE")
error_to_raise = litellm.ServiceUnavailableError(
message="API is unavailable",
model=model,
llm_provider=llm_provider,
response=httpx.Response(
status_code=503,
request=httpx.Request(
method="completion",
url="https://github.com/BerriAI/litellm",
),
),
)
router = Router(
model_list=[
{
"model_name": model,
"litellm_params": {
"model": model,
"api_key": os.getenv("AZURE_AI_API_KEY"),
"api_base": api_base,
"vertex_location": vertex_location,
"vertex_project": vertex_project,
},
}
],
num_retries=0,
allowed_fails=100,
)
slack_alerting.update_values(llm_router=router)
with patch.object(
slack_alerting, "outage_alerts", new=AsyncMock()
) as mock_outage_alert:
try:
await router.acompletion(
model=model,
messages=[{"role": "user", "content": "Hey!"}],
mock_response=error_to_raise,
)
except Exception as e:
pass
mock_outage_alert.assert_called_once()
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
for _ in range(6):
try:
await router.acompletion(
model=model,
messages=[{"role": "user", "content": "Hey!"}],
mock_response=error_to_raise,
)
except Exception as e:
pass
await asyncio.sleep(3)
if error_code == 500 or error_code == 408:
mock_send_alert.assert_called_once()
else:
mock_send_alert.assert_not_called()
@pytest.mark.parametrize(
"model, api_base, llm_provider, vertex_project, vertex_location",
[
("gpt-5-mini", None, "openai", None, None),
(
"azure/gpt-5-mini",
"https://openai-gpt-4-test-v-1.openai.azure.com",
"azure",
None,
None,
),
("gemini-2.0-flash", None, "vertex_ai", "hardy-device-38811", "us-central1"),
],
)
@pytest.mark.parametrize("error_code", [500, 408, 400])
@pytest.mark.asyncio
async def test_region_outage_alerting_called(
model, api_base, llm_provider, vertex_project, vertex_location, error_code
):
"""
If call fails, outage alert is called
If multiple calls fail, outage alert is sent
"""
slack_alerting = SlackAlerting(
alerting=["webhook"], alert_types=[AlertType.region_outage_alerts]
)
litellm.callbacks = [slack_alerting]
error_to_raise: Optional[APIError] = None
if error_code == 400:
print("RAISING 400 ERROR CODE")
error_to_raise = litellm.BadRequestError(
message="this is a bad request",
model=model,
llm_provider=llm_provider,
)
elif error_code == 408:
print("RAISING 408 ERROR CODE")
error_to_raise = litellm.Timeout(
message="A timeout occurred", model=model, llm_provider=llm_provider
)
elif error_code == 500:
print("RAISING 500 ERROR CODE")
error_to_raise = litellm.ServiceUnavailableError(
message="API is unavailable",
model=model,
llm_provider=llm_provider,
response=httpx.Response(
status_code=503,
request=httpx.Request(
method="completion",
url="https://github.com/BerriAI/litellm",
),
),
)
router = Router(
model_list=[
{
"model_name": model,
"litellm_params": {
"model": model,
"api_key": os.getenv("AZURE_AI_API_KEY"),
"api_base": api_base,
"vertex_location": vertex_location,
"vertex_project": vertex_project,
},
"model_info": {"id": "1"},
},
{
"model_name": model,
"litellm_params": {
"model": model,
"api_key": os.getenv("AZURE_AI_API_KEY"),
"api_base": api_base,
"vertex_location": vertex_location,
"vertex_project": "vertex_project-2",
},
"model_info": {"id": "2"},
},
],
num_retries=0,
allowed_fails=100,
)
slack_alerting.update_values(llm_router=router)
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
for idx in range(6):
if idx % 2 == 0:
deployment_id = "1"
else:
deployment_id = "2"
await slack_alerting.region_outage_alerts(
exception=error_to_raise, deployment_id=deployment_id # type: ignore
)
if model == "gemini-2.0-flash" and (error_code == 500 or error_code == 408):
mock_send_alert.assert_called_once()
else:
mock_send_alert.assert_not_called()
@pytest.mark.asyncio
async def test_langfuse_trace_id():
"""
- Unit test for `_add_langfuse_trace_id_to_alert` function in slack_alerting.py
"""
from litellm.litellm_core_utils.litellm_logging import Logging
from litellm.integrations.SlackAlerting.utils import _add_langfuse_trace_id_to_alert
litellm.success_callback = ["langfuse"]
litellm_logging_obj = Logging(
model="gpt-5-mini",
messages=[{"role": "user", "content": "hi"}],
stream=False,
call_type="acompletion",
litellm_call_id="1234",
start_time=datetime.now(),
function_id="1234",
)
litellm.completion(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Hey how's it going?"}],
mock_response="Hey!",
litellm_logging_obj=litellm_logging_obj,
)
await asyncio.sleep(3)
assert litellm_logging_obj._get_trace_id(service_name="langfuse") is not None
slack_alerting = SlackAlerting(
alerting_threshold=32,
alerting=["slack"],
alert_types=[AlertType.llm_exceptions],
internal_usage_cache=DualCache(),
)
trace_url = await _add_langfuse_trace_id_to_alert(
request_data={"litellm_logging_obj": litellm_logging_obj}
)
assert trace_url is not None
returned_trace_id = trace_url.split("/")[-1]
assert returned_trace_id == litellm_logging_obj._get_trace_id(
service_name="langfuse"
)
@pytest.mark.asyncio
async def test_print_alerting_payload_warning():
"""
Test if alerts are printed to verbose logger when log_to_console=True
"""
litellm.set_verbose = True
from litellm._logging import verbose_proxy_logger
from litellm.integrations.SlackAlerting.batching_handler import send_to_webhook
import logging
# Create a string buffer to capture log output
log_stream = io.StringIO()
handler = logging.StreamHandler(log_stream)
verbose_proxy_logger.addHandler(handler)
verbose_proxy_logger.setLevel(logging.WARNING)
# Create SlackAlerting instance with log_to_console=True
slack_alerting = SlackAlerting(
alerting_threshold=0.0000001,
alerting=["slack"],
alert_types=[AlertType.llm_exceptions],
internal_usage_cache=DualCache(),
)
slack_alerting.alerting_args.log_to_console = True
test_payload = {"text": "Test alert message"}
# Send an alert
with patch.object(
slack_alerting.async_http_handler, "post", new=AsyncMock()
) as mock_post:
await send_to_webhook(
slackAlertingInstance=slack_alerting,
item={
"url": "https://example.com",
"headers": {"Content-Type": "application/json"},
"payload": {"text": "Test alert message"},
},
count=1,
)
# Check if the payload was logged
log_output = log_stream.getvalue()
print(log_output)
assert "Test alert message" in log_output
# Clean up
verbose_proxy_logger.removeHandler(handler)
log_stream.close()
@pytest.mark.parametrize("report_type", ["weekly", "monthly"])
@pytest.mark.asyncio
async def test_spend_report_cache(report_type):
"""
Test that spend reports are only sent once within their period
"""
# Mock prisma client response
mock_spend_data = [
{"team_alias": "team1", "total_spend": 100.0},
{"team_alias": "team2", "total_spend": 200.0},
]
mock_tag_data = [
{"individual_request_tag": "tag1", "total_spend": 150.0},
{"individual_request_tag": "tag2", "total_spend": 150.0},
]
with patch("litellm.proxy.proxy_server.prisma_client") as mock_prisma:
# Setup mock for database query
mock_prisma.db.query_raw = AsyncMock(
side_effect=[mock_spend_data, mock_tag_data]
)
slack_alerting = SlackAlerting(
alerting=["webhook"], internal_usage_cache=DualCache()
)
user_info = CallInfo(
token="test_token",
spend=100,
max_budget=1000,
user_id="test@test.com",
user_email="test@test.com",
key_alias="test-key",
event_group=Litellm_EntityType.KEY,
)
with patch.object(
slack_alerting, "send_alert", new=AsyncMock()
) as mock_send_alert:
# First call should send alert
if report_type == "weekly":
await slack_alerting.send_weekly_spend_report()
else:
await slack_alerting.send_monthly_spend_report()
mock_send_alert.assert_called_once()
mock_send_alert.reset_mock()
# Second call should not send alert (cached)
if report_type == "weekly":
await slack_alerting.send_weekly_spend_report()
else:
await slack_alerting.send_monthly_spend_report()
mock_send_alert.assert_not_called()
@pytest.mark.asyncio
async def test_soft_budget_alerts():
"""
Test if soft budget alerts (warnings when approaching budget limit) work correctly
- Test alert is sent when spend reaches 80% of budget
"""
slack_alerting = SlackAlerting(alerting=["webhook"])
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
# Test 80% threshold
user_info = CallInfo(
token="test_token",
spend=80, # $80 spent
soft_budget=80,
user_id="test@test.com",
user_email="test@test.com",
key_alias="test-key",
event_group=Litellm_EntityType.KEY,
)
await slack_alerting.budget_alerts(
type="soft_budget",
user_info=user_info,
)
mock_send_alert.assert_called_once()
# Verify alert message contains correct percentage
alert_message = mock_send_alert.call_args[1]["message"]
print("GOT MESSAGE\n\n", alert_message)
expected_message = (
"Soft Budget Crossed: Total Soft Budget:`80.0`\n"
"\n"
"*spend:* `80.0`\n"
"*soft_budget:* `80.0`\n"
"*user_id:* `test@test.com`\n"
"*user_email:* `test@test.com`\n"
"*key_alias:* `test-key`\n"
"*event_group:* `key`\n"
)
assert alert_message == expected_message
key_info = CallInfo(
token="test_token",
spend=81,
soft_budget=80,
max_budget=100,
user_id="test@test.com",
user_email="test@test.com",
key_alias="test-key",
event_group=Litellm_EntityType.KEY,
)
team_info = CallInfo(
token="test_token",
spend=160,
soft_budget=150,
max_budget=200,
team_id="team-123",
team_alias="engineering-team",
event_group=Litellm_EntityType.TEAM,
)
user_info = CallInfo(
token="test_token",
spend=45,
soft_budget=40,
max_budget=50,
user_id="user123",
event_group=Litellm_EntityType.USER,
)
key_no_max_budget_info = CallInfo(
token="test_token",
spend=90,
soft_budget=85,
user_id="dev@test.com",
user_email="dev@test.com",
key_alias="dev-key",
event_group=Litellm_EntityType.KEY,
)
@pytest.mark.parametrize(
"entity_info",
[
key_info,
team_info,
user_info,
key_no_max_budget_info,
],
)
@pytest.mark.asyncio
async def test_soft_budget_alerts_webhook(entity_info):
"""
Tests that soft budget alerts are triggered for different entity types.
Tests:
- Key with max budget
- Team
- User
- Key without max budget
"""
slack_alerting = SlackAlerting(alerting=["webhook"])
with patch.object(slack_alerting, "send_alert", new=AsyncMock()) as mock_send_alert:
# Test entity hit soft budget limit
await slack_alerting.budget_alerts(
type="soft_budget",
user_info=entity_info,
)
mock_send_alert.assert_called_once()
# Verify the webhook event
call_args = mock_send_alert.call_args[1]
logged_webhook_event: WebhookEvent = call_args["user_info"]
# Validate the webhook event has all expected fields
assert logged_webhook_event.spend == entity_info.spend
assert logged_webhook_event.soft_budget == entity_info.soft_budget
assert logged_webhook_event.max_budget == entity_info.max_budget
assert logged_webhook_event.user_id == entity_info.user_id
assert logged_webhook_event.user_email == entity_info.user_email
assert logged_webhook_event.key_alias == entity_info.key_alias
assert logged_webhook_event.event_group == entity_info.event_group