Files
litellm/tests/logging_callback_tests/test_custom_callback_router.py
T
Mateo Wang 2c733c00f5 chore(ci): modernize model references in tests and configs (#27856)
* test: modernize models used in CircleCI e2e test suites

Replaces obsolete models (gpt-4o, gpt-4o-mini, gpt-3.5-turbo,
claude-3-5-sonnet-20240620, claude-sonnet-4-20250514) with current
equivalents across the e2e_openai_endpoints and
proxy_e2e_anthropic_messages_tests CircleCI jobs.

- gpt-4o -> gpt-5.5 (responses API e2e tests)
- gpt-4o-mini -> gpt-5-mini (websocket responses, oai_misc_config)
- gpt-4o-mini-2024-07-18 -> gpt-4.1-mini-2025-04-14 (fine-tuning,
  still actively fine-tunable)
- gpt-4 / gpt-3.5-turbo target_model_names example -> gpt-5.5 /
  gpt-5-mini
- bedrock claude-3-5-sonnet-20240620 batch entry -> haiku-4-5-20251001
  (also aligning oai_misc_config model_name with what
  test_bedrock_batches_api.py actually requests)
- bedrock claude-sonnet-4-20250514 (deprecated, retires 2026-06-15)
  -> claude-sonnet-4-5-20250929

* test: point bedrock-claude-sonnet-4 alias at Sonnet 4.6, not 4.5

Greptile/Cursor flagged that after the previous commit, the
bedrock-claude-sonnet-4 alias collided with bedrock-claude-sonnet-4.5
(both pointed to claude-sonnet-4-5-20250929). Rename to
bedrock-claude-sonnet-4.6 and point it at the Sonnet 4.6 Bedrock ID
(us.anthropic.claude-sonnet-4-6, already in the litellm model
registry) so the alias name matches the underlying model version.

* test: modernize models across remaining CI-mounted configs & tests

Expands the modernization sweep to all CircleCI-mounted proxy configs
and to test directories where the model literal is a fixture/route key
(not the test's subject).

Config changes:
- proxy_server_config.yaml: bump gpt-3.5-turbo / gpt-3.5-turbo-1106 /
  gpt-4o / gemini-1.5-flash / dall-e-3 underlying models; rename
  gpt-3.5-turbo-end-user-test alias to gpt-5-mini-end-user-test; bump
  text-embedding-ada-002 underlying to text-embedding-3-small. User-
  facing aliases (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, etc.)
  preserved for backward compatibility with tests.
- simple_config.yaml, otel_test_config.yaml, spend_tracking_config.yaml:
  bump gpt-3.5-turbo underlying to gpt-5-mini.
- pass_through_config.yaml: claude-3-5-sonnet / claude-3-7-sonnet /
  claude-3-haiku entries replaced with claude-sonnet-4-5 / claude-
  haiku-4-5 / claude-opus-4-7.
- oai_misc_config.yaml: align alias name with the gpt-5-mini rename.

Test changes (proactive: claude-sonnet-4-20250514 / claude-opus-4-
20250514 retire 2026-06-15):
- tests/llm_translation/test_anthropic_completion.py: bump 3 references
  + paired Vertex AI ID to claude-sonnet-4-5.
- tests/llm_translation/test_optional_params.py: bump 2 references.
- tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py
  and test_bedrock_anthropic_messages_test.py: bump router fixtures
  using the deprecated model IDs.
- tests/pass_through_unit_tests/base_anthropic_messages_tool_search_test.py:
  modernize docstring examples.
- tests/test_end_users.py: update references to renamed alias.

* test: modernize placeholder model literals in router_unit_tests

Mass replace_all on fixture/placeholder model literals across the
router_unit_tests/ suite (model name is a routing key / label, not the
test subject). Sub-agent sweep so far — additional commits will follow
for logging_callback_tests/, enterprise/, top-level tests/test_*.py,
and other CI-mounted dirs.

Mappings applied:
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 / claude-3-opus-20240229 /
  claude-3-haiku-20240307 / claude-3-5-sonnet-20240620 ->
  claude-sonnet-4-5-20250929 / claude-opus-4-7 /
  claude-haiku-4-5-20251001 as appropriate

Explicitly preserved:
- gpt-4o-mini-* variants (transcribe, tts, etc.) where they're current
- gpt-4-turbo / gpt-4-vision-preview / gpt-4-0613 (subject literals)
- JSONL batch body literals
- Mock LLM response model fields (must match upstream)
- Fake/mock identifiers

* test: modernize placeholder model literals across remaining CI suites

Sub-agent sweep across logging_callback_tests/, guardrails_tests/,
enterprise/, pass_through_unit_tests/, otel_tests/,
llm_responses_api_testing/, batches_tests/, spend_tracking_tests/,
litellm_utils_tests/, unified_google_tests/, and a few top-level
tests/test_*.py files where the model literal is a fixture or
placeholder (router model_list, mock standard logging payload, mock
callback data) rather than the test's subject.

Mappings applied (see scope notes below):
- gpt-3.5-turbo -> gpt-5-mini
- gpt-4 (bare) -> gpt-5.5
- gpt-4o (bare) -> gpt-5.5 (corrected from initial gpt-5 — bare gpt-5
  is not a valid OpenAI alias; only gpt-5.5 / gpt-5.4 / gpt-5.2-codex
  / gpt-5-mini exist)
- gpt-4o-mini (bare) -> gpt-5-mini
- text-embedding-ada-002 -> text-embedding-3-small
- claude-3-sonnet-20240229 -> claude-sonnet-4-5-20250929
- claude-3-opus-20240229 -> claude-opus-4-7
- claude-3-haiku-20240307 -> claude-haiku-4-5-20251001
- claude-3-5-sonnet-20240620/20241022 -> claude-sonnet-4-5-20250929
- claude-3-7-sonnet-20250219 -> claude-sonnet-4-6
- gemini-1.5-flash -> gemini-2.5-flash
- gemini-1.5-pro -> gemini-2.5-pro

Explicitly preserved (not modernized):
- llm_translation/ tests where model is the SUBJECT (provider-specific
  translation/transformation logic). Only the deprecated 20250514
  references were already bumped in a prior commit.
- Cost-calc / tokenizer subject tests in test_utils.py (skip-ranges
  documented by the sub-agent).
- Bedrock model IDs in test_health_check.py path-stripping tests.
- JSONL batch request bodies and mock LLM response bodies (must match
  upstream literal).
- Langfuse expected-request-body JSON fixtures (cost values are exact-
  match-asserted; changing the model would shift response_cost).
- gpt-3.5-turbo-instruct (text-completion endpoint; no modern OpenAI
  equivalent).
- Top-level tests calling the proxy through user-facing aliases
  (gpt-3.5-turbo, gpt-4, text-embedding-ada-002, dall-e-3) — aliases
  in proxy_server_config.yaml stay; only the underlying model was
  bumped.
- tests/test_gpt5_azure_temperature_support.py (the test's whole point
  is model-name handling).
- Fake / mock / openai/fake identifiers.

Notable side fixes:
- test_spend_accuracy_tests.py: UPSTREAM_MODEL now matches what
  spend_tracking_config.yaml's proxy actually routes to (gpt-5-mini),
  resolving a latent inconsistency.
- proxy_server_config.yaml: bare `gpt-5` alias renamed to `gpt-5.5`
  (bare gpt-5 is not a valid OpenAI alias).
- test_batches_logging_unit_tests.py: explicit_models list entries
  kept distinct (gpt-5-mini + gpt-5.5) after bulk rename.

* test: fix CI failures from model modernization sweep

CI surfaced 4 categories of regression from the bulk modernization:

1. Azure deployment names are customer-specific. Reverted:
   - tests/litellm_utils_tests/test_health_check.py: azure/text-
     embedding-3-small -> azure/text-embedding-ada-002 (the CI Azure
     account does not have a text-embedding-3-small deployment).
   - tests/logging_callback_tests/test_custom_callback_router.py:
     same revert for two router fixtures driving aembedding.

2. gpt-5 family does not accept temperature != 1. Tests that pass a
   custom temperature swapped from gpt-5-mini to gpt-4.1-mini (modern
   non-reasoning OpenAI mini that still accepts temperature/logprobs):
   - tests/logging_callback_tests/test_datadog.py
   - tests/logging_callback_tests/test_langsmith_unit_test.py
   - tests/logging_callback_tests/test_otel_logging.py

3. proxy_server_config.yaml's gpt-3.5-turbo-large alias was routing to
   gpt-5.5 (a reasoning model that rejects logprobs). The proxy test
   tests/test_openai_endpoints.py::test_chat_completion_streaming
   exercises logprobs/top_logprobs through that alias. Bumped the
   underlying model to gpt-4.1 (non-reasoning, still modern).

4. tests/logging_callback_tests/test_gcs_pub_sub.py asserts against a
   pinned JSON fixture (gcs_pub_sub_body/spend_logs_payload.json) with
   hardcoded model="gpt-4o" and a model-specific spend value. Reverted
   the litellm.acompletion calls in the test to model="gpt-4o" so the
   fixture's exact-match assertions still hold.

5. tests/pass_through_unit_tests/test_anthropic_messages_passthrough.py:
   anthropic.messages.create routing to openai/gpt-5-mini returned an
   empty content[0] with max_tokens=100 (reasoning-token consumption).
   Swapped to openai/gpt-4.1-mini.

* test: fix Assistants API model + 2 cursor[bot] review nits

1. pass_through_unit_tests/test_custom_logger_passthrough.py: gpt-5.5
   isn't accepted by the /v1/assistants endpoint
   ("unsupported_model"). Switch to gpt-4.1-mini (modern, Assistants-
   API-supported, non-reasoning).

2. example_config_yaml/pass_through_config.yaml: the previous sweep
   bumped the claude-3-7-sonnet alias to claude-opus-4-7, which is a
   tier change (Sonnet -> Opus). Map to claude-sonnet-4-6 to keep the
   Sonnet tier intact. (Cursor bugbot review.)

3. example_config_yaml/simple_config.yaml: model_name was left as
   gpt-3.5-turbo while the underlying was bumped to gpt-5-mini, which
   muddles the "simple" example. Make both sides gpt-5-mini so the
   most basic example is a straight 1:1 mapping again. (Cursor bugbot
   review.)

* fix: revert gpt-4/gpt-3.5-turbo alias underlying to non-reasoning models

tests/test_openai_endpoints.py::test_completion calls the proxy alias
"gpt-4" with temperature=0, and other tests call gpt-3.5-turbo with
custom temperature / logprobs / the legacy /v1/completions endpoint.
The earlier modernization mapped both aliases to gpt-5.5 / gpt-5-mini,
which are reasoning models that reject temperature != 1 and don't
expose /v1/completions. Map the aliases to gpt-4.1 / gpt-4.1-mini
(modern non-reasoning OpenAI models) instead — keeps user-facing
aliases preserved while picking a current underlying that still
supports the parameters/endpoints the tests exercise.
2026-05-15 15:44:28 -07:00

828 lines
33 KiB
Python

### What this tests ####
## This test asserts the type of data passed into each method of the custom callback handler
import asyncio
import inspect
import os
import sys
import time
import traceback
from datetime import datetime
import pytest
sys.path.insert(0, os.path.abspath("../.."))
from typing import List, Literal, Optional
from unittest.mock import AsyncMock, MagicMock, patch
import litellm
from litellm import Cache, Router
from litellm.integrations.custom_logger import CustomLogger
# Test Scenarios (test across completion, streaming, embedding)
## 1: Pre-API-Call
## 2: Post-API-Call
## 3: On LiteLLM Call success
## 4: On LiteLLM Call failure
## fallbacks
## retries
# Test cases
## 1. Simple Azure OpenAI acompletion + streaming call
## 2. Simple Azure OpenAI aembedding call
## 3. Azure OpenAI acompletion + streaming call with retries
## 4. Azure OpenAI aembedding call with retries
## 5. Azure OpenAI acompletion + streaming call with fallbacks
## 6. Azure OpenAI aembedding call with fallbacks
## Test interfaces
## 1. router.completion() + router.embeddings()
## 2. proxy.completions + proxy.embeddings
litellm.num_retries = 0
class CompletionCustomHandler(
CustomLogger
): # https://docs.litellm.ai/docs/observability/custom_callback#callback-class
"""
The set of expected inputs to a custom handler for a
"""
# Class variables or attributes
def __init__(self):
self.errors = []
self.states: Optional[
List[
Literal[
"sync_pre_api_call",
"async_pre_api_call",
"post_api_call",
"sync_stream",
"async_stream",
"sync_success",
"async_success",
"sync_failure",
"async_failure",
]
]
] = []
def log_pre_api_call(self, model, messages, kwargs):
try:
print(f"received kwargs in pre-input: {kwargs}")
self.states.append("sync_pre_api_call")
## MODEL
assert isinstance(model, str)
## MESSAGES
assert isinstance(messages, list)
## KWARGS
assert isinstance(kwargs["model"], str)
assert isinstance(kwargs["messages"], list)
assert isinstance(kwargs["optional_params"], dict)
assert isinstance(kwargs["litellm_params"], dict)
assert isinstance(kwargs["start_time"], (datetime, type(None)))
assert isinstance(kwargs["stream"], bool)
assert isinstance(kwargs["user"], (str, type(None)))
### ROUTER-SPECIFIC KWARGS
assert isinstance(kwargs["litellm_params"]["metadata"], dict)
assert isinstance(kwargs["litellm_params"]["metadata"]["model_group"], str)
assert isinstance(kwargs["litellm_params"]["metadata"]["deployment"], str)
assert isinstance(kwargs["litellm_params"]["model_info"], dict)
assert isinstance(kwargs["litellm_params"]["model_info"]["id"], str)
assert isinstance(
kwargs["litellm_params"]["proxy_server_request"], (str, type(None))
)
assert isinstance(
kwargs["litellm_params"]["preset_cache_key"], (str, type(None))
)
assert isinstance(kwargs["litellm_params"]["stream_response"], dict)
except Exception as e:
print(f"Assertion Error: {traceback.format_exc()}")
self.errors.append(traceback.format_exc())
def log_post_api_call(self, kwargs, response_obj, start_time, end_time):
try:
self.states.append("post_api_call")
## START TIME
assert isinstance(start_time, datetime)
## END TIME
assert end_time == None
## RESPONSE OBJECT
assert response_obj == None
## KWARGS
assert isinstance(kwargs["model"], str)
assert isinstance(kwargs["messages"], list)
assert isinstance(kwargs["optional_params"], dict)
assert isinstance(kwargs["litellm_params"], dict)
assert isinstance(kwargs["start_time"], (datetime, type(None)))
assert isinstance(kwargs["stream"], bool)
assert isinstance(kwargs["user"], (str, type(None)))
assert isinstance(kwargs["input"], (list, dict, str))
assert isinstance(kwargs["api_key"], (str, type(None)))
assert (
isinstance(
kwargs["original_response"], (str, litellm.CustomStreamWrapper)
)
or inspect.iscoroutine(kwargs["original_response"])
or inspect.isasyncgen(kwargs["original_response"])
)
assert isinstance(kwargs["additional_args"], (dict, type(None)))
assert isinstance(kwargs["log_event_type"], str)
### ROUTER-SPECIFIC KWARGS
assert isinstance(kwargs["litellm_params"]["metadata"], dict)
assert isinstance(kwargs["litellm_params"]["metadata"]["model_group"], str)
assert isinstance(kwargs["litellm_params"]["metadata"]["deployment"], str)
assert isinstance(kwargs["litellm_params"]["model_info"], dict)
assert isinstance(kwargs["litellm_params"]["model_info"]["id"], str)
assert isinstance(
kwargs["litellm_params"]["proxy_server_request"], (str, type(None))
)
assert isinstance(
kwargs["litellm_params"]["preset_cache_key"], (str, type(None))
)
assert isinstance(kwargs["litellm_params"]["stream_response"], dict)
except Exception:
print(f"Assertion Error: {traceback.format_exc()}")
self.errors.append(traceback.format_exc())
async def async_log_stream_event(self, kwargs, response_obj, start_time, end_time):
try:
self.states.append("async_stream")
## START TIME
assert isinstance(start_time, datetime)
## END TIME
assert isinstance(end_time, datetime)
## RESPONSE OBJECT
assert isinstance(response_obj, litellm.ModelResponseStream)
## KWARGS
assert isinstance(kwargs["model"], str)
assert isinstance(kwargs["messages"], list) and isinstance(
kwargs["messages"][0], dict
)
assert isinstance(kwargs["optional_params"], dict)
assert isinstance(kwargs["litellm_params"], dict)
assert isinstance(kwargs["start_time"], (datetime, type(None)))
assert isinstance(kwargs["stream"], bool)
assert isinstance(kwargs["user"], (str, type(None)))
assert (
isinstance(kwargs["input"], list)
and isinstance(kwargs["input"][0], dict)
) or isinstance(kwargs["input"], (dict, str))
assert isinstance(kwargs["api_key"], (str, type(None)))
assert (
isinstance(
kwargs["original_response"], (str, litellm.CustomStreamWrapper)
)
or inspect.isasyncgen(kwargs["original_response"])
or inspect.iscoroutine(kwargs["original_response"])
)
assert isinstance(kwargs["additional_args"], (dict, type(None)))
assert isinstance(kwargs["log_event_type"], str)
except Exception:
print(f"Assertion Error: {traceback.format_exc()}")
self.errors.append(traceback.format_exc())
def log_success_event(self, kwargs, response_obj, start_time, end_time):
try:
self.states.append("sync_success")
## START TIME
assert isinstance(start_time, datetime)
## END TIME
assert isinstance(end_time, datetime)
## RESPONSE OBJECT
assert isinstance(response_obj, litellm.ModelResponse)
## KWARGS
assert isinstance(kwargs["model"], str)
assert isinstance(kwargs["messages"], list) and isinstance(
kwargs["messages"][0], dict
)
assert isinstance(kwargs["optional_params"], dict)
assert isinstance(kwargs["litellm_params"], dict)
assert isinstance(kwargs["start_time"], (datetime, type(None)))
assert isinstance(kwargs["stream"], bool)
assert isinstance(kwargs["user"], (str, type(None)))
assert (
isinstance(kwargs["input"], list)
and isinstance(kwargs["input"][0], dict)
) or isinstance(kwargs["input"], (dict, str))
assert isinstance(kwargs["api_key"], (str, type(None)))
assert isinstance(
kwargs["original_response"], (str, litellm.CustomStreamWrapper)
)
assert isinstance(kwargs["additional_args"], (dict, type(None)))
assert isinstance(kwargs["log_event_type"], str)
assert kwargs["cache_hit"] is None or isinstance(kwargs["cache_hit"], bool)
except Exception:
print(f"Assertion Error: {traceback.format_exc()}")
self.errors.append(traceback.format_exc())
def log_failure_event(self, kwargs, response_obj, start_time, end_time):
try:
self.states.append("sync_failure")
## START TIME
assert isinstance(start_time, datetime)
## END TIME
assert isinstance(end_time, datetime)
## RESPONSE OBJECT
assert response_obj == None
## KWARGS
assert isinstance(kwargs["model"], str)
assert isinstance(kwargs["messages"], list) and isinstance(
kwargs["messages"][0], dict
)
assert isinstance(kwargs["optional_params"], dict)
assert isinstance(kwargs["litellm_params"], dict)
assert isinstance(kwargs["start_time"], (datetime, type(None)))
assert isinstance(kwargs["stream"], bool)
assert isinstance(kwargs["user"], (str, type(None)))
assert (
isinstance(kwargs["input"], list)
and isinstance(kwargs["input"][0], dict)
) or isinstance(kwargs["input"], (dict, str))
assert isinstance(kwargs["api_key"], (str, type(None)))
assert (
isinstance(
kwargs["original_response"], (str, litellm.CustomStreamWrapper)
)
or kwargs["original_response"] == None
)
assert isinstance(kwargs["additional_args"], (dict, type(None)))
assert isinstance(kwargs["log_event_type"], str)
except Exception:
print(f"Assertion Error: {traceback.format_exc()}")
self.errors.append(traceback.format_exc())
async def async_log_pre_api_call(self, model, messages, kwargs):
try:
"""
No-op.
Not implemented yet.
"""
pass
except Exception as e:
print(f"Assertion Error: {traceback.format_exc()}")
self.errors.append(traceback.format_exc())
async def async_log_success_event(self, kwargs, response_obj, start_time, end_time):
try:
print("CompletionCustomHandler.async_log_success_event, kwargs: ", kwargs)
self.states.append("async_success")
print(
"############### CompletionCustomHandler async success, kwargs: ",
kwargs,
)
## START TIME
assert isinstance(start_time, datetime)
## END TIME
assert isinstance(end_time, datetime)
## RESPONSE OBJECT
assert isinstance(
response_obj, (litellm.ModelResponse, litellm.EmbeddingResponse)
)
## KWARGS
assert isinstance(kwargs["model"], str)
# checking we use base_model for azure cost calculation
base_model = litellm.utils._get_base_model_from_metadata(
model_call_details=kwargs
)
if (
kwargs["model"] == "chatgpt-v-3"
and base_model is not None
and kwargs["stream"] != True
):
# when base_model is set for azure, we should use pricing for the base_model
# this checks response_cost == litellm.cost_per_token(model=base_model)
assert isinstance(kwargs["response_cost"], float)
response_cost = kwargs["response_cost"]
print(
f"response_cost: {response_cost}, for model: {kwargs['model']} and base_model: {base_model}"
)
prompt_tokens = response_obj.usage.prompt_tokens
completion_tokens = response_obj.usage.completion_tokens
# ensure the pricing is based on the base_model here
prompt_price, completion_price = litellm.cost_per_token(
model=base_model,
prompt_tokens=prompt_tokens,
completion_tokens=completion_tokens,
)
expected_price = prompt_price + completion_price
print(f"expected price: {expected_price}")
assert (
response_cost == expected_price
), f"response_cost: {response_cost} != expected_price: {expected_price}. For model: {kwargs['model']} and base_model: {base_model}. should have used base_model for price"
assert isinstance(kwargs["messages"], list)
assert isinstance(kwargs["optional_params"], dict)
assert isinstance(kwargs["litellm_params"], dict)
assert isinstance(kwargs["start_time"], (datetime, type(None)))
assert isinstance(kwargs["stream"], bool)
assert isinstance(kwargs["user"], (str, type(None)))
assert isinstance(kwargs["input"], (list, dict, str))
assert isinstance(kwargs["api_key"], (str, type(None)))
assert (
isinstance(
kwargs["original_response"], (str, litellm.CustomStreamWrapper)
)
or inspect.isasyncgen(kwargs["original_response"])
or inspect.iscoroutine(kwargs["original_response"])
)
assert isinstance(kwargs["additional_args"], (dict, type(None)))
assert isinstance(kwargs["log_event_type"], str)
assert kwargs["cache_hit"] is None or isinstance(kwargs["cache_hit"], bool)
### ROUTER-SPECIFIC KWARGS
assert isinstance(kwargs["litellm_params"]["metadata"], dict)
assert isinstance(kwargs["litellm_params"]["metadata"]["model_group"], str)
assert isinstance(kwargs["litellm_params"]["metadata"]["deployment"], str)
assert isinstance(kwargs["litellm_params"]["model_info"], dict)
assert isinstance(kwargs["litellm_params"]["model_info"]["id"], str)
assert isinstance(
kwargs["litellm_params"]["proxy_server_request"], (str, type(None))
)
assert isinstance(
kwargs["litellm_params"]["preset_cache_key"], (str, type(None))
)
assert isinstance(kwargs["litellm_params"]["stream_response"], dict)
except Exception:
print(f"Assertion Error: {traceback.format_exc()}")
self.errors.append(traceback.format_exc())
async def async_log_failure_event(self, kwargs, response_obj, start_time, end_time):
try:
print(f"received original response: {kwargs['original_response']}")
self.states.append("async_failure")
## START TIME
assert isinstance(start_time, datetime)
## END TIME
assert isinstance(end_time, datetime)
## RESPONSE OBJECT
assert response_obj == None
## KWARGS
assert isinstance(kwargs["model"], str)
assert isinstance(kwargs["messages"], list)
assert isinstance(kwargs["optional_params"], dict)
assert isinstance(kwargs["litellm_params"], dict)
assert isinstance(kwargs["start_time"], (datetime, type(None)))
assert isinstance(kwargs["stream"], bool)
assert isinstance(kwargs["user"], (str, type(None)))
assert isinstance(kwargs["input"], (list, str, dict))
assert isinstance(kwargs["api_key"], (str, type(None)))
assert (
isinstance(
kwargs["original_response"], (str, litellm.CustomStreamWrapper)
)
or inspect.isasyncgen(kwargs["original_response"])
or inspect.iscoroutine(kwargs["original_response"])
or kwargs["original_response"] == None
)
assert isinstance(kwargs["additional_args"], (dict, type(None)))
assert isinstance(kwargs["log_event_type"], str)
except Exception:
print(f"Assertion Error: {traceback.format_exc()}")
self.errors.append(traceback.format_exc())
# Simple Azure OpenAI call
## COMPLETION
# @pytest.mark.flaky(retries=5, delay=1)
@pytest.mark.asyncio
async def test_async_chat_azure():
try:
customHandler_completion_azure_router = CompletionCustomHandler()
customHandler_streaming_azure_router = CompletionCustomHandler()
customHandler_failure = CompletionCustomHandler()
litellm.callbacks = [customHandler_completion_azure_router]
litellm.set_verbose = True
model_list = [
{
"model_name": "gpt-4.1-nano", # openai model name
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/gpt-4.1-mini",
"api_key": os.getenv("AZURE_AI_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_AI_API_BASE"),
},
"model_info": {"base_model": "azure/gpt-4.1-mini"},
"tpm": 240000,
"rpm": 1800,
},
]
router = Router(model_list=model_list, num_retries=0) # type: ignore
response = await router.acompletion(
model="gpt-4.1-nano",
messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}],
)
print("got response, sleeping 5 seconds....")
await asyncio.sleep(5)
assert len(customHandler_completion_azure_router.errors) == 0
assert (
len(customHandler_completion_azure_router.states) == 3
) # pre, post, success
# streaming
litellm.logging_callback_manager._reset_all_callbacks()
litellm.callbacks = [customHandler_streaming_azure_router]
router2 = Router(model_list=model_list, num_retries=0) # type: ignore
response = await router2.acompletion(
model="gpt-4.1-nano",
messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}],
stream=True,
)
async for chunk in response:
print(f"async azure router chunk: {chunk}")
continue
await asyncio.sleep(5)
print(f"customHandler.states: {customHandler_streaming_azure_router.states}")
assert len(customHandler_streaming_azure_router.errors) == 0
assert (
len(customHandler_streaming_azure_router.states) >= 3
) # pre, post, stream (multiple times), success
# failure
model_list = [
{
"model_name": "gpt-5-mini", # openai model name
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/gpt-4o-new-test",
"api_key": "my-bad-key",
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_AI_API_BASE"),
},
"tpm": 240000,
"rpm": 1800,
},
]
litellm.logging_callback_manager._reset_all_callbacks()
litellm.callbacks = [customHandler_failure]
router3 = Router(model_list=model_list, num_retries=0) # type: ignore
try:
response = await router3.acompletion(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}],
)
print(f"response in router3 acompletion: {response}")
except Exception:
pass
await asyncio.sleep(5)
print(f"customHandler.states: {customHandler_failure.states}")
assert len(customHandler_failure.errors) == 0
assert len(customHandler_failure.states) == 3 # pre, post, failure
assert "async_failure" in customHandler_failure.states
except Exception as e:
print(f"Assertion Error: {traceback.format_exc()}")
pytest.fail(f"An exception occurred - {str(e)}")
## EMBEDDING
@pytest.mark.asyncio
async def test_async_embedding_azure():
try:
customHandler = CompletionCustomHandler()
customHandler_failure = CompletionCustomHandler()
litellm.callbacks = [customHandler]
model_list = [
{
"model_name": "azure-embedding-model", # openai model name
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/text-embedding-ada-002",
"api_key": os.getenv("AZURE_AI_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_AI_API_BASE"),
},
"tpm": 240000,
"rpm": 1800,
},
]
router = Router(model_list=model_list) # type: ignore
response = await router.aembedding(
model="azure-embedding-model", input=["hello from litellm!"]
)
await asyncio.sleep(2)
assert len(customHandler.errors) == 0
assert len(customHandler.states) == 3 # pre, post, success
# failure
model_list = [
{
"model_name": "azure-embedding-model", # openai model name
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/text-embedding-ada-002",
"api_key": "my-bad-key",
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_AI_API_BASE"),
},
"tpm": 240000,
"rpm": 1800,
},
]
litellm.logging_callback_manager._reset_all_callbacks()
litellm.callbacks = [customHandler_failure]
router3 = Router(model_list=model_list, num_retries=0) # type: ignore
try:
response = await router3.aembedding(
model="azure-embedding-model", input=["hello from litellm!"]
)
print(f"response in router3 aembedding: {response}")
except Exception:
pass
await asyncio.sleep(1)
print(f"customHandler.states: {customHandler_failure.states}")
assert len(customHandler_failure.errors) == 0
assert len(customHandler_failure.states) == 3 # pre, post, failure
assert "async_failure" in customHandler_failure.states
except Exception as e:
print(f"Assertion Error: {traceback.format_exc()}")
pytest.fail(f"An exception occurred - {str(e)}")
# asyncio.run(test_async_embedding_azure())
# Azure OpenAI call w/ Fallbacks
## COMPLETION
@pytest.mark.asyncio
async def test_async_chat_azure_with_fallbacks():
try:
customHandler_fallbacks = CompletionCustomHandler()
litellm.callbacks = [customHandler_fallbacks]
litellm.set_verbose = True
# with fallbacks
model_list = [
{
"model_name": "gpt-5-mini", # openai model name
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/gpt-4.1-mini",
"api_key": "my-bad-key",
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_AI_API_BASE"),
},
"tpm": 240000,
"rpm": 1800,
},
{
"model_name": "gpt-3.5-turbo-16k",
"litellm_params": {
"model": "gpt-3.5-turbo-16k",
},
"tpm": 240000,
"rpm": 1800,
},
]
router = Router(
model_list=model_list,
fallbacks=[{"gpt-5-mini": ["gpt-3.5-turbo-16k"]}],
retry_policy=litellm.router.RetryPolicy(
AuthenticationErrorRetries=0,
),
) # type: ignore
response = await router.acompletion(
model="gpt-5-mini",
messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}],
)
await asyncio.sleep(2)
print(f"customHandler_fallbacks.states: {customHandler_fallbacks.states}")
assert len(customHandler_fallbacks.errors) == 0
assert (
len(customHandler_fallbacks.states) == 6
) # pre, post, failure, pre, post, success
litellm.callbacks = []
except Exception as e:
print(f"Assertion Error: {traceback.format_exc()}")
pytest.fail(f"An exception occurred - {str(e)}")
# asyncio.run(test_async_chat_azure_with_fallbacks())
# CACHING
## Test Azure - completion, embedding
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=1)
async def test_async_completion_azure_caching():
customHandler_caching = CompletionCustomHandler()
litellm.cache = Cache(
type="redis",
host=os.environ["REDIS_HOST"],
port=os.environ["REDIS_PORT"],
password=os.environ["REDIS_PASSWORD"],
)
litellm.callbacks = [customHandler_caching]
unique_time = time.time()
model_list = [
{
"model_name": "gpt-4.1-nano", # openai model name
"litellm_params": { # params for litellm completion/embedding call
"model": "azure/gpt-4.1-mini",
"api_key": os.getenv("AZURE_AI_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_AI_API_BASE"),
},
"tpm": 240000,
"rpm": 1800,
},
{
"model_name": "gpt-3.5-turbo-16k",
"litellm_params": {
"model": "gpt-3.5-turbo-16k",
},
"tpm": 240000,
"rpm": 1800,
},
]
router = Router(model_list=model_list) # type: ignore
response1 = await router.acompletion(
model="gpt-4.1-nano",
messages=[
{"role": "user", "content": f"Hi 👋 - i'm async azure {unique_time}"}
],
caching=True,
)
await asyncio.sleep(1)
print(f"customHandler_caching.states pre-cache hit: {customHandler_caching.states}")
response2 = await router.acompletion(
model="gpt-4.1-nano",
messages=[
{"role": "user", "content": f"Hi 👋 - i'm async azure {unique_time}"}
],
caching=True,
)
await asyncio.sleep(1) # success callbacks are done in parallel
print(
f"customHandler_caching.states post-cache hit: {customHandler_caching.states}"
)
assert len(customHandler_caching.errors) == 0
assert len(customHandler_caching.states) == 4 # pre, post, success, success
@pytest.mark.asyncio
async def test_async_completion_azure_caching_streaming():
import copy
import uuid
litellm.set_verbose = True
customHandler_caching = CompletionCustomHandler()
litellm.cache = Cache(
type="redis",
host=os.environ["REDIS_HOST"],
port=os.environ["REDIS_PORT"],
password=os.environ["REDIS_PASSWORD"],
)
litellm.callbacks = [customHandler_caching]
unique_time = uuid.uuid4()
# Use Router instead of direct litellm.acompletion to get router-specific metadata
model_list = [
{
"model_name": "gpt-4.1-nano",
"litellm_params": {
"model": "azure/gpt-4.1-mini",
"api_key": os.getenv("AZURE_AI_API_KEY"),
"api_version": os.getenv("AZURE_API_VERSION"),
"api_base": os.getenv("AZURE_AI_API_BASE"),
},
"tpm": 240000,
"rpm": 1800,
},
]
router = Router(model_list=model_list)
response1 = await router.acompletion(
model="gpt-4.1-nano",
messages=[
{"role": "user", "content": f"Hi 👋 - i'm async azure {unique_time}"}
],
caching=True,
stream=True,
)
async for chunk in response1:
print(f"chunk in response1: {chunk}")
await asyncio.sleep(1)
initial_customhandler_caching_states = len(customHandler_caching.states)
print(f"customHandler_caching.states pre-cache hit: {customHandler_caching.states}")
response2 = await router.acompletion(
model="gpt-4.1-nano",
messages=[
{"role": "user", "content": f"Hi 👋 - i'm async azure {unique_time}"}
],
caching=True,
stream=True,
)
async for chunk in response2:
print(f"chunk in response2: {chunk}")
await asyncio.sleep(1) # success callbacks are done in parallel
print(
f"customHandler_caching.states post-cache hit: {customHandler_caching.states}"
)
assert len(customHandler_caching.errors) == 0
assert (
len(customHandler_caching.states) > initial_customhandler_caching_states
) # pre, post, streaming .., success, success
@pytest.mark.asyncio
@pytest.mark.flaky(retries=3, delay=2)
async def test_async_embedding_azure_caching():
print("Testing custom callback input - Azure Caching")
customHandler_caching = CompletionCustomHandler()
litellm.cache = Cache(
type="redis",
host=os.environ["REDIS_HOST"],
port=os.environ["REDIS_PORT"],
password=os.environ["REDIS_PASSWORD"],
)
router = Router(
model_list=[
{
"model_name": "text-embedding-3-small",
"litellm_params": {
"model": "openai/text-embedding-3-small",
},
}
]
)
litellm.callbacks = [customHandler_caching]
unique_time = time.time()
response1 = await router.aembedding(
model="text-embedding-3-small",
input=[f"good morning from litellm1 {unique_time}"],
caching=True,
)
await asyncio.sleep(1) # set cache is async for aembedding()
response2 = await router.aembedding(
model="text-embedding-3-small",
input=[f"good morning from litellm1 {unique_time}"],
caching=True,
)
await asyncio.sleep(1) # success callbacks are done in parallel
print(customHandler_caching.states)
print(customHandler_caching.errors)
assert len(customHandler_caching.errors) == 0
assert len(customHandler_caching.states) == 4 # pre, post, success, success
@pytest.mark.asyncio
async def test_rate_limit_error_callback():
"""
Assert a callback is hit, if a model group starts hitting rate limit errors
Relevant issue: https://github.com/BerriAI/litellm/issues/4096
"""
from litellm.litellm_core_utils.litellm_logging import Logging as LiteLLMLogging
customHandler = CompletionCustomHandler()
litellm.callbacks = [customHandler]
litellm.success_callback = []
router = Router(
model_list=[
{
"model_name": "my-test-gpt",
"litellm_params": {
"model": "gpt-5-mini",
"mock_response": "litellm.RateLimitError",
},
}
],
allowed_fails=2,
num_retries=0,
)
litellm_logging_obj = LiteLLMLogging(
model="my-test-gpt",
messages=[{"role": "user", "content": "hi"}],
stream=False,
call_type="acompletion",
litellm_call_id="1234",
start_time=datetime.now(),
function_id="1234",
)
try:
_ = await router.acompletion(
model="my-test-gpt",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
)
except Exception:
pass
with patch.object(
customHandler, "log_model_group_rate_limit_error", new=AsyncMock()
) as mock_client:
print(
f"customHandler.log_model_group_rate_limit_error: {customHandler.log_model_group_rate_limit_error}"
)
try:
_ = await router.acompletion(
model="my-test-gpt",
messages=[{"role": "user", "content": "Hey, how's it going?"}],
litellm_logging_obj=litellm_logging_obj,
)
except (litellm.RateLimitError, ValueError):
pass
await asyncio.sleep(3)
mock_client.assert_called_once()
assert "original_model_group" in mock_client.call_args.kwargs
assert mock_client.call_args.kwargs["original_model_group"] == "my-test-gpt"