mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-18 00:48:01 +00:00
17e145a083
The model_max_budget limiter tracks spend in one code path (async_log_success_event) and enforces budget limits in another (is_key_within_model_budget via user_api_key_auth). These two paths used different model name formats to build cache keys: - Tracking used standard_logging_payload["model"], which is the deployment-level model name (e.g. "vertex_ai/claude-opus-4-6@default") - Enforcement used request_data["model"], which is the model group alias (e.g. "claude-opus-4-6") Because the cache keys never matched, the enforcement path always read None for current spend, silently allowing all requests through even after the budget was exceeded. This affected any provider that decorates model names with provider prefixes or version suffixes (Vertex AI, Bedrock, etc.). Fix: use model_group (the user-facing alias) from StandardLoggingPayload for spend tracking, falling back to model when model_group is None. This aligns the tracking cache key with the enforcement cache key. Fixes the same root cause reported in #15223 and #10052. Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>