- Change `if team_limit:` to `if team_limit is not None:` in both
get_key_model_rpm_limit and get_key_model_tpm_limit so that an
explicitly-empty team rate-limit map ({}) is returned as-is instead
of silently falling through to deployment defaults (P1 fix).
- Replace the bare `int()` list comprehension in _get_deployment_default_limit
with a loop that catches ValueError/TypeError so malformed config strings
do not raise an unhandled exception during request handling (P2 fix).
- Add corresponding unit tests for both edge cases.
Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
Replace bare _get_deployment_default_tpm/rpm_limit calls in the
async_log_success_event condition with get_key_model_tpm/rpm_limit
(model_name=model_group). The higher-level getters short-circuit on
key/team metadata hits before ever reaching the router, so requests
that don't use deployment defaults incur no extra router lookup. Remove
the now-unused bare helper imports.
Also fix invalid `int = None` type hints in test helper signatures
to `Optional[int] = None`.
Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
- Use min() across all matching deployments instead of first-wins when
resolving default_api_key_tpm/rpm_limit for a model group, so
load-balanced setups with different per-deployment limits always apply
the most conservative value
- Replace the global SensitiveDataMasker non_sensitive_overrides change
with a targeted excluded_keys set at the remove_sensitive_info_from_deployment
call site, avoiding unintended suppression of other fields
- Update the v1 parallel request limiter to pass model_name to
get_key_model_tpm/rpm_limit so deployment defaults apply there too
- Add 4 tests covering multi-deployment min semantics
Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
Adds `default_api_key_tpm_limit` and `default_api_key_rpm_limit` to
`GenericLiteLLMParams` so operators can set per-deployment rate limit
defaults in config.yaml. When a key has no model-specific tpm/rpm limit
configured, the proxy falls back to these deployment defaults (Case 2 in
spec). Key-level limits always take priority (Case 1).
- Extends `get_key_model_tpm_limit` / `get_key_model_rpm_limit` with a
`model_name` param and a priority-4 deployment-default fallback
- Passes `model_name=requested_model` in the parallel request limiter so
the fallback is triggered at enforcement time
- Adds `"limit"` to `SensitiveDataMasker` non-sensitive overrides so
`*_limit` fields are not masked in `/model/info` responses
- Adds 17 unit tests covering both spec cases and the `/model/info` path
Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
* fix(proxy): support slashes in google route params
* fix(proxy): extract google model ids with slashes
* test(proxy): cover google model ids with slashes