Commit Graph

8 Commits

Author SHA1 Message Date
Ephrim Stanley ae0769b1df fix: guard empty-dict team limits and malformed int in deployment default limits
- Change `if team_limit:` to `if team_limit is not None:` in both
  get_key_model_rpm_limit and get_key_model_tpm_limit so that an
  explicitly-empty team rate-limit map ({}) is returned as-is instead
  of silently falling through to deployment defaults (P1 fix).
- Replace the bare `int()` list comprehension in _get_deployment_default_limit
  with a loop that catches ValueError/TypeError so malformed config strings
  do not raise an unhandled exception during request handling (P2 fix).
- Add corresponding unit tests for both edge cases.

Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
2026-03-19 07:40:47 -04:00
Ephrim Stanley 477c54184b perf: avoid unconditional router lookups in success handler
Replace bare _get_deployment_default_tpm/rpm_limit calls in the
async_log_success_event condition with get_key_model_tpm/rpm_limit
(model_name=model_group). The higher-level getters short-circuit on
key/team metadata hits before ever reaching the router, so requests
that don't use deployment defaults incur no extra router lookup. Remove
the now-unused bare helper imports.

Also fix invalid `int = None` type hints in test helper signatures
to `Optional[int] = None`.

Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
2026-03-19 02:07:50 -04:00
Ephrim Stanley 36dc893770 fix: address review feedback on default tpm/rpm limits
- Use min() across all matching deployments instead of first-wins when
  resolving default_api_key_tpm/rpm_limit for a model group, so
  load-balanced setups with different per-deployment limits always apply
  the most conservative value
- Replace the global SensitiveDataMasker non_sensitive_overrides change
  with a targeted excluded_keys set at the remove_sensitive_info_from_deployment
  call site, avoiding unintended suppression of other fields
- Update the v1 parallel request limiter to pass model_name to
  get_key_model_tpm/rpm_limit so deployment defaults apply there too
- Add 4 tests covering multi-deployment min semantics

Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
2026-03-19 01:43:27 -04:00
Ephrim Stanley cac685014f feat: add proxy-wide default tpm/rpm limits per deployment
Adds `default_api_key_tpm_limit` and `default_api_key_rpm_limit` to
`GenericLiteLLMParams` so operators can set per-deployment rate limit
defaults in config.yaml. When a key has no model-specific tpm/rpm limit
configured, the proxy falls back to these deployment defaults (Case 2 in
spec). Key-level limits always take priority (Case 1).

- Extends `get_key_model_tpm_limit` / `get_key_model_rpm_limit` with a
  `model_name` param and a priority-4 deployment-default fallback
- Passes `model_name=requested_model` in the parallel request limiter so
  the fallback is triggered at enforcement time
- Adds `"limit"` to `SensitiveDataMasker` non-sensitive overrides so
  `*_limit` fields are not masked in `/model/info` responses
- Adds 17 unit tests covering both spec cases and the `/model/info` path

Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
2026-03-19 01:30:18 -04:00
brtydse100 dd1ea3d39e Support multiple headers mapped to the customer user role (#23664)
* added the header mapping feature

* added tests

* final cleanup

* final cleanup

* added missing test and logic

* fixed header sending bug

* Update litellm/proxy/auth/auth_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* added back init file in responses + fixed test_auth_utils.py  int local_testing

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-15 14:20:45 +05:30
Jay Prajapati 582d324a76 fix(proxy): support slashes in google generateContent model names (#19737)
* fix(proxy): support slashes in google route params

* fix(proxy): extract google model ids with slashes

* test(proxy): cover google model ids with slashes
2026-01-25 22:59:50 -08:00
Ishaan Jaff 117c7dd158 [Feat] Claude Code - Add End-user tracking with Claude Code (#19171)
* add claude code customer usage tracking

* fix get end user trackign claude code

* TestGetCustomerIdFromStandardHeaders
2026-01-15 17:57:10 -08:00
Harshit Jain 8a683d9a6a Add fix for bedrock_cache, metadata and max_model_budget (#18872) 2026-01-10 01:09:00 +05:30