Commit Graph

149 Commits

Author SHA1 Message Date
Krrish Dholakia 4c00a14ce0 fix: fix ci/cd + handle oidc jwt tokens 2026-03-30 16:12:58 -07:00
Yuneng Jiang 6522d282b5 [Fix] Correct kwarg name in test_user_api_key_auth tests
PR #24755 renamed `azure_api_key_header` to `AZURE_AI_API_KEY_header` in
the test file but did not update the actual function signatures of
`get_api_key()` and `_user_api_key_auth_builder()`, causing TypeError
on all affected test cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:51:09 -07:00
Krrish Dholakia bc829d51f2 test: test 2026-03-28 19:17:38 -07:00
ryan-crabbe-berri 2eb3c20e76 Merge pull request #24718 from BerriAI/litellm_ryan-march-26
litellm ryan march 26
2026-03-28 09:01:11 -07:00
ryan-crabbe-berri 726a34627c Merge pull request #24717 from BerriAI/litellm_fix-user-cache-invalidation
fix(jwt): invalidate user cache after role/team sync updates
2026-03-27 19:50:41 -07:00
Ryan Crabbe dd11e77852 fix: add explicit TTL to cache writes and test coverage for user cache invalidation
Add DEFAULT_MANAGEMENT_OBJECT_IN_MEMORY_CACHE_TTL to both async_set_cache
calls in sync_user_role_and_teams for consistency with all other user cache
writes. Add 3 tests covering cache invalidation on role change, team change,
and no-op when nothing changes.
2026-03-27 19:45:13 -07:00
Ryan Crabbe 8e3755931d test(auth): add regression tests for JWTHandler.is_jwt(None)
Add None-token test cases to both proxy_unit_tests and test_litellm
to cover the guard added in the previous commit. Also add -> bool
return type annotation to is_jwt().
2026-03-27 16:51:08 -07:00
michelligabriele d533b432fd fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters
Budget checks on API keys, teams, and team members were not enforced in
multi-pod deployments because user_api_key_cache is intentionally
in-memory-only. Each pod tracked spend independently, so with N pods
the effective budget was N × max_budget.

Introduces a separate spend_counter_cache (DualCache wired to
redis_usage_cache) with atomic increment/read helpers:
- increment_spend_counters(): awaited in cost callback (not create_task)
  to update both in-memory and Redis before the next auth check
- get_current_spend(): reads Redis first (cross-pod authoritative),
  falls back to in-memory, then to cached object .spend from DB

Budget check functions (_virtual_key_max_budget_check,
_team_max_budget_check, _check_team_member_budget) now read spend via
get_current_spend() instead of cached object .spend fields.

When Redis is not configured, falls back to in-memory-only counters
(same as current single-instance behavior).

Fixes #23714
2026-03-27 20:39:52 +01:00
yuneng-jiang 262534a3a5 Merge branch 'main' into litellm_dev_sameer_16_march_week 2026-03-21 14:30:57 -07:00
Ishaan Jaff 2ea9e207bd Litellm ishaan march 20 (#24303)
* feat(redis): add circuit breaker to RedisCache to fast-fail when Redis is down (#24181)

* feat(redis): add circuit breaker env var constants

* feat(redis): add RedisCircuitBreaker and apply guard decorator to all async ops

* fix(dual_cache): fall back to L1 instead of re-raising on Redis increment failures

* test(caching): add circuit breaker unit tests

* fix(redis): fast-fail concurrent HALF_OPEN probes — only one probe at a time

* fix(dual_cache): return None fallback when in_memory_cache is absent and Redis fails

* test(caching): add regression tests for HALF_OPEN concurrency and None fallback

* Fix blocking sync next in __anext__ (#24177)

* Fix blocking sync next

* Update tests/test_litellm/litellm_core_utils/test_streaming_handler.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix PEP 479 regression in __anext__ sync iterator exhaustion

asyncio.to_thread re-raises thread exceptions inside a coroutine, where
PEP 479 converts StopIteration to RuntimeError before any except clause
can catch it. Add _next_sync_or_exhausted() module-level helper that
catches StopIteration in the thread and returns a sentinel instead, then
raise StopAsyncIteration in the coroutine.

Also rewrites the non-blocking test to use asyncio.gather() instead of
asyncio.create_task() (which returned None on Python 3.9 / pytest-asyncio
in CI), and adds an exhaustion regression test that drains the wrapper
fully and asserts no RuntimeError leaks out.

---------

Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* feat: add git-subdir source type to claude-code/plugins API (#24223)

Support a third plugin source type `git-subdir` alongside the existing
`github` and `url` types, as documented in the official Claude Code
plugin marketplaces spec.

New format: {"source": "git-subdir", "url": "...", "path": "subdir/path"}

- Validates url and path fields are present and non-empty
- Rejects absolute paths, '..' segments, backslashes, and percent-encoded
  traversal sequences (including double-encoded variants via regex check)
- Extracts path validation into _validate_git_subdir_path() helper
- Updates Pydantic field description to document all three source types
- Adds isValidUrl() check for url/git-subdir source types in the UI form
- Adds "Git Subdir" option to the UI form with a required Path field
- Adds unit tests covering success, update, missing/empty fields,
  path traversal variants, and unknown source type

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add extract_header and extract_footer to Mistral OCR supported params (#24213)

* docs: add git-subdir source type to claude-code plugin marketplace docs (#24289)

* fix(ui): swap J/K keyboard navigation in log details drawer (#24279) (#24286)

J should navigate down (next) and K should navigate up (previous),
matching vim/standard conventions.

* fix: use async_set_cache in user_api_key_auth hot path (#24302)

* fix: use async_set_cache in auth hot path to avoid blocking event loop

* test: assert no blocking set_cache call in _user_api_key_auth_builder

* test: broaden blocking call check to all sync DualCache methods

* test: fix regression test to actually catch blocking cache calls

* fix: ruff lint unused variable + UI build MessageManager error

- litellm/caching/redis_cache.py: remove unused variable 'e' in circuit
  breaker exception handler (F841)
- add_plugin_form.tsx: use MessageManager.error() instead of undefined
  message.error() for git URL validation

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* docs: add REDIS_CIRCUIT_BREAKER env vars to config_settings reference

Add REDIS_CIRCUIT_BREAKER_FAILURE_THRESHOLD and
REDIS_CIRCUIT_BREAKER_RECOVERY_TIMEOUT to the environment variables
reference table so test_env_keys.py passes.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Vincenzo Barrea <manamana88@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Robert Kirscht <rkirscht242@gmail.com>
Co-authored-by: Imgyu Kim <kimimgo@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-03-21 12:40:11 -07:00
Sameer Kankute 4f1e484a9b Merge branch 'main' into litellm_dev_sameer_16_march_week
Resolve conflicts in common_request_processing.py (keep main streaming,
post_call_success_hook try/finally, deferred logging; retain skip_pre_call_logic)
and utils.py (defer + internal-call skip + sync success callbacks for all calls).

Tighten _has_post_call_guardrails for event_hook=None; align deferred
guardrail test. Sync model_prices_and_context_window_backup.json.

Pyright: narrow ignores for passthrough StreamingResponse and post_call hook.
Made-with: Cursor
2026-03-22 00:29:38 +05:30
Ephrim Stanley ae0769b1df fix: guard empty-dict team limits and malformed int in deployment default limits
- Change `if team_limit:` to `if team_limit is not None:` in both
  get_key_model_rpm_limit and get_key_model_tpm_limit so that an
  explicitly-empty team rate-limit map ({}) is returned as-is instead
  of silently falling through to deployment defaults (P1 fix).
- Replace the bare `int()` list comprehension in _get_deployment_default_limit
  with a loop that catches ValueError/TypeError so malformed config strings
  do not raise an unhandled exception during request handling (P2 fix).
- Add corresponding unit tests for both edge cases.

Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
2026-03-19 07:40:47 -04:00
Ephrim Stanley 477c54184b perf: avoid unconditional router lookups in success handler
Replace bare _get_deployment_default_tpm/rpm_limit calls in the
async_log_success_event condition with get_key_model_tpm/rpm_limit
(model_name=model_group). The higher-level getters short-circuit on
key/team metadata hits before ever reaching the router, so requests
that don't use deployment defaults incur no extra router lookup. Remove
the now-unused bare helper imports.

Also fix invalid `int = None` type hints in test helper signatures
to `Optional[int] = None`.

Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
2026-03-19 02:07:50 -04:00
Ephrim Stanley 36dc893770 fix: address review feedback on default tpm/rpm limits
- Use min() across all matching deployments instead of first-wins when
  resolving default_api_key_tpm/rpm_limit for a model group, so
  load-balanced setups with different per-deployment limits always apply
  the most conservative value
- Replace the global SensitiveDataMasker non_sensitive_overrides change
  with a targeted excluded_keys set at the remove_sensitive_info_from_deployment
  call site, avoiding unintended suppression of other fields
- Update the v1 parallel request limiter to pass model_name to
  get_key_model_tpm/rpm_limit so deployment defaults apply there too
- Add 4 tests covering multi-deployment min semantics

Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
2026-03-19 01:43:27 -04:00
Ephrim Stanley cac685014f feat: add proxy-wide default tpm/rpm limits per deployment
Adds `default_api_key_tpm_limit` and `default_api_key_rpm_limit` to
`GenericLiteLLMParams` so operators can set per-deployment rate limit
defaults in config.yaml. When a key has no model-specific tpm/rpm limit
configured, the proxy falls back to these deployment defaults (Case 2 in
spec). Key-level limits always take priority (Case 1).

- Extends `get_key_model_tpm_limit` / `get_key_model_rpm_limit` with a
  `model_name` param and a priority-4 deployment-default fallback
- Passes `model_name=requested_model` in the parallel request limiter so
  the fallback is triggered at enforcement time
- Adds `"limit"` to `SensitiveDataMasker` non-sensitive overrides so
  `*_limit` fields are not masked in `/model/info` responses
- Adds 17 unit tests covering both spec cases and the `/model/info` path

Co-Authored-By: Claude (claude-sonnet-4-6) <noreply@anthropic.com>
2026-03-19 01:30:18 -04:00
Sameer Kankute ab1744f9fe fix(proxy): scope wildcard cleanup to subpath entries and restore registry in test
- Only remove wildcard path from openai_routes when the route entry has
  type="subpath", avoiding accidental removal when two endpoints share
  the same base path but differ in include_subpath
- Clean up _registered_pass_through_routes in the test finally block to
  prevent stale entries from polluting subsequent tests on failure
2026-03-19 09:53:22 +05:30
Sameer Kankute 97b7358791 fix(proxy): dedup openai_routes on reload and clean up on endpoint removal
- Add dedup guard for base path registration (prevents unbounded list
  growth on config reload)
- Clean up base path and wildcard path from openai_routes when an
  endpoint is removed via remove_endpoint_routes
- Rewrite test to exercise initialize_pass_through_endpoints directly,
  covering registration, dedup on reload, and cleanup on removal
2026-03-19 09:44:30 +05:30
Sameer Kankute 4829de6102 fix(proxy): allow non-admin users to access pass-through subpath routes with auth
When a pass-through endpoint has both auth=true and include_subpath=true,
non-admin users got 401 errors on subpath requests because only the base
path was registered in openai_routes. Now the wildcard path is also
registered so the auth check recognizes subpath requests as LLM API routes.

Also fixes pre-existing pyright error where logging_obj was possibly
unbound in the except block.
2026-03-19 09:28:55 +05:30
brtydse100 dd1ea3d39e Support multiple headers mapped to the customer user role (#23664)
* added the header mapping feature

* added tests

* final cleanup

* final cleanup

* added missing test and logic

* fixed header sending bug

* Update litellm/proxy/auth/auth_utils.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* added back init file in responses + fixed test_auth_utils.py  int local_testing

---------

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-03-15 14:20:45 +05:30
yuneng-jiang 0b3dc00440 Merge remote-tracking branch 'origin' into litellm_internal_dev_03_12_2026 2026-03-13 15:11:49 -07:00
Chesars 4e6e1d8de8 merge: resolve conflicts with upstream staging (bedrock + mcp tests)
Keep both sets of tests: upstream's OAuth2 token injection test and
our case-insensitive tool matching tests. Use upstream's version of
the bedrock output_config test (more comprehensive).
2026-03-12 13:40:16 -03:00
Chesars feed274aa3 Reapply "feat: add model_cost aliases expansion support"
This reverts commit 3d2df7e8b5.
2026-03-12 13:36:57 -03:00
Chesars 1be6b31e2f merge: resolve conflicts between main and litellm_oss_staging_03_11_2026 2026-03-12 09:38:31 -03:00
Cursor Agent 679b8fd52a test: add unit tests for /v2/user/info endpoint and route checks
- 9 tests for the endpoint: admin access, self-lookup, unauthorized access,
  default to self, nonexistent user, response shape, team admin access,
  team admin denied, URL encoding
- 2 tests for route checks: route in info_routes, route access control

Co-authored-by: yuneng-jiang <yuneng-jiang@users.noreply.github.com>
2026-03-12 07:46:31 +00:00
yuneng-jiang 76cff9ae0e Allow proxy_admin_viewer to access audit log endpoints
Add /audit and /audit/{id} to admin_viewer_routes so read-only admins
can view audit logs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-11 20:07:51 -07:00
Cesar Garcia 3d2df7e8b5 Revert "feat: add model_cost aliases expansion support" 2026-03-10 22:39:19 -03:00
yuneng-jiang 1755a281bd Fix mutation bug: copy lists in get_key_models to prevent corrupting cached UserAPIKeyAuth
`all_models = user_api_key_dict.models` was creating an alias, so
`_get_models_from_access_groups` (which uses `.pop()`/`.extend()`) would
mutate the cached object in-place. Now both `.models` and `.team_models`
assignments create copies via `list()`.

Added test to verify the input is not mutated.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:55:53 -07:00
yuneng-jiang 1cf191d9ad [Fix] Deduplicate model lists and remove dead assignment
Adds dedup to get_key_models and get_team_models to prevent duplicate
entries when access group member models overlap with proxy_model_list.
Removes dead assignment of all_models in get_team_models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:44:45 -07:00
yuneng-jiang c829733200 [Fix] Include model access groups when expanding All Proxy Models
When a team has "all-proxy-models", the model list expansion now includes
model access group names so they appear in the UI key creation form.
Also fixes get_key_models not forwarding include_model_access_groups to
_get_models_from_access_groups, and removes unused _unfurl_all_proxy_models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-09 22:30:07 -07:00
milan-berri df2e1bca46 feat: allow JWT and OAuth2 auth to coexist on the same instance (#23153)
When both enable_jwt_auth and enable_oauth2_auth are True, the proxy now
routes tokens based on their format:
- JWT tokens (3 dot-separated parts) -> JWT auth handler
- Opaque tokens -> OAuth2 auth handler

This enables using JWT for human users and OAuth2 for M2M (machine) clients
on the same LiteLLM instance. Previously, enabling OAuth2 would intercept
all tokens on LLM API routes before JWT auth could run.

When only one auth method is enabled, behavior is unchanged (backward compatible).
2026-03-09 08:41:27 -07:00
yuneng-jiang 8bf3c0c67f fix: org admin invite user — multi-org selector, organizations list in POST body, auth check
- Thread org objects {organization_id, organization_alias} instead of bare IDs from
  users/page.tsx → view_users.tsx → CreateUserButton so the selector can show aliases
- Replace single-select org dropdown with multi-select; always shown when organizationIds
  is non-null; disabled/pre-selected for single-org admins; displays "Alias (id)"
- handleCreate: maps organization_ids → organizations before POST, removes redundant
  organizationMemberAddCall (backend _add_user_to_organizations handles it)
- _user_is_org_admin: also checks organizations list field in addition to singular
  organization_id so /user/new succeeds for org admins
- Add 5 backend unit tests for _user_is_org_admin and 2 frontend tests for new form behavior

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 20:34:12 -08:00
yuneng-jiang 67884c279a fix: allow any authenticated user to call /user/available_roles
Org admins and team admins opening the invite-user modal could not see
the 4 global proxy roles because GET /user/available_roles has no
request body, so the org-admin route check (which requires
organization_id in the payload) always returned False and blocked them.

Add /user/available_roles to self_managed_routes so the route-access
check passes for any authenticated user. The endpoint's existing
Depends(user_api_key_auth) still requires a valid API key.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-07 20:11:35 -08:00
v0rtex20k c64140e4c5 [Feat[ extends OAuth2 M2M authentication support to info routes (/key/info, /team/info, /user/info, /model/info) (#22713)
* added info_route

* greptile pt1

* greptile pt2

* greptile pt3
2026-03-06 17:29:25 -08:00
Spencer Burridge c919031ff0 feat(proxy): include user_email in jwt upsert user creation (#22915)
* Include user_email in new user creation within get_user_object

Enhance the get_user_object function to include user_email in the parameters when creating a new user. This change is accompanied by a new test to verify that user_email is correctly included during the upsert process.

* Improve error handling in test_get_user_object by logging exceptions

Updated the test_get_user_object_upsert_includes_user_email function to log exceptions when they occur, enhancing the visibility of potential issues during testing. This change helps in diagnosing failures related to the mock LiteLLM_UserTable.
2026-03-05 10:55:11 -08:00
Harshit Jain 41b149ee93 Merge pull request #22678 from Harshit28j/litellm_custom_auth_opt_in
fix(proxy): make common_checks opt-in for custom auth
2026-03-04 14:53:44 +05:30
yuneng-jiang 0a1b2635d7 fix: allow team admins to access /key/{key}/reset_spend route
The route-level auth check was blocking internal_user role (team admins)
from reaching /key/{key}/reset_spend because KEY_RESET_SPEND was missing
from key_management_routes. Added it so team admins pass the route check
and the endpoint's existing _check_proxy_or_team_admin_for_key enforces
actual authorization.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-03-03 15:54:34 -08:00
Julio Quinteros Pro a07d041881 fix: apply same AsyncMock pattern to remaining OIDC discovery test
Address Greptile review: test_resolve_jwks_url_resolves_oidc_discovery_document
also used the inconsistent patch.object pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:56:51 -03:00
Julio Quinteros Pro eb658693a3 fix: use direct AsyncMock assignment instead of patch.object in JWT tests
The patch.object with new_callable=AsyncMock can behave inconsistently
across Python versions, causing mock_response.status_code to return a
MagicMock instead of the assigned value. Direct assignment is simpler
and more reliable.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 19:51:09 -03:00
Harshit28j b44755db96 fix(proxy): make common_checks opt-in for custom auth via custom_auth_run_common_checks
Replaces the skip_route_check approach from PR #22662 with a configurable
opt-in flag. By default, common_checks() is not run for custom auth flows,
preserving backwards compatibility with pre-#22164 behavior.

Users who want budget/team/route enforcement on custom auth can enable it:
  general_settings:
    custom_auth_run_common_checks: true

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-03 22:50:34 +05:30
Harshit28j 6d535e5639 fix(proxy): allow custom auth routes to bypass route authorization checks
Custom user-added routes (e.g. /ldap/ngs/ready) used with Depends(user_api_key_auth) were being rejected as admin-only after _run_post_custom_auth_checks was introduced in commit 14badde13c.

The route authorization check in common_checks is designed for LiteLLM's own management routes. Custom auth flows that add their own routes should be trusted since the custom auth function already validated the request. Budget and expiry checks still run.

Add skip_route_check parameter to common_checks() and pass skip_route_check=True from _run_post_custom_auth_checks() to skip route authorization while preserving budget/team/model checks.

Regression test added: test_common_checks_skip_route_check_for_custom_auth

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>
2026-03-03 20:20:47 +05:30
Jaeyeon Kim(김재연) 6bcba46dda fix: set mock status_code in JWT OIDC discovery tests (#22361)
The _resolve_jwks_url method checks response.status_code != 200, but
MagicMock returns a MagicMock object for status_code which is always
truthy (!= 200). Explicitly set mock_response.status_code = 200 so the
tests exercise the intended code path.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-02 21:57:54 -08:00
Shivaang 5f28422f49 fix(types): filter null fields from reasoning output items (#22370)
* fix(image_generation): propagate extra_headers to OpenAI image generation

Add headers parameter to image_generation() and aimage_generation() methods
in OpenAI provider, and pass headers from images/main.py to ensure custom
headers like cf-aig-authorization are properly forwarded to the OpenAI API.
Aligns behavior with completion() method and Azure provider implementation.

* test(image_generation): add tests for extra_headers propagation

Verify that extra_headers are correctly forwarded to OpenAI's
images.generate() in both sync and async paths, and that they
are absent when not provided.

* Add Prometheus child_exit cleanup for gunicorn workers

When a gunicorn worker exits (e.g. from max_requests recycling), its
per-process prometheus .db files remain on disk. For gauges using
livesum/liveall mode, this means the dead worker's last-known values
persist as if the process were still alive. Wire gunicorn's child_exit
hook to call mark_process_dead() so live-tracking gauges accurately
reflect only running workers.

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway (#21130)

* docs: update AssemblyAI docs with Universal-3 Pro, Speech Understanding, and LLM Gateway provider config

* feat: add AssemblyAI LLM Gateway as OpenAI-compatible provider

* fix(mcp): update test mocks to use renamed filter_server_ids_by_ip_with_info

Tests were mocking the old method name `filter_server_ids_by_ip` but production
code at server.py:774 calls `filter_server_ids_by_ip_with_info` which returns
a (server_ids, blocked_count) tuple. The unmocked method on AsyncMock returned
a coroutine, causing "cannot unpack non-iterable coroutine object" errors.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update realtime guardrail test assertions for voice violation behavior

Tests were asserting no response.create/conversation.item.create sent to
backend when guardrail blocks, but the implementation intentionally sends
these to have the LLM voice the guardrail violation message to the user.

Updated assertions to verify the correct guardrail flow:
- response.cancel is sent to stop any in-progress response
- conversation.item.create with violation message is injected
- response.create is sent to voice the violation
- original blocked content is NOT forwarded

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(bedrock): restore parallel_tool_calls mapping in map_openai_params

The revert in 8565c70e53 removed the parallel_tool_calls handling from
map_openai_params, and the subsequent fix d0445e1e33 only re-added the
transform_request consumption but forgot to re-add the map_openai_params
producer that sets _parallel_tool_use_config. This meant parallel_tool_calls
was silently ignored for all Bedrock models.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(test): update Azure pass-through test to mock litellm.completion

Commit 99c62ca40e removed "azure" from _RESPONSES_API_PROVIDERS,
routing Azure models through litellm.completion instead of
litellm.responses. The test was not updated to match, causing it
to assert against the wrong mock.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* feat: add in_flight_requests metric to /health/backlog + prometheus (#22319)

* feat: add in_flight_requests metric to /health/backlog + prometheus

* refactor: clean class with static methods, add tests, fix sentinel pattern

* docs: add in_flight_requests to prometheus metrics and latency troubleshooting

* fix(db): add missing migration for LiteLLM_ClaudeCodePluginTable

PR #22271 added the LiteLLM_ClaudeCodePluginTable model to
schema.prisma but did not include a corresponding migration file,
causing test_aaaasschema_migration_check to fail.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: update stale docstring to match guardrail voicing behavior

Addresses Greptile review feedback.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(caching): store background task references in LLMClientCache._remove_key to prevent unawaited coroutine warnings

Fixes #22128

* [Feat] Agent RBAC Permission Fix - Ensure Internal Users cannot create agents (#22329)

* fix: enforce RBAC on agent endpoints — block non-admin create/update/delete

- Add /v1/agents/{agent_id} to agent_routes so internal users can
  access GET-by-ID (previously returned 403 due to missing route pattern)
- Add _check_agent_management_permission() guard to POST, PUT, PATCH,
  DELETE agent endpoints — only PROXY_ADMIN may mutate agents
- Add user_api_key_dict param to delete_agent so the role check works
- Add comprehensive unit tests for RBAC enforcement across all roles

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: mock prisma_client in internal user get-agent-by-id test

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* feat(ui): hide agent create/delete controls for non-admin users

Match MCP servers pattern: wrap '+ Add New Agent' button in
isAdmin conditional so internal users see a read-only agents view.
Delete buttons in card and table were already gated.
Update empty-state copy for non-admin users.
Add 7 Vitest tests covering role-based visibility.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: Add PROXY_ADMIN role to system user for key rotation (#21896)

* fix: Add PROXY_ADMIN role to system user for key rotation

The key rotation worker was failing with 'You are not authorized to regenerate this key'
when rotating team keys. This was because the system user created by
get_litellm_internal_jobs_user_api_key_auth() was missing the user_role field.

Without user_role=PROXY_ADMIN, the system user couldn't bypass team permission checks
in can_team_member_execute_key_management_endpoint(), causing authorization failures
for team key rotation.

This fix adds user_role=LitellmUserRoles.PROXY_ADMIN to the system user, allowing
it to bypass team permission checks and successfully rotate keys for all teams.

* test: Add unit test for system user PROXY_ADMIN role

- Verify internal jobs system user has PROXY_ADMIN role
- Critical for key rotation to bypass team permission checks
- Regression test for PR #21896

* fix: populate user_id and user_info for admin users in /user/info (#22239)

* fix: populate user_id and user_info for admin users in /user/info endpoint

Fixes #22179

When admin users call /user/info without a user_id parameter, the endpoint
was returning null for both user_id and user_info fields. This broke
budgeting tooling that relies on /user/info to look up current budget and spend.

Changes:
- Modified _get_user_info_for_proxy_admin() to accept user_api_key_dict parameter
- Added logic to fetch admin's own user info from database
- Updated function to return admin's user_id and user_info instead of null
- Updated unit test to verify admin user_id is populated

The fix ensures admin users get their own user information just like regular users.

* test: make mock get_data signature match real method

- Updated MockPrismaClientDB.get_data() to accept all parameters that the real method accepts
- Makes mock more robust against future refactors
- Added datetime and Union imports
- Mock now returns None when user_id is not provided

* [Fix] Pass MCP auth headers from request into tool fetch for /v1/responses and chat completions (#22291)

* fixed dynamic auth for /responses with mcp

* fixed greptile concern

* fix(bedrock): filter internal json_tool_call when mixed with real tools

Fixes #18381: When using both tools and response_format with Bedrock
Converse API, LiteLLM internally adds json_tool_call to handle structured
output. Bedrock may return both this internal tool AND real user-defined
tools, breaking consumers like OpenAI Agents SDK.

Changes:
- Non-streaming: Added _filter_json_mode_tools() to handle 3 scenarios:
  only json_tool_call (convert to content), mixed (filter it out), or
  no json_tool_call (pass through)
- Streaming: Added json_mode tracking to AWSEventStreamDecoder to suppress
  json_tool_call chunks and convert to text content
- Fixed optional_params.pop() mutation issue

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* refactor: extract duplicated JSON unwrapping into helper method

Addresses review comment from greptile-apps:
https://github.com/BerriAI/litellm/pull/21107#pullrequestreview-3796085353

Changes:
- Added `_unwrap_bedrock_properties()` helper method to eliminate code duplication
- Replaced two identical JSON unwrapping blocks (lines 1592-1601 and 1612-1620)
  with calls to the new helper method
- Improves maintainability - single source of truth for Bedrock properties unwrapping logic

The helper method:
- Parses JSON string
- Checks for single "properties" key structure
- Unwraps and returns the properties value
- Returns original string if unwrapping not needed or parsing fails

No functional changes - pure refactoring.

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: use correct class name AmazonConverseConfig in helper method calls

Fixed MyPy errors where BedrockConverseConfig was used instead of
AmazonConverseConfig in the _unwrap_bedrock_properties() calls.

Errors:
- Line 1619: BedrockConverseConfig -> AmazonConverseConfig
- Line 1631: BedrockConverseConfig -> AmazonConverseConfig

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

* fix: shorten guardrail benchmark result filenames for Windows long path support

Fixes #21941

The generated result filenames from _save_confusion_results contained
parentheses, dots, and full yaml filenames, producing paths that exceed
the Windows 260-char MAX_PATH limit. Rework the safe_label logic to
produce short {topic}_{method_abbrev} filenames (e.g. insults_cf.json)
while preserving the full label inside the JSON content.

Rename existing tracked result files to match the new naming convention.

* Update litellm/proxy/guardrails/guardrail_hooks/litellm_content_filter/guardrail_benchmarks/test_eval.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* Remove Apache 2 license from SKILL.md (#22322)

* fix(mcp): default available_on_public_internet to true (#22331)

* fix(mcp): default available_on_public_internet to true

MCPs were defaulting to private (available_on_public_internet=false) which
was a breaking change. This reverts the default to public (true) across:
- Pydantic models (AddMCPServerRequest, UpdateMCPServerRequest, LiteLLM_MCPServerTable)
- Prisma schema @default
- mcp_server_manager.py YAML config + DB loading fallbacks
- UI form initialValue and setFieldValue defaults

* fix(ui): add forceRender to Collapse.Panel so toggle defaults render correctly

Ant Design's Collapse.Panel lazy-renders children by default. Without
forceRender, the Form.Item for 'Available on Public Internet' isn't
mounted when the useEffect fires form.setFieldValue, causing the Switch
to visually show OFF even though the intended default is true.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mcp): update remaining schema copies and MCPServer type default to true

Missed in previous commit per Greptile review:
- schema.prisma (root)
- litellm-proxy-extras/litellm_proxy_extras/schema.prisma
- litellm/types/mcp_server/mcp_server_manager.py MCPServer class

* ui(mcp): reframe network access as 'Internal network only' restriction

Replace scary 'Available on Public Internet' toggle with 'Internal network only'
opt-in restriction. Toggle OFF (default) = all networks allowed. Toggle ON =
restricted to internal network only. Auth is always required either way.

- MCPPermissionManagement: new label/tooltip/description, invert display via
  getValueProps/getValueFromEvent so underlying available_on_public_internet
  value is unchanged
- mcp_server_view: 'Public' → 'All networks', 'Internal' → 'Internal only' (orange)
- mcp_server_columns: same badge updates

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336)

* fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints

Three fixes for Azure AD JWT auth:

1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to
   .well-known/openid-configuration endpoints. The proxy fetches the
   discovery doc, extracts jwks_uri, and caches it.

2. Handle roles claim as array - when team_id_jwt_field points to a list
   (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead
   of crashing with 'unhashable type: list'.

3. Better error hint for dot-notation indexing - when team_id_jwt_field is
   set to "roles.0" or "roles[0]", the 401 error now explains to use
   "roles" instead and that LiteLLM auto-unwraps lists.

* Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo screenshots for PR comment

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add integration test results with screenshots for PR review

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* address greptile review feedback (greploop iteration 1)

- fix: add HTTP status code check in _resolve_jwks_url before parsing JSON
- fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it)

* Update tests/test_litellm/proxy/auth/test_handle_jwt.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove demo scripts and assets

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* perf: streaming latency improvements — 4 targeted hot-path fixes (#22346)

* perf: raise aiohttp connection pool limits (300→1000, 50/host→500)

* perf: skip model_copy() on every chunk — only copy usage-bearing chunks

* perf: replace list+join O(n²) with str+= O(n) in async_data_generator

* perf: cache model-level guardrail lookup per request, not per chunk

* test: add comprehensive Vitest coverage for CostTrackingSettings

Add 88 tests across 9 test files for the CostTrackingSettings component directory:
- provider_display_helpers.test.ts: 9 tests for helper functions
- how_it_works.test.tsx: 9 tests for discount calculator component
- add_provider_form.test.tsx: 7 tests for provider form validation
- add_margin_form.test.tsx: 9 tests for margin form with type toggle
- provider_discount_table.test.tsx: 12 tests for table editing and interactions
- provider_margin_table.test.tsx: 13 tests for margin table with sorting
- use_discount_config.test.ts: 11 tests for discount hook logic
- use_margin_config.test.ts: 12 tests for margin hook logic
- cost_tracking_settings.test.tsx: 15 tests for main component and role-based rendering

All tests passing. Coverage includes form validation, user interactions, API calls, state management, and conditional rendering.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] Key list endpoint: Add project_id and access_group_id filters

Add filtering capabilities to /key/list endpoint for project_id and access_group_id parameters. Both filters work globally across all visibility rules and stack with existing sort/pagination params. Added comprehensive unit tests for the new filters.

Co-Authored-By: Claude Haiku 4.5 <noreply@anthropic.com>

* [Feature] UI - Projects: Add Project Details page with Edit modal

- Add ProjectDetailsPage with header, details card, spend/budget progress,
  model spend bar chart, keys placeholder, and team info card
- Refactor CreateProjectModal into base form pattern (ProjectBaseForm)
  shared between Create and Edit flows
- Add EditProjectModal with pre-filled form data from backend
- Add useProjectDetails and useUpdateProject hooks
- Add duplicate key validation for model limits and metadata
- Wire project ID click in table to navigate to detail view
- Move pagination inline with search bar

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Update ui/litellm-dashboard/src/components/Projects/ProjectModals/CreateProjectModal.tsx

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* fix(types): filter null fields from reasoning output items in ResponsesAPIResponse

When providers return reasoning items without status/content/encrypted_content,
Pydantic's Optional defaults serialize them as null. This breaks downstream SDKs
(e.g., the OpenAI C# SDK crashes on status=null).

Add a field_serializer on ResponsesAPIResponse.output that removes null
status, content, and encrypted_content from reasoning items during
serialization. This mirrors the request-side filtering already done in
OpenAIResponsesAPIConfig._handle_reasoning_item().

Fixes https://github.com/BerriAI/litellm/issues/16824

---------

Co-authored-by: Zero Clover <zero@root.me>
Co-authored-by: Ryan Crabbe <rcrabbe@berkeley.edu>
Co-authored-by: ryan-crabbe <128659760+ryan-crabbe@users.noreply.github.com>
Co-authored-by: Dylan Duan <dylan.duan@assemblyai.com>
Co-authored-by: Julio Quinteros Pro <jquinter@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: milan-berri <milan@berri.ai>
Co-authored-by: Shivam Rawat <161387515+shivamrawat1@users.noreply.github.com>
Co-authored-by: Brian Caswell <bcaswell@microsoft.com>
Co-authored-by: Brian Caswell <bcaswell@gmail.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Co-authored-by: rasmi <rrelasmar@gmail.com>
Co-authored-by: yuneng-jiang <yuneng.jiang@gmail.com>
2026-03-02 19:21:25 +05:30
yuneng-jiang 8053be60df Merge pull request #22182 from BerriAI/litellm_make_session_duration_configurable
[Feat] Make UI login session duration configurable via LITELLM_UI_SESSION_DURATION
2026-02-28 20:31:31 -08:00
yuneng-jiang c2e7cf160f fix(onboarding): prevent invite link reuse for password reset
Moves is_accepted=True from GET /onboarding/get_token to POST /onboarding/claim_token,
so the flag accurately reflects that a password has been set. Both endpoints now reject
already-used links, with get_token rejecting before any user data is returned.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-02-28 17:39:03 -08:00
Ishaan Jaff 3ff70598ad fix: bump litellm-proxy-extras to 0.4.50 and fix 3 failing tests (#22417)
* fix(ci): handle inline table in pyproject.toml for litellm-proxy-extras version check

* fix: bump litellm-proxy-extras to 0.4.50 in pyproject.toml, requirements.txt, and poetry.lock

* fix(tests): set status_code=200 on JWT mocks and pass pii_tokens through data in presidio test
2026-02-28 10:20:03 -08:00
Ishaan Jaff ee703cea99 fix(jwt): OIDC discovery URLs, roles array handling, dot-notation error hints (#22336)
* fix(jwt): support OIDC discovery URLs, handle roles array, improve error hints

Three fixes for Azure AD JWT auth:

1. OIDC discovery URL support - JWT_PUBLIC_KEY_URL can now be set to
   .well-known/openid-configuration endpoints. The proxy fetches the
   discovery doc, extracts jwks_uri, and caches it.

2. Handle roles claim as array - when team_id_jwt_field points to a list
   (e.g. AAD's "roles": ["team1"]), auto-unwrap the first element instead
   of crashing with 'unhashable type: list'.

3. Better error hint for dot-notation indexing - when team_id_jwt_field is
   set to "roles.0" or "roles[0]", the 401 error now explains to use
   "roles" instead and that LiteLLM auto-unwraps lists.

* Add integration demo script for JWT auth fixes (OIDC discovery, array roles, dot-notation hints)

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo_servers.py for manual JWT auth testing with mock JWKS/OIDC endpoints

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add demo screenshots for PR comment

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* Add integration test results with screenshots for PR review

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* address greptile review feedback (greploop iteration 1)

- fix: add HTTP status code check in _resolve_jwks_url before parsing JSON
- fix: remove misleading bracket-notation hint from debug log (get_nested_value does not support it)

* Update tests/test_litellm/proxy/auth/test_handle_jwt.py

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* remove demo scripts and assets

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-02-27 20:30:47 -08:00
Harshit Jain 4d2fab49a7 Merge pull request #22164 from Harshit28j/litellm_custom_auth_budget_fix
fix: custom auth budget issue
2026-02-26 23:38:38 +05:30
shivam ffb438f3f2 fix: clarify EXPERIMENTAL_UI_LOGIN ignores LITELLM_UI_SESSION_DURATION, add regression test 2026-02-26 04:41:07 -08:00
shivam 44557261a3 greptile issue 2026-02-26 04:29:42 -08:00
shivam a7f5163976 added the env flag 2026-02-26 03:49:54 -08:00