Commit Graph

1761 Commits

Author SHA1 Message Date
Krrish Dholakia 5596728cae Merge pull request #24753 from BerriAI/litellm_dev_03_27_2026_p1
Fix returned model when batch completions is used - return picked model, not comma-separated list
2026-03-30 17:53:48 -07:00
Krrish Dholakia 4c00a14ce0 fix: fix ci/cd + handle oidc jwt tokens 2026-03-30 16:12:58 -07:00
Yuneng Jiang 6522d282b5 [Fix] Correct kwarg name in test_user_api_key_auth tests
PR #24755 renamed `azure_api_key_header` to `AZURE_AI_API_KEY_header` in
the test file but did not update the actual function signatures of
`get_api_key()` and `_user_api_key_auth_builder()`, causing TypeError
on all affected test cases.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-30 14:51:09 -07:00
Krrish Dholakia bc829d51f2 test: test 2026-03-28 19:17:38 -07:00
Yuneng Jiang 7100ed5d0a [Fix] Test isolation for agent health checks and documentation test path resolution
Fix agent health check tests failing with 500 errors in parallel CI by
mocking prisma_client to None. Fix documentation validation tests using
CWD-relative paths that break depending on the working directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:00:22 -07:00
ryan-crabbe-berri 2eb3c20e76 Merge pull request #24718 from BerriAI/litellm_ryan-march-26
litellm ryan march 26
2026-03-28 09:01:11 -07:00
Krrish Dholakia 32adda8a49 fix: return winning model name instead of comma-separated list for fastest_response
When fastest_response=true with comma-separated models, the response
model field was stamped with the entire comma-separated string. Now
uses the x-litellm-model-group header from the winning response to
return the correct model name.

Made-with: Cursor
2026-03-27 22:34:26 -07:00
ryan-crabbe-berri 726a34627c Merge pull request #24717 from BerriAI/litellm_fix-user-cache-invalidation
fix(jwt): invalidate user cache after role/team sync updates
2026-03-27 19:50:41 -07:00
Ryan Crabbe dd11e77852 fix: add explicit TTL to cache writes and test coverage for user cache invalidation
Add DEFAULT_MANAGEMENT_OBJECT_IN_MEMORY_CACHE_TTL to both async_set_cache
calls in sync_user_role_and_teams for consistency with all other user cache
writes. Add 3 tests covering cache invalidation on role change, team change,
and no-op when nothing changes.
2026-03-27 19:45:13 -07:00
ryan-crabbe-berri 5b651048f2 Merge pull request #24706 from BerriAI/litellm_fix-jwt-none-guard
fix(auth): guard JWTHandler.is_jwt() against None token
2026-03-27 18:06:24 -07:00
yuneng-jiang 846e4b44b6 Merge pull request #24682 from michelligabriele/fix/budget-spend-counters
fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters
2026-03-27 16:59:23 -07:00
Ryan Crabbe 8e3755931d test(auth): add regression tests for JWTHandler.is_jwt(None)
Add None-token test cases to both proxy_unit_tests and test_litellm
to cover the guard added in the previous commit. Also add -> bool
return type annotation to is_jwt().
2026-03-27 16:51:08 -07:00
Ryan Crabbe e24819afef fix(sso): pass decoded JWT access token to role mapping during SSO login
During SSO login, bearer tokens are stripped from the OAuth response
before role mapping runs. Custom role claims encoded inside the JWT
access token are lost, so map_jwt_role_to_litellm_role() returns None
and the user falls back to internal_user_viewer.

process_sso_jwt_access_token() now returns the decoded JWT payload, and
a new _sync_user_role_from_jwt_role_map() receives it so
jwt_litellm_role_map works correctly during SSO login.
2026-03-27 13:50:30 -07:00
michelligabriele d533b432fd fix(proxy): enforce budget limits across multi-pod deployments via Redis-backed spend counters
Budget checks on API keys, teams, and team members were not enforced in
multi-pod deployments because user_api_key_cache is intentionally
in-memory-only. Each pod tracked spend independently, so with N pods
the effective budget was N × max_budget.

Introduces a separate spend_counter_cache (DualCache wired to
redis_usage_cache) with atomic increment/read helpers:
- increment_spend_counters(): awaited in cost callback (not create_task)
  to update both in-memory and Redis before the next auth check
- get_current_spend(): reads Redis first (cross-pod authoritative),
  falls back to in-memory, then to cached object .spend from DB

Budget check functions (_virtual_key_max_budget_check,
_team_max_budget_check, _check_team_member_budget) now read spend via
get_current_spend() instead of cached object .spend fields.

When Redis is not configured, falls back to in-memory-only counters
(same as current single-instance behavior).

Fixes #23714
2026-03-27 20:39:52 +01:00
yuneng-jiang 1b111d23f3 Merge pull request #24688 from Sameerlite/litellm_litellm_team-model-group-name-routing-fix
fix(team-routing): preserve sibling deployment candidates for team public models
2026-03-27 12:00:34 -07:00
Sameer Kankute 92a07e2d6e fix(proxy): address Greptile review feedback
- Remove HTTP_PROXY/HTTPS_PROXY from blocklist (legitimately used in corporate envs)
- Add NO_PROXY/no_proxy to blocklist (prevents bypassing proxy monitoring)
- Remove dead code in _is_valid_user_id (space exception was unreachable)
- Update tests accordingly

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 20:38:36 +05:30
Sameer Kankute 8112fbf274 fix(proxy): sanitize user_id input and block dangerous env var keys
Add input validation to get_user_id_from_request (length limit, control char rejection) and a blocklist of dangerous environment variable keys in _load_environment_variables to prevent PATH/LD_PRELOAD/PYTHONPATH override via config.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-27 20:38:36 +05:30
Sameer Kankute 2321d77599 fix(router): address remaining Greptile review comments
- Cache LITELLM_ENABLE_TEAM_STALE_ALIAS_BYPASS at module level to avoid hot-path secret lookups
- Add clarifying comments for should_include_deployment team isolation logic
- Add negative assertion for update_team.assert_not_called() in test
- Add docstring clarification for _get_team_deployments helper pattern
- Add explicit assertion message in test_get_model_list_alias_optimization

Made-with: Cursor
2026-03-27 20:11:28 +05:30
Sameer Kankute 1a0b30aaac Fix greptile reviews and mock test 2026-03-27 20:11:28 +05:30
Sameer Kankute c6cc0341f6 Fix greptile reviews and mock test 2026-03-27 20:11:28 +05:30
Sameer Kankute 316a742945 Fix greptile comments 2026-03-27 20:11:28 +05:30
Sameer Kankute 173695f5e0 Fix greptile comments 2026-03-27 20:11:27 +05:30
Sameer Kankute e8fb7762b3 perf(routing): optimize team model checks and improve test coverage
- Use O(1) team index lookup instead of map_team_model in alias guard
- Fix MockPrismaClient to validate where clause filters
- Add comment explaining DB query trade-off for team deployments

Made-with: Cursor
2026-03-27 20:11:27 +05:30
Sameer Kankute 8aa58bdcaa fix(routing): prevent stale model_aliases from interfering with team routing
- Skip model_aliases rewrite if model resolves to team deployments
- Add test coverage for sibling-preservation branch
- Update MockPrismaClient to support sibling deployment scenarios

Made-with: Cursor
2026-03-27 20:11:27 +05:30
Sameer Kankute f5b7298854 fix(management): query DB directly for sibling deployments on rename
- Add clarifying comments to test assertions
- Query prisma DB instead of in-memory router to avoid stale state
- Prevents incorrect deletion of old public name when siblings exist

Made-with: Cursor
2026-03-27 20:11:27 +05:30
Sameer Kankute aeb932d707 fix(team-routing): keep team model routing on public names
Remove team model_alias rewrites and resolve team deployments by team_public_model_name with team_id so sibling deployments stay in the routing candidate pool, with explicit logs showing candidate selection before load balancing.

Made-with: Cursor
2026-03-27 20:11:27 +05:30
Sameer Kankute 5534b40ab3 fix(team-routing): use deterministic team model group names
Use a deterministic internal model_name for team-scoped deployments so sibling deployments with the same public model share a routing group. This makes team alias writes idempotent and preserves multi-deployment failover/load balancing behavior.

Made-with: Cursor
2026-03-27 20:11:27 +05:30
Ryan Crabbe 0aadf51342 fix(proxy): ignore return_to in SSO when control_plane_url is not configured
Instead of returning a 400 error when return_to is passed without
control_plane_url configured, silently ignore it and proceed with
the normal same-origin SSO flow.
2026-03-23 21:54:29 -07:00
Krrish Dholakia 26d162ccf4 fix(test): add user_api_key_project_alias to spend logs expected keys
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-23 18:12:50 -07:00
michelligabriele fa7ccf0893 fix(test): add request_data param to test mock + black formatting 2026-03-23 15:43:05 +01:00
michelligabriele 4625ccbaa2 fix(proxy): anchor metadata dict in _process_response/_process_error so pop() mutates the real dict 2026-03-23 15:39:23 +01:00
michelligabriele d8fd9a20ed fix(proxy): address Greptile review — streaming request_data, OCR backward compat, test coverage
- Pass request_data to end-of-stream process_output_streaming_response call
- Restore inputs.update() in OCR handler for third-party guardrail providers
- Add streaming end-to-end test for guardrail logging passthrough
2026-03-23 15:39:23 +01:00
michelligabriele ae454fd700 fix(proxy): OpenAI Moderation post-call guardrail response not captured for logging
Two independent bugs prevented post-call OpenAI Moderation guardrail
results from reaching downstream logging callbacks (Langfuse, Datadog).

Bug 1: process_output_response() created a throwaway request_data dict,
so guardrail info written by @log_guardrail_information was discarded.
Fixed by threading the real request_data from the unified guardrail
dispatcher through all 13 BaseTranslation handlers, with litellm_metadata
injection preserved for third-party guardrails (Zscaler, Prompt Security).
Also extended to process_output_streaming_response for consistency.

Bug 2: The @log_guardrail_information decorator collapsed the full
moderation API response (categories, scores, flagged status) to "allow".
Fixed by overriding _process_response/_process_error on
OpenAIModerationGuardrail to stash and log the full response, following
the established Model Armor pattern.
2026-03-23 15:39:22 +01:00
yuneng-jiang 9963b31e07 Revert "fix(proxy): restore per-entity breakdown in aggregated daily activity endpoint"
This reverts commit 9c3fab24ad.
2026-03-21 21:37:29 -07:00
yuneng-jiang e3d4c29d37 Merge pull request #24323 from BerriAI/litellm_ryan_march_20
litellm ryan march 20
2026-03-21 15:57:28 -07:00
yuneng-jiang 72fba093c8 Merge remote-tracking branch 'origin/main' into litellm_dev_sameer_16_march_week 2026-03-21 15:11:29 -07:00
yuneng-jiang 2b889f1627 Merge pull request #23471 from michelligabriele/fix/aggregated-activity-entity-breakdown
fix(proxy): restore per-entity breakdown in aggregated daily activity endpoint
2026-03-21 14:59:41 -07:00
Krish Dholakia f911d8d865 Merge pull request #23818 from BerriAI/litellm_oss_staging_03_17_2026
fix(fireworks): skip #transform=inline for base64 data URLs (#23729)
2026-03-21 14:54:39 -07:00
yuneng-jiang 262534a3a5 Merge branch 'main' into litellm_dev_sameer_16_march_week 2026-03-21 14:30:57 -07:00
Ishaan Jaff 2ea9e207bd Litellm ishaan march 20 (#24303)
* feat(redis): add circuit breaker to RedisCache to fast-fail when Redis is down (#24181)

* feat(redis): add circuit breaker env var constants

* feat(redis): add RedisCircuitBreaker and apply guard decorator to all async ops

* fix(dual_cache): fall back to L1 instead of re-raising on Redis increment failures

* test(caching): add circuit breaker unit tests

* fix(redis): fast-fail concurrent HALF_OPEN probes — only one probe at a time

* fix(dual_cache): return None fallback when in_memory_cache is absent and Redis fails

* test(caching): add regression tests for HALF_OPEN concurrency and None fallback

* Fix blocking sync next in __anext__ (#24177)

* Fix blocking sync next

* Update tests/test_litellm/litellm_core_utils/test_streaming_handler.py

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* fix PEP 479 regression in __anext__ sync iterator exhaustion

asyncio.to_thread re-raises thread exceptions inside a coroutine, where
PEP 479 converts StopIteration to RuntimeError before any except clause
can catch it. Add _next_sync_or_exhausted() module-level helper that
catches StopIteration in the thread and returns a sentinel instead, then
raise StopAsyncIteration in the coroutine.

Also rewrites the non-blocking test to use asyncio.gather() instead of
asyncio.create_task() (which returned None on Python 3.9 / pytest-asyncio
in CI), and adds an exhaustion regression test that drains the wrapper
fully and asserts no RuntimeError leaks out.

---------

Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* feat: add git-subdir source type to claude-code/plugins API (#24223)

Support a third plugin source type `git-subdir` alongside the existing
`github` and `url` types, as documented in the official Claude Code
plugin marketplaces spec.

New format: {"source": "git-subdir", "url": "...", "path": "subdir/path"}

- Validates url and path fields are present and non-empty
- Rejects absolute paths, '..' segments, backslashes, and percent-encoded
  traversal sequences (including double-encoded variants via regex check)
- Extracts path validation into _validate_git_subdir_path() helper
- Updates Pydantic field description to document all three source types
- Adds isValidUrl() check for url/git-subdir source types in the UI form
- Adds "Git Subdir" option to the UI form with a required Path field
- Adds unit tests covering success, update, missing/empty fields,
  path traversal variants, and unknown source type

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

* [FEAT] add extract_header and extract_footer to Mistral OCR supported params (#24213)

* docs: add git-subdir source type to claude-code plugin marketplace docs (#24289)

* fix(ui): swap J/K keyboard navigation in log details drawer (#24279) (#24286)

J should navigate down (next) and K should navigate up (previous),
matching vim/standard conventions.

* fix: use async_set_cache in user_api_key_auth hot path (#24302)

* fix: use async_set_cache in auth hot path to avoid blocking event loop

* test: assert no blocking set_cache call in _user_api_key_auth_builder

* test: broaden blocking call check to all sync DualCache methods

* test: fix regression test to actually catch blocking cache calls

* fix: ruff lint unused variable + UI build MessageManager error

- litellm/caching/redis_cache.py: remove unused variable 'e' in circuit
  breaker exception handler (F841)
- add_plugin_form.tsx: use MessageManager.error() instead of undefined
  message.error() for git URL validation

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* docs: add REDIS_CIRCUIT_BREAKER env vars to config_settings reference

Add REDIS_CIRCUIT_BREAKER_FAILURE_THRESHOLD and
REDIS_CIRCUIT_BREAKER_RECOVERY_TIMEOUT to the environment variables
reference table so test_env_keys.py passes.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

---------

Co-authored-by: Emerson Gomes <emerson.gomes@thalesgroup.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Vincenzo Barrea <manamana88@users.noreply.github.com>
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-authored-by: Robert Kirscht <rkirscht242@gmail.com>
Co-authored-by: Imgyu Kim <kimimgo@gmail.com>
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
2026-03-21 12:40:11 -07:00
Sameer Kankute 4f1e484a9b Merge branch 'main' into litellm_dev_sameer_16_march_week
Resolve conflicts in common_request_processing.py (keep main streaming,
post_call_success_hook try/finally, deferred logging; retain skip_pre_call_logic)
and utils.py (defer + internal-call skip + sync success callbacks for all calls).

Tighten _has_post_call_guardrails for event_hook=None; align deferred
guardrail test. Sync model_prices_and_context_window_backup.json.

Pyright: narrow ignores for passthrough StreamingResponse and post_call hook.
Made-with: Cursor
2026-03-22 00:29:38 +05:30
Krrish Dholakia 509d2e9ac3 Fix PR review issues: gpt-4-0314 prompt caching, case-insensitive data URL check, test I/O mocking
- Remove incorrect supports_prompt_caching from gpt-4-0314 (predates the feature)
- Make data-URL detection case-insensitive in Gemini tool call result conversion
- Mock show_banner/generate_feedback_box in max_budget tests to prevent real I/O

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-21 11:30:29 -07:00
Krish Dholakia a5b7e49713 Merge branch 'main' into litellm_oss_staging_03_17_2026 2026-03-21 10:40:48 -07:00
Sameer Kankute 49abf98a27 Merge branch 'main' into litellm_oss_staging_03_17_2026 2026-03-21 21:16:49 +05:30
Sameer Kankute a427807796 Merge branch 'main' into litellm_dev_sameer_16_march_week 2026-03-21 21:16:07 +05:30
Sameer Kankute 5b5c998dbd Merge branch 'main' into litellm_oss_staging_03_19_2026 2026-03-21 20:31:08 +05:30
ryan-crabbe 1da02b66f6 Merge branch 'main' into litellm_audit_log_s3_export 2026-03-20 16:39:54 -07:00
ryan-crabbe 59b4a05782 Merge branch 'main' into litellm_ryan_march_18 2026-03-20 13:36:37 -07:00
yuneng-jiang 5927a77a14 Merge branch 'main' into fix/aggregated-activity-entity-breakdown 2026-03-20 11:50:59 -07:00
yuneng-jiang f884e4ac66 Merge branch 'main' into fix/team-member-budget-duration-on-create 2026-03-20 11:48:08 -07:00