litellm

mirror of https://github.com/tiennm99/litellm.git synced 2026-08-02 08:21:53 +00:00

Author	SHA1	Message	Date
milan-berri GitHub Cursor	d84499e0f2	fix(team): reserve team budget raises for proxy admins on /team/update (#30030 ) The caller's PERSONAL max_budget was the wrong yardstick for /team/update: a team's spend ceiling has nothing to do with the admin's own key budget. That comparison was an unintended side effect of reusing _check_user_team_limits() (which exists for the /team/new path) and broke the UI, which re-sends the unchanged budget on every save. New behavior on /team/update for standalone teams: - A team admin (already authorized via _verify_team_access) may freely KEEP or LOWER the team budget, and change models/tpm/rpm, without being gated by their personal limits. - GROWING a team's spend ceiling is a budget-authority action reserved for proxy admins -> 403 for team admins. "Growing" covers both raising max_budget above the team's current finite value and removing the cap entirely (max_budget=null, detected via model_fields_set so an explicit null is distinguished from an omitted field). For a team that currently has no cap, setting a finite value is a restriction and is allowed. - Org-scoped teams remain governed by _check_org_team_limits() (capped by the org budget). Also reverts the #29525 existing_team_max_budget workaround in _check_user_team_limits() back to the create-only form; /team/new still enforces the creator's personal caps. docs(access_control): resolve the contradiction in the team-admin section — team admins can keep/lower the budget and manage rate limits/models, but cannot raise the team budget (proxy-admin only). tests: unit + behavior coverage for raise-blocked, cap-removal-blocked (team admin), raise/removal allowed (proxy admin), uncapped-team restriction allowed, keep/lower/resend allowed, and unchanged create-path guards. Co-authored-by: Cursor <cursoragent@cursor.com>	2026-06-09 09:19:15 -07:00
yuneng-jiangandGitHub	5e16f20962	test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints (#28681 ) * test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints Extends the Phase 1–3 behavior-pin suite at tests/proxy_behavior/management/ with a second axis: payload-shape pinning. Phase 1–3 held payload minimal and pinned (actor, target) → status across 37 routes; Phase 4 holds the caller fixed at an authorized actor, varies the payload shape, and asserts the observable DB effect (on accept) or the named guard / row-unchanged (on reject). Faithfulness contract from Phase 1–3 is unchanged. Six families + one gap-closer (59 new scenarios, 620 → 679 total): * F1 — key budget / rate-limit (test_key_budget_limits.py, 18) * F2 — key↔team reassignment (test_key_team_change.py, 6) * F3 — team budget / rate-limit (test_team_budget_limits.py, 15) * F4 — member-info validation (test_team_member_info_validation.py, 5) * F5 — permission batching (test_team_permissions_bulk_update.py, 6) * F6 — org-scoped team access (+2 detail-string pins in existing files) * F7 — coverage gap-closer (test_f7_coverage_closeout.py, 7) Harness extensions in conftest.py (additive only): * create_scratch_org() seeder with its own scratch-prefixed budget row * budget / limit fields on create_scratch_team() * scratch teardown also sweeps litellm_organizationtable Coverage telemetry (behavior-suite-only): * key_management_endpoints.py 60 % → 65 % (+82 lines) * team_endpoints.py 62 % → 72 % (+137 lines, crosses 70 % stretch) Key lands under 70 % per plan §7 escape hatch — the gap is dominated by routes outside F1–F6 scope (key list/info v2 internals) and structurally dead org-budget guards (call sites at lines 889 + 2310 + 985 + 1751 load the org without include_budget_table=True, so org.litellm_budget_table is None at guard time and the aggregate guard no-ops). Pinned as observed no-op behavior so a future fix that flips the flag turns these into reds. Zero source-code changes; pyproject.toml diff is empty; test_route_coverage.py stays green untouched; G3 grep guards still green; local wall-time 14 s for the full suite (no coverage), 22 s with coverage. G4 regression-replay protocol executed against three representative fix-PR parents (`410ce761dc`, `0bd49ecb8b`, `8bbc61e03c`): all Phase 4 tests PASS at pre-fix SHAs — confirming the F1–F7 layer is a helper-body pin, not a regression-replay layer for those specific historical bypass shapes. Targeted RED-bait scenarios for each fix are left for a follow-up PR. * test(proxy): push key_management_endpoints.py past the 70% stretch (F7-extension) Adds 24 more payload-pin scenarios in test_f7_key_coverage_push.py following the same accepted-effect / rejected-guard pattern. Each scenario cites the file:line range it pins; same anti-snapshot rules apply. Target ranges (all reachable via HTTP-boundary payload variation): * 5942-6063 /key/health with metadata.logging → test_key_logging body * 4565-4692 /key/reset_spend happy + 404 + non-admin gate + value validation * 4421-4533 /key/regenerate ghost-404 + happy + new_key + grace_period * 4168-4202 _insert_deprecated_key body via grace_period * 6118-6133 _enforce_unique_key_alias duplicate-alias rejection * 6148-6169 validate_model_max_budget malformed-payload rejection * 4708-4789 validate_key_list_check user/team/org/key_hash branches * 2622-2733 /key/bulk_update mixed success/failure + admin gate + size limits * 2797-2950 /team/key/bulk_update all-keys path + explicit-keys dedupe + 404 * 5108-5207 /key/aliases admin + scoped + search-filter branches * 3253-3303 /key/info ghost + explicit-key + no-key-uses-auth-header * 3427-3436 generate_key_helper_fn budget_limits initialization * 1794-1815 prepare_key_update_data duration + budget_duration paths * 5280-5388 _build_filter_conditions across include_created_by_keys/team/sort/alias Coverage telemetry — full PR4 dataset: key_management_endpoints.py: 60 % → 71 % (+11 pts, +194 lines) team_endpoints.py: 62 % → 72 % (+10 pts, +137 lines) Both files now over the plan §7 PR4.M4 70 % stretch as a side effect of pinning real payload behavior. 721 tests pass in 19 s local (full suite, no coverage); 27 s with coverage. Zero source-code changes; pyproject.toml diff still empty; test_route_coverage.py + G3 grep guards still green. Honest finding (kept from the prior commit's body): four structurally-dead org-budget guards remain pinned as observed no-op behavior — they fire only when get_org_object is called with include_budget_table=True, which none of the four management-endpoint call sites currently do. Pinned so a future change that flips the flag turns these into reds. Two helper guards are honest-ceiling: _validate_reset_spend_value's isinstance check at line 4568 is unreachable from HTTP because Pydantic 422s non-float before the helper runs; same shape for /team/key/bulk_update's missing team_id / no-selector pre-handler guards. * test(proxy): address PR review — try/finally cleanup + loosen 500 envelope pins + Optional annotations Greptile review feedback on PR #28681: 1. Wrap manual budget-row cleanup in try/finally so an assertion failure doesn't leave non-scratch-prefixed budget rows orphaned across CI re-runs (test_team_new_with_team_member_budget_creates_budget_row and test_team_update_team_member_budget_upserts). 2. Loosen the two 500-status pins to in (400, 422, 500) — the named-guard substring is the real pin; the outer ValueError-wrap envelope is an implementation detail that a future improvement should be free to fix to a proper 400/422 without flipping these tests red. 3. Add missing Optional annotations on _seed_token's max_budget / metadata / team_id keyword args (they default to None). Greptile's typo flag on 'read-world' in the conftest comment is declined — 'read-world' is the project's established term for the immutable seeded world fixture (see other usages in conftest.py and actors.py). 721 tests still pass in 17 s.	2026-05-23 12:16:29 -07:00
yuneng-jiangandGitHub	f62ae93e13	test(proxy): behavior-pinning matrix for tier-2/3 key + team management endpoints (#28620 ) * test(proxy): add create_scratch_actor harness helper Adds create_scratch_actor() to the management behavior-suite conftest and extends create_scratch_team() with team_member_permissions / models kwargs, needed by the PR3 team-key-permission and team-model matrices. The new helper mints a scratch-prefixed user + verification token (+ org memberships), all reclaimed by the existing scratch-prefix teardown. * test(proxy): pin /key block, unblock, health, aliases behavior Adds behavior-pinning matrices for POST /key/block, POST /key/unblock, POST /key/health, and GET /key/aliases. Pins that the management-route gate 401s ORG_ADMIN-role callers before _check_key_admin_access runs, the block/unblock round-trip on the blocked column, missing-key 404, and the _apply_non_admin_alias_scope visibility rules for /key/aliases. * test(proxy): pin /key/bulk_update + /team/key/bulk_update behavior Adds behavior-pinning matrices for POST /key/bulk_update (PROXY_ADMIN-only; ORG_ADMIN stopped 401 at the route gate, INTERNAL_USER-role 403 at the handler) and POST /team/key/bulk_update (team-member-permission gate keyed on KEY_UPDATE). Pins batch semantics: empty/over-cap 400, per-key failure isolation into failed_updates, all_keys_in_team broadcast, and no-keys 404. Adds an optional key_alias arg to create_scratch_key for multi-key scenarios. * test(proxy): pin /key SA-generate, v2-info, reset-spend behavior Adds behavior-pinning matrices for POST /key/service-account/generate (team-membership + team-member-permission gating; SA keys carry no user_id), POST /v2/key/info (per-key _can_user_query_key_info silently drops invisible keys), and POST /key/{key}/reset_spend (PROXY_ADMIN or team admin only; missing key 404, reset-value 400). Pins that ORG_ADMIN-role callers are stopped 401 at the management-route gate on the two non-info routes. * test(proxy): close PR1/PR2 key-side deferred coverage gaps Closes the four key-side gaps deferred from PR1/PR2: - 404 on missing key for /key/update and /key/delete (not 401/403) - denied /key/update leaves max_budget/tpm_limit/rpm_limit untouched - /key/regenerate enforces litellm.upperbound_key_generate_params (#26340) - /key/list key_alias substring vs exact (admin-only) + team_id filter, and a non-admin filtering a foreign team is 403 * test(proxy): pin /team block, unblock, available, filter/ui, members/me Adds behavior-pinning matrices for POST /team/block + /team/unblock (management-route gate fronts _verify_team_access; reachable only by PROXY_ADMIN and an org admin of the team's own org), GET /team/available (default empty path), GET /team/filter/ui (route-gated PROXY-ADMIN-only despite the handler having no gate), and GET /team/{team_id}/members/me (caller resolves its own membership; non-member 404, no-user_id key 400). * test(proxy): pin /team model add/delete + permissions endpoints Adds behavior-pinning matrices for POST /team/model/add + /team/model/delete (route-gated PROXY-ADMIN-only; missing team 404), GET /team/permissions_list + POST /team/permissions_update (self-managed; proxy/team/org admin pass), and POST /team/permissions_bulk_update (PROXY_ADMIN-only). Pins the deliberate divergence that the available-team self-join grants read access via permissions_list but never write access via permissions_update. * test(proxy): pin /team delete, bulk_member_add, v2/list, daily/activity Adds behavior-pinning matrices for POST /team/delete (per-team _verify_team_access; batch aborts whole on a missing id), POST /team/bulk_member_add (route-gated PROXY-ADMIN-only; empty/over-cap 400), GET /v2/team/list (_enforce_list_team_v2_access — bare query 401s regular users, org-scoped for org admins) and GET /team/daily/activity (non-member team_ids filter 404, the VERIA-43 fix). * test(proxy): add route-coverage gate + close team org-relocation gap Adds test_route_coverage.py (PR3.M1): parses every @router route literal from the two management-endpoint source files and asserts each is exercised by >=1 behavior-suite scenario — a permanent regression guard for future routes. Closes the last PR1/PR2 deferred gap: the /team/update org-relocation allowed branch, exercised by a dual-org-admin minted via create_scratch_actor. test_team_model uses literal route URLs so the coverage parser resolves them. * test(proxy): bound plain route params to one path segment in coverage gate Plain path params ({team_id}) now compile to [^/?]+ instead of [^?]+, so a parameter cannot span '/'. Starlette ':path' params still match across '/'. Keeps the route-coverage guard from falsely reporting a future multi-segment route as covered. All 37 routes remain covered.	2026-05-22 11:24:41 -07:00
yuneng-jiangandGitHub	67e6e5e1df	test(proxy): behavior-pinning matrix for team management endpoints (#28441 ) * test(proxy): behavior-pinning matrix for team management endpoints PR2 (Team Tier-1) of the management-endpoint behavior-pinning effort. Extends the tests/proxy_behavior/management/ harness PR1 built and adds the actor x target-resource authz matrix for the 7 team endpoints: /team/new, /team/info, /team/list, /team/update, /team/member_add, /team/member_delete, /team/member_update. Tests-only, no production code changes. Harness extensions: - actors.py: ORG_B_ADMIN actor (org admin of ORG_B) and TEAM_GAMMA (an ORG_A team with no actor members), so team-targeting endpoints get a clean own / same-org-other / cross-org target axis. - conftest.py: create_scratch_team() raw-seeds target teams without /team/new side effects; the scratch teardown now also strips dangling scratch-team refs from LiteLLM_UserTable.teams. 156 new scenarios; status codes pinned to observed handler behavior. * test(proxy): record mutmut run blockers in PR2 triage doc Attempted a scoped local mutmut run for G5; it did not complete. Record the three concrete blockers in mutmut_triage/pr2-team-tier1.md so the next attempt has a head start: 1. mutmut's mutants/ sandbox is import-shadowed by the worktree source. 2. the legacy mock suite and the real-DB behavior suite cannot share a pytest session (mock suite globally patches prisma_client). 3. the CI mutation-test.yml workflow starts no Postgres, so its stats phase now aborts on the behavior-suite tests PR1 added to tests_dir. mutmut stays a deferred follow-up (as in PR1); the binding pre-merge signal remains the behavior matrix (G1) and the G4 regression-replay. * test(proxy): drop suite README + triage doc, trim test comments Remove the two prose docs from the behavior suite (README.md and mutmut_triage/pr2-team-tier1.md) and tighten the comment blocks on the team test files + harness down to the load-bearing parts (the gate each matrix pins, plus genuinely surprising results). No behavior change — all 286 scenarios still pass. * test(proxy): remove mutmut tests_dir comment	2026-05-21 16:57:25 -07:00
yuneng-jiangandGitHub	79a5a7abad	feat(tests): behavior-pinning harness + Key Tier-1 matrix (#28321 ) * test(proxy_behavior): scaffold session-scoped async ASGI client + liveness smoke Slice 2 of the management-endpoints behavior-pinning effort. New top-level dir tests/proxy_behavior/management/ outside every existing pytest glob. conftest.py initialises the proxy app once per session against the DATABASE_URL the harness boots Postgres at, wraps it in httpx.AsyncClient via in-process ASGITransport. The one smoke test asserts /health/liveliness returns 200, which exercises the full FastAPI middleware stack against a real app — no mocks. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): connect prisma via real lifespan; key/generate de-risk Slice 3 of the management-endpoints behavior-pinning effort. The fixture now enters the real FastAPI lifespan (proxy_startup_event) instead of just calling initialize() — that is where prisma_client is connected, password migration is kicked off, and the rest of the startup wiring runs. Tests pin the loop to the session scope so the AsyncClient created in the session fixture and the prisma connection opened in the lifespan share the same loop as the test bodies. New de-risk smoke: POST /key/generate with the master key returns 200, the returned sk- token resolves to a hashed row in LiteLLM_VerificationToken, and the cleartext token is never stored. Proves auth + handler + helper + prisma all wire together end-to-end against a real Postgres. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): seed 8-actor read-world for the authz matrix Slice 4 of the management-endpoints behavior-pinning effort. New ``actors.py`` defines the actor enum + seeds an immutable world (2 orgs, 2 teams, 8 users, 8 verification tokens) under the ``behavior-pin-`` prefix so the rows are identifiable in psql and ``_wipe_world`` is targeted. Each actor key is created with its cleartext form generated locally and its hashed form (via ``litellm.proxy.utils.hash_token``) stored in ``LiteLLM_VerificationToken`` — so the real ``user_api_key_auth`` accepts the cleartext bearer token. Roles, ``team_id``, ``organization_id``, and the service-account metadata flag are all set on the seeded rows so the auth layer resolves the same scopes a real proxy would. The session-scoped ``world`` fixture re-seeds at session start (idempotent via wipe-then-create), and the smoke test confirms each of the 8 actor keys can call ``/key/info`` on itself and receive its own row back. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): per-test scratch namespace + targeted delete_many teardown Slice 5 of the management-endpoints behavior-pinning effort. Adds the ``scratch`` function-scoped fixture: each test gets a uuid4-derived namespace prefix, tags writes with it (``key_alias``, ``team_alias``, ``user_id``, ``budget_id``), and the fixture teardown ``delete_many``-s any row whose namespace column starts with that prefix. Cleanup uses Prisma model methods only (no raw SQL, per CLAUDE.md) and orders deletes children-before-parents to avoid FK conflicts. The Slice 3 de-risk smoke is migrated onto the same fixture so it stops accumulating untagged tokens across repeated local runs. Smoke proves both halves of the contract: one test writes a scratch-tagged key and asserts it lands; a second test runs after the first's teardown and asserts no rows in the scratch namespace survived. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): codify G3 (strict-import grep) as a pytest item Slice 6 of the management-endpoints behavior-pinning effort. Two new tests walk every .py file under tests/proxy_behavior/ and assert: * no ``from litellm.proxy.management_endpoints`` import — the suite is deliberately constrained to the HTTP boundary so it survives handler refactors; * no ``mock``/``patch`` on ``user_api_key_auth`` — mocking auth is the structural failure mode of the existing 11k-line mock suite, and the point of this harness is that the real auth layer runs. Codifying G3 as a CI test removes the "did someone forget to check the PR-description checklist" failure mode. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * style(proxy_behavior): apply black to G3 grep test Follow-up to 6f588c753b — line-length fixes only, no behavior change. * test(proxy_behavior): pin /key/generate authz matrix (18 scenarios) Slice 7 of the management-endpoints behavior-pinning effort. Parametrized matrix across two axes: actor (8 seeded) × target scope (self, team_alpha in org_a, team_beta in org_b). 18 scenarios after dropping non-applicable combos. Whole-suite wall-time stays at ~4.7s (well under the 10-min G2 budget for the eventual CI job). While pinning, the test surfaced one seed gap: ``_get_user_in_team`` reads ``members_with_roles`` (a JSON list of ``{user_id, role}``), not the plain ``members`` String[]. Both columns are now populated in the seed to match what the real ``/team/new`` handler would produce. Expected status codes are intentionally heterogeneous (200, 400, 401) because the current handler emits different statuses depending on which check fails first (role gate, team-member-perm gate, "not assigned" check). Pinning the observed codes — not what they "should" be — is exactly the regression signal we want. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): pin /key/info authz matrix (24 scenarios) Slice 8 of the management-endpoints behavior-pinning effort. 8 actors × 3 target keys (own, OWNER's key in org_a, CROSS_ORG_USER's key in org_b) covering self-read, same-team-peer read, and cross-org read. Notable pinned behaviors (intentionally surfaced for review, not "fixed"): * ORG_ADMIN gets 403 on individual key info even within their own org — visibility is scoped to "your own keys" + "your team's keys", not "your org's keys". * Same-team peers (INTERNAL_USER, UNRELATED_SAME_ORG, SERVICE_ACCOUNT) DO see each other's keys. Whether that is desired is for the team to decide; this PR only pins the existing behavior so unintentional changes flip the matrix red. Wall-time is unchanged (~4.3s for the slice on its own). Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): pin /key/list default-visibility matrix (8 scenarios) Slice 9 of the management-endpoints behavior-pinning effort. For /key/list the response IS the matrix: each of the 8 seeded actors calls the endpoint with default filters and the test asserts set-equality between the returned visible-token set (filtered to seeded tokens only, so unrelated rows can't flap the assertion) and a pinned expected actor-set. Pinned default visibility: * PROXY_ADMIN sees all 8 actors' keys. * Every other actor sees only their own key — including ORG_ADMIN (which had broader expectations going in but currently behaves same-as-internal-user for /key/list defaults) and TEAM_ADMIN (no team-aggregation without include_team_keys=true). Future changes that broaden or narrow any single actor's default visibility will turn this matrix red — exactly the regression signal we want. Parameter-driven views (include_team_keys, filters) are deferred to Slice 13 / PR2 follow-up. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): pin /key/update authz matrix + mutation re-read (21 scenarios) Slice 10 of the management-endpoints behavior-pinning effort. 8 actors × 3 target shapes (self-owned, OWNER-scoped in org_a/team_alpha, CROSS_ORG_USER-scoped in org_b/team_beta) = 21 applicable scenarios. Each test: 1. Master-key-seeds a fresh scratch key with the target's (user_id, team_id) scope (so the read-world stays untouched). 2. Has the actor under test POST /key/update flipping ``models`` to a known marker list. 3. Asserts the status code AND the DB row's ``models`` field — present when 200, unchanged otherwise — so a handler that silently mutates on a denied response surfaces red. Observed gating (pinned, not endorsed): * PROXY_ADMIN bypasses every check. * ORG_ADMIN is blocked by an early role gate, always 401. * Every other (INTERNAL_USER-rolesed) actor hits one of three failure modes — 403 "user can only create keys for themselves", 403 "only proxy admins, team admins, or org admins", or 401 "team_member_permission_error" — depending on whether they own the target and whether they're a team admin / member of its team. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): pin /key/regenerate authz matrix + rotation contract (22 scenarios) Slice 11 of the management-endpoints behavior-pinning effort. 21 matrix scenarios (8 actors × 3 target shapes, minus the cross_org/owner combo that exists in the seed but isn't applicable) plus one smoke for the ``/key/{key:path}/regenerate`` route registration. On 200 outcomes the test verifies the full rotation contract: * the regenerate response key differs from the old cleartext, * the OLD cleartext returns 401 on a follow-up ``/key/info``, * the NEW cleartext returns 200 on a follow-up ``/key/info``. On denied outcomes the test verifies the OLD cleartext still works — catching any handler that mutates the token row on a failed call. Pinned authz divergence vs /key/update: regenerate routes most denials through the team-member-perm 401 path rather than the role-gate 403 path. The matrices for both endpoints are now in tree side-by-side, so any future refactor that "harmonises" the codes will turn one of the two red. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * test(proxy_behavior): pin /key/delete authz matrix + post-delete contract (21 scenarios) Slice 12 of the management-endpoints behavior-pinning effort. Mirrors slices 10/11. On success: cleartext can no longer authenticate (handles both hard-delete and soft-delete to LiteLLM_DeletedVerificationToken). On denial: row survives and cleartext still authenticates. Notable behavior gap with /key/update: same-team peers (internal_user, unrelated_same_org, etc.) get 403 on /key/delete for OWNER's key — i.e. cannot delete each other's keys — whereas they CAN read each other's keys (Slice 8). Delete is stricter than read. Pinned as-is. Cumulative whole-suite wall-time is 5.9s for all 128 tests on the local runner — well under the 10-min G2 budget for the CI job in Slice 13. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * ci(proxy-mgmt-behavior): add PR-triggered workflow for the behavior suite Slice 13 of the management-endpoints behavior-pinning effort. New workflow ``test-unit-proxy-mgmt-behavior.yml`` fires ``on: pull_request`` for the same branch set every other proxy unit-test workflow watches (main, litellm_internal_staging, litellm_oss_branch, litellm_*). It delegates to the existing reusable ``_test-unit-services-base.yml`` with ``enable-postgres: true``, which already provisions a postgres:14 service container and runs ``prisma db push`` against it before pytest collects. ``reruns: 0`` because a behavior-pinning matrix that needs reruns is itself a regression — flakes are signal. ``timeout-minutes: 15`` gives generous headroom over the local 5.9s whole-suite wall-time; the binding G2 budget is 10 min. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d docs(proxy_behavior): G4 regression-replay table for Key Tier-1 Slice 14 of the management-endpoints behavior-pinning effort. Documents the regression-replay verification methodology + a 12-row table mapping recent fix-PRs touching key_management_endpoints.py to the catching scenarios in the PR1 matrix. One canonical RED→GREEN cycle is captured verbatim — `c7c3df2b02` "extend /key/update admin check to non-budget fields". Under the parent-of-fix code, 6 scenarios in test_key_update.py flip from 200 to 403; under HEAD code, all 21 pass. The handler swap is the only change between the two runs, confirming the matrix catches the behavior shift the fix introduced. The table also calls out 4 genuine coverage gaps deferred to PR2/PR3: 404-on-missing-key, budget-limit counter assertions, /key/regenerate upperbound enforcement, and /key/list filter-param views. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * chore(mutmut): include the behavior suite in tests_dir + G5 triage stub Slice 15 of the management-endpoints behavior-pinning effort. Appends ``tests/proxy_behavior/management/`` to ``[tool.mutmut].tests_dir`` so the existing mutation-test workflow runs against both the legacy mock suite AND the new behavior suite — the latter is where the regression signal will actually surface. Adds a stub at ``tests/proxy_behavior/management/mutmut_triage/pr1.md`` documenting the G5 triage protocol (zero unreviewed survivors in the 6 Tier-1 handler functions) and a placeholder baseline-metrics table to fill in after the first manually-triggered mutmut run completes — runs take hours and run on a manual cadence, so PR1 ships with the wiring + protocol, not the numbers. The actual baseline is recorded in a follow-up once ``gh workflow run mutation-test.yml`` finishes. The kill rate stays telemetry-only, never a gate. G5 (per-survivor classification) is the binding mutation gate. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * docs(proxy_behavior): suite README with local-repro + conventions + gates Slice 16 of the management-endpoints behavior-pinning effort. The README documents: * The same three commands the CI workflow runs locally (BYO-DATABASE_URL, no new tooling). * Suite layout — what each test file covers, which slice it lands. * The asyncio loop_scope convention required for session fixtures (httpx AsyncClient + prisma connection) to share a loop with each test body. * G3 strict-import convention + the test that enforces it. * Read-world vs scratch-world fixture conventions. * Behavior-pinning philosophy: pin observed codes; flag, don't judge. * Where each G1–G5 + PR1.M1–M3 gate's evidence lives. Plan: https://www.notion.so/36643b8acdab8128a581ced0f6a4744d * ci(proxy-mgmt-behavior): drop xdist (workers=0) to fix seed race First run on PR #28321 failed with UniqueViolation on ``behavior-pin-budget`` plus cascading missing-membership FK errors. Both xdist workers entered ``seed_world()`` concurrently against the shared Postgres service container; whichever lost the race left the world in a half-seeded state and downstream tests ran against missing team_membership rows. Whole-suite wall-time is ~7s sequentially, so disabling xdist here costs nothing — and the seed itself is the wrong place to add per-worker isolation (the world is intentionally shared so set-equality assertions in /key/list have a deterministic expected set). * ci(proxy-mgmt-behavior): seed scratch keys via proxy_admin actor, not master Second CI run failed: ``/key/generate`` with explicit ``user_id`` returned 403 "User can only create keys for themselves. Got user_id=X, Your ID=None" in every test that called ``_create_scratch_key`` with a per-actor user_id. The bare master key's auth path was producing ``user_id=None`` in the fresh CI Postgres, which doesn't trigger the PROXY_ADMIN bypass in ``_user_can_only_create_keys_for_themselves`` reliably. Locally the same master key path worked, masking the issue. Fix: every ``_create_scratch_key`` helper now takes a seeder cleartext and the test bodies pass ``world.keys[Actor.PROXY_ADMIN].cleartext``. That actor was seeded with ``user_role=PROXY_ADMIN`` AND a concrete ``user_id``, so the bypass fires deterministically in both environments. No behavior shift in the matrices themselves — all 128 scenarios still pass locally; only the setup helper's auth identity changed. The bare-master smoke (test_smoke + test_scratch_teardown) is intentionally left on the master key path: those tests don't pass ``user_id`` in the body so they don't hit the user_id-mismatch gate. * ci(proxy-mgmt-behavior): diag — run world-seed test first + bump max-failures Third CI run failed identically: seeded PROXY_ADMIN actor's auth resolves to ``user_id=None`` even though the DB row has the right ``user_id``. The suite was aborting at maxfail=10 inside test_key_delete, so test_world_seed (which would tell us whether the seed itself is reachable) never ran in CI. Two diagnostic moves on this push, no behavior change: * Rename ``test_world_seed.py`` → ``test_aaa_world_seed.py`` so it's the first collected file. If it passes in CI we know the seed is fine and the bug lives downstream; if it fails the same way the bug is in the auth resolution path. * Bump ``max-failures`` to 200 for this workflow so we see the full failure surface instead of stopping at the first cascading setup error. Will tighten back down once the suite is green. Adds one new test ``test_proxy_admin_actor_can_create_keys_for_others`` that explicitly exercises the PROXY_ADMIN bypass via /key/generate with an explicit user_id — the same shape the matrix setup helper uses but without the matrix machinery muddying the diagnostic. * ci(proxy-mgmt-behavior): await LiteLLM_VerificationTokenView creation in fixture Fourth CI run still failed because the proxy's lifespan kicks off ``prisma_client.check_view_exists()`` as a fire-and-forget background task — that task is what creates ``LiteLLM_VerificationTokenView``, the SQL view ``user_api_key_auth`` queries to resolve a token to its user_id / user_role / team. On a fresh Postgres (CI), the first test races the background task. The view doesn't exist when the first auth call runs, the resolver falls through to a degraded path that returns ``user_id=None``, and every matrix test that depends on the seeded actor's identity then fails confusingly with "Got user_id=X, Your ID=None" 403s. Locally the view persists across pytest runs so the race is invisible. Fix: await ``prisma_client.check_view_exists()`` explicitly inside the session ``proxy_app`` fixture, after the lifespan enters but before the fixture yields. Deterministic regardless of whether the underlying DB is fresh (CI) or warm (local). * ci(proxy-mgmt-behavior): widen diagnostic to dump token / user / view shape The fifth CI run isolated the failure to ``/key/generate`` with explicit user_id while ``/key/info`` works for the same seeded PROXY_ADMIN actor. The auth context's user_id is None even though the DB row has it set. This commit widens the diagnostic test: on failure, dump the raw token row's user_id, the user row's user_role, and what ``LiteLLM_VerificationTokenView`` actually returns for the seeded token. If the view returns user_id=None we know the view shape is the problem; if the view returns the right user_id we know it's a downstream code path stripping it. * ci(proxy-mgmt-behavior): unambiguous diagnostic view query Previous diagnostic's raw SQL had an ambiguous user_id column from joining the view with the user table, so the diagnostic itself crashed before printing useful state. Simplified to query just the view's columns. * ci(proxy-mgmt-behavior): add auth-resolver chain diagnostic Six runs and the underlying data (token row, user row, view row) all verified correct in CI, but auth still returns user_id=None. This diagnostic calls the resolver primitives directly: 1. ``prisma.get_data(table_name="combined_view")`` → raw view object 2. ``get_key_object(...)`` → cached/DB UserAPIKeyAuth 3. ``get_user_object(...)`` → LiteLLM_UserTable row 4. ``_is_user_proxy_admin`` / ``_get_user_role`` and prints each intermediate via captured stdout (-s). Whichever step returns None/False in CI is where the chain breaks. Imports come from ``litellm.proxy.auth`` (not management_endpoints), so G3 still passes. * ci(proxy-mgmt-behavior): set LITELLM_MASTER_KEY env so lifespan doesn't wipe it Real root cause of every CI run that returned ``Your ID=None`` for the seeded actors: * In ``initialize()``, ``master_key`` is set from the config YAML's ``general_settings.master_key`` (load_config code path at proxy_server.py:4174). * Then the FastAPI lifespan (``proxy_startup_event``) runs and at line 776 does ``master_key = get_secret_str("LITELLM_MASTER_KEY")``, which UNCONDITIONALLY overwrites the global. * In CI the env var is unset, so the post-lifespan ``master_key`` is None. Downstream every auth path degrades: master-key requests don't bypass because ``secrets.compare_digest(api_key, None)`` raises and is caught to ``is_master_key_valid=False``; seeded-actor requests cache a ``UserAPIKeyAuth`` whose ``user_role`` never resolves through the PROXY_ADMIN bypass; ``_is_allowed_to_make_key_request`` then hits the ``user_id`` mismatch path with ``Your ID=None``. Locally my shell happened to have ``LITELLM_MASTER_KEY`` set from a prior session, which is why every local run was green and CI red — exactly the "don't generalize from your environment to CI" memory. Fix: ``os.environ.setdefault("LITELLM_MASTER_KEY", MASTER_KEY)`` and ``os.environ.setdefault("CONFIG_FILE_PATH", config_path)`` before entering the lifespan, so its re-read produces the same value as ``initialize()``. Whole-suite still green locally (130 tests, ~6.4s). * ci(proxy-mgmt-behavior): force premium_user=True so /key/regenerate isn't gated Ninth CI run cleared every ``Your ID=None`` failure (the master_key env fix worked end-to-end) and exposed the next thin layer of failures: ``/key/regenerate`` returns 500 "Regenerating Virtual Keys is an Enterprise feature" in CI because the proxy can't see a ``LITELLM_LICENSE``. Locally my license is set, so the matrix passes. The behavior matrix is supposed to pin authz, not licensing — so flip ``proxy_server.premium_user = True`` directly, both before and after the lifespan (the lifespan re-runs ``_license_check.is_premium()`` and would otherwise reset it). With premium gating disabled, the regenerate matrix exercises the same authz path /key/update does. Whole-suite still green locally (130 tests, ~6.3s). * test(proxy_behavior): trim debug diagnostics, restore default max-failures Followup to the CI-bring-up sequence: now that the suite is green in CI (130 → 129 tests after this trim; 156s wall-time on ubuntu-latest), drop the diagnostic noise left over from debugging the master_key wipe: * Rename ``test_aaa_world_seed.py`` back to ``test_world_seed.py`` — no longer needs to run first. * Remove ``test_auth_resolver_returns_correct_user_id_and_role`` — that test reached into private auth helpers to localize the bug between the DB and ``UserAPIKeyAuth``; it has served its purpose and isn't HTTP-boundary. * Keep ``test_proxy_admin_actor_can_create_keys_for_others`` (without the failure-time dump) — it's a real authz contract that pins the PROXY_ADMIN bypass on /key/generate, and would catch a regression of the same conftest interaction this sequence revealed. * Drop the workflow's ``max-failures: 200`` override — that was a debug aid for seeing the full failure surface in CI. Default of 10 is right for a stable suite. * chore(proxy_behavior): drop empty mutmut triage stub, fold protocol into README The mutmut_triage/pr1.md file was a placeholder for numbers and classifications that don't exist yet — the first mutmut run is a manual follow-up. Empty stubs aren't evidence; deleting it. The G5 protocol (run the workflow, triage survivors in the six Tier-1 handler functions, kill-or-accept-with-reason, zero unreviewed) moves into the suite README's "Gate evidence" block. The real triage file will land alongside the first mutmut follow-up. pyproject.toml's [tool.mutmut].tests_dir entry stays — that's the one-line wiring that makes the existing (manual-trigger) mutation-test workflow include our suite next time someone runs it. Comment updated to drop the dead file reference. * chore(proxy_behavior): drop README + trim comments Removes the suite README — its contents (local repro, layout, conventions) were either restated by the file structure or already covered by the workflow YAML and pyproject.toml. Trims docstrings and inline comments across every test file to keep only non-obvious WHY (the masking ``_get_user_in_team`` reads, the LiteLLM_VerificationTokenView models-can't- be-NULL gotcha, the org_admin/peer-visibility surprise, the rotation contract). Suite still 129 green locally. * test(proxy_behavior): address Greptile review — env force, pagination, dedup - conftest: force LITELLM_MASTER_KEY / CONFIG_FILE_PATH unconditionally instead of setdefault. An ambient LITELLM_MASTER_KEY with a different value would make the proxy authenticate on that key while the tests still send MASTER_KEY → silent 401s. - test_key_list: paginate /key/list instead of a single size=100 request. size is capped at 100 by the endpoint, so on a non-fresh DB a single page could truncate PROXY_ADMIN's view and a seeded key could fall off the page. Walk total_pages. - conftest: hoist the duplicated _create_scratch_key helper (copy-pasted and already diverged across test_key_{update,regenerate,delete}.py) into a single shared create_scratch_key. - Delete regression_replay/README.md — G4 regression-replay evidence belongs in the PR description, not a committed doc file (repo docs policy + the effort's own plan both say so). Content moved to the PR.	2026-05-20 19:27:44 -07:00