Files
litellm/tests/proxy_behavior
yuneng-jiang 5e16f20962 test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints (#28681)
* test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints

Extends the Phase 1–3 behavior-pin suite at tests/proxy_behavior/management/
with a second axis: payload-shape pinning. Phase 1–3 held payload minimal
and pinned (actor, target) → status across 37 routes; Phase 4 holds the
caller fixed at an authorized actor, varies the payload shape, and asserts
the observable DB effect (on accept) or the named guard / row-unchanged
(on reject). Faithfulness contract from Phase 1–3 is unchanged.

Six families + one gap-closer (59 new scenarios, 620 → 679 total):

  * F1 — key budget / rate-limit (test_key_budget_limits.py, 18)
  * F2 — key↔team reassignment   (test_key_team_change.py, 6)
  * F3 — team budget / rate-limit (test_team_budget_limits.py, 15)
  * F4 — member-info validation   (test_team_member_info_validation.py, 5)
  * F5 — permission batching      (test_team_permissions_bulk_update.py, 6)
  * F6 — org-scoped team access   (+2 detail-string pins in existing files)
  * F7 — coverage gap-closer      (test_f7_coverage_closeout.py, 7)

Harness extensions in conftest.py (additive only):
  * create_scratch_org() seeder with its own scratch-prefixed budget row
  * budget / limit fields on create_scratch_team()
  * scratch teardown also sweeps litellm_organizationtable

Coverage telemetry (behavior-suite-only):
  * key_management_endpoints.py  60 % → 65 % (+82 lines)
  * team_endpoints.py            62 % → 72 % (+137 lines, crosses 70 % stretch)

Key lands under 70 % per plan §7 escape hatch — the gap is dominated by
routes outside F1–F6 scope (key list/info v2 internals) and structurally
dead org-budget guards (call sites at lines 889 + 2310 + 985 + 1751 load
the org without include_budget_table=True, so org.litellm_budget_table is
None at guard time and the aggregate guard no-ops). Pinned as observed
no-op behavior so a future fix that flips the flag turns these into reds.

Zero source-code changes; pyproject.toml diff is empty;
test_route_coverage.py stays green untouched; G3 grep guards still green;
local wall-time 14 s for the full suite (no coverage), 22 s with coverage.

G4 regression-replay protocol executed against three representative
fix-PR parents (410ce761dc, 0bd49ecb8b, 8bbc61e03c): all Phase 4 tests
PASS at pre-fix SHAs — confirming the F1–F7 layer is a helper-body pin,
not a regression-replay layer for those specific historical bypass
shapes. Targeted RED-bait scenarios for each fix are left for a
follow-up PR.

* test(proxy): push key_management_endpoints.py past the 70% stretch (F7-extension)

Adds 24 more payload-pin scenarios in test_f7_key_coverage_push.py
following the same accepted-effect / rejected-guard pattern. Each
scenario cites the file:line range it pins; same anti-snapshot rules
apply.

Target ranges (all reachable via HTTP-boundary payload variation):
  * 5942-6063  /key/health with metadata.logging → test_key_logging body
  * 4565-4692  /key/reset_spend happy + 404 + non-admin gate + value validation
  * 4421-4533  /key/regenerate ghost-404 + happy + new_key + grace_period
  * 4168-4202  _insert_deprecated_key body via grace_period
  * 6118-6133  _enforce_unique_key_alias duplicate-alias rejection
  * 6148-6169  validate_model_max_budget malformed-payload rejection
  * 4708-4789  validate_key_list_check user/team/org/key_hash branches
  * 2622-2733  /key/bulk_update mixed success/failure + admin gate + size limits
  * 2797-2950  /team/key/bulk_update all-keys path + explicit-keys dedupe + 404
  * 5108-5207  /key/aliases admin + scoped + search-filter branches
  * 3253-3303  /key/info ghost + explicit-key + no-key-uses-auth-header
  * 3427-3436  generate_key_helper_fn budget_limits initialization
  * 1794-1815  prepare_key_update_data duration + budget_duration paths
  * 5280-5388  _build_filter_conditions across include_created_by_keys/team/sort/alias

Coverage telemetry — full PR4 dataset:

  key_management_endpoints.py: 60 % → 71 %  (+11 pts, +194 lines)
  team_endpoints.py:           62 % → 72 %  (+10 pts, +137 lines)

Both files now over the plan §7 PR4.M4 70 % stretch as a side effect of
pinning real payload behavior. 721 tests pass in 19 s local (full suite,
no coverage); 27 s with coverage. Zero source-code changes; pyproject.toml
diff still empty; test_route_coverage.py + G3 grep guards still green.

Honest finding (kept from the prior commit's body): four structurally-dead
org-budget guards remain pinned as observed no-op behavior — they fire
only when get_org_object is called with include_budget_table=True, which
none of the four management-endpoint call sites currently do. Pinned so
a future change that flips the flag turns these into reds.

Two helper guards are honest-ceiling: _validate_reset_spend_value's
isinstance check at line 4568 is unreachable from HTTP because Pydantic
422s non-float before the helper runs; same shape for /team/key/bulk_update's
missing team_id / no-selector pre-handler guards.

* test(proxy): address PR review — try/finally cleanup + loosen 500 envelope pins + Optional annotations

Greptile review feedback on PR #28681:

1. Wrap manual budget-row cleanup in try/finally so an assertion failure
   doesn't leave non-scratch-prefixed budget rows orphaned across CI re-runs
   (test_team_new_with_team_member_budget_creates_budget_row and
   test_team_update_team_member_budget_upserts).
2. Loosen the two 500-status pins to in (400, 422, 500) — the named-guard
   substring is the real pin; the outer ValueError-wrap envelope is an
   implementation detail that a future improvement should be free to fix
   to a proper 400/422 without flipping these tests red.
3. Add missing Optional annotations on _seed_token's max_budget / metadata
   / team_id keyword args (they default to None).

Greptile's typo flag on 'read-world' in the conftest comment is declined —
'read-world' is the project's established term for the immutable seeded
world fixture (see other usages in conftest.py and actors.py).

721 tests still pass in 17 s.
2026-05-23 12:16:29 -07:00
..