mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-17 14:48:44 +00:00
5e16f20962
* test(proxy): phase-4 payload behavior pinning for tier-2/3 key + team management endpoints Extends the Phase 1–3 behavior-pin suite at tests/proxy_behavior/management/ with a second axis: payload-shape pinning. Phase 1–3 held payload minimal and pinned (actor, target) → status across 37 routes; Phase 4 holds the caller fixed at an authorized actor, varies the payload shape, and asserts the observable DB effect (on accept) or the named guard / row-unchanged (on reject). Faithfulness contract from Phase 1–3 is unchanged. Six families + one gap-closer (59 new scenarios, 620 → 679 total): * F1 — key budget / rate-limit (test_key_budget_limits.py, 18) * F2 — key↔team reassignment (test_key_team_change.py, 6) * F3 — team budget / rate-limit (test_team_budget_limits.py, 15) * F4 — member-info validation (test_team_member_info_validation.py, 5) * F5 — permission batching (test_team_permissions_bulk_update.py, 6) * F6 — org-scoped team access (+2 detail-string pins in existing files) * F7 — coverage gap-closer (test_f7_coverage_closeout.py, 7) Harness extensions in conftest.py (additive only): * create_scratch_org() seeder with its own scratch-prefixed budget row * budget / limit fields on create_scratch_team() * scratch teardown also sweeps litellm_organizationtable Coverage telemetry (behavior-suite-only): * key_management_endpoints.py 60 % → 65 % (+82 lines) * team_endpoints.py 62 % → 72 % (+137 lines, crosses 70 % stretch) Key lands under 70 % per plan §7 escape hatch — the gap is dominated by routes outside F1–F6 scope (key list/info v2 internals) and structurally dead org-budget guards (call sites at lines 889 + 2310 + 985 + 1751 load the org without include_budget_table=True, so org.litellm_budget_table is None at guard time and the aggregate guard no-ops). Pinned as observed no-op behavior so a future fix that flips the flag turns these into reds. Zero source-code changes; pyproject.toml diff is empty; test_route_coverage.py stays green untouched; G3 grep guards still green; local wall-time 14 s for the full suite (no coverage), 22 s with coverage. G4 regression-replay protocol executed against three representative fix-PR parents (410ce761dc,0bd49ecb8b,8bbc61e03c): all Phase 4 tests PASS at pre-fix SHAs — confirming the F1–F7 layer is a helper-body pin, not a regression-replay layer for those specific historical bypass shapes. Targeted RED-bait scenarios for each fix are left for a follow-up PR. * test(proxy): push key_management_endpoints.py past the 70% stretch (F7-extension) Adds 24 more payload-pin scenarios in test_f7_key_coverage_push.py following the same accepted-effect / rejected-guard pattern. Each scenario cites the file:line range it pins; same anti-snapshot rules apply. Target ranges (all reachable via HTTP-boundary payload variation): * 5942-6063 /key/health with metadata.logging → test_key_logging body * 4565-4692 /key/reset_spend happy + 404 + non-admin gate + value validation * 4421-4533 /key/regenerate ghost-404 + happy + new_key + grace_period * 4168-4202 _insert_deprecated_key body via grace_period * 6118-6133 _enforce_unique_key_alias duplicate-alias rejection * 6148-6169 validate_model_max_budget malformed-payload rejection * 4708-4789 validate_key_list_check user/team/org/key_hash branches * 2622-2733 /key/bulk_update mixed success/failure + admin gate + size limits * 2797-2950 /team/key/bulk_update all-keys path + explicit-keys dedupe + 404 * 5108-5207 /key/aliases admin + scoped + search-filter branches * 3253-3303 /key/info ghost + explicit-key + no-key-uses-auth-header * 3427-3436 generate_key_helper_fn budget_limits initialization * 1794-1815 prepare_key_update_data duration + budget_duration paths * 5280-5388 _build_filter_conditions across include_created_by_keys/team/sort/alias Coverage telemetry — full PR4 dataset: key_management_endpoints.py: 60 % → 71 % (+11 pts, +194 lines) team_endpoints.py: 62 % → 72 % (+10 pts, +137 lines) Both files now over the plan §7 PR4.M4 70 % stretch as a side effect of pinning real payload behavior. 721 tests pass in 19 s local (full suite, no coverage); 27 s with coverage. Zero source-code changes; pyproject.toml diff still empty; test_route_coverage.py + G3 grep guards still green. Honest finding (kept from the prior commit's body): four structurally-dead org-budget guards remain pinned as observed no-op behavior — they fire only when get_org_object is called with include_budget_table=True, which none of the four management-endpoint call sites currently do. Pinned so a future change that flips the flag turns these into reds. Two helper guards are honest-ceiling: _validate_reset_spend_value's isinstance check at line 4568 is unreachable from HTTP because Pydantic 422s non-float before the helper runs; same shape for /team/key/bulk_update's missing team_id / no-selector pre-handler guards. * test(proxy): address PR review — try/finally cleanup + loosen 500 envelope pins + Optional annotations Greptile review feedback on PR #28681: 1. Wrap manual budget-row cleanup in try/finally so an assertion failure doesn't leave non-scratch-prefixed budget rows orphaned across CI re-runs (test_team_new_with_team_member_budget_creates_budget_row and test_team_update_team_member_budget_upserts). 2. Loosen the two 500-status pins to in (400, 422, 500) — the named-guard substring is the real pin; the outer ValueError-wrap envelope is an implementation detail that a future improvement should be free to fix to a proper 400/422 without flipping these tests red. 3. Add missing Optional annotations on _seed_token's max_budget / metadata / team_id keyword args (they default to None). Greptile's typo flag on 'read-world' in the conftest comment is declined — 'read-world' is the project's established term for the immutable seeded world fixture (see other usages in conftest.py and actors.py). 721 tests still pass in 17 s.