* test(proxy): add harness for proxy_server.py behavior-pinning
Creates tests/test_litellm/proxy/proxy_server/ with:
- conftest.py: 11 shared fixtures (app, client, mock_prisma, auth_as,
mock_router with parametrized response builders, normalize, etc.)
- _coverage_check.py: per-PR coverage gate (line + branch) against a
baseline, self-selects target by inspecting which placeholder files
have been filled
- _pin_check.py: AST-based gate that verifies every pin-list item has
>=1 happy + >=1 error test with a real assertion (no status-only)
- test_harness_smoke.py: 19 smoke tests covering every fixture +
both scripts end-to-end
- 26 placeholder test files (one docstring each) reserved for
follow-up PRs per the directory ownership in the Notion plan
- .coverage_baseline pinned at 0% so future PRs measure deltas
against new-tests-only and aren't entangled with the broader
scattered test suite
Adds a dedicated proxy-server job to test-unit-proxy-endpoints.yml
so this directory's runtime + coverage are tracked independently.
Plan: https://www.notion.so/36c43b8acdab81ee845fd5365128a2fc
* ci(proxy-endpoints): allow workflow_dispatch
Lets the workflow be triggered manually on a branch via
`gh workflow run`, which is needed for the verify-first
flow on workflow changes before opening a PR.
* test(proxy): address review feedback on proxy_server harness
- conftest.py: anchor sys.path insert to __file__ (Path(__file__).resolve().parents[4])
instead of CWD-relative os.path.abspath("../../../../") which resolved
to the wrong directory when pytest is launched from the repo root.
- _coverage_check.py: actually read .coverage_baseline and use it as
the floor (line_min = max(target, baseline)). Closes the gap between
the PR description's "delta semantics" and what the script was doing.
With baseline=0.0 today this is a no-op; future PRs that update the
baseline cause regressions (test deletions etc.) to trip the gate
even if the static PR target is still met.
- _pin_check.py: drop unreachable startswith("_") guard
(test_*.py glob never yields underscore-prefixed names) and read
each test file once instead of twice.