Files
litellm/.github/workflows/mutation-test.yml
T
ryan-crabbe-berri be84d5cd7d ci: add manually-triggered mutation testing workflow (#27576)
* ci: add manually-triggered mutation testing smoke workflow

Adds a workflow_dispatch-only GitHub Actions workflow that runs mutmut
against a single source/test pair (router_settings_endpoints) to validate
the tooling end-to-end before scaling.

The workflow reinstalls litellm non-editable so the mutants/ sandbox is
not shadowed by the editable .pth on sys.path, and sets PYTHONPATH so
the trampolined sandbox copy wins over site-packages.

mutmut itself is pulled in via uv run --with so it does not appear in
uv.lock or affect the shared dev environment.

Includes a temporary push: trigger scoped to this branch so we can
iterate before the workflow file lands on the default branch — to be
removed before merging (workflow_dispatch only requires the file on the
default branch to surface the manual trigger button).

* ci(mutation): disable rerun and xdist plugins for mutmut runs

mutmut's in-process pytest.main() call hits
`INTERNALERROR: no option named 'filtered_exceptions'` from
pytest-retry's pytest_configure hook. Reruns are also wrong for
mutation testing — a "failed" mutant test that gets retried would
mask which mutants are killed vs. survive. Disable retry,
rerunfailures, and xdist via pytest_add_cli_args in [tool.mutmut].

* ci(mutation): uninstall pytest-retry before mutmut runs

`-p no:retry` (and similar names) didn't match pytest-retry's
entry-point name, so the plugin still loaded and crashed during
mutmut's "Running clean tests" phase. Uninstalling the package is
surgical and doesn't depend on guessing the entry-point name.

* ci(mutation): emit per-survivor diffs to run-page summary + artifact

The previous artifact only contained `mutmut results` text (which in
mutmut 3.x lists survivor names but not the actual mutations). Adds:

- `mutmut export-cicd-stats` to produce mutmut-cicd-stats.json with the
  killed/survived/total scoreboard.
- `mutmut show <name>` per surviving mutant to capture each mutation as
  a unified diff.
- A `mutmut-report.md` that combines summary + run-progress tail +
  per-survivor diffs, written to both the artifact and
  $GITHUB_STEP_SUMMARY (visible on the run page, no download needed).
- Corrected artifact paths: stats files live under mutants/, not the
  project root.
- The trampolined source file from the sandbox so survivors can be
  inspected even outside `mutmut show`.

* ci(mutation): document intended manual weekly cadence in trigger comment

* ci(mutation): generate ACH-style report with embedded function bodies

Replaces the inline bash markdown generation with a Python script that:
- Groups survivors by function (one section per function, function body
  shown once per section, surviving mutants nested as subsections)
- Embeds each enclosing function's source via Python AST (so the agent
  has full context, not just a 3-line `mutmut show` diff)
- Inlines the existing test file(s) listed in [tool.mutmut].tests_dir
- Writes an ACH-style task description at the bottom following the
  prompt template from arXiv 2501.12862

Output goes to mutation-report.md (artifact) and the head of the file
is appended to $GITHUB_STEP_SUMMARY for at-a-glance visibility.

* fix(mutation report): correctly parse function names with leading underscores

mutmut's mutant-name prefix is x_ (single underscore), so a function
named _foo produces mutants x__foo__mutmut_N. The previous regex
\.x__(.+)__mutmut_ ate the function's leading underscore as part of
the prefix. Changed to \.x_(.+)__mutmut_ so leading underscores are
preserved in the captured function name; verified for normal, leading-
underscore, and dunder-method names.

* feat(mutation report): full Meta ACH-style rendering with MUTANT delimiters

For each surviving mutant, parse the mutmut sandbox trampoline file and
render the mutated function as it appears in the source — with the
differing lines wrapped in `# MUTANT START` / `# MUTANT END` comments,
matching the format from Meta's ACH paper (arXiv 2501.12862, Table 1).
Renames the function header back to its original name so the agent sees
the function as it would appear in the file. Falls back to the unified
diff if the trampoline lookup fails.

Handles replace, insert, and delete diff ops; uses difflib's
SequenceMatcher to find the differing line ranges.

The unified diff is preserved in a collapsible <details> block as
secondary context.

* ci(mutation): scope to whole management_endpoints folder, drop temp push trigger

Final scope before merge:
- paths_to_mutate / tests_dir broadened from one file to the entire
  management_endpoints source/test folders
- Trigger is now `workflow_dispatch` only — the temporary push: block
  used during workflow iteration is removed
- timeout-minutes bumped from 60 to 350 (just under the GH-hosted job
  cap of 360); whole-folder mutation against ~15 files / ~7.5k LOC can
  take a few hours
- Artifact path for the trampoline files glob-expanded to cover all
  files under mutants/litellm/proxy/management_endpoints/

* fix(mutation report): warn when multiple functions in a file share a name

Addresses the Greptile review concern: ast.walk's first-match-wins
behavior could embed the wrong function body when a file defines the
same name in multiple places (e.g., a module-level helper and a class
method). mutmut's mutant identifier does not carry class context, so
we can't always determine which definition was mutated.

find_function_in_file now returns the start line of every matching
definition; render() surfaces a "Note: N functions named X" warning
in the report when there is more than one match. The first match is
still embedded as the body — the warning tells the reader to verify
manually instead of silently using the wrong context.

Smoke-tested against the existing artifact: single-match files render
unchanged.

* Fix mutation report anchors

* Fix mutation report TOC anchors

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
2026-05-11 15:19:57 -07:00

132 lines
5.2 KiB
YAML

name: "Mutation Test (manual)"
# Manually-triggered mutation testing. Runs mutmut against the scope
# configured in [tool.mutmut] in pyproject.toml (currently the
# litellm/proxy/management_endpoints/ folder). Intended cadence is roughly
# weekly — clicked from the Actions tab when someone wants a fresh report.
#
# Uploads a structured `mutation-report.md` (Meta ACH-style: original +
# mutated function with `# MUTANT START`/`# MUTANT END` delimiters + the
# existing tests + a task instruction) as a workflow artifact. Failures
# do not block anything because nothing depends on this workflow.
on:
workflow_dispatch:
permissions:
contents: read
concurrency:
group: mutation-test-${{ github.ref }}
cancel-in-progress: true
jobs:
mutation:
name: Run mutmut
runs-on: ubuntu-latest
# Whole-folder mutation against ~15 files / ~7.5k LOC can take hours.
# 350 minutes is just under the GitHub-hosted job cap of 360 minutes.
timeout-minutes: 350
steps:
- uses: actions/checkout@08eba0b27e820071cde6df949e0beb9ba4906955 # v4.3.0
with:
persist-credentials: false
- name: Set up Python
uses: actions/setup-python@a26af69be951a213d495a4c3e4e4022e16d87065 # v5.6.0
with:
python-version: "3.12"
- name: Set up uv
uses: astral-sh/setup-uv@37802adc94f370d6bfd71619e3f0bf239e1f3b78 # v7
with:
version: "0.10.9"
- name: Cache uv dependencies
uses: actions/cache@0057852bfaa89a56745cba8c7296529d2fc39830 # v4.3.0
with:
path: |
~/.cache/uv
.venv
key: ${{ runner.os }}-uv-${{ hashFiles('uv.lock') }}
restore-keys: |
${{ runner.os }}-uv-
- name: Install dependencies
run: |
uv sync --frozen --group ci --group proxy-dev --extra google --extra proxy --extra semantic-router
- name: Generate Prisma client
env:
PRISMA_BINARY_CACHE_DIR: ${{ runner.temp }}/prisma-cache
run: |
uv run --no-sync prisma generate --schema litellm/proxy/schema.prisma
# mutmut 3.x runs tests inside a `mutants/` sandbox where it injects
# mutation trampolines. uv installs the project as editable by default,
# which puts the original source dir on sys.path via a .pth file and
# shadows the sandbox copy — so tests would never exercise the mutated
# code. Reinstalling non-editable removes the .pth shadow.
- name: Reinstall litellm non-editable (so mutants/ is not shadowed)
run: |
uv pip uninstall litellm
uv pip install . --no-deps
# pytest-retry's pytest_configure hook crashes with
# `INTERNALERROR: no option named 'filtered_exceptions'` when invoked
# via mutmut's in-process pytest.main() call. The entry-point name
# doesn't normalize cleanly with `-p no:<name>`, so just remove the
# package outright. Reruns are wrong for mutation testing anyway —
# rerunning a "failed" mutant test would mask which mutants are killed.
- name: Remove pytest plugins that conflict with mutmut
run: |
uv pip uninstall pytest-retry || true
- name: Run mutmut
env:
# Make the mutants/ sandbox win over site-packages on sys.path so the
# trampolined files are imported instead of the installed copy.
PYTHONPATH: ${{ github.workspace }}/mutants
run: |
set -o pipefail
mkdir -p mutants
uv run --no-sync --with mutmut==3.5.0 mutmut run 2>&1 | tee mutmut-run.log
# Generate the structured report. The script embeds the enclosing
# function source for each survivor (via Python AST) and includes the
# existing test files, so an LLM agent has enough context to write
# killing tests without further file lookups. Modeled on Meta's ACH
# prompt template (arXiv 2501.12862).
- name: Generate detailed mutation report
if: always()
run: |
set +e
uv run --no-sync --with mutmut==3.5.0 mutmut export-cicd-stats > /dev/null 2>&1
uv run --no-sync --with mutmut==3.5.0 mutmut results > mutmut-results.txt 2>&1
uv run --no-sync python scripts/mutation_report.py
# The full report can be very long for big test files; the run-page
# summary cuts off at 1 MB. Append the head of the report (summary
# + survivor list) and link out to the artifact for the full body.
{
head -c 900000 mutation-report.md
echo ""
echo ""
echo "_Full report (with embedded function bodies and test files) is in the workflow artifact._"
} >> "$GITHUB_STEP_SUMMARY"
- name: Upload mutmut artifacts
if: always()
uses: actions/upload-artifact@4cec3d8aa04e39d1a68397de0c4cd6fb9dce8ec1 # v4.6.1
with:
name: mutmut-${{ github.run_id }}-${{ github.run_attempt }}
path: |
mutation-report.md
mutmut-results.txt
mutmut-run.log
mutants/mutmut-stats.json
mutants/mutmut-cicd-stats.json
mutants/litellm/proxy/management_endpoints/**/*.py
if-no-files-found: warn
retention-days: 14