Commit Graph

552 Commits

Author SHA1 Message Date
Cursor Agent 65a60dbe35 [Infra] CI: add UI drift guard + regenerate _experimental/out
Adds a CI job that rebuilds the admin UI from source and fails if the
committed static export at litellm/proxy/_experimental/out/ has drifted
from what npm run build produces. This prevents silently shipping stale
UI bytes and is a prerequisite for the non_root Dockerfile streamlining
work, which will stage the UI from _experimental/out/ directly instead
of rebuilding it inside the image.

Also regenerates litellm/proxy/_experimental/out/ to match a fresh
npm run build (Node 20.20.2) — the committed tree had drifted from
source prior to this commit.

Co-authored-by: yuneng-jiang <yuneng-berri@users.noreply.github.com>
2026-04-19 04:03:10 +00:00
Ishaan Jaffer e8461b5b97 style: run black formatter on files from main merge 2026-04-17 13:02:59 -07:00
Yuneng Jiang aff4717494 [Infra] Expand CI branch filters for non-main PR targets
Required test-unit-* and related workflows only triggered on PRs targeting
main, so feature PRs routed through litellm_internal_staging or
litellm_oss_branch never dispatched the full suite. Branch protection
reported BLOCKED even when CircleCI was green.

Expand pull_request and push branch filters to also match
litellm_internal_staging, litellm_oss_branch, and "litellm_**" (using **
so branch names containing "/" also match).
2026-04-15 15:39:57 -07:00
yuneng-jiang 72a461ba4a Merge pull request #25733 from BerriAI/litellm_guardMainBranch
[Infra] Guard main to only accept PRs from staging and hotfix branches
2026-04-14 20:54:29 -07:00
joereyna ccbdaa9187 fix(ci): increase test-server-root-path timeout to 30m 2026-04-14 19:42:10 -07:00
Yuneng Jiang 38f8d7a008 Point contributors toward litellm_oss_branch in guard error messages 2026-04-14 18:41:59 -07:00
Yuneng Jiang ab71d3d700 Also reject PRs from forks, not just non-allowlisted branches 2026-04-14 18:39:54 -07:00
Yuneng Jiang 45d1e1b341 [Infra] Guard main branch with PR source-branch check
Adds a GHA that fails PRs to main unless the head branch is
'litellm_internal_staging' or 'litellm_hotfix_*'. Also fails merge_group
events since merge queue is not in use.
2026-04-14 18:19:14 -07:00
Yuneng Jiang 92bbf0a0d3 Merge remote-tracking branch 'origin' into litellm_oss_staging_04_09_2026 2026-04-11 11:52:42 -07:00
Krrish Dholakia 01b9b50b43 Add Screenshots / Proof of Fix section to PR template (#25564)
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Krrish Dholakia <krrish-berri-2@users.noreply.github.com>
2026-04-11 10:20:34 -07:00
Yuneng Jiang 83c459225c [Fix] CI: fix GHA timeouts and uv lock --check failures
1. exclude-newer: change from absolute "2026-04-10" to relative "3 days".
   All pinned deps were published before the 3-day cutoff. Re-locked so
   uv lock --check passes in test-mcp.yml and test-linting.yml.

2. test_eager_tiktoken_load: run all 10 env var values in a single
   subprocess instead of spawning 10 separate processes. Each cold
   import litellm takes ~78s on CI, so the old loop took ~13 min on a
   single xdist worker. Now takes ~78s total.

3. proxy-db remaining timeout: increase from 20 to 30 minutes. The
   remaining group has 51 test files and was consistently timing out at
   71% across all branches (pre-existing issue, not migration-related).
2026-04-11 09:04:49 -07:00
stuxf a6c30b30bf build: migrate packaging, CI, and Docker from Poetry to uv (#25007)
* build: migrate packaging metadata to uv

* ci: move automation and local tooling to uv

* docker: migrate image builds and runtime setup to uv

* docs: update install and deployment guidance for uv

* chore: align auxiliary scripts and tests with uv

* test: harden test_litellm isolation

* fix: keep release and health check images self-contained

* build: pin uv tooling and health check deps

* test: isolate bedrock image request formatting from suite state

* test: cover sandbox executor requirements flow

* ci: fix circleci no-op command steps

* ci: fix circleci publish workflow parsing

* fix: stabilize remaining uv migration CI checks

* ci: increase matrix test timeout headroom

* fix: restore published docker and license coverage

* fix: restore proxy runtime build parity

* fix: restore proxy extras parity and venv migrations

* ci: persist uv path across circleci steps

* fix: keep psycopg binary in default test env

* docker: preserve prisma cache across stages

* test: run local proxy checks through uv python

* build: restore runtime deps moved into ci

* build: refresh uv lock after upstream merge

* fix: restore module import in test_check_migration after merge

The conflict resolution imported only the function but the test body
references check_migration as a module throughout.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: revert dependency promotions, remove nodejs-wheel-binaries, fix Docker layer caching

- Move google-generativeai, Pillow, tenacity back to ci group (they are
  lazily imported and bloat the base SDK install needlessly)
- Remove nodejs-wheel-binaries from extra_proxy and proxy-dev (redundant
  in Docker where system Node.js is already installed via apk)
- Remove all nodejs-wheel node replacement and venv npm patching blocks
  from Dockerfiles since the wheel is no longer installed
- Add --no-default-groups to CodSpeed benchmark workflow so the benchmark
  environment matches the old minimal pip install footprint
- Apply standard uv two-phase Docker pattern: copy metadata first, install
  deps (cached layer), then copy source and install project
- Replace CircleCI enterprise no-op with proper uv sync command

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: regenerate uv.lock after removing nodejs-wheel-binaries

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): use cache/restore instead of cache to prevent cache poisoning

The old workflow used actions/cache/restore (read-only). The uv migration
changed it to actions/cache (read-write), which zizmor flags as a cache
poisoning risk. Restore the safer read-only variant.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv built-in cache to silence cache-poisoning alert

The setup-uv action enables caching by default, which zizmor flags as a
cache poisoning risk. Disable it since we already use a read-only
cache/restore step.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): disable setup-uv cache in publish workflow

Silences zizmor cache-poisoning alert. Publishing workflow runs
infrequently on protected branches so caching adds no real benefit.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(test): remove duplicate verbose_logger mock in test_check_migration

The logger was patched twice — first via mocker.patch() then via
mocker.patch.object(autospec=True). The second call fails because
autospec cannot inspect an already-mocked attribute. Remove the
redundant first patch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix(ci): free disk space before Docker build in test-server-root-path

The Dockerfile.non_root build ran out of disk on the CI runner. Remove
Android SDK, .NET, Boost, and GHC toolchains (~12GB) to free space.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-09 11:46:23 -07:00
Yuneng Jiang 3a02c0ac6b [Infra] Migrate Redis caching tests from GHA to CircleCI
Redis caching unit tests (test_dual_cache, test_redis_batch_optimizations,
test_router_utils) required Redis secrets that should live in CircleCI.

- Add redis_caching_unit_tests job to CircleCI config
- Delete test-unit-caching-redis.yml GHA workflow
- Remove all Redis plumbing (inputs, secrets, env vars) from
  _test-unit-services-base.yml and its callers
2026-04-08 09:07:12 -07:00
Yuneng Jiang 30565581be [Infra] Pin cosign.pub verification to initial commit hash
Pin all cosign public key references to the immutable commit hash
(0112e53) that first introduced the key, instead of fetching it from
the release tag. This addresses the concern that an attacker with push
access could replace the key on main/tags and re-sign tampered images.

Docs now show two verification methods: commit hash (recommended) and
release tag (convenience), with explanation of why the hash is stronger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:53:23 -07:00
yuneng-jiang d132b1bf51 [Infra] Remove Redundant Matrix Unit Test Workflow (#25251)
* Remove redundant matrix unit test workflow

All test paths in test-litellm-matrix.yml are fully covered by the
newer semantic unit test workflows (test-unit-*.yml), making the
matrix workflow redundant CI spend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add Codecov coverage reporting to semantic unit test workflows

Add coverage collection (--cov) and Codecov OIDC upload to both
reusable base workflows and all 12 caller workflows, replacing the
coverage reporting that was previously only in the matrix workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move id-token/pull-requests permissions to job level for multi-job workflows

For workflows with multiple jobs (llm-providers, proxy-db), move
id-token: write and pull-requests: write from workflow level to job
level so permissions are scoped to only the jobs that need them.
Removes zizmor inline suppressions that were masking the issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:52:38 -07:00
yuneng-jiang a60e19aeb8 Remove flaky proxy_e2e_azure_batches_tests CI workflow (#25247)
The proxy_e2e_azure_batches_tests workflow is consistently flaky and
does not provide reliable signal on whether changes break anything.
Remove the workflow from both CircleCI and GitHub Actions, along with
the test directory it exclusively used.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:49:14 -07:00
yuneng-jiang 39c1042258 [Docs] Add cosign Docker image verification steps to security blog posts (#25122)
* docs(blog): add cosign Docker image verification instructions

Add steps for verifying Docker images with cosign to three security blog posts:
CI/CD v2, Security Townhall, and Security Update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(proxy): add cosign verification to Docker/Helm/Terraform deploy page

Add image signature verification steps to the main deployment doc so
users pulling Docker images know how to verify them with cosign.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: fixes

* Update index.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* [Docs] Scope cosign signing docs to GHCR and specify starting version

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Docs] Add starting version callout to ci_cd_v2 blog post

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-06 09:59:27 -07:00
joereyna 6cc56f58fd Fix broken codeql-action SHA in scorecard workflow 2026-04-03 11:36:02 -07:00
yuneng-jiang 0f88968da9 Merge pull request #24804 from joereyna/feat/add-codecov-to-ci
Re-add Codecov coverage reporting to GHA matrix workflow
2026-04-01 09:46:55 -07:00
joereyna c903845266 Use unique filenames per matrix job to preserve all coverage reports 2026-03-31 16:44:13 -07:00
joereyna 98a51e088d Remove debug step from upload-coverage job 2026-03-31 16:44:13 -07:00
joereyna 695d726352 Revert to --cov=litellm, add checkout and root_dir to upload job 2026-03-31 16:44:13 -07:00
joereyna b8eac3059a Measure coverage from repo root so filenames include litellm/ prefix 2026-03-31 16:44:13 -07:00
joereyna fdfd0e58ed Force coverage path remapping via explicit coverage xml step 2026-03-31 16:44:13 -07:00
joereyna 57c22d3a41 Add debug step to inspect coverage XML paths 2026-03-31 16:44:13 -07:00
joereyna e7e0637f53 Fix coverage source paths for Codecov 2026-03-31 16:44:13 -07:00
joereyna 13660572ca Add pull-requests write permission for Codecov PR comments 2026-03-31 16:44:13 -07:00
joereyna aaa5973b88 Use OIDC for Codecov upload instead of static token 2026-03-31 16:44:13 -07:00
joereyna 8358650660 Isolate Codecov upload into separate job to protect CODECOV_TOKEN 2026-03-31 16:44:13 -07:00
joereyna b3eee71084 Pin codecov-action to immutable SHA (v5.5.4) 2026-03-31 16:44:12 -07:00
joereyna d38498c3ef Re-add Codecov coverage upload to GHA matrix workflow 2026-03-31 16:44:12 -07:00
Yuneng Jiang 8071691ffc [Fix] Address review feedback on release workflow
- Use nullish coalescing for potentially null response body
- Create release as draft first, then publish atomically to avoid partial-release state
- Pin cosign.pub URL to release tag instead of main branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 16:26:20 -07:00
Yuneng Jiang 05368d9b1a [Infra] Add cosign verification section to release notes
Prepend Docker image signature verification instructions to auto-generated
release notes, using the cosign public key committed to the repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 15:46:34 -07:00
Yuneng Jiang 0112e53046 [Infra] Add release workflow and cosign public key
Add create-release.yml workflow triggered via workflow_dispatch to create
GitHub releases with auto-generated notes. Add cosign public key for
container image signature verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:30:27 -07:00
stuxf 7066c895f6 chore: harden npm supply chain — pin overrides, enforce npm ci, add ignore-scripts (#24838)
* chore: harden npm supply chain — pin overrides, enforce npm ci, add ignore-scripts

Replace open-ended >= version overrides with exact pins matching lockfile
versions across all 6 package.json files. Remove dead overrides for packages
not present in lockfiles. Switch CI and devcontainer from npm install to
npm ci for deterministic lockfile-based installs.

Add .npmrc to all 7 JS project directories with ignore-scripts=true (blocks
postinstall RAT vectors like the axios@1.14.1 supply chain attack) and
min-release-age=3d (refuses packages published <3 days ago, requires npm
>=11.10). Remove Yarn-only resolutions field from docs/my-website.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump sharp to 0.33.5 in docs, add docs .npmrc

sharp 0.32.x uses postinstall to download native binaries, which breaks
with ignore-scripts=true. sharp 0.33+ distributes via optionalDependencies
instead, making it compatible with the new .npmrc hardening.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove docs .npmrc to fix Vercel deploy

Vercel's build for docs/my-website uses npm install which needs
sharp 0.32.6's postinstall script. Since we don't control Vercel's
build process, remove the .npmrc from docs rather than fight it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: Dockerfile npm ci + nvm checksum verification

- Replace npm install with npm ci in Dockerfile.non_root,
  Dockerfile.custom_ui, and spend-logs/Dockerfile for deterministic
  lockfile-based installs
- Replace curl-pipe-bash nvm install with download-then-verify pattern
  in build_admin_ui.sh, build_ui.sh, and build_ui_custom_path.sh
- Update nvm from v0.38.0 (2021) to v0.40.4 (Jan 2026) with SHA256
  checksum verification before execution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: macOS sha256sum compat + clarify min-release-age scope

- Use shasum -a 256 fallback on macOS where sha256sum is unavailable
- Clarify in .npmrc comments that min-release-age only protects local
  npm install, not npm ci (used in CI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:41:37 -07:00
Krrish Dholakia 05134fc70b Create scorecard.yml 2026-03-30 07:47:06 -07:00
Yuneng Jiang 3b5b98327e [Fix] Use integration-redis-postgres env for Redis workflows since Postgres always starts
GHA doesn't support conditional service containers, so the Postgres container
always starts even for Redis-only jobs. Use integration-redis-postgres
environment for any workflow with enable-redis so the Postgres container gets
valid credentials.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:25:29 -07:00
Yuneng Jiang 3ae80407dd [Fix] Move Postgres username and password to environment secrets
Move POSTGRES_USER and POSTGRES_PASSWORD from hardcoded values to
environment secrets so no credentials appear in workflow files at all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:31:58 -07:00
Yuneng Jiang d42e2f6429 [Fix] Move Postgres DATABASE_URL to environment secret to avoid credential leak warnings
The hardcoded postgresql://postgres:postgres@localhost connection string was
being flagged by secret scanners. Move DATABASE_URL to a GHA environment
secret (integration-postgres) so the password is never in the workflow file.

Changes:
- _test-unit-services-base.yml: DATABASE_URL now comes from secrets, environment
  is derived from enable-* flags (integration-postgres, integration-redis, or
  integration-redis-postgres)
- test-unit-proxy-db.yml: switched to push-only trigger (uses secrets now)
- test-unit-security.yml: switched to push-only trigger (uses secrets now)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:28:41 -07:00
Yuneng Jiang 6549f3eb1a [Infra] Add unit test workflows for Postgres, Redis, and security test suites
Add three new GHA workflows for tests requiring service containers, plus a
reusable base workflow that provides Postgres and cloud Redis support.

New workflows:
- test-unit-proxy-db.yml: proxy DB tests (key generation, auth checks,
  remaining) using a local Postgres container with a 3-way descriptive matrix
- test-unit-caching-redis.yml: caching tests that need Redis but no provider
  API keys, using cloud Redis via the integration-redis environment
- test-unit-security.yml: proxy security tests using a local Postgres container

Reusable base (_test-unit-services-base.yml):
- Local Postgres pinned by digest (postgres@sha256:705a5d5b...)
- Cloud Redis credentials scoped to the integration-redis GHA environment
- Environment binding is derived from enable-redis flag inside the base
  (not caller-controllable) to prevent secret scope bypass
- Supports workers=0 for tests that cannot run in parallel

Security hardening:
- All actions pinned to commit SHAs
- persist-credentials: false on all checkouts
- permissions: contents: read only
- Postgres-only workflows (proxy-db, security) use zero secrets and trigger on
  both pull_request and push to main/litellm_*
- Redis workflow triggers on push only (not pull_request) to prevent external
  PRs from accessing Redis Cloud credentials
- Added ${TEST_PATH:?} guard to both _test-unit-base.yml and
  _test-unit-services-base.yml to fail fast on empty test paths
- All files pass zizmor --pedantic with zero findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 12:06:45 -07:00
Yuneng Jiang 7851567091 [Fix] Scope documentation workflow to match CircleCI and add missing router settings
Revert path fixes for documentation tests that CircleCI never ran
(test_exception_types, test_general_setting_keys, test_readme_providers,
test_standard_logging_payload). Update the GHA workflow to run only the
4 tests CircleCI actually executed: test_env_keys, test_router_settings,
test_api_docs, test_circular_imports.

Add 2 missing router_settings keys (enable_health_check_routing,
health_check_staleness_threshold) and 27 missing general_settings keys
to config_settings.md so test_router_settings passes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:23:53 -07:00
Yuneng Jiang 7100ed5d0a [Fix] Test isolation for agent health checks and documentation test path resolution
Fix agent health check tests failing with 500 errors in parallel CI by
mocking prisma_client to None. Fix documentation validation tests using
CWD-relative paths that break depending on the working directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:00:22 -07:00
yuneng-jiang 428d837704 Merge pull request #24740 from BerriAI/litellm_unit_test_workflow_isolation
[Infra] Isolate unit test workflows with hardened security posture
2026-03-28 10:30:13 -07:00
Yuneng Jiang c717189ed2 [Infra] Remove workflows that require API keys or external services
These test suites are not pure unit tests and don't belong in Phase 1:
- litellm_utils_tests: health check tests need OPENAI_API_KEY
- pass_through_unit_tests: tests hit real Anthropic API
- router_unit_tests: tests call real OpenAI moderation endpoints
- proxy_security_tests: requires DATABASE_URL (Postgres)
- documentation_tests: requires docs directory at specific relative path

These will be re-added in later phases with proper secret scoping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:16:19 -07:00
Yuneng Jiang a34ed20901 [Infra] Fix job naming in reusable workflow callers
Rename job keys from generic 'test' to descriptive names (e.g.,
'core-utils', 'proxy-auth', 'router') so GitHub checks display as
'core-utils / run' instead of 'test / test'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:07:32 -07:00
Yuneng Jiang 3d527b722d [Infra] Add isolated unit test workflows with hardened security posture
Replace monolithic matrix workflow with individual, descriptively-named
workflow files. Each workflow uses a shared reusable base and follows
least-privilege security: zero secrets, read-only permissions, SHA-pinned
actions, persist-credentials: false, and env-var indirection to prevent
template injection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:56:58 -07:00
Yuneng Jiang e0e0c5e293 [Infra] Fix zizmor artipacked warnings on schema sync workflows
Add persist-credentials: false to check-schema-sync (read-only, no push needed).
Explicitly set persist-credentials: true on sync-schema (required for git push).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:14:06 -07:00
Yuneng Jiang 08e29e0a9a [Infra] Automated schema.prisma sync and drift detection
Sync all 3 schema.prisma copies and add GHA workflows to keep them in sync automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:01:20 -07:00
yuneng-jiang d949085310 Merge pull request #24697 from BerriAI/litellm_codeql_gha
[Infra] Improve CodeQL scanning coverage and schedule
2026-03-27 12:17:39 -07:00
Yuneng Jiang ec4273ed8b [Infra] Improve CodeQL scanning coverage and schedule
Switch query suite from security-extended to security-and-quality to
match the default GitHub Advanced Security setup. Run scheduled scans
daily instead of weekly. Remove paths-ignore for _experimental/out so
build artifacts are also scanned.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:04:09 -07:00