Commit Graph

491 Commits

Author SHA1 Message Date
Yuneng Jiang 3a02c0ac6b [Infra] Migrate Redis caching tests from GHA to CircleCI
Redis caching unit tests (test_dual_cache, test_redis_batch_optimizations,
test_router_utils) required Redis secrets that should live in CircleCI.

- Add redis_caching_unit_tests job to CircleCI config
- Delete test-unit-caching-redis.yml GHA workflow
- Remove all Redis plumbing (inputs, secrets, env vars) from
  _test-unit-services-base.yml and its callers
2026-04-08 09:07:12 -07:00
Yuneng Jiang 30565581be [Infra] Pin cosign.pub verification to initial commit hash
Pin all cosign public key references to the immutable commit hash
(0112e53) that first introduced the key, instead of fetching it from
the release tag. This addresses the concern that an attacker with push
access could replace the key on main/tags and re-sign tampered images.

Docs now show two verification methods: commit hash (recommended) and
release tag (convenience), with explanation of why the hash is stronger.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 22:53:23 -07:00
yuneng-jiang d132b1bf51 [Infra] Remove Redundant Matrix Unit Test Workflow (#25251)
* Remove redundant matrix unit test workflow

All test paths in test-litellm-matrix.yml are fully covered by the
newer semantic unit test workflows (test-unit-*.yml), making the
matrix workflow redundant CI spend.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Add Codecov coverage reporting to semantic unit test workflows

Add coverage collection (--cov) and Codecov OIDC upload to both
reusable base workflows and all 12 caller workflows, replacing the
coverage reporting that was previously only in the matrix workflow.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* Move id-token/pull-requests permissions to job level for multi-job workflows

For workflows with multiple jobs (llm-providers, proxy-db), move
id-token: write and pull-requests: write from workflow level to job
level so permissions are scoped to only the jobs that need them.
Removes zizmor inline suppressions that were masking the issue.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:52:38 -07:00
yuneng-jiang a60e19aeb8 Remove flaky proxy_e2e_azure_batches_tests CI workflow (#25247)
The proxy_e2e_azure_batches_tests workflow is consistently flaky and
does not provide reliable signal on whether changes break anything.
Remove the workflow from both CircleCI and GitHub Actions, along with
the test directory it exclusively used.

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-06 16:49:14 -07:00
yuneng-jiang 39c1042258 [Docs] Add cosign Docker image verification steps to security blog posts (#25122)
* docs(blog): add cosign Docker image verification instructions

Add steps for verifying Docker images with cosign to three security blog posts:
CI/CD v2, Security Townhall, and Security Update.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* docs(proxy): add cosign verification to Docker/Helm/Terraform deploy page

Add image signature verification steps to the main deployment doc so
users pulling Docker images know how to verify them with cosign.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: fixes

* Update index.md

Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>

* [Docs] Scope cosign signing docs to GHCR and specify starting version

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* [Docs] Add starting version callout to ci_cd_v2 blog post

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
2026-04-06 09:59:27 -07:00
joereyna 6cc56f58fd Fix broken codeql-action SHA in scorecard workflow 2026-04-03 11:36:02 -07:00
yuneng-jiang 0f88968da9 Merge pull request #24804 from joereyna/feat/add-codecov-to-ci
Re-add Codecov coverage reporting to GHA matrix workflow
2026-04-01 09:46:55 -07:00
joereyna c903845266 Use unique filenames per matrix job to preserve all coverage reports 2026-03-31 16:44:13 -07:00
joereyna 98a51e088d Remove debug step from upload-coverage job 2026-03-31 16:44:13 -07:00
joereyna 695d726352 Revert to --cov=litellm, add checkout and root_dir to upload job 2026-03-31 16:44:13 -07:00
joereyna b8eac3059a Measure coverage from repo root so filenames include litellm/ prefix 2026-03-31 16:44:13 -07:00
joereyna fdfd0e58ed Force coverage path remapping via explicit coverage xml step 2026-03-31 16:44:13 -07:00
joereyna 57c22d3a41 Add debug step to inspect coverage XML paths 2026-03-31 16:44:13 -07:00
joereyna e7e0637f53 Fix coverage source paths for Codecov 2026-03-31 16:44:13 -07:00
joereyna 13660572ca Add pull-requests write permission for Codecov PR comments 2026-03-31 16:44:13 -07:00
joereyna aaa5973b88 Use OIDC for Codecov upload instead of static token 2026-03-31 16:44:13 -07:00
joereyna 8358650660 Isolate Codecov upload into separate job to protect CODECOV_TOKEN 2026-03-31 16:44:13 -07:00
joereyna b3eee71084 Pin codecov-action to immutable SHA (v5.5.4) 2026-03-31 16:44:12 -07:00
joereyna d38498c3ef Re-add Codecov coverage upload to GHA matrix workflow 2026-03-31 16:44:12 -07:00
Yuneng Jiang 8071691ffc [Fix] Address review feedback on release workflow
- Use nullish coalescing for potentially null response body
- Create release as draft first, then publish atomically to avoid partial-release state
- Pin cosign.pub URL to release tag instead of main branch

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 16:26:20 -07:00
Yuneng Jiang 05368d9b1a [Infra] Add cosign verification section to release notes
Prepend Docker image signature verification instructions to auto-generated
release notes, using the cosign public key committed to the repo.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 15:46:34 -07:00
Yuneng Jiang 0112e53046 [Infra] Add release workflow and cosign public key
Add create-release.yml workflow triggered via workflow_dispatch to create
GitHub releases with auto-generated notes. Add cosign public key for
container image signature verification.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 14:30:27 -07:00
stuxf 7066c895f6 chore: harden npm supply chain — pin overrides, enforce npm ci, add ignore-scripts (#24838)
* chore: harden npm supply chain — pin overrides, enforce npm ci, add ignore-scripts

Replace open-ended >= version overrides with exact pins matching lockfile
versions across all 6 package.json files. Remove dead overrides for packages
not present in lockfiles. Switch CI and devcontainer from npm install to
npm ci for deterministic lockfile-based installs.

Add .npmrc to all 7 JS project directories with ignore-scripts=true (blocks
postinstall RAT vectors like the axios@1.14.1 supply chain attack) and
min-release-age=3d (refuses packages published <3 days ago, requires npm
>=11.10). Remove Yarn-only resolutions field from docs/my-website.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: bump sharp to 0.33.5 in docs, add docs .npmrc

sharp 0.32.x uses postinstall to download native binaries, which breaks
with ignore-scripts=true. sharp 0.33+ distributes via optionalDependencies
instead, making it compatible with the new .npmrc hardening.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: remove docs .npmrc to fix Vercel deploy

Vercel's build for docs/my-website uses npm install which needs
sharp 0.32.6's postinstall script. Since we don't control Vercel's
build process, remove the .npmrc from docs rather than fight it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* chore: Dockerfile npm ci + nvm checksum verification

- Replace npm install with npm ci in Dockerfile.non_root,
  Dockerfile.custom_ui, and spend-logs/Dockerfile for deterministic
  lockfile-based installs
- Replace curl-pipe-bash nvm install with download-then-verify pattern
  in build_admin_ui.sh, build_ui.sh, and build_ui_custom_path.sh
- Update nvm from v0.38.0 (2021) to v0.40.4 (Jan 2026) with SHA256
  checksum verification before execution

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

* fix: macOS sha256sum compat + clarify min-release-age scope

- Use shasum -a 256 fallback on macOS where sha256sum is unavailable
- Clarify in .npmrc comments that min-release-age only protects local
  npm install, not npm ci (used in CI)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 13:41:37 -07:00
Krrish Dholakia 05134fc70b Create scorecard.yml 2026-03-30 07:47:06 -07:00
Yuneng Jiang 3b5b98327e [Fix] Use integration-redis-postgres env for Redis workflows since Postgres always starts
GHA doesn't support conditional service containers, so the Postgres container
always starts even for Redis-only jobs. Use integration-redis-postgres
environment for any workflow with enable-redis so the Postgres container gets
valid credentials.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 14:25:29 -07:00
Yuneng Jiang 3ae80407dd [Fix] Move Postgres username and password to environment secrets
Move POSTGRES_USER and POSTGRES_PASSWORD from hardcoded values to
environment secrets so no credentials appear in workflow files at all.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:31:58 -07:00
Yuneng Jiang d42e2f6429 [Fix] Move Postgres DATABASE_URL to environment secret to avoid credential leak warnings
The hardcoded postgresql://postgres:postgres@localhost connection string was
being flagged by secret scanners. Move DATABASE_URL to a GHA environment
secret (integration-postgres) so the password is never in the workflow file.

Changes:
- _test-unit-services-base.yml: DATABASE_URL now comes from secrets, environment
  is derived from enable-* flags (integration-postgres, integration-redis, or
  integration-redis-postgres)
- test-unit-proxy-db.yml: switched to push-only trigger (uses secrets now)
- test-unit-security.yml: switched to push-only trigger (uses secrets now)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 13:28:41 -07:00
Yuneng Jiang 6549f3eb1a [Infra] Add unit test workflows for Postgres, Redis, and security test suites
Add three new GHA workflows for tests requiring service containers, plus a
reusable base workflow that provides Postgres and cloud Redis support.

New workflows:
- test-unit-proxy-db.yml: proxy DB tests (key generation, auth checks,
  remaining) using a local Postgres container with a 3-way descriptive matrix
- test-unit-caching-redis.yml: caching tests that need Redis but no provider
  API keys, using cloud Redis via the integration-redis environment
- test-unit-security.yml: proxy security tests using a local Postgres container

Reusable base (_test-unit-services-base.yml):
- Local Postgres pinned by digest (postgres@sha256:705a5d5b...)
- Cloud Redis credentials scoped to the integration-redis GHA environment
- Environment binding is derived from enable-redis flag inside the base
  (not caller-controllable) to prevent secret scope bypass
- Supports workers=0 for tests that cannot run in parallel

Security hardening:
- All actions pinned to commit SHAs
- persist-credentials: false on all checkouts
- permissions: contents: read only
- Postgres-only workflows (proxy-db, security) use zero secrets and trigger on
  both pull_request and push to main/litellm_*
- Redis workflow triggers on push only (not pull_request) to prevent external
  PRs from accessing Redis Cloud credentials
- Added ${TEST_PATH:?} guard to both _test-unit-base.yml and
  _test-unit-services-base.yml to fail fast on empty test paths
- All files pass zizmor --pedantic with zero findings

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 12:06:45 -07:00
Yuneng Jiang 7851567091 [Fix] Scope documentation workflow to match CircleCI and add missing router settings
Revert path fixes for documentation tests that CircleCI never ran
(test_exception_types, test_general_setting_keys, test_readme_providers,
test_standard_logging_payload). Update the GHA workflow to run only the
4 tests CircleCI actually executed: test_env_keys, test_router_settings,
test_api_docs, test_circular_imports.

Add 2 missing router_settings keys (enable_health_check_routing,
health_check_staleness_threshold) and 27 missing general_settings keys
to config_settings.md so test_router_settings passes.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:23:53 -07:00
Yuneng Jiang 7100ed5d0a [Fix] Test isolation for agent health checks and documentation test path resolution
Fix agent health check tests failing with 500 errors in parallel CI by
mocking prisma_client to None. Fix documentation validation tests using
CWD-relative paths that break depending on the working directory.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 11:00:22 -07:00
yuneng-jiang 428d837704 Merge pull request #24740 from BerriAI/litellm_unit_test_workflow_isolation
[Infra] Isolate unit test workflows with hardened security posture
2026-03-28 10:30:13 -07:00
Yuneng Jiang c717189ed2 [Infra] Remove workflows that require API keys or external services
These test suites are not pure unit tests and don't belong in Phase 1:
- litellm_utils_tests: health check tests need OPENAI_API_KEY
- pass_through_unit_tests: tests hit real Anthropic API
- router_unit_tests: tests call real OpenAI moderation endpoints
- proxy_security_tests: requires DATABASE_URL (Postgres)
- documentation_tests: requires docs directory at specific relative path

These will be re-added in later phases with proper secret scoping.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:16:19 -07:00
Yuneng Jiang a34ed20901 [Infra] Fix job naming in reusable workflow callers
Rename job keys from generic 'test' to descriptive names (e.g.,
'core-utils', 'proxy-auth', 'router') so GitHub checks display as
'core-utils / run' instead of 'test / test'.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 10:07:32 -07:00
Yuneng Jiang 3d527b722d [Infra] Add isolated unit test workflows with hardened security posture
Replace monolithic matrix workflow with individual, descriptively-named
workflow files. Each workflow uses a shared reusable base and follows
least-privilege security: zero secrets, read-only permissions, SHA-pinned
actions, persist-credentials: false, and env-var indirection to prevent
template injection.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-28 09:56:58 -07:00
Yuneng Jiang e0e0c5e293 [Infra] Fix zizmor artipacked warnings on schema sync workflows
Add persist-credentials: false to check-schema-sync (read-only, no push needed).
Explicitly set persist-credentials: true on sync-schema (required for git push).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:14:06 -07:00
Yuneng Jiang 08e29e0a9a [Infra] Automated schema.prisma sync and drift detection
Sync all 3 schema.prisma copies and add GHA workflows to keep them in sync automatically.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 16:01:20 -07:00
yuneng-jiang d949085310 Merge pull request #24697 from BerriAI/litellm_codeql_gha
[Infra] Improve CodeQL scanning coverage and schedule
2026-03-27 12:17:39 -07:00
Yuneng Jiang ec4273ed8b [Infra] Improve CodeQL scanning coverage and schedule
Switch query suite from security-extended to security-and-quality to
match the default GitHub Advanced Security setup. Run scheduled scans
daily instead of weekly. Remove paths-ignore for _experimental/out so
build artifacts are also scanned.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 12:04:09 -07:00
Yuneng Jiang ca3457b091 Pin nodejs-wheel-binaries in CI workflows running prisma generate
prisma generate internally runs `npm install prisma@5.4.2` against the
npm registry at runtime. Without a bundled Node.js, this causes
ECONNRESET failures on flaky GitHub Actions network and leaves the
npm transitive dependency tree unpinned.

Pre-install nodejs-wheel-binaries==24.13.1 (matching the Dockerfiles)
so prisma uses the bundled Node/npm instead of fetching from the
registry.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 11:25:03 -07:00
Krrish Dholakia ff63df25a2 Merge pull request #24663 from BerriAI/litellm_test_branch_03_26_2026_p1
Add zizmor to ci/cd
2026-03-27 08:59:07 -07:00
Krrish Dholakia a671275f5c ci: add zizmor github action 2026-03-27 05:33:21 -07:00
Krrish Dholakia dfb543369b fix: address zizmor comments 2026-03-26 21:09:01 -07:00
Yuneng Jiang ba8455a3be [Infra] Migrate PyPI publishing from CircleCI to GitHub Actions OIDC
- Add .github/workflows/publish_to_pypi.yml with OIDC trusted publisher
- Remove publish_to_pypi job from .circleci/config.yml
- Zero long-lived tokens, all actions SHA-pinned, build deps version-pinned
2026-03-26 19:02:14 -07:00
Yuneng Jiang 84be6f69ef fix google-cloud-aiplatform pin to be compatible with google-genai==1.22.0
Pin to 1.115.0 (latest version that doesn't require google-genai>=1.59.0).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:37:43 -07:00
Yuneng Jiang 1beb687f54 pin GHA dependencies + remove unused load test files
Pin all pip install commands to exact versions and SHA-pin all GitHub
Actions to prevent supply chain attacks. Remove snok/install-poetry
in favor of direct pip install. Delete orphaned load test scripts.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-25 22:32:08 -07:00
Krrish Dholakia df2a36dd27 docs: document new github + gitlab ci scripts 2026-03-25 20:17:10 -07:00
Yuneng Jiang b90a0af0d7 remove extra @ 2026-03-25 17:46:37 -07:00
Yuneng Jiang a989587525 re-add helm unit test with checksum pin 2026-03-25 17:38:36 -07:00
Yuneng Jiang f86b240d7e pin github scripts + remove unused 2026-03-25 17:38:36 -07:00
Ishaan Jaffer 3e8a6f24b7 ci: remove all publish/deploy workflows as part of supply chain incident response 2026-03-24 18:03:04 -07:00