Files
litellm/CLAUDE.md
T
yuneng-jiang 278c9babc6 [Infra] Merging RC Branch with Main (#23786)
* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock

The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): route OpenAI models through chat completions in pass-through tests

The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): fix assemblyai custom auth and router wildcard test flakiness

1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
   user can access management endpoints like /key/generate. The test
   test_assemblyai_transcribe_with_non_admin_key was hidden behind an
   earlier -x failure and was never reached before.

2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
   to 2s for test_router_get_model_group_usage_wildcard_routes. The async
   callback needs time to write usage to cache, and 1s is insufficient on
   slower CI hardware.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retrigger CI pipeline

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic

Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)

* fix: don't close HTTP/SDK clients on LLMClientCache eviction

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.

* test: update LLMClientCache unit tests for no-close-on-eviction behavior

Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.

* test: add e2e tests for OpenAI SDK client surviving cache eviction

Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.

* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction

* docs(CLAUDE.md): add HTTP client cache safety guideline

* [Fix] Install bsdmainutils for column command in security scans

The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle string callback values in prometheus multiproc setup

When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
  TypeError: can only concatenate str (not "list") to str

Normalize each callback setting to a list before concatenating.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bump: version 1.82.2 → 1.82.3

* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing

When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.

Now checks whether the router_model_id entry actually has pricing before preferring it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-03-16 15:32:20 -07:00

12 KiB

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Development Commands

Installation

  • make install-dev - Install core development dependencies
  • make install-proxy-dev - Install proxy development dependencies with full feature set
  • make install-test-deps - Install all test dependencies

Testing

  • make test - Run all tests
  • make test-unit - Run unit tests (tests/test_litellm) with 4 parallel workers
  • make test-integration - Run integration tests (excludes unit tests)
  • pytest tests/ - Direct pytest execution

Code Quality

  • make lint - Run all linting (Ruff, MyPy, Black, circular imports, import safety)
  • make format - Apply Black code formatting
  • make lint-ruff - Run Ruff linting only
  • make lint-mypy - Run MyPy type checking only

Single Test Files

  • poetry run pytest tests/path/to/test_file.py -v - Run specific test file
  • poetry run pytest tests/path/to/test_file.py::test_function -v - Run specific test

Running Scripts

  • poetry run python script.py - Run Python scripts (use for non-test files)

GitHub Issue & PR Templates

When contributing to the project, use the appropriate templates:

Bug Reports (.github/ISSUE_TEMPLATE/bug_report.yml):

  • Describe what happened vs. what you expected
  • Include relevant log output
  • Specify your LiteLLM version

Feature Requests (.github/ISSUE_TEMPLATE/feature_request.yml):

  • Describe the feature clearly
  • Explain the motivation and use case

Pull Requests (.github/pull_request_template.md):

  • Add at least 1 test in tests/litellm/
  • Ensure make test-unit passes

Architecture Overview

LiteLLM is a unified interface for 100+ LLM providers with two main components:

Core Library (litellm/)

  • Main entry point: litellm/main.py - Contains core completion() function
  • Provider implementations: litellm/llms/ - Each provider has its own subdirectory
  • Router system: litellm/router.py + litellm/router_utils/ - Load balancing and fallback logic
  • Type definitions: litellm/types/ - Pydantic models and type hints
  • Integrations: litellm/integrations/ - Third-party observability, caching, logging
  • Caching: litellm/caching/ - Multiple cache backends (Redis, in-memory, S3, etc.)

Proxy Server (litellm/proxy/)

  • Main server: proxy_server.py - FastAPI application
  • Authentication: auth/ - API key management, JWT, OAuth2
  • Database: db/ - Prisma ORM with PostgreSQL/SQLite support
  • Management endpoints: management_endpoints/ - Admin APIs for keys, teams, models
  • Pass-through endpoints: pass_through_endpoints/ - Provider-specific API forwarding
  • Guardrails: guardrails/ - Safety and content filtering hooks
  • UI Dashboard: Served from _experimental/out/ (Next.js build)

Key Patterns

Provider Implementation

  • Providers inherit from base classes in litellm/llms/base.py
  • Each provider has transformation functions for input/output formatting
  • Support both sync and async operations
  • Handle streaming responses and function calling

Error Handling

  • Provider-specific exceptions mapped to OpenAI-compatible errors
  • Fallback logic handled by Router system
  • Comprehensive logging through litellm/_logging.py

Configuration

  • YAML config files for proxy server (see proxy/example_config_yaml/)
  • Environment variables for API keys and settings
  • Database schema managed via Prisma (proxy/schema.prisma)

Development Notes

Code Style

  • Uses Black formatter, Ruff linter, MyPy type checker
  • Pydantic v2 for data validation
  • Async/await patterns throughout
  • Type hints required for all public APIs
  • Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability. The only exception is avoiding circular imports where absolutely necessary.
  • Use dict spread for immutable copies — prefer {**original, "key": new_value} over dict(obj) + mutation. The spread produces the final dict in one step and makes intent clear.
  • Guard at resolution time — when resolving an optional value through a fallback chain (a or b or ""), raise immediately if the resolved result being empty is an error. Don't pass empty strings or sentinel values downstream for the callee to deal with.
  • Extract complex comprehensions to named helpers — a set/dict comprehension that calls into the DB or manager (e.g. "which of these server IDs are OAuth2?") belongs in a named helper function, not inline in the caller.
  • FastAPI parameter declarations — mark required query/form params with = Query(...) / = Form(...) explicitly when other params in the same handler are optional. Mixing str (required) with Optional[str] = None in the same signature causes silent 422s when the required param is missing.

Testing Strategy

  • Unit tests in tests/test_litellm/
  • Integration tests for each provider in tests/llm_translation/
  • Proxy tests in tests/proxy_unit_tests/
  • Load tests in tests/load_tests/
  • Always add tests when adding new entity types or features — if the existing test file covers other entity types, add corresponding tests for the new one
  • Keep monkeypatch stubs in sync with real signatures — when a function gains a new optional parameter, update every fake_* / stub_* in tests that patch it to also accept that kwarg (even as **kwargs). Stale stubs fail with unexpected keyword argument and mask real bugs.
  • Test all branches of name→ID resolution — when adding server/resource lookup that resolves names to UUIDs, test: (1) name resolves and UUID is allowed, (2) name resolves but UUID is not allowed, (3) name does not resolve at all. The silent-fallback path is where access-control bugs hide.

UI / Backend Consistency

  • When wiring a new UI entity type to an existing backend endpoint, verify the backend API contract (single value vs. array, required vs. optional params) and ensure the UI controls match — e.g., use a single-select dropdown when the backend accepts a single value, not a multi-select

MCP OAuth / OpenAPI Transport Mapping

  • TRANSPORT.OPENAPI is a UI-only concept. The backend only accepts "http", "sse", or "stdio". Always map it to "http" before any API call (including pre-OAuth temp-session calls).
  • FastAPI validation errors return detail as an array of {loc, msg, type} objects. Error extractors must handle: array (map .msg), string, nested {error: string}, and fallback.
  • When an MCP server already has authorization_url stored, skip OAuth discovery (_discovery_metadata) — the server URL for OpenAPI MCPs is the spec file, not the API base, and fetching it causes timeouts.
  • client_id should be optional in the /authorize endpoint — if the server has a stored client_id in credentials, use that. Never require callers to re-supply it.

MCP Credential Storage

  • OAuth credentials and BYOK credentials share the litellm_mcpusercredentials table, distinguished by a "type" field in the JSON payload ("oauth2" vs plain string).
  • When deleting OAuth credentials, check type before deleting to avoid accidentally deleting a BYOK credential for the same (user_id, server_id) pair.
  • Always pass the raw expires_at timestamp to the client — never set it to None for expired credentials. Let the frontend compute the "Expired" display state from the timestamp.
  • Use RecordNotFoundError (not bare except Exception) when catching "already deleted" in credential delete endpoints.

Browser Storage Safety (UI)

  • Never write LiteLLM access tokens or API keys to localStorage — use sessionStorage only. localStorage survives browser close and is readable by any injected script (XSS).
  • Shared utility functions (e.g. extractErrorMessage) belong in src/utils/ — never define them inline in hooks or duplicate them across files.

Database Migrations

  • Prisma handles schema migrations
  • Migration files auto-generated with prisma migrate dev
  • Always test migrations against both PostgreSQL and SQLite

Proxy database access

  • Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of execute_raw / query_raw.
  • Use the generated client: prisma_client.db.<model> (e.g. litellm_tooltable, litellm_usertable) with .upsert(), .find_many(), .find_unique(), .update(), .update_many() as appropriate. This avoids schema/client drift, keeps code testable with simple mocks, and matches patterns used in spend logs and other proxy code.
  • No N+1 queries. Never query the DB inside a loop. Batch-fetch with {"in": ids} and distribute in-memory.
  • Batch writes. Use create_many/update_many/delete_many instead of individual calls (these return counts only; update_many/delete_many no-op silently on missing rows). When multiple separate writes target the same table (e.g. in batch_()), order by primary key to avoid deadlocks.
  • Push work to the DB. Filter, sort, group, and aggregate in SQL, not Python. Verify Prisma generates the expected SQL — e.g. prefer group_by over find_many(distinct=...) which does client-side processing.
  • Bound large result sets. Prisma materializes full results in memory. For results over ~10 MB, paginate with take/skip or cursor/take, always with an explicit order. Prefer cursor-based pagination (skip is O(n)). Don't paginate naturally small result sets.
  • Limit fetched columns on wide tables. Use select to fetch only needed fields — returns a partial object, so downstream code must not access unselected fields.
  • Check index coverage. For new or modified queries, check schema.prisma for a supporting index. Prefer extending an existing index (e.g. @@index([a])@@index([a, b])) over adding a new one, unless it's a @@unique. Only add indexes for large/frequent queries.
  • Keep schema files in sync. Apply schema changes to all schema.prisma copies (schema.prisma, litellm/proxy/, litellm-proxy-extras/, litellm-js/spend-logs/ for SpendLogs) with a migration under litellm-proxy-extras/litellm_proxy_extras/migrations/.

Enterprise Features

  • Enterprise-specific code in enterprise/ directory
  • Optional features enabled via environment variables
  • Separate licensing and authentication for enterprise features

HTTP Client Cache Safety

  • Never close HTTP/SDK clients on cache eviction. LLMClientCache._remove_key() must not call close()/aclose() on evicted clients — they may still be used by in-flight requests. Doing so causes RuntimeError: Cannot send a request, as the client has been closed. after the 1-hour TTL expires. Cleanup happens at shutdown via close_litellm_async_clients().

Troubleshooting: DB schema out of sync after proxy restart

litellm-proxy-extras runs prisma migrate deploy on startup using its own bundled migration files, which may lag behind schema changes in the current worktree. Symptoms: Unknown column, Invalid prisma invocation, or missing data on new fields.

Diagnose: Run \d "TableName" in psql and compare against schema.prisma — missing columns confirm the issue.

Fix options:

  1. Create a Prisma migration (permanent) — run prisma migrate dev --name <description> in the worktree. The generated file will be picked up by prisma migrate deploy on next startup.
  2. Apply manually for local devpsql -d litellm -c "ALTER TABLE ... ADD COLUMN IF NOT EXISTS ..." after each proxy start. Fine for dev, not for production.
  3. Update litellm-proxy-extras — if the package is installed from PyPI, its migration directory must include the new file. Either update the package or run the migration manually until the next release ships it.