* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock
The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): route OpenAI models through chat completions in pass-through tests
The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(ci): fix assemblyai custom auth and router wildcard test flakiness
1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
user can access management endpoints like /key/generate. The test
test_assemblyai_transcribe_with_non_admin_key was hidden behind an
earlier -x failure and was never reached before.
2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
to 2s for test_router_get_model_group_usage_wildcard_routes. The async
callback needs time to write usage to cache, and 1s is insufficient on
slower CI hardware.
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* ci: retrigger CI pipeline
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic
Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)
* fix: don't close HTTP/SDK clients on LLMClientCache eviction
Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:
RuntimeError: Cannot send a request, as the client has been closed.
This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().
Fixes production crashes after the 1-hour cache TTL expires.
* test: update LLMClientCache unit tests for no-close-on-eviction behavior
Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.
Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.
* test: add e2e tests for OpenAI SDK client surviving cache eviction
Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
eviction doesn't close the client
Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.
Also expand the module docstring to explain why the sleep is required.
* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction
* docs(CLAUDE.md): add HTTP client cache safety guideline
* [Fix] Install bsdmainutils for column command in security scans
The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix: handle string callback values in prometheus multiproc setup
When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
TypeError: can only concatenate str (not "list") to str
Normalize each callback setting to a list before concatenating.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* bump: version 1.82.2 → 1.82.3
* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement
The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing
When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.
Now checks whether the router_model_id entry actually has pricing before preferring it.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
---------
Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
12 KiB
CLAUDE.md
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
Development Commands
Installation
make install-dev- Install core development dependenciesmake install-proxy-dev- Install proxy development dependencies with full feature setmake install-test-deps- Install all test dependencies
Testing
make test- Run all testsmake test-unit- Run unit tests (tests/test_litellm) with 4 parallel workersmake test-integration- Run integration tests (excludes unit tests)pytest tests/- Direct pytest execution
Code Quality
make lint- Run all linting (Ruff, MyPy, Black, circular imports, import safety)make format- Apply Black code formattingmake lint-ruff- Run Ruff linting onlymake lint-mypy- Run MyPy type checking only
Single Test Files
poetry run pytest tests/path/to/test_file.py -v- Run specific test filepoetry run pytest tests/path/to/test_file.py::test_function -v- Run specific test
Running Scripts
poetry run python script.py- Run Python scripts (use for non-test files)
GitHub Issue & PR Templates
When contributing to the project, use the appropriate templates:
Bug Reports (.github/ISSUE_TEMPLATE/bug_report.yml):
- Describe what happened vs. what you expected
- Include relevant log output
- Specify your LiteLLM version
Feature Requests (.github/ISSUE_TEMPLATE/feature_request.yml):
- Describe the feature clearly
- Explain the motivation and use case
Pull Requests (.github/pull_request_template.md):
- Add at least 1 test in
tests/litellm/ - Ensure
make test-unitpasses
Architecture Overview
LiteLLM is a unified interface for 100+ LLM providers with two main components:
Core Library (litellm/)
- Main entry point:
litellm/main.py- Contains core completion() function - Provider implementations:
litellm/llms/- Each provider has its own subdirectory - Router system:
litellm/router.py+litellm/router_utils/- Load balancing and fallback logic - Type definitions:
litellm/types/- Pydantic models and type hints - Integrations:
litellm/integrations/- Third-party observability, caching, logging - Caching:
litellm/caching/- Multiple cache backends (Redis, in-memory, S3, etc.)
Proxy Server (litellm/proxy/)
- Main server:
proxy_server.py- FastAPI application - Authentication:
auth/- API key management, JWT, OAuth2 - Database:
db/- Prisma ORM with PostgreSQL/SQLite support - Management endpoints:
management_endpoints/- Admin APIs for keys, teams, models - Pass-through endpoints:
pass_through_endpoints/- Provider-specific API forwarding - Guardrails:
guardrails/- Safety and content filtering hooks - UI Dashboard: Served from
_experimental/out/(Next.js build)
Key Patterns
Provider Implementation
- Providers inherit from base classes in
litellm/llms/base.py - Each provider has transformation functions for input/output formatting
- Support both sync and async operations
- Handle streaming responses and function calling
Error Handling
- Provider-specific exceptions mapped to OpenAI-compatible errors
- Fallback logic handled by Router system
- Comprehensive logging through
litellm/_logging.py
Configuration
- YAML config files for proxy server (see
proxy/example_config_yaml/) - Environment variables for API keys and settings
- Database schema managed via Prisma (
proxy/schema.prisma)
Development Notes
Code Style
- Uses Black formatter, Ruff linter, MyPy type checker
- Pydantic v2 for data validation
- Async/await patterns throughout
- Type hints required for all public APIs
- Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability. The only exception is avoiding circular imports where absolutely necessary.
- Use dict spread for immutable copies — prefer
{**original, "key": new_value}overdict(obj)+ mutation. The spread produces the final dict in one step and makes intent clear. - Guard at resolution time — when resolving an optional value through a fallback chain (
a or b or ""), raise immediately if the resolved result being empty is an error. Don't pass empty strings or sentinel values downstream for the callee to deal with. - Extract complex comprehensions to named helpers — a set/dict comprehension that calls into the DB or manager (e.g. "which of these server IDs are OAuth2?") belongs in a named helper function, not inline in the caller.
- FastAPI parameter declarations — mark required query/form params with
= Query(...)/= Form(...)explicitly when other params in the same handler are optional. Mixingstr(required) withOptional[str] = Nonein the same signature causes silent 422s when the required param is missing.
Testing Strategy
- Unit tests in
tests/test_litellm/ - Integration tests for each provider in
tests/llm_translation/ - Proxy tests in
tests/proxy_unit_tests/ - Load tests in
tests/load_tests/ - Always add tests when adding new entity types or features — if the existing test file covers other entity types, add corresponding tests for the new one
- Keep monkeypatch stubs in sync with real signatures — when a function gains a new optional parameter, update every
fake_*/stub_*in tests that patch it to also accept that kwarg (even as**kwargs). Stale stubs fail withunexpected keyword argumentand mask real bugs. - Test all branches of name→ID resolution — when adding server/resource lookup that resolves names to UUIDs, test: (1) name resolves and UUID is allowed, (2) name resolves but UUID is not allowed, (3) name does not resolve at all. The silent-fallback path is where access-control bugs hide.
UI / Backend Consistency
- When wiring a new UI entity type to an existing backend endpoint, verify the backend API contract (single value vs. array, required vs. optional params) and ensure the UI controls match — e.g., use a single-select dropdown when the backend accepts a single value, not a multi-select
MCP OAuth / OpenAPI Transport Mapping
TRANSPORT.OPENAPIis a UI-only concept. The backend only accepts"http","sse", or"stdio". Always map it to"http"before any API call (including pre-OAuth temp-session calls).- FastAPI validation errors return
detailas an array of{loc, msg, type}objects. Error extractors must handle: array (map.msg), string, nested{error: string}, and fallback. - When an MCP server already has
authorization_urlstored, skip OAuth discovery (_discovery_metadata) — the server URL for OpenAPI MCPs is the spec file, not the API base, and fetching it causes timeouts. client_idshould be optional in the/authorizeendpoint — if the server has a storedclient_idin credentials, use that. Never require callers to re-supply it.
MCP Credential Storage
- OAuth credentials and BYOK credentials share the
litellm_mcpusercredentialstable, distinguished by a"type"field in the JSON payload ("oauth2"vs plain string). - When deleting OAuth credentials, check type before deleting to avoid accidentally deleting a BYOK credential for the same
(user_id, server_id)pair. - Always pass the raw
expires_attimestamp to the client — never set it toNonefor expired credentials. Let the frontend compute the "Expired" display state from the timestamp. - Use
RecordNotFoundError(not bareexcept Exception) when catching "already deleted" in credential delete endpoints.
Browser Storage Safety (UI)
- Never write LiteLLM access tokens or API keys to
localStorage— usesessionStorageonly.localStoragesurvives browser close and is readable by any injected script (XSS). - Shared utility functions (e.g.
extractErrorMessage) belong insrc/utils/— never define them inline in hooks or duplicate them across files.
Database Migrations
- Prisma handles schema migrations
- Migration files auto-generated with
prisma migrate dev - Always test migrations against both PostgreSQL and SQLite
Proxy database access
- Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of
execute_raw/query_raw. - Use the generated client:
prisma_client.db.<model>(e.g.litellm_tooltable,litellm_usertable) with.upsert(),.find_many(),.find_unique(),.update(),.update_many()as appropriate. This avoids schema/client drift, keeps code testable with simple mocks, and matches patterns used in spend logs and other proxy code. - No N+1 queries. Never query the DB inside a loop. Batch-fetch with
{"in": ids}and distribute in-memory. - Batch writes. Use
create_many/update_many/delete_manyinstead of individual calls (these return counts only;update_many/delete_manyno-op silently on missing rows). When multiple separate writes target the same table (e.g. inbatch_()), order by primary key to avoid deadlocks. - Push work to the DB. Filter, sort, group, and aggregate in SQL, not Python. Verify Prisma generates the expected SQL — e.g. prefer
group_byoverfind_many(distinct=...)which does client-side processing. - Bound large result sets. Prisma materializes full results in memory. For results over ~10 MB, paginate with
take/skiporcursor/take, always with an explicitorder. Prefer cursor-based pagination (skipis O(n)). Don't paginate naturally small result sets. - Limit fetched columns on wide tables. Use
selectto fetch only needed fields — returns a partial object, so downstream code must not access unselected fields. - Check index coverage. For new or modified queries, check
schema.prismafor a supporting index. Prefer extending an existing index (e.g.@@index([a])→@@index([a, b])) over adding a new one, unless it's a@@unique. Only add indexes for large/frequent queries. - Keep schema files in sync. Apply schema changes to all
schema.prismacopies (schema.prisma,litellm/proxy/,litellm-proxy-extras/,litellm-js/spend-logs/for SpendLogs) with a migration underlitellm-proxy-extras/litellm_proxy_extras/migrations/.
Enterprise Features
- Enterprise-specific code in
enterprise/directory - Optional features enabled via environment variables
- Separate licensing and authentication for enterprise features
HTTP Client Cache Safety
- Never close HTTP/SDK clients on cache eviction.
LLMClientCache._remove_key()must not callclose()/aclose()on evicted clients — they may still be used by in-flight requests. Doing so causesRuntimeError: Cannot send a request, as the client has been closed.after the 1-hour TTL expires. Cleanup happens at shutdown viaclose_litellm_async_clients().
Troubleshooting: DB schema out of sync after proxy restart
litellm-proxy-extras runs prisma migrate deploy on startup using its own bundled migration files, which may lag behind schema changes in the current worktree. Symptoms: Unknown column, Invalid prisma invocation, or missing data on new fields.
Diagnose: Run \d "TableName" in psql and compare against schema.prisma — missing columns confirm the issue.
Fix options:
- Create a Prisma migration (permanent) — run
prisma migrate dev --name <description>in the worktree. The generated file will be picked up byprisma migrate deployon next startup. - Apply manually for local dev —
psql -d litellm -c "ALTER TABLE ... ADD COLUMN IF NOT EXISTS ..."after each proxy start. Fine for dev, not for production. - Update litellm-proxy-extras — if the package is installed from PyPI, its migration directory must include the new file. Either update the package or run the migration manually until the next release ships it.