mirror of https://github.com/tiennm99/litellm.git synced 2026-07-05 19:07:38 +00:00

Files

T

yuneng-jiang 278c9babc6 [Infra] Merging RC Branch with Main (#23786 )

* fix(test): add missing mocks for test_streamable_http_mcp_handler_mock

The test was missing mocks for extract_mcp_auth_context and set_auth_context,
causing the handler to fail silently in the except block instead of reaching
session_manager.handle_request. This mirrors the fix already applied to the
sibling test_sse_mcp_handler_mock.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): route OpenAI models through chat completions in pass-through tests

The test_anthropic_messages_openai_model_streaming_cost_injection test fails
because the OpenAI Responses API returns 400 for requests routed through the
Anthropic Messages endpoint. Setting LITELLM_USE_CHAT_COMPLETIONS_URL_FOR_ANTHROPIC_MESSAGES=true
routes OpenAI models through the stable chat completions path instead.
Cost injection still works since it happens at the proxy level.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(ci): fix assemblyai custom auth and router wildcard test flakiness

1. custom_auth_basic.py: Add user_role='proxy_admin' so the custom auth
   user can access management endpoints like /key/generate. The test
   test_assemblyai_transcribe_with_non_admin_key was hidden behind an
   earlier -x failure and was never reached before.

2. test_router_utils.py: Add flaky(retries=3) and increase sleep from 1s
   to 2s for test_router_get_model_group_usage_wildcard_routes. The async
   callback needs time to write usage to cache, and 1s is insufficient on
   slower CI hardware.

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* ci: retrigger CI pipeline

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix(mypy): use LitellmUserRoles enum instead of raw string in custom_auth_basic

Fixes mypy error: Argument 'user_role' has incompatible type 'str'; expected 'LitellmUserRoles | None'

Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>

* fix: don't close HTTP/SDK clients on LLMClientCache eviction (#22926)

* fix: don't close HTTP/SDK clients on LLMClientCache eviction

Removing the _remove_key override that eagerly called aclose()/close()
on evicted clients. Evicted clients may still be held by in-flight
streaming requests; closing them causes:

  RuntimeError: Cannot send a request, as the client has been closed.

This is a regression from commit fb72979432. Clients that are no longer
referenced will be garbage-collected naturally. Explicit shutdown cleanup
happens via close_litellm_async_clients().

Fixes production crashes after the 1-hour cache TTL expires.

* test: update LLMClientCache unit tests for no-close-on-eviction behavior

Flip the assertions: evicted clients must NOT be closed. Replace
test_remove_key_closes_async_client → test_remove_key_does_not_close_async_client
and equivalents for sync/eviction paths.

Add test_remove_key_removes_plain_values for non-client cache entries.
Remove test_background_tasks_cleaned_up_after_completion (no more _background_tasks).
Remove test_remove_key_no_event_loop variant that depended on old behavior.

* test: add e2e tests for OpenAI SDK client surviving cache eviction

Add two new e2e tests using real AsyncOpenAI clients:
- test_evicted_openai_sdk_client_stays_usable: verifies size-based eviction
  doesn't close the client
- test_ttl_expired_openai_sdk_client_stays_usable: verifies TTL expiry
  eviction doesn't close the client

Both tests sleep after eviction so any create_task()-based close would
have time to run, making the regression detectable.

Also expand the module docstring to explain why the sleep is required.

* docs(AGENTS.md): add rule — never close HTTP/SDK clients on cache eviction

* docs(CLAUDE.md): add HTTP client cache safety guideline

* [Fix] Install bsdmainutils for column command in security scans

The security_scans.sh script uses `column` to format vulnerability
output, but the package wasn't installed in the CI environment.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix: handle string callback values in prometheus multiproc setup

When callbacks are configured as a plain string (e.g., `callbacks: "my_callback"`)
instead of a list, the proxy crashes on startup with:
  TypeError: can only concatenate str (not "list") to str

Normalize each callback setting to a list before concatenating.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* bump: version 1.82.2 → 1.82.3

* fix(test): update test_startup_fails_when_db_setup_fails for opt-in enforcement

The --enforce_prisma_migration_check flag is now required to trigger
sys.exit(1) on DB migration failure, after #23675 flipped the default
behavior to warn-and-continue.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(cost_calculator): use model name for per-request custom pricing when router_model_id has no pricing

When custom pricing is passed as per-request kwargs (input_cost_per_token/output_cost_per_token),
completion() registers pricing under the model name, but _select_model_name_for_cost_calc was
selecting the router deployment hash (which has no pricing data), causing response_cost to be 0.0.

Now checks whether the router_model_id entry actually has pricing before preferring it.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Cursor Agent <cursoragent@cursor.com>
Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
Co-authored-by: Ishaan Jaff <ishaanjaffer0324@gmail.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

2026-03-16 15:32:20 -07:00

12 KiB

Raw Blame History

CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

Development Commands

Installation

make install-dev - Install core development dependencies
make install-proxy-dev - Install proxy development dependencies with full feature set
make install-test-deps - Install all test dependencies

Testing

make test - Run all tests
make test-unit - Run unit tests (tests/test_litellm) with 4 parallel workers
make test-integration - Run integration tests (excludes unit tests)
pytest tests/ - Direct pytest execution

Code Quality

make lint - Run all linting (Ruff, MyPy, Black, circular imports, import safety)
make format - Apply Black code formatting
make lint-ruff - Run Ruff linting only
make lint-mypy - Run MyPy type checking only

Single Test Files

poetry run pytest tests/path/to/test_file.py -v - Run specific test file
poetry run pytest tests/path/to/test_file.py::test_function -v - Run specific test

Running Scripts

poetry run python script.py - Run Python scripts (use for non-test files)

GitHub Issue & PR Templates

When contributing to the project, use the appropriate templates:

Bug Reports (.github/ISSUE_TEMPLATE/bug_report.yml):

Describe what happened vs. what you expected
Include relevant log output
Specify your LiteLLM version

Feature Requests (.github/ISSUE_TEMPLATE/feature_request.yml):

Describe the feature clearly
Explain the motivation and use case

Pull Requests (.github/pull_request_template.md):

Add at least 1 test in tests/litellm/
Ensure make test-unit passes

Architecture Overview

LiteLLM is a unified interface for 100+ LLM providers with two main components:

Core Library (`litellm/`)

Main entry point: litellm/main.py - Contains core completion() function
Provider implementations: litellm/llms/ - Each provider has its own subdirectory
Router system: litellm/router.py + litellm/router_utils/ - Load balancing and fallback logic
Type definitions: litellm/types/ - Pydantic models and type hints
Integrations: litellm/integrations/ - Third-party observability, caching, logging
Caching: litellm/caching/ - Multiple cache backends (Redis, in-memory, S3, etc.)

Proxy Server (`litellm/proxy/`)

Main server: proxy_server.py - FastAPI application
Authentication: auth/ - API key management, JWT, OAuth2
Database: db/ - Prisma ORM with PostgreSQL/SQLite support
Management endpoints: management_endpoints/ - Admin APIs for keys, teams, models
Pass-through endpoints: pass_through_endpoints/ - Provider-specific API forwarding
Guardrails: guardrails/ - Safety and content filtering hooks
UI Dashboard: Served from _experimental/out/ (Next.js build)

Key Patterns

Provider Implementation

Providers inherit from base classes in litellm/llms/base.py
Each provider has transformation functions for input/output formatting
Support both sync and async operations
Handle streaming responses and function calling

Error Handling

Provider-specific exceptions mapped to OpenAI-compatible errors
Fallback logic handled by Router system
Comprehensive logging through litellm/_logging.py

Configuration

YAML config files for proxy server (see proxy/example_config_yaml/)
Environment variables for API keys and settings
Database schema managed via Prisma (proxy/schema.prisma)

Development Notes

Code Style

Uses Black formatter, Ruff linter, MyPy type checker
Pydantic v2 for data validation
Async/await patterns throughout
Type hints required for all public APIs
Avoid imports within methods — place all imports at the top of the file (module-level). Inline imports inside functions/methods make dependencies harder to trace and hurt readability. The only exception is avoiding circular imports where absolutely necessary.
Use dict spread for immutable copies — prefer {**original, "key": new_value} over dict(obj) + mutation. The spread produces the final dict in one step and makes intent clear.
Guard at resolution time — when resolving an optional value through a fallback chain (a or b or ""), raise immediately if the resolved result being empty is an error. Don't pass empty strings or sentinel values downstream for the callee to deal with.
Extract complex comprehensions to named helpers — a set/dict comprehension that calls into the DB or manager (e.g. "which of these server IDs are OAuth2?") belongs in a named helper function, not inline in the caller.
FastAPI parameter declarations — mark required query/form params with = Query(...) / = Form(...) explicitly when other params in the same handler are optional. Mixing str (required) with Optional[str] = None in the same signature causes silent 422s when the required param is missing.

Testing Strategy

Unit tests in tests/test_litellm/
Integration tests for each provider in tests/llm_translation/
Proxy tests in tests/proxy_unit_tests/
Load tests in tests/load_tests/
Always add tests when adding new entity types or features — if the existing test file covers other entity types, add corresponding tests for the new one
Keep monkeypatch stubs in sync with real signatures — when a function gains a new optional parameter, update every fake_* / stub_* in tests that patch it to also accept that kwarg (even as **kwargs). Stale stubs fail with unexpected keyword argument and mask real bugs.
Test all branches of name→ID resolution — when adding server/resource lookup that resolves names to UUIDs, test: (1) name resolves and UUID is allowed, (2) name resolves but UUID is not allowed, (3) name does not resolve at all. The silent-fallback path is where access-control bugs hide.

UI / Backend Consistency

When wiring a new UI entity type to an existing backend endpoint, verify the backend API contract (single value vs. array, required vs. optional params) and ensure the UI controls match — e.g., use a single-select dropdown when the backend accepts a single value, not a multi-select

MCP OAuth / OpenAPI Transport Mapping

TRANSPORT.OPENAPI is a UI-only concept. The backend only accepts "http", "sse", or "stdio". Always map it to "http" before any API call (including pre-OAuth temp-session calls).
FastAPI validation errors return detail as an array of {loc, msg, type} objects. Error extractors must handle: array (map .msg), string, nested {error: string}, and fallback.
When an MCP server already has authorization_url stored, skip OAuth discovery (_discovery_metadata) — the server URL for OpenAPI MCPs is the spec file, not the API base, and fetching it causes timeouts.
client_id should be optional in the /authorize endpoint — if the server has a stored client_id in credentials, use that. Never require callers to re-supply it.

MCP Credential Storage

OAuth credentials and BYOK credentials share the litellm_mcpusercredentials table, distinguished by a "type" field in the JSON payload ("oauth2" vs plain string).
When deleting OAuth credentials, check type before deleting to avoid accidentally deleting a BYOK credential for the same (user_id, server_id) pair.
Always pass the raw expires_at timestamp to the client — never set it to None for expired credentials. Let the frontend compute the "Expired" display state from the timestamp.
Use RecordNotFoundError (not bare except Exception) when catching "already deleted" in credential delete endpoints.

Browser Storage Safety (UI)

Never write LiteLLM access tokens or API keys to localStorage — use sessionStorage only. localStorage survives browser close and is readable by any injected script (XSS).
Shared utility functions (e.g. extractErrorMessage) belong in src/utils/ — never define them inline in hooks or duplicate them across files.

Database Migrations

Prisma handles schema migrations
Migration files auto-generated with prisma migrate dev
Always test migrations against both PostgreSQL and SQLite

Proxy database access

Do not write raw SQL for proxy DB operations. Use Prisma model methods instead of execute_raw / query_raw.
Use the generated client: prisma_client.db.<model> (e.g. litellm_tooltable, litellm_usertable) with .upsert(), .find_many(), .find_unique(), .update(), .update_many() as appropriate. This avoids schema/client drift, keeps code testable with simple mocks, and matches patterns used in spend logs and other proxy code.
No N+1 queries. Never query the DB inside a loop. Batch-fetch with {"in": ids} and distribute in-memory.
Batch writes. Use create_many/update_many/delete_many instead of individual calls (these return counts only; update_many/delete_many no-op silently on missing rows). When multiple separate writes target the same table (e.g. in batch_()), order by primary key to avoid deadlocks.
Push work to the DB. Filter, sort, group, and aggregate in SQL, not Python. Verify Prisma generates the expected SQL — e.g. prefer group_by over find_many(distinct=...) which does client-side processing.
Bound large result sets. Prisma materializes full results in memory. For results over ~10 MB, paginate with take/skip or cursor/take, always with an explicit order. Prefer cursor-based pagination (skip is O(n)). Don't paginate naturally small result sets.
Limit fetched columns on wide tables. Use select to fetch only needed fields — returns a partial object, so downstream code must not access unselected fields.
Check index coverage. For new or modified queries, check schema.prisma for a supporting index. Prefer extending an existing index (e.g. @@index([a]) → @@index([a, b])) over adding a new one, unless it's a @@unique. Only add indexes for large/frequent queries.
Keep schema files in sync. Apply schema changes to all schema.prisma copies (schema.prisma, litellm/proxy/, litellm-proxy-extras/, litellm-js/spend-logs/ for SpendLogs) with a migration under litellm-proxy-extras/litellm_proxy_extras/migrations/.

Enterprise Features

Enterprise-specific code in enterprise/ directory
Optional features enabled via environment variables
Separate licensing and authentication for enterprise features

HTTP Client Cache Safety

Never close HTTP/SDK clients on cache eviction. LLMClientCache._remove_key() must not call close()/aclose() on evicted clients — they may still be used by in-flight requests. Doing so causes RuntimeError: Cannot send a request, as the client has been closed. after the 1-hour TTL expires. Cleanup happens at shutdown via close_litellm_async_clients().

Troubleshooting: DB schema out of sync after proxy restart

litellm-proxy-extras runs prisma migrate deploy on startup using its own bundled migration files, which may lag behind schema changes in the current worktree. Symptoms: Unknown column, Invalid prisma invocation, or missing data on new fields.

Diagnose: Run \d "TableName" in psql and compare against schema.prisma — missing columns confirm the issue.

Fix options:

Create a Prisma migration (permanent) — run prisma migrate dev --name <description> in the worktree. The generated file will be picked up by prisma migrate deploy on next startup.
Apply manually for local dev — psql -d litellm -c "ALTER TABLE ... ADD COLUMN IF NOT EXISTS ..." after each proxy start. Fine for dev, not for production.
Update litellm-proxy-extras — if the package is installed from PyPI, its migration directory must include the new file. Either update the package or run the migration manually until the next release ships it.

12 KiB Raw Blame History