mirror of
https://github.com/tiennm99/litellm.git
synced 2026-06-29 07:13:23 +00:00
cfd0e2cf99
* Update AGENTS.md with additional Cursor Cloud setup notes - Add note about openapi-core dependency needed for OpenAPI compliance tests - Add note about poetry lock fallback when lock file is out of sync Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Sync lock files with current dependency specs - poetry.lock: regenerated to match pyproject.toml (litellm-proxy-extras 0.4.50 -> 0.4.51) - package-lock.json: updated from npm install Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Polish MCP Servers UI for enterprise-grade look and feel 10 improvements to the MCP Servers table and related components: 1. Remove debug console.logs from mcp_servers.tsx 2. Fix health status icons: distinct ✓/✗/? per state instead of identical dots 3. Health status badges: proper pill styling with rounded-full and borders 4. Health loading state: subtle pulsing dot instead of raw SVG spinner 5. Transport column: color-coded badges (HTTP=blue, SSE=purple, STDIO=amber, OPENAPI=teal) 6. Auth type column: color-coded badges (oauth2=indigo, bearer_token=sky, api_key=emerald) 7. Server ID chip: rounded corners, border, and transition effect 8. Filter bar: lighter border, cleaner labels, vertical divider between filters 9. Network Access: pill badges with colored dots (Public/Internal) 10. Date columns: shorter headers, dash for missing values, tooltip with full datetime Also: - Improved delete modal: cleaner layout, neutral background instead of red - Access Groups column: shows first group with +N count instead of truncated text - Empty state message includes CTA guidance - Updated test to match renamed filter label Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Polish MCP server detail views and table refinements (round 2) 10 more enterprise polish improvements: 1. Overview cards: use color-coded badges for Transport and Auth Type values 2. Overview cards: fix 'Host Url' typo -> 'Host URL', uppercase card labels 3. Settings tab: show em-dash placeholder for empty/missing values 4. Settings tab: use consistent Transport/Auth/Network badge styling matching table 5. Settings tab: definition-list layout with label/value grid columns 6. Server detail header: show server name prominently with alias as badge 7. Server detail header: show description below name, smaller server ID 8. Actions column: improved hover states with background color transitions 9. Credential column: pill badge for Connected state, shadow on Connect button 10. Table header: server count badge next to title, CTA button moved right Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> * Revert colorful transport/auth badges to neutral gray Color should only carry semantic meaning. Transport type (HTTP/SSE) and auth type (oauth2/bearer_token) are informational labels, not status indicators, so they use a uniform gray badge. Color remains on: - Health status: green (healthy), red (unhealthy) - Network access: green (public), orange (internal) - Credential: green (connected) Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com> --------- Co-authored-by: Cursor Agent <cursoragent@cursor.com> Co-authored-by: Ishaan Jaff <ishaan-jaff@users.noreply.github.com>
274 lines
13 KiB
Markdown
274 lines
13 KiB
Markdown
# INSTRUCTIONS FOR LITELLM
|
|
|
|
This document provides comprehensive instructions for AI agents working in the LiteLLM repository.
|
|
|
|
## OVERVIEW
|
|
|
|
LiteLLM is a unified interface for 100+ LLMs that:
|
|
- Translates inputs to provider-specific completion, embedding, and image generation endpoints
|
|
- Provides consistent OpenAI-format output across all providers
|
|
- Includes retry/fallback logic across multiple deployments (Router)
|
|
- Offers a proxy server (LLM Gateway) with budgets, rate limits, and authentication
|
|
- Supports advanced features like function calling, streaming, caching, and observability
|
|
|
|
## REPOSITORY STRUCTURE
|
|
|
|
### Core Components
|
|
- `litellm/` - Main library code
|
|
- `llms/` - Provider-specific implementations (OpenAI, Anthropic, Azure, etc.)
|
|
- `proxy/` - Proxy server implementation (LLM Gateway)
|
|
- `router_utils/` - Load balancing and fallback logic
|
|
- `types/` - Type definitions and schemas
|
|
- `integrations/` - Third-party integrations (observability, caching, etc.)
|
|
|
|
### Key Directories
|
|
- `tests/` - Comprehensive test suites
|
|
- `docs/my-website/` - Documentation website
|
|
- `ui/litellm-dashboard/` - Admin dashboard UI
|
|
- `enterprise/` - Enterprise-specific features
|
|
|
|
## DEVELOPMENT GUIDELINES
|
|
|
|
### MAKING CODE CHANGES
|
|
|
|
1. **Provider Implementations**: When adding/modifying LLM providers:
|
|
- Follow existing patterns in `litellm/llms/{provider}/`
|
|
- Implement proper transformation classes that inherit from `BaseConfig`
|
|
- Support both sync and async operations
|
|
- Handle streaming responses appropriately
|
|
- Include proper error handling with provider-specific exceptions
|
|
|
|
2. **Type Safety**:
|
|
- Use proper type hints throughout
|
|
- Update type definitions in `litellm/types/`
|
|
- Ensure compatibility with both Pydantic v1 and v2
|
|
|
|
3. **Testing**:
|
|
- Add tests in appropriate `tests/` subdirectories
|
|
- Include both unit tests and integration tests
|
|
- Test provider-specific functionality thoroughly
|
|
- Consider adding load tests for performance-critical changes
|
|
|
|
### MAKING CODE CHANGES FOR THE UI (IGNORE FOR BACKEND)
|
|
|
|
1. **Tremor is DEPRECATED, do not use Tremor components in new features/changes**
|
|
- The only exception is the Tremor Table component and its required Tremor Table sub components.
|
|
|
|
2. **Use Common Components as much as possible**:
|
|
- These are usually defined in the `common_components` directory
|
|
- Use these components as much as possible and avoid building new components unless needed
|
|
|
|
3. **Testing**:
|
|
- The codebase uses **Vitest** and **React Testing Library**
|
|
- **Query Priority Order**: Use query methods in this order: `getByRole`, `getByLabelText`, `getByPlaceholderText`, `getByText`, `getByTestId`
|
|
- **Always use `screen`** instead of destructuring from `render()` (e.g., use `screen.getByText()` not `getByText`)
|
|
- **Wrap user interactions in `act()`**: Always wrap `fireEvent` calls with `act()` to ensure React state updates are properly handled
|
|
- **Use `query` methods for absence checks**: Use `queryBy*` methods (not `getBy*`) when expecting an element to NOT be present
|
|
- **Test names must start with "should"**: All test names should follow the pattern `it("should ...")`
|
|
- **Mock external dependencies**: Check `setupTests.ts` for global mocks and mock child components/networking calls as needed
|
|
- **Structure tests properly**:
|
|
- First test should verify the component renders successfully
|
|
- Subsequent tests should focus on functionality and user interactions
|
|
- Use `waitFor` for async operations that aren't already awaited
|
|
- **Avoid using `querySelector`**: Prefer React Testing Library queries over direct DOM manipulation
|
|
|
|
### IMPORTANT PATTERNS
|
|
|
|
1. **Function/Tool Calling**:
|
|
- LiteLLM standardizes tool calling across providers
|
|
- OpenAI format is the standard, with transformations for other providers
|
|
- See `litellm/llms/anthropic/chat/transformation.py` for complex tool handling
|
|
|
|
2. **Streaming**:
|
|
- All providers should support streaming where possible
|
|
- Use consistent chunk formatting across providers
|
|
- Handle both sync and async streaming
|
|
|
|
3. **Error Handling**:
|
|
- Use provider-specific exception classes
|
|
- Maintain consistent error formats across providers
|
|
- Include proper retry logic and fallback mechanisms
|
|
|
|
4. **Configuration**:
|
|
- Support both environment variables and programmatic configuration
|
|
- Use `BaseConfig` classes for provider configurations
|
|
- Allow dynamic parameter passing
|
|
|
|
## PROXY SERVER (LLM GATEWAY)
|
|
|
|
The proxy server is a critical component that provides:
|
|
- Authentication and authorization
|
|
- Rate limiting and budget management
|
|
- Load balancing across multiple models/deployments
|
|
- Observability and logging
|
|
- Admin dashboard UI
|
|
- Enterprise features
|
|
|
|
Key files:
|
|
- `litellm/proxy/proxy_server.py` - Main server implementation
|
|
- `litellm/proxy/auth/` - Authentication logic
|
|
- `litellm/proxy/management_endpoints/` - Admin API endpoints
|
|
|
|
**Database (proxy)**: Use Prisma model methods (`prisma_client.db.<model>.upsert`, `.find_many`, `.find_unique`, etc.), not raw SQL (`execute_raw`/`query_raw`). See COMMON PITFALLS for details.
|
|
|
|
## MCP (MODEL CONTEXT PROTOCOL) SUPPORT
|
|
|
|
LiteLLM supports MCP for agent workflows:
|
|
- MCP server integration for tool calling
|
|
- Transformation between OpenAI and MCP tool formats
|
|
- Support for external MCP servers (Zapier, Jira, Linear, etc.)
|
|
- See `litellm/experimental_mcp_client/` and `litellm/proxy/_experimental/mcp_server/`
|
|
|
|
## RUNNING SCRIPTS
|
|
|
|
Use `poetry run python script.py` to run Python scripts in the project environment (for non-test files).
|
|
|
|
## GITHUB TEMPLATES
|
|
|
|
When opening issues or pull requests, follow these templates:
|
|
|
|
### Bug Reports (`.github/ISSUE_TEMPLATE/bug_report.yml`)
|
|
- Describe what happened vs. expected behavior
|
|
- Include relevant log output
|
|
- Specify LiteLLM version
|
|
- Indicate if you're part of an ML Ops team (helps with prioritization)
|
|
|
|
### Feature Requests (`.github/ISSUE_TEMPLATE/feature_request.yml`)
|
|
- Clearly describe the feature
|
|
- Explain motivation and use case with concrete examples
|
|
|
|
### Pull Requests (`.github/pull_request_template.md`)
|
|
- Add at least 1 test in `tests/litellm/`
|
|
- Ensure `make test-unit` passes
|
|
|
|
|
|
## TESTING CONSIDERATIONS
|
|
|
|
1. **Provider Tests**: Test against real provider APIs when possible
|
|
2. **Proxy Tests**: Include authentication, rate limiting, and routing tests
|
|
3. **Performance Tests**: Load testing for high-throughput scenarios
|
|
4. **Integration Tests**: End-to-end workflows including tool calling
|
|
|
|
## DOCUMENTATION
|
|
|
|
- Keep documentation in sync with code changes
|
|
- Update provider documentation when adding new providers
|
|
- Include code examples for new features
|
|
- Update changelog and release notes
|
|
|
|
## SECURITY CONSIDERATIONS
|
|
|
|
- Handle API keys securely
|
|
- Validate all inputs, especially for proxy endpoints
|
|
- Consider rate limiting and abuse prevention
|
|
- Follow security best practices for authentication
|
|
|
|
## ENTERPRISE FEATURES
|
|
|
|
- Some features are enterprise-only
|
|
- Check `enterprise/` directory for enterprise-specific code
|
|
- Maintain compatibility between open-source and enterprise versions
|
|
|
|
## COMMON PITFALLS TO AVOID
|
|
|
|
1. **Breaking Changes**: LiteLLM has many users - avoid breaking existing APIs
|
|
2. **Provider Specifics**: Each provider has unique quirks - handle them properly
|
|
3. **Rate Limits**: Respect provider rate limits in tests
|
|
4. **Memory Usage**: Be mindful of memory usage in streaming scenarios
|
|
5. **Dependencies**: Keep dependencies minimal and well-justified
|
|
6. **UI/Backend Contract Mismatch**: When adding a new entity type to the UI, always check whether the backend endpoint accepts a single value or an array. Match the UI control accordingly (single-select vs. multi-select) to avoid silently dropping user selections
|
|
7. **Missing Tests for New Entity Types**: When adding a new entity type (e.g., in `EntityUsage`, `UsageViewSelect`), always add corresponding tests in the existing test files and update any icon/component mocks
|
|
8. **Raw SQL in proxy DB code**: Do not use `execute_raw` or `query_raw` for proxy database access. Use Prisma model methods (e.g. `prisma_client.db.litellm_tooltable.upsert()`, `.find_many()`, `.find_unique()`) so behavior stays consistent with the schema, the client stays mockable in tests, and you avoid the pitfalls of hand-written SQL (parameter ordering, type casting, schema drift)
|
|
|
|
8. **Do not hardcode model-specific flags**: Put model-specific capability flags in `model_prices_and_context_window.json` and read them via `get_model_info` (or existing helpers like `supports_reasoning`). This prevents users from needing to upgrade LiteLLM each time a new model supports a feature.
|
|
|
|
**Example of BAD** (hardcoded model checks):
|
|
|
|
```python
|
|
@staticmethod
|
|
def _is_effort_supported_model(model: str) -> bool:
|
|
"""Check if the model supports the output_config.effort parameter..."""
|
|
model_lower = model.lower()
|
|
if AnthropicConfig._is_claude_4_6_model(model):
|
|
return True
|
|
return any(
|
|
v in model_lower for v in ("opus-4-5", "opus_4_5", "opus-4.5", "opus_4.5")
|
|
)
|
|
```
|
|
|
|
**Example of GOOD** (config-driven or helper that reads from config):
|
|
|
|
```python
|
|
if (
|
|
"claude-3-7-sonnet" in model
|
|
or AnthropicConfig._is_claude_4_6_model(model)
|
|
or supports_reasoning(
|
|
model=model,
|
|
custom_llm_provider=self.custom_llm_provider,
|
|
)
|
|
):
|
|
...
|
|
```
|
|
|
|
Using helpers like `supports_reasoning` (which read from `model_prices_and_context_window.json` / `get_model_info`) allows future model updates to "just work" without code changes.
|
|
|
|
9. **Never close HTTP/SDK clients on cache eviction**: Do not add `close()`, `aclose()`, or `create_task(close_fn())` inside `LLMClientCache._remove_key()` or any cache eviction path. Evicted clients may still be held by in-flight requests; closing them causes `RuntimeError: Cannot send a request, as the client has been closed.` in production after the cache TTL (1 hour) expires. Connection cleanup is handled at shutdown by `close_litellm_async_clients()`. See PR #22247 for the full incident history.
|
|
|
|
## HELPFUL RESOURCES
|
|
|
|
- Main documentation: https://docs.litellm.ai/
|
|
- Provider-specific docs in `docs/my-website/docs/providers/`
|
|
- Admin UI for testing proxy features
|
|
|
|
## WHEN IN DOUBT
|
|
|
|
- Follow existing patterns in the codebase
|
|
- Check similar provider implementations
|
|
- Ensure comprehensive test coverage
|
|
- Update documentation appropriately
|
|
- Consider backward compatibility impact
|
|
|
|
## Cursor Cloud specific instructions
|
|
|
|
### Environment
|
|
|
|
- Poetry is installed in `~/.local/bin`; the update script ensures it is on `PATH`.
|
|
- Python 3.12, Node 22 are pre-installed.
|
|
- The virtual environment lives under `~/.cache/pypoetry/virtualenvs/`.
|
|
|
|
### Running the proxy server
|
|
|
|
Start the proxy with a config file:
|
|
|
|
```bash
|
|
poetry run litellm --config dev_config.yaml --port 4000
|
|
```
|
|
|
|
The proxy takes ~15-20 seconds to fully start (it runs Prisma migrations on boot). Wait for `/health` to return before sending requests. Without a PostgreSQL `DATABASE_URL`, the proxy connects to a default Neon dev database embedded in the `litellm-proxy-extras` package.
|
|
|
|
### Running tests
|
|
|
|
See `CLAUDE.md` and the `Makefile` for standard commands. Key notes:
|
|
|
|
- `psycopg-binary` must be installed (`poetry run pip install psycopg-binary`) because the pytest-postgresql plugin requires it and the lock file only includes `psycopg` (no binary).
|
|
- `openapi-core` must be installed (`poetry run pip install openapi-core`) for the OpenAPI compliance tests in `tests/test_litellm/interactions/`.
|
|
- The `--timeout` pytest flag is NOT available; don't pass it.
|
|
- Unit tests: `poetry run pytest tests/test_litellm/ -x -vv -n 4`
|
|
- Black `--check` may report pre-existing formatting issues; this does not block test runs.
|
|
- If `poetry install` fails with "pyproject.toml changed significantly since poetry.lock was last generated", run `poetry lock` first to regenerate the lock file.
|
|
|
|
### Lint
|
|
|
|
```bash
|
|
cd litellm && poetry run ruff check .
|
|
```
|
|
|
|
Ruff is the primary fast linter. For the full lint suite (including mypy, black, circular imports), run `make lint` per `CLAUDE.md`.
|
|
|
|
### UI Dashboard development
|
|
|
|
- The UI is at `ui/litellm-dashboard/`. Run `npm run dev` from that directory for the Next.js dev server on port 3000.
|
|
- The proxy at port 4000 serves a **pre-built** static UI from `litellm/proxy/_experimental/out/`. After making UI code changes, you must run `npm run build` in the dashboard directory and copy the output: `cp -r ui/litellm-dashboard/out/* litellm/proxy/_experimental/out/` for the proxy to serve the updated UI.
|
|
- SVGs used as provider logos (loaded via `<img>` tags) must NOT use `fill="currentColor"` — replace with an explicit color like `#000000` or use the `-color` variant from lobehub icons, since CSS color inheritance does not work inside `<img>` elements.
|
|
- Provider logos live in `ui/litellm-dashboard/public/assets/logos/` (source) and `litellm/proxy/_experimental/out/assets/logos/` (pre-built). Both locations must have the file for it to work in dev and proxy-served modes.
|
|
- UI Vitest tests: `cd ui/litellm-dashboard && npx vitest run` |