Files
litellm/README.md
T
Sameer Kankute 36c494fdd2 Litellm oss staging (#28161)
* fix(opentelemetry): JSON-serialize dict metadata fields for OTEL span attributes (#27451) (#27455)

Squash-merged by litellm-agent from Anai-Guo's PR.

* feat(dashscope): add embeddings and reranks(qwen3-rerank) support via OpenAI-compatible endpoint (#27508)

Squash-merged by litellm-agent from yimao's PR.

* fix(vertex_ai/gemini): raise BadRequestError when image_url or url fi… (#24550)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix(vertex_ai): raise error on mid-stream 429/error chunks instead of silently swallowing (#23711)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix: raise BadRequestError for file content blocks missing 'file' sub… (#24503)

Squash-merged by litellm-agent from krisxia0506's PR.

* Fix Gemini MIME detection for extensionless GCS URIs (#27278)

Squash-merged by litellm-agent from krisxia0506's PR.

* fix(vertex_ai/partner_models): drop unused vertexai SDK gate from count_tokens (closes #28084) (#28107)

Squash-merged by litellm-agent from voidborne-d's PR.

* feat(chart): add support for autoscaling behavior in HPA (#27990)

Squash-merged by litellm-agent from FabrizioCafolla's PR.

* feat(proxy): add blocked flag to models for pause/resume from the UI (#27927)

Squash-merged by litellm-agent from Cyberfilo's PR.

* fix: pass socket timeouts to Redis cluster clients (#27920)

Squash-merged by litellm-agent from tomdee's PR.

* Fix/cache token (#28009)

Squash-merged by litellm-agent from escon1004's PR.

* fix(deepseek): forward reasoning_content in multi-turn thinking mode conversations (#28080)

Squash-merged by litellm-agent from Divyansh8321's PR.

* fix(guardrails): return HTTP 400 instead of 500 for blocked requests (#27617)

* fix: reset org and tag budgets (#27326)

* reset org budgets

* reset tag budgets

---------

Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>

* fix(ui): omit allowed_routes from key edit save when unchanged (#27553)

* fix(ui): omit allowed_routes from key edit save when unchanged

When a team admin opens Edit Settings on a key with key_type=AI APIs and
saves without changing anything, the UI re-sends the existing allowed_routes
value, which the backend's _check_allowed_routes_caller_permission gate
rejects for non-proxy-admins (LIT-2681).

Strip allowed_routes from the patch in handleSubmit when it deep-equals the
original keyData.allowed_routes. The backend treats absence as "leave alone,"
so no-op saves now succeed for non-admins. Admins explicitly editing the
field still send the new value.

* fix(ui): order-insensitive allowed_routes diff + cover null-original case

Address Greptile review:

- Switch the "is allowed_routes unchanged" check to a Set-based comparison so
  a server-side reorder of the array doesn't register as a user edit and
  re-trigger LIT-2681.
- Add two regression tests: (1) keyData.allowed_routes is null and the form
  is untouched — patch should strip the field; (2) server returned routes in
  a different order than the user originally entered — patch should still
  recognize the value as unchanged.

* chore(ui): strip ticket refs and tighten comments in key edit fix

- Remove internal-tracker references from in-code comments
- Tighten the WHY comment in handleSubmit to two lines
- Drop redundant test-block comments — test names already describe the case

* fix(ui): annotate Set<string> generic in allowed_routes diff to fix tsc

* fix(guardrails): return HTTP 400 instead of 500 for guardrail-blocked requests

GuardrailRaisedException and BlockedPiiEntityError both lacked a
status_code attribute.  When these exceptions reached the proxy
exception handler (getattr(e, 'status_code', 500)), the fallback
defaulted to HTTP 500 — making intentional guardrail blocks
indistinguishable from server errors and causing unnecessary client
retries.

Changes:
- Add status_code=400 (keyword-only) to GuardrailRaisedException
- Add status_code=400 (keyword-only) to BlockedPiiEntityError
- Update _is_guardrail_intervention() to recognize both exceptions
  so downstream loggers record 'guardrail_intervened' instead of
  'guardrail_failed_to_respond'
- Add 6 unit tests for default/custom status codes and getattr pattern
- Strengthen existing blocked-action test with status_code assertion

Fixes #24348

---------

Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>

* fix(router/proxy): address Greptile P1+P2 review comments on PR #28161

- router: raise ServiceUnavailableError (503) instead of RouterRateLimitErrorBasic (429)
  when a specifically-addressed deployment is administratively blocked; 429 misleads
  retry-enabled clients into spinning forever against a paused model
- proxy_server: compute get_fully_blocked_model_names() once before both branches in
  model_list() instead of duplicating the call in each branch
- deepseek: upgrade silent debug log to warning when injecting placeholder
  reasoning_content so callers are clearly notified of degraded multi-turn quality
- tests: update two blocked-deployment assertions to expect ServiceUnavailableError

Co-authored-by: Cursor <cursoragent@cursor.com>

* fix: address bug detection findings (cache token order, mutable defaults)

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix: address bugs in async pass-through, anthropic cache token detection, rerank tests

- async_get_available_deployment_for_pass_through: enforce blocked check on specific deployments
- cost_calculator: detect anthropic-style usage by attribute presence (not truthiness) to avoid mixing OpenAI cached_tokens into anthropic normalization when read=0
- dashscope rerank tests: pass request to httpx.Response constructions for consistency

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix code qa

* fix(vertex_ai/gemini): strip MIME parameters from GCS contentType

GCS object metadata's contentType field can include parameters such as
'text/html; charset=utf-8'. Strip them in _apply_gemini_mime_type_aliases
so downstream get_file_extension_from_mime_type sees a bare MIME type.

Co-authored-by: Yassin Kortam <yassin@berri.ai>

* fix(vertex_ai/gemini): clarify mime-type error message string concatenation

Co-authored-by: Yassin Kortam <yassin@berri.ai>

---------

Co-authored-by: Tai An <antai12232931@outlook.com>
Co-authored-by: Vincent <yimao1231@gmail.com>
Co-authored-by: Kris Xia <xiajiayi0506@gmail.com>
Co-authored-by: d 🔹 <liusway405@gmail.com>
Co-authored-by: Fabrizio Cafolla <developer@fabriziocafolla.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Tom Denham <tom@tomdee.co.uk>
Co-authored-by: escon1004 <70471150+escon1004@users.noreply.github.com>
Co-authored-by: Divyansh Singhal <97736786+Divyansh8321@users.noreply.github.com>
Co-authored-by: robin-fiddler <robin@fiddler.ai>
Co-authored-by: Michael-RZ-Berri <michael@berri.ai>
Co-authored-by: Michael Riad Zaky <michaelr@Mac.localdomain>
Co-authored-by: ryan-crabbe-berri <ryan@berri.ai>
Co-authored-by: Krrish Dholakia <krrish+github@berri.ai>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Yassin Kortam <yassin@berri.ai>
2026-05-18 16:27:44 -07:00

28 KiB

🚅 LiteLLM

LiteLLM AI Gateway

Open Source AI Gateway for 100+ LLMs. Self-hosted. Enterprise-ready. Call any LLM in OpenAI format.

Deploy to Render Deploy on Railway

LiteLLM Proxy Server (AI Gateway) | Hosted Proxy | Enterprise Tier | Website

PyPI Version GitHub Stars Y Combinator W23 Whatsapp Discord Slack CodSpeed

Group 7154 (1)

What is LiteLLM

LiteLLM is an open source AI Gateway that gives you a single, unified interface to call 100+ LLM providers — OpenAI, Anthropic, Gemini, Bedrock, Azure, and more — using the OpenAI format.

Use it as a Python SDK for direct library integration, or deploy the AI Gateway (Proxy Server) as a centralized service for your team or organization.

Jump to LiteLLM Proxy (LLM Gateway) Docs
Jump to Supported LLM Providers


Why LiteLLM

Managing LLM calls across providers gets complicated fast — different SDKs, auth patterns, request formats, and error types for every model. LiteLLM removes that friction:

  • Unified API — one interface for 100+ LLMs, no provider-specific SDK juggling
  • Drop-in OpenAI compatibility — swap providers without rewriting your code
  • Production-ready gateway — virtual keys, spend tracking, guardrails, load balancing, and an admin dashboard out of the box
  • 8ms P95 latency at 1k RPS (benchmarks)

OSS Adopters

Stripe image Google ADK Greptile OpenHands

Netflix

OpenAI Agents SDK

Features

LLMs - Call 100+ LLMs (Python SDK + AI Gateway)

All Supported Endpoints - /chat/completions, /responses, /embeddings, /images, /audio, /batches, /rerank, /a2a, /messages and more.

Python SDK

uv add litellm
from litellm import completion
import os

os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

# OpenAI
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello!"}])

# Anthropic  
response = completion(model="anthropic/claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Hello!"}])

AI Gateway (Proxy Server)

Getting Started - E2E Tutorial - Setup virtual keys, make your first request

uv tool install 'litellm[proxy]'
litellm --model gpt-4o
import openai

client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Docs: LLM Providers

Agents - Invoke A2A Agents (Python SDK + AI Gateway)

Supported Providers - LangGraph, Vertex AI Agent Engine, Azure AI Foundry, Bedrock AgentCore, Pydantic AI

Python SDK - A2A Protocol

from litellm.a2a_protocol import A2AClient
from a2a.types import SendMessageRequest, MessageSendParams
from uuid import uuid4

client = A2AClient(base_url="http://localhost:10001")

request = SendMessageRequest(
    id=str(uuid4()),
    params=MessageSendParams(
        message={
            "role": "user",
            "parts": [{"kind": "text", "text": "Hello!"}],
            "messageId": uuid4().hex,
        }
    )
)
response = await client.send_message(request)

AI Gateway (Proxy Server)

Step 1. Add your Agent to the AI Gateway

Step 2. Call Agent via A2A SDK

from a2a.client import A2ACardResolver, A2AClient
from a2a.types import MessageSendParams, SendMessageRequest
from uuid import uuid4
import httpx

base_url = "http://localhost:4000/a2a/my-agent"  # LiteLLM proxy + agent name
headers = {"Authorization": "Bearer sk-1234"}    # LiteLLM Virtual Key

async with httpx.AsyncClient(headers=headers) as httpx_client:
    resolver = A2ACardResolver(httpx_client=httpx_client, base_url=base_url)
    agent_card = await resolver.get_agent_card()
    client = A2AClient(httpx_client=httpx_client, agent_card=agent_card)

    request = SendMessageRequest(
        id=str(uuid4()),
        params=MessageSendParams(
            message={
                "role": "user",
                "parts": [{"kind": "text", "text": "Hello!"}],
                "messageId": uuid4().hex,
            }
        )
    )
    response = await client.send_message(request)

Docs: A2A Agent Gateway

MCP Tools - Connect MCP servers to any LLM (Python SDK + AI Gateway)

Python SDK - MCP Bridge

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from litellm import experimental_mcp_client
import litellm

server_params = StdioServerParameters(command="python", args=["mcp_server.py"])

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()

        # Load MCP tools in OpenAI format
        tools = await experimental_mcp_client.load_mcp_tools(session=session, format="openai")

        # Use with any LiteLLM model
        response = await litellm.acompletion(
            model="gpt-4o",
            messages=[{"role": "user", "content": "What's 3 + 5?"}],
            tools=tools
        )

AI Gateway - MCP Gateway

Step 1. Add your MCP Server to the AI Gateway

Step 2. Call MCP tools via /chat/completions

curl -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
  -H 'Authorization: Bearer sk-1234' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Summarize the latest open PR"}],
    "tools": [{
      "type": "mcp",
      "server_url": "litellm_proxy/mcp/github",
      "server_label": "github_mcp",
      "require_approval": "never"
    }]
  }'

Use with Cursor IDE

{
  "mcpServers": {
    "LiteLLM": {
      "url": "http://localhost:4000/mcp/",
      "headers": {
        "x-litellm-api-key": "Bearer sk-1234"
      }
    }
  }
}

Docs: MCP Gateway

Supported Providers (Website Supported Models | Docs)

Provider /chat/completions /messages /responses /embeddings /image/generations /audio/transcriptions /audio/speech /moderations /batches /rerank
Abliteration (abliteration)
AI/ML API (aiml)
AI21 (ai21)
AI21 Chat (ai21_chat)
Aleph Alpha
Amazon Nova
Anthropic (anthropic)
Anthropic Text (anthropic_text)
Anyscale
AssemblyAI (assemblyai)
Auto Router (auto_router)
AWS - Bedrock (bedrock)
AWS - Sagemaker (sagemaker)
Azure (azure)
Azure AI (azure_ai)
Azure Text (azure_text)
Baseten (baseten)
Bytez (bytez)
Cerebras (cerebras)
Clarifai (clarifai)
Cloudflare AI Workers (cloudflare)
Codestral (codestral)
Cohere (cohere)
Cohere Chat (cohere_chat)
CometAPI (cometapi)
CompactifAI (compactifai)
Custom (custom)
Custom OpenAI (custom_openai)
Dashscope (dashscope)
Databricks (databricks)
DataRobot (datarobot)
Deepgram (deepgram)
DeepInfra (deepinfra)
Deepseek (deepseek)
ElevenLabs (elevenlabs)
Empower (empower)
Fal AI (fal_ai)
Featherless AI (featherless_ai)
Fireworks AI (fireworks_ai)
FriendliAI (friendliai)
Galadriel (galadriel)
GitHub Copilot (github_copilot)
GitHub Models (github)
Google - PaLM
Google - Vertex AI (vertex_ai)
Google AI Studio - Gemini (gemini)
GradientAI (gradient_ai)
Groq AI (groq)
Heroku (heroku)
Hosted VLLM (hosted_vllm)
Huggingface (huggingface)
Hyperbolic (hyperbolic)
IBM - Watsonx.ai (watsonx)
Infinity (infinity)
Jina AI (jina_ai)
Lambda AI (lambda_ai)
Lemonade (lemonade)
LiteLLM Proxy (litellm_proxy)
Llamafile (llamafile)
LM Studio (lm_studio)
Maritalk (maritalk)
Meta - Llama API (meta_llama)
Mistral AI API (mistral)
Moonshot (moonshot)
Morph (morph)
Nebius AI Studio (nebius)
NLP Cloud (nlp_cloud)
Novita AI (novita)
Nscale (nscale)
Nvidia NIM (nvidia_nim)
OCI (oci)
Ollama (ollama)
Ollama Chat (ollama_chat)
Oobabooga (oobabooga)
OpenAI (openai)
OpenAI-like (openai_like)
OpenRouter (openrouter)
OVHCloud AI Endpoints (ovhcloud)
Perplexity AI (perplexity)
Petals (petals)
Predibase (predibase)
Recraft (recraft)
Replicate (replicate)
Sagemaker Chat (sagemaker_chat)
Sambanova (sambanova)
Snowflake (snowflake)
Text Completion Codestral (text-completion-codestral)
Text Completion OpenAI (text-completion-openai)
Together AI (together_ai)
Topaz (topaz)
Triton (triton)
V0 (v0)
Vercel AI Gateway (vercel_ai_gateway)
VLLM (vllm)
Volcengine (volcengine)
Voyage AI (voyage)
WandB Inference (wandb)
Watsonx Text (watsonx_text)
xAI (xai)
Xinference (xinference)

Read the Docs


Get Started

You can use LiteLLM through either the Proxy Server or Python SDK. Both give you a unified interface to access multiple LLMs (100+ LLMs). Choose the option that best fits your needs:

LiteLLM AI Gateway LiteLLM Python SDK
Use Case Central service (LLM Gateway) to access multiple LLMs Use LiteLLM directly in your Python code
Who Uses It? Gen AI Enablement / ML Platform Teams Developers building LLM projects
Key Features Centralized API gateway with authentication and authorization, multi-tenant cost tracking and spend management per project/user, per-project customization (logging, guardrails, caching), virtual keys for secure access control, admin dashboard UI for monitoring and management Direct Python library integration in your codebase, Router with retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router, application-level load balancing and cost tracking, exception handling with OpenAI-compatible errors, observability callbacks (Lunary, MLflow, Langfuse, etc.)

Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published. More information about the release cycle here

Support for more providers. Missing a provider or LLM Platform, raise a feature request.

Run in Developer Mode

Services

  1. Setup .env file in root
  2. Run dependant services docker-compose up db prometheus

Backend

  1. (In root) create virtual environment python -m venv .venv
  2. Activate virtual environment source .venv/bin/activate
  3. Install dependencies uv sync --all-extras --group proxy-dev
  4. uv run prisma generate
  5. prisma generate
  6. Start proxy backend python litellm/proxy/proxy_cli.py

Frontend

  1. Navigate to ui/litellm-dashboard
  2. Install dependencies npm install
  3. Run npm run dev to start the dashboard

Verify Docker Image Signatures

All LiteLLM Docker images published to GHCR are signed with cosign. Every release is signed with the same key introduced in commit 0112e53.

Verify using the pinned commit hash (recommended):

A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key:

cosign verify \
  --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \
  ghcr.io/berriai/litellm:<release-tag>

Verify using a release tag (convenience):

Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules:

cosign verify \
  --key https://raw.githubusercontent.com/BerriAI/litellm/<release-tag>/cosign.pub \
  ghcr.io/berriai/litellm:<release-tag>

Replace <release-tag> with the version you are deploying (e.g. v1.83.0-stable).


Enterprise

For companies that need better security, user management and professional support

Get an Enterprise License Talk to founders

This covers:

  • Features under the LiteLLM Commercial License:
  • Feature Prioritization
  • Custom Integrations
  • Professional Support - Dedicated discord + slack
  • Custom SLAs
  • Secure access with Single Sign-On

Contributing

We welcome contributions to LiteLLM! Whether you're fixing bugs, adding features, or improving documentation, we appreciate your help.

Quick Start for Contributors

This requires uv to be installed.

git clone https://github.com/BerriAI/litellm.git
cd litellm
make install-dev    # Install development dependencies
make format         # Format your code
make lint           # Run all linting checks
make test-unit      # Run unit tests
make format-check   # Check formatting only

For detailed contributing guidelines, see CONTRIBUTING.md.

📖 Contributing to documentation? The LiteLLM docs have moved to a separate repository: BerriAI/litellm-docs. Please open doc PRs there. Docs are served at docs.litellm.ai.

Code Quality / Linting

LiteLLM follows the Google Python Style Guide.

Our automated checks include:

  • Black for code formatting
  • Ruff for linting and code quality
  • MyPy for type checking
  • Circular import detection
  • Import safety checks

All these checks must pass before your PR can be merged.

Support / talk with founders

Contributors