Files
litellm/README.md
T
Sameer Kankute d671a09c20 Litellm oss staging 050626 (#29774)
* Mark xAI models retiring on 2026-05-15 (#28788)

Per https://docs.x.ai/developers/migration/may-15-retirement, xAI is
retiring the following slugs on 2026-05-15 (auto-redirect to grok-4.3
with various reasoning efforts; callers continuing to use the old slugs
will be billed at grok-4.3 pricing):

  grok-4-1-fast-reasoning{,-latest}      -> grok-4.3 (low effort)
  grok-4-1-fast-non-reasoning{,-latest}  -> grok-4.3 (none)
  grok-4-fast-reasoning                  -> grok-4.3 (low effort)
  grok-4-fast-non-reasoning              -> grok-4.3 (none)
  grok-4-0709                            -> grok-4.3 (low effort)
  grok-code-fast-1{,-0825}               -> grok-build-0.1
  grok-3                                 -> grok-4.3 (none)

Only the direct xai/ slugs are tagged; third-party hosts (azure_ai,
oci, vercel_ai_gateway, perplexity/xai) run their own schedules. The
grok-3 retirement list explicitly names only the base grok-3 slug — the
-mini / -fast / -beta / -latest variants are not listed, so they remain
untouched.

* feat(moonshot): advertise json_schema response support on live models (#29683)

litellm.responses() already routes Moonshot through the responses->chat-completions
bridge, and Moonshot honors response_format json_schema on chat completions. The
cost-map entries left supports_response_schema unset, so discovery layers that gate
on that flag dropped Moonshot from structured-output / responses listings even though
the capability works end to end.

Set supports_response_schema on the nine models currently live on api.moonshot.ai:
kimi-k2.5, kimi-k2.6, the moonshot-v1 8k/32k/128k text and vision-preview variants,
and moonshot-v1-auto. Verified against the live API that each honors json_schema and
that litellm.responses() returns schema-valid structured output through the bridge.

* chore(moonshot): mark models retired from api.moonshot.ai as deprecated (#29685)

Thirteen Moonshot/Kimi models in the cost map no longer resolve on
api.moonshot.ai (all return 404). Stamp each with its deprecation_date from
platform.kimi.ai/docs/models rather than deleting the entries, so historical
cost calculation keeps resolving the names while tooling can surface the
retirement.

Dates: kimi-thinking-preview 2025-11-11; kimi-latest and its 8k/32k/128k context
variants 2026-01-28; the kimi-k2 preview/turbo/thinking series 2026-05-25; the
moonshot-v1 -0430 snapshots use their own 2024-04-30 snapshot date (Moonshot
publishes no discontinuation date for them).

* fix(moonshot): drop temperature for reasoning models (kimi-k2.5/k2.6) (#29687)

Kimi reasoning models reject every temperature except 1; a request with
temperature=0.2 returns "invalid temperature: only 1 is allowed for this model".
litellm only clamped temperature into [0.3, 1], so any value below 1 still 400'd.

Drop the temperature param entirely for reasoning models (gated on
supports_reasoning, the same signal transform_request already uses) so the model
default is used; the non-reasoning moonshot-v1 models keep the existing clamp.

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(mcp): add per-server timeout configuration (#29672)

* feat(mcp): add per-server timeout configuration

* fix(mcp): address timeout field review comments

- use is not None guard instead of or for 0.0 edge case
- copy timeout in both LiteLLM_MCPServerTable constructions (health check path + _build_mcp_server_table)
- add timeout Float? column to all three schema.prisma files
- extend round-trip test to cover _build_mcp_server_table direction
- add test for zero timeout not treated as falsy

* fix(mcp): forward timeout in _build_temporary_mcp_server_record

* fix(mcp): return 504 instead of 500 when per-server timeout fires

* test(mcp): add 504 timeout regression test; fix black formatting

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7 (#28567)

* fix(thinking): handle None thinking param in is_thinking_enabled (#28598)

Squash-merged by litellm-agent from Terrajlz's PR.

* feat(helm): support tpl rendering in podAnnotations (#28609)

Squash-merged by litellm-agent from devauxbr's PR.

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505) (#28575)

* Forward custom_llm_provider through the Responses API bridge (Fixes #28505)

When a Chat Completions request to a GPT-5.4+ model contains both
`tools` and `reasoning_effort`, `completion()` auto-routes through
`responses_api_bridge`. The bridge handler called
`litellm.responses()` / `litellm.aresponses()` without forwarding the
already-resolved `custom_llm_provider`, so the downstream call
re-invoked `get_llm_provider()` with `custom_llm_provider=None` and
stripped a second provider prefix from a `provider/provider/model`
deployment string.

For a deployment configured as `openai/openai/openai/gpt-5.5`,
the bridge flow sent `openai/gpt-5.5` to the upstream API instead of
the correct `openai/openai/gpt-5.5`. Upstream APIs that enforce
model-name allow-lists rejected this as `key_model_access_denied`.

Fix: pass the locally-resolved `custom_llm_provider` into both the
sync `responses()` and async `aresponses()` calls so the downstream
`_resolve_model_provider_for_responses` sees an explicit provider
and skips the second prefix-strip.

New regression test
`tests/test_litellm/completion_extras/test_responses_bridge_provider_propagation.py`
pins both call sites: each must forward `custom_llm_provider`.

* fix(28505): set custom_llm_provider on request_data instead of as duplicate kwarg

Greptile flagged that the previous patch passed custom_llm_provider as an
explicit kwarg to responses()/aresponses() while request_data already
carried it via the spread of sanitized_litellm_params, which would raise
TypeError: got multiple values for keyword argument on every real bridge
call.

Switches to assigning request_data['custom_llm_provider'] before the call
so the resolved provider wins over whatever sanitized_litellm_params spread
in, without duplicating the kwarg.

Updates the regression test to seed request_data with a sentinel
custom_llm_provider so it actually exercises the overwrite path (the
previous test mocked transform_request with a minimal dict and never hit
the conflict).

* chore: trigger shin-agent re-eval on retargeted staging base

* chore: trigger shin-agent re-eval against updated Greptile state

* Add jp. Bedrock cross-region inference profile for claude-opus-4-7

AWS Bedrock documents jp.anthropic.claude-opus-4-7 alongside the
existing us./eu./au./global. profiles for Claude Opus 4.7
(ap-northeast-1 Tokyo / ap-northeast-3 Osaka), but the entry is
missing from model_prices_and_context_window.json. Tokyo-region
users currently get an "unknown model" error when routing through
the JP geo profile.

Adds the entry to both the canonical file and the bundled backup,
mirroring the recent pattern for sonnet-4-6 (#27831). Pricing matches
the other regional profiles (10% premium over base/global).

Regression test pins all six documented profiles (base, global, us, eu,
au, jp) and asserts pricing parity between jp. and au. variants.

Source: https://docs.aws.amazon.com/bedrock/latest/userguide/model-card-anthropic-claude-opus-4-7.html

---------

Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Sameer Kankute <sameer@berri.ai>

* feat(soniox): add soniox audio transcription integration (#29508)

* feat(openmeter): add OPENMETER_TRUST_REQUEST_USER to prevent forged attribution (#29650)

The OpenMeter callback resolves the CloudEvent subject from kwargs["user"]
first, then falls back to the key-bound user_api_key_user_id. For
multi-tenant proxy deployments, a client can set `"user": "..."` in the
request body and cause their usage to be attributed to that arbitrary
string — a billing-attribution forgery risk.

Adds OPENMETER_TRUST_REQUEST_USER env var (default "true" for backward
compatibility). When set to "false", the request-supplied `user` field is
ignored and the subject is resolved solely from user_api_key_user_id.

Matches the existing env-var-driven config pattern in this file
(OPENMETER_API_KEY, OPENMETER_API_ENDPOINT, OPENMETER_EVENT_TYPE).

* feat(search): add you_com as a search provider (#28370)

* feat(search): add you_com as a search provider

Registers You.com Search API as a first-class `search_provider` in the
`search_tools` registry, alongside Tavily, Exa, Perplexity, etc.

- New adapter: litellm/llms/you_com/search/transformation.py
  - POSTs to https://ydc-index.io/v1/search
  - Auth: X-API-Key from YOUCOM_API_KEY (or explicit api_key)
  - Maps Perplexity unified spec: max_results -> count,
    search_domain_filter -> include_domains, country -> country
  - Flattens results.web + results.news into a single SearchResult list;
    snippet prefers snippets[0], falls back to description; page_age -> date
- Registry: SearchProviders.YOU_COM in litellm/types/utils.py and wired
  into ProviderConfigManager.get_provider_search_config()
- Pricing entry: model_prices_and_context_window.json (placeholder $0.0;
  happy to adjust to maintainers' preferred public number)
- Docs: example router config snippet and example proxy yaml updated
- Tests: tests/search_tests/test_you_com_search.py - 5 mocked tests
  (payload shape, domain filter mapping, snippet fallback, news flattening,
  missing-api-key error)

Refs upstream expansion signal: #15942

* review fixups: normalize api_base, lowercase country, scope env-var to test

Addresses Greptile inline review comments on #28370:

- get_complete_url: strip trailing slashes from api_base *before* the
  endswith("/v1/search") check, so a custom base like ".../v1/search/"
  doesn't become ".../v1/search/v1/search".
- transform_search_request: .lower() country before sending, matching
  Tavily's convention so callers using the unified spec form ("US") get
  consistent behavior across providers.
- Tests: replace direct os.environ writes with an autouse monkeypatch
  fixture so YOUCOM_API_KEY is set per-test and removed afterwards.
  The missing-key test now uses monkeypatch.delenv. New test asserts the
  trailing-slash normalization above.

Reverts the ARCHITECTURE.md / example yaml edits per the reviewer note
that documentation changes belong in the litellm-docs repo.

* support keyless free tier (api.you.com/v1/agents/search) as default

You.com offers an IP-throttled keyless endpoint that returns the same
response shape as the keyed one (~100 queries/day, no signup). This is a
significant onboarding lever - mirrors the keyless DuckDuckGo/SearXNG
providers already in the search_tools registry.

Behavior:
- YOUCOM_API_KEY set        -> keyed:  POST https://ydc-index.io/v1/search
                                       (X-API-Key header)
- no key                    -> free:   POST https://api.you.com/v1/agents/search
                                       (no auth)
- YOUCOM_API_BASE override  -> honored as-is

Tests:
- New: test_you_com_search_keyless_free_tier - asserts URL + absence of
  X-API-Key when no key is configured.
- New: test_you_com_search_validate_environment_keyless - asserts the
  config no longer raises when the key is absent.
- Removed: test_you_com_search_raises_without_api_key (the precondition
  no longer holds).
- Existing payload/domain-filter/etc tests still cover keyed mode via
  the autouse YOUCOM_API_KEY fixture.

Verified both endpoints accept POST + return identical JSON shape:
  results.web[] / results.news[] with title, url, snippets, description,
  page_age.

* register you_com in provider_endpoints_support.json

Adding `litellm/llms/you_com/` requires a corresponding entry in
provider_endpoints_support.json or the
code-quality/check_provider_folders_documented CI check fails.

Follows the compact tavily/serper pattern - endpoints: { search: true }.
Local run of the check now reports "All 114 provider folders are documented".

* move tests under tests/test_litellm/llms/ so CI exercises them

The litellm CI workflows scope unit tests to `tests/test_litellm/...`
(see test-unit-llm-providers.yml: `tests/test_litellm/llms` path), so
tests living under `tests/search_tests/` are never run in CI - which is
why codecov reports 0% patch coverage for the new adapter even though
the unit tests exist and pass locally.

Move test_you_com_search.py into `tests/test_litellm/llms/you_com/` so
the test-unit-llm-providers job picks it up. 7/7 tests still pass at
the new location.

(Sibling search-only providers - tavily, exa_ai, brave, etc. - still
live only in `tests/search_tests/` and would benefit from the same
move, but that is out of scope for this PR.)

* fix(you_com): pin Accept-Encoding: identity to dodge keyless gzip bug

The keyless free-tier endpoint (api.you.com/v1/agents/search) advertises
Content-Encoding: gzip but returns a body that httpx's decoder rejects
with `zlib.error: Error -3 while decompressing data: incorrect header
check`, surfacing as litellm.APIConnectionError in user code. curl works
because it doesn't request compression by default.

Pin Accept-Encoding: identity in validate_environment so the upstream
server skips compression entirely. Harmless on the keyed endpoint
(ydc-index.io/v1/search) which negotiates content-encoding correctly.

The header uses setdefault so a caller-supplied Accept-Encoding still
takes precedence. (Server-side bug has been flagged to the You.com team
separately - once fixed there, this workaround can be removed.)

New unit test: test_you_com_search_pins_identity_accept_encoding.

---------

Co-authored-by: Sameer Kankute <sameer@berri.ai>

* docs: fix README typo (#29419)

Correct clear spelling mistakes in documentation without changing behavior.

Confidence: high
Scope-risk: narrow
Tested: git diff --check; uvx codespell on changed files
Not-tested: Full docs build not run; text-only changes

* Fix(langfuse): pass httpx_client to Langfuse in langfuse_prompt_management to respect SSL_VERIFY (#29480)

* fix(langfuse): pass ssl_verify to Langfuse httpx client

* fix_langfuse_

* add unit tests

* addressed comments

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* feat(models): add minimax/MiniMax-M3 to model cost map (#29412)

Add MiniMax's new flagship MiniMax-M3 to the native minimax provider:
512K context, 128K max output, native multimodal (supports_vision),
reasoning, prompt caching. Pricing (USD/M tokens): input 0.6 / output
2.4 / cache read 0.12. M3 has no active prompt-cache-write tier, so
cache_creation_input_token_cost is omitted.

Updated both the root model_prices_and_context_window.json (remote
source) and the bundled litellm/model_prices_and_context_window_backup.json
(local fallback), keeping them in sync.

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log (#29394)

* fix(logging): handle ResponseCompletedEvent in anthropic_messages streaming spend log

* fix(logging): extend terminal event handling to ResponseIncompleteEvent and ResponseFailedEvent; fix return type annotation

* feat(provider): Add Neosantara provider as OpenAI Compatible (#29646)

* Add Neosantara provider

* Register Neosantara provider enum

* Address Neosantara provider review feedback

* Add Neosantara packaged endpoint support

---------

Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>

* fix: address greptile and veria review feedback

- langfuse: guard httpx_client injection behind version check (>= 2.7.3)
- soniox: propagate audio_transcription_duration in _hidden_params for spend tracking
- soniox: give SONIOX_API_BASE env var priority over caller-supplied api_base
- mcp: replace CancelledError catch with asyncio.wait_for + TimeoutError

* chore(mcp): add migration for per-server timeout column

* fix(test): add tool_use_system_prompt_tokens to model prices schema validator

* fix: mcp timeout test uses real asyncio.wait_for timeout; you_com get_complete_url respects resolved api_key

* fix: forward resolved api_key into you_com endpoint selection and apply timeout to soniox polling GETs

The search flow resolves api_key in validate_environment but never passed it
into get_complete_url, so a programmatic api_key (with no YOUCOM_API_KEY in the
env) set the X-API-Key header yet still selected the keyless free-tier endpoint.
Forward api_key through both the search entrypoint and the http handler so the
keyed endpoint is chosen.

HTTPHandler.get/AsyncHTTPHandler.get had no timeout parameter, so the Soniox
poll and transcript-fetch GETs silently used the client global default instead
of the caller timeout. Add a per-request timeout to get() and forward the
configured timeout from the Soniox handler.

* fix(soniox): price stt-async-v4 per second so transcriptions are billed

The handler stores audio_transcription_duration in _hidden_params, but the
model carried only token cost fields and the response has no token usage, so
the transcription cost path fell through to cost_per_second and returned $0.
An authenticated caller could transcribe Soniox audio without decrementing
their budget. Switch the entry to output_cost_per_second at Soniox's published
$0.10/hour async rate so the stored duration produces a real charge.

* fix(langfuse): use a dedicated httpx client for the SDK injection

The httpx_client handed to the Langfuse SDK came from _get_httpx_client(),
which returns LiteLLM's globally cached HTTPHandler. If Langfuse closed that
client on teardown it would invalidate the shared client used by every other
LiteLLM HTTP call. Build a dedicated httpx.Client instead, still resolving SSL
verification and client certificate from LiteLLM's configuration.

* fix(soniox): prefer caller-supplied api_base over SONIOX_API_BASE env var

* fix(cohere): support max_completion_tokens on cohere v2 chat (default route) (#29779)

* fix(cohere): support max_completion_tokens on cohere v2 chat

The default cohere_chat route resolves to CohereV2ChatConfig, which did not
list or map max_completion_tokens, so get_optional_params raised
UnsupportedParamsError for the standard OpenAI parameter (the modern
replacement for the deprecated max_tokens). The v1 config already maps it to
cohere's max_tokens; mirror that in v2 and add v2 regression tests.

* fix(cohere): make max_completion_tokens take precedence over max_tokens on v2

When both max_tokens and max_completion_tokens are supplied, prefer
max_completion_tokens explicitly rather than relying on dict iteration order,
and cover both orderings with a regression test.

---------

Co-authored-by: Daniel Yudelevich <4537920+yudelevi@users.noreply.github.com>
Co-authored-by: hectorc98 <hector.chamorroalvarez@adyen.com>
Co-authored-by: Filippo Menghi <113345637+Cyberfilo@users.noreply.github.com>
Co-authored-by: Terrajlz <info@jouleselectrictech.com>
Co-authored-by: Bruno Devaux <devaux.br@gmail.com>
Co-authored-by: Dan Lemon <dan@danlemon.com>
Co-authored-by: Saswat <saswatds@users.noreply.github.com>
Co-authored-by: Brian Sparker <brainsparker@users.noreply.github.com>
Co-authored-by: Zhao73 <156770117+Zhao73@users.noreply.github.com>
Co-authored-by: Urain Ahmad Shah <60431964+urainshah@users.noreply.github.com>
Co-authored-by: shin-berri <shin-laptop@berri.ai>
Co-authored-by: yuneng-jiang <yuneng@berri.ai>
Co-authored-by: kape <168134658+kapelame@users.noreply.github.com>
Co-authored-by: danisalvaa <159898202+danisalvaa@users.noreply.github.com>
Co-authored-by: Just R <remixingmagelang@gmail.com>
Co-authored-by: mateo-berri <277851410+mateo-berri@users.noreply.github.com>
Co-authored-by: abhay23-AI <abhaytrivedi22@gmail.com>
2026-06-05 13:51:51 -07:00

28 KiB

🚅 LiteLLM

LiteLLM AI Gateway

Open Source AI Gateway for 100+ LLMs. Self-hosted. Enterprise-ready. Call any LLM in OpenAI format.

Deploy to Render Deploy on Railway

LiteLLM Proxy Server (AI Gateway) | Hosted Proxy | Enterprise Tier | Website

PyPI Version GitHub Stars Y Combinator W23 Whatsapp Discord Slack CodSpeed

LiteLLM AI Gateway

What is LiteLLM

LiteLLM is an open source AI Gateway that gives you a single, unified interface to call 100+ LLM providers — OpenAI, Anthropic, Gemini, Bedrock, Azure, and more — using the OpenAI format.

Use it as a Python SDK for direct library integration, or deploy the AI Gateway (Proxy Server) as a centralized service for your team or organization.

Jump to LiteLLM Proxy (LLM Gateway) Docs
Jump to Supported LLM Providers


Why LiteLLM

Managing LLM calls across providers gets complicated fast — different SDKs, auth patterns, request formats, and error types for every model. LiteLLM removes that friction:

  • Unified API — one interface for 100+ LLMs, no provider-specific SDK juggling
  • Drop-in OpenAI compatibility — swap providers without rewriting your code
  • Production-ready gateway — virtual keys, spend tracking, guardrails, load balancing, and an admin dashboard out of the box
  • 8ms P95 latency at 1k RPS (benchmarks)

OSS Adopters

Stripe image Google ADK Greptile OpenHands

Netflix

OpenAI Agents SDK

Features

LLMs - Call 100+ LLMs (Python SDK + AI Gateway)

All Supported Endpoints - /chat/completions, /responses, /embeddings, /images, /audio, /batches, /rerank, /a2a, /messages and more.

Python SDK

uv add litellm
from litellm import completion
import os

os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

# OpenAI
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hello!"}])

# Anthropic  
response = completion(model="anthropic/claude-sonnet-4-20250514", messages=[{"role": "user", "content": "Hello!"}])

AI Gateway (Proxy Server)

Getting Started - E2E Tutorial - Setup virtual keys, make your first request

uv tool install 'litellm[proxy]'
litellm --model gpt-4o
import openai

client = openai.OpenAI(api_key="anything", base_url="http://0.0.0.0:4000")
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Hello!"}]
)

Docs: LLM Providers

Agents - Invoke A2A Agents (Python SDK + AI Gateway)

Supported Providers - LangGraph, Vertex AI Agent Engine, Azure AI Foundry, Bedrock AgentCore, Pydantic AI

Python SDK - A2A Protocol

from litellm.a2a_protocol import A2AClient
from a2a.types import SendMessageRequest, MessageSendParams
from uuid import uuid4

client = A2AClient(base_url="http://localhost:10001")

request = SendMessageRequest(
    id=str(uuid4()),
    params=MessageSendParams(
        message={
            "role": "user",
            "parts": [{"kind": "text", "text": "Hello!"}],
            "messageId": uuid4().hex,
        }
    )
)
response = await client.send_message(request)

AI Gateway (Proxy Server)

Step 1. Add your Agent to the AI Gateway

Step 2. Call Agent via A2A SDK

from a2a.client import A2ACardResolver, A2AClient
from a2a.types import MessageSendParams, SendMessageRequest
from uuid import uuid4
import httpx

base_url = "http://localhost:4000/a2a/my-agent"  # LiteLLM proxy + agent name
headers = {"Authorization": "Bearer sk-1234"}    # LiteLLM Virtual Key

async with httpx.AsyncClient(headers=headers) as httpx_client:
    resolver = A2ACardResolver(httpx_client=httpx_client, base_url=base_url)
    agent_card = await resolver.get_agent_card()
    client = A2AClient(httpx_client=httpx_client, agent_card=agent_card)

    request = SendMessageRequest(
        id=str(uuid4()),
        params=MessageSendParams(
            message={
                "role": "user",
                "parts": [{"kind": "text", "text": "Hello!"}],
                "messageId": uuid4().hex,
            }
        )
    )
    response = await client.send_message(request)

Docs: A2A Agent Gateway

MCP Tools - Connect MCP servers to any LLM (Python SDK + AI Gateway)

Python SDK - MCP Bridge

from mcp import ClientSession, StdioServerParameters
from mcp.client.stdio import stdio_client
from litellm import experimental_mcp_client
import litellm

server_params = StdioServerParameters(command="python", args=["mcp_server.py"])

async with stdio_client(server_params) as (read, write):
    async with ClientSession(read, write) as session:
        await session.initialize()

        # Load MCP tools in OpenAI format
        tools = await experimental_mcp_client.load_mcp_tools(session=session, format="openai")

        # Use with any LiteLLM model
        response = await litellm.acompletion(
            model="gpt-4o",
            messages=[{"role": "user", "content": "What's 3 + 5?"}],
            tools=tools
        )

AI Gateway - MCP Gateway

Step 1. Add your MCP Server to the AI Gateway

Step 2. Call MCP tools via /chat/completions

curl -X POST 'http://0.0.0.0:4000/v1/chat/completions' \
  -H 'Authorization: Bearer sk-1234' \
  -H 'Content-Type: application/json' \
  -d '{
    "model": "gpt-4o",
    "messages": [{"role": "user", "content": "Summarize the latest open PR"}],
    "tools": [{
      "type": "mcp",
      "server_url": "litellm_proxy/mcp/github",
      "server_label": "github_mcp",
      "require_approval": "never"
    }]
  }'

Use with Cursor IDE

{
  "mcpServers": {
    "LiteLLM": {
      "url": "http://localhost:4000/mcp/",
      "headers": {
        "x-litellm-api-key": "Bearer sk-1234"
      }
    }
  }
}

Docs: MCP Gateway

Supported Providers (Website Supported Models | Docs)

Provider /chat/completions /messages /responses /embeddings /image/generations /audio/transcriptions /audio/speech /moderations /batches /rerank
Abliteration (abliteration)
AI/ML API (aiml)
AI21 (ai21)
AI21 Chat (ai21_chat)
Aleph Alpha
Amazon Nova
Anthropic (anthropic)
Anthropic Text (anthropic_text)
Anyscale
AssemblyAI (assemblyai)
Auto Router (auto_router)
AWS - Bedrock (bedrock)
AWS - Sagemaker (sagemaker)
Azure (azure)
Azure AI (azure_ai)
Azure Text (azure_text)
Baseten (baseten)
Bytez (bytez)
Cerebras (cerebras)
Clarifai (clarifai)
Cloudflare AI Workers (cloudflare)
Codestral (codestral)
Cohere (cohere)
Cohere Chat (cohere_chat)
CometAPI (cometapi)
CompactifAI (compactifai)
Custom (custom)
Custom OpenAI (custom_openai)
Dashscope (dashscope)
Databricks (databricks)
DataRobot (datarobot)
Deepgram (deepgram)
DeepInfra (deepinfra)
Deepseek (deepseek)
ElevenLabs (elevenlabs)
Empower (empower)
Fal AI (fal_ai)
Featherless AI (featherless_ai)
Fireworks AI (fireworks_ai)
FriendliAI (friendliai)
Galadriel (galadriel)
GitHub Copilot (github_copilot)
GitHub Models (github)
Google - PaLM
Google - Vertex AI (vertex_ai)
Google AI Studio - Gemini (gemini)
GradientAI (gradient_ai)
Groq AI (groq)
Heroku (heroku)
Hosted VLLM (hosted_vllm)
Huggingface (huggingface)
Hyperbolic (hyperbolic)
IBM - Watsonx.ai (watsonx)
Infinity (infinity)
Jina AI (jina_ai)
Lambda AI (lambda_ai)
Lemonade (lemonade)
LiteLLM Proxy (litellm_proxy)
Llamafile (llamafile)
LM Studio (lm_studio)
Maritalk (maritalk)
Meta - Llama API (meta_llama)
Mistral AI API (mistral)
Moonshot (moonshot)
Morph (morph)
Nebius AI Studio (nebius)
NLP Cloud (nlp_cloud)
Novita AI (novita)
Nscale (nscale)
Nvidia NIM (nvidia_nim)
OCI (oci)
Ollama (ollama)
Ollama Chat (ollama_chat)
Oobabooga (oobabooga)
OpenAI (openai)
OpenAI-like (openai_like)
OpenRouter (openrouter)
OVHCloud AI Endpoints (ovhcloud)
Perplexity AI (perplexity)
Petals (petals)
Predibase (predibase)
Recraft (recraft)
Replicate (replicate)
Sagemaker Chat (sagemaker_chat)
Sambanova (sambanova)
Snowflake (snowflake)
Text Completion Codestral (text-completion-codestral)
Text Completion OpenAI (text-completion-openai)
Together AI (together_ai)
Topaz (topaz)
Triton (triton)
V0 (v0)
Vercel AI Gateway (vercel_ai_gateway)
VLLM (vllm)
Volcengine (volcengine)
Voyage AI (voyage)
WandB Inference (wandb)
Watsonx Text (watsonx_text)
xAI (xai)
Xinference (xinference)

Read the Docs


Get Started

You can use LiteLLM through either the Proxy Server or Python SDK. Both give you a unified interface to access multiple LLMs (100+ LLMs). Choose the option that best fits your needs:

LiteLLM AI Gateway LiteLLM Python SDK
Use Case Central service (LLM Gateway) to access multiple LLMs Use LiteLLM directly in your Python code
Who Uses It? Gen AI Enablement / ML Platform Teams Developers building LLM projects
Key Features Centralized API gateway with authentication and authorization, multi-tenant cost tracking and spend management per project/user, per-project customization (logging, guardrails, caching), virtual keys for secure access control, admin dashboard UI for monitoring and management Direct Python library integration in your codebase, Router with retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router, application-level load balancing and cost tracking, exception handling with OpenAI-compatible errors, observability callbacks (Lunary, MLflow, Langfuse, etc.)

Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published. More information about the release cycle here

Support for more providers. Missing a provider or LLM Platform, raise a feature request.

Run in Developer Mode

Services

  1. Setup .env file in root
  2. Run dependent services docker-compose up db prometheus

Backend

  1. (In root) create virtual environment python -m venv .venv
  2. Activate virtual environment source .venv/bin/activate
  3. Install dependencies uv sync --all-extras --group proxy-dev
  4. uv run prisma generate
  5. prisma generate
  6. Start proxy backend python litellm/proxy/proxy_cli.py

Frontend

  1. Navigate to ui/litellm-dashboard
  2. Install dependencies npm install
  3. Run npm run dev to start the dashboard

Verify Docker Image Signatures

All LiteLLM Docker images published to GHCR are signed with cosign. Every release is signed with the same key introduced in commit 0112e53.

Verify using the pinned commit hash (recommended):

A commit hash is cryptographically immutable, so this is the strongest way to ensure you are using the original signing key:

cosign verify \
  --key https://raw.githubusercontent.com/BerriAI/litellm/0112e53046018d726492c814b3644b7d376029d0/cosign.pub \
  ghcr.io/berriai/litellm:<release-tag>

Verify using a release tag (convenience):

Tags are protected in this repository and resolve to the same key. This option is easier to read but relies on tag protection rules:

cosign verify \
  --key https://raw.githubusercontent.com/BerriAI/litellm/<release-tag>/cosign.pub \
  ghcr.io/berriai/litellm:<release-tag>

Replace <release-tag> with the version you are deploying (e.g. v1.83.0-stable).


Enterprise

For companies that need better security, user management and professional support

Get an Enterprise License Talk to founders

This covers:

  • Features under the LiteLLM Commercial License:
  • Feature Prioritization
  • Custom Integrations
  • Professional Support - Dedicated discord + slack
  • Custom SLAs
  • Secure access with Single Sign-On

Contributing

We welcome contributions to LiteLLM! Whether you're fixing bugs, adding features, or improving documentation, we appreciate your help.

Quick Start for Contributors

This requires uv to be installed.

git clone https://github.com/BerriAI/litellm.git
cd litellm
make install-dev    # Install development dependencies
make format         # Format your code
make lint           # Run all linting checks
make test-unit      # Run unit tests
make format-check   # Check formatting only

For detailed contributing guidelines, see CONTRIBUTING.md.

📖 Contributing to documentation? The LiteLLM docs have moved to a separate repository: BerriAI/litellm-docs. Please open doc PRs there. Docs are served at docs.litellm.ai.

Code Quality / Linting

LiteLLM follows the Google Python Style Guide.

Our automated checks include:

  • Black for code formatting
  • Ruff for linting and code quality
  • MyPy for type checking
  • Circular import detection
  • Import safety checks

All these checks must pass before your PR can be merged.

Support / talk with founders

Contributors