mirror of https://github.com/tiennm99/litellm.git synced 2026-08-02 10:21:52 +00:00

T

b1b96ff3cf [Perf] Alexsander fixes round 2 - Oct 18th (#15695 )

* perf(router): Optimize prompt management model check with early exit

Add early return for models without '/' to avoid expensive get_model_list()
calls for 99% of standard model requests (gpt-4, claude-3, etc).

- Refactor _is_prompt_management_model() with "/" check before model lookup
- Add unit tests to verify optimization doesn't break detection

* perf(caching): optimize Redis batch cache operations and reduce unnecessary queries

This commit introduces several performance optimizations to the Redis caching layer:

**DualCache Improvements (dual_cache.py):**

1. Increase batch cache size limit from 100 to 1000
   - Allows for larger batch operations, reducing Redis round-trips

2. Throttle repeated Redis queries for cache misses
   - Update last_redis_batch_access_time for ALL queried keys, including those
     with None values
   - Prevents excessive Redis queries for frequently-accessed non-existent keys

3. Add early exit optimization
   - Short-circuit when redis_result is None or contains only None values
   - Avoids unnecessary processing when no cache hits are found

4. Optimize key lookup performance
   - Replace O(n) keys.index() calls with O(1) dict lookup via key_to_index mapping
   - Reduces algorithmic complexity in batch operations

5. Streamline cache updates
   - Combine result updates and in-memory cache updates in single loop
   - Only cache non-None values to avoid polluting in-memory cache

**CooldownCache Improvements (cooldown_cache.py):**

1. Enhanced early return logic
   - Check if all values in results are None, not just if results is None
   - Prevents unnecessary iteration when no valid cooldown data exists

These changes significantly improve Redis caching performance, especially for:
- High-throughput batch operations
- Scenarios with frequent cache misses
- Large-scale deployments with many concurrent requests

* fix: remove unnecessary test

* refactor: move default_max_redis_batch_cache_size to constants

- Add DEFAULT_MAX_REDIS_BATCH_CACHE_SIZE constant (default: 1000)
- Update DualCache to use constant from constants.py
- Document new environment variable in config_settings.md

* fix: only use in memory cache when set

* fix(router): improve prompt management model detection with smart early return

The previous early return optimization in _is_prompt_management_model() was
checking if the model name parameter contained '/' and returning False if it
didn't. This broke detection for model aliases (e.g., 'chatbot_actions') that
don't have '/' in their name but map to prompt management models
(e.g., 'langfuse/openai-gpt-3.5-turbo').

Changed the early return logic to only exit early when:
- Model name contains '/' AND
- The prefix is NOT a known prompt management provider

This maintains the performance optimization for 99% of direct model calls
(avoiding expensive get_model_list lookups) while correctly handling:
- Direct prompt management calls (e.g., 'langfuse/model')
- Model aliases without '/' (e.g., 'chatbot_actions')
- Regular models with/without '/' (e.g., 'gpt-3.5-turbo', 'openai/gpt-4')

Fixes test: test_router_prompt_management_factory

* perf(router): optimize _pre_call_checks with shallow copy (1400x faster)

Replace deepcopy with list() in _pre_call_checks - runs on every request.
Only pops from list, never modifies deployment dicts, so shallow copy is safe.

Performance: 1400x faster on hot path
Impact: 2-5x overall throughput improvement for routing workloads
Tests: Added regression test to ensure no mutation + filtering works

* perf(router): replace deepcopy with shallow copy for default deployment

Replace expensive copy.deepcopy() with shallow copy for default_deployment
in _common_checks_available_deployment() hot path.

Changes:
- Use dict.copy() for top-level deployment dict
- Use dict.copy() for nested litellm_params dict
- Only the 'model' field is modified, so deep recursion is unnecessary

Impact:
- 100x+ faster for default deployment path (every request when used)
- deepcopy recursively traverses entire object tree
- Shallow copy only copies two dict levels (exactly what's needed)

Test coverage:
- Added regression test to verify deployment isolation
- Ensures returned deployments don't mutate original default_deployment
- Validates multiple concurrent requests get independent copies

* perf(router): remove unnecessary dict copy in completion hot paths

Remove unnecessary deployment['litellm_params'].copy() in _completion
and _acompletion functions. The dict is only read and spread into a new
dict, never modified, making the defensive copy wasteful.

Changes:
- Remove .copy() in _completion (sync hot path)
- Remove .copy() in _acompletion (async hot path)

Impact:
- Every completion request (highest traffic endpoints)
- Eliminates unnecessary dict allocation and copy on every call
- Dict spreading already creates new dict, so no mutation possible

Test coverage:
- Added tests verifying deployment params unchanged after calls
- Tests both sync and async completion paths
- Validates optimization doesn't introduce mutations

* perf(router): optimize deployment filtering in pre-call checks

Replace O(n²) list pop pattern with O(n) set-based filtering in
_pre_call_checks() to improve routing performance under high load.

Changes:
- Use set() instead of list for invalid_model_indices tracking
- Replace reversed list.pop() loop with single-pass list comprehension
- Eliminate redundant list→set conversion overhead

Impact:
- Hot path optimization: runs on every request through the router
- ~2-5x faster filtering when many deployments fail validation
- Most beneficial with 50+ deployments per model group or high
  invalidation rates (rate limits, context window exceeded)

Technical details:
Old: O(k²) where k = invalid deployments (pop shifts remaining elements)
New: O(n) single pass with O(1) set membership checks

* add: memory profiler

feat(proxy): Add configurable GC thresholds and enhance memory debugging endpoints

- Add PYTHON_GC_THRESHOLD env var to configure garbage collection thresholds
- Add POST /debug/memory/gc/configure endpoint for runtime GC tuning
- Enhance memory debugging endpoints with better structure and explanations
- Add comprehensive router and cache memory tracking
- Include worker PID in all debug responses for multi-worker debugging

* refactor: reduce complexity in get_memory_details endpoint

Extract 6 helper functions from get_memory_details to fix linter
error PLR0915 (too many statements). Improves maintainability
while preserving functionality.

* fix(router): remove incorrect early exit in _is_prompt_management_model

Removes early exit optimization that checked model_name prefix instead
of the actual litellm_params model. This incorrectly returned False for
custom model aliases that map to prompt management providers.

Example: "my-langfuse-prompt/test_id" -> "langfuse_prompt/actual_id"

The method now correctly checks the underlying model's prefix.

Fixes test_is_prompt_management_model_optimization

* fix(proxy): add explicit type annotations to debug_utils dictionaries

Resolved 6 mypy type errors in proxy/common_utils/debug_utils.py by adding
explicit Dict[str, Any] annotations to dictionary variables where mypy was
incorrectly inferring narrow types. This allows the dictionaries to accept
different value types (strings, nested dicts) for error handling and various
return structures.

Fixed:
- Line 246: caches dictionary in get_memory_summary()
- Line 371: cache_stats dictionary in _get_cache_memory_stats()
- Line 439: litellm_router_memory dictionary in _get_router_memory_stats()

* fix(proxy): fix Python 3.8 compatibility in debug_utils type annotations

- Replace tuple[...], list[...] with Tuple[...], List[...] from typing
- Replace Dict | None with Optional[Dict] for Python 3.8 compatibility
- Add missing imports: List, Optional, Tuple to typing imports

Fixes TypeError: 'type' object is not subscriptable in Python 3.8

---------

Co-authored-by: AlexsanderHamir <alexsanderhamirgomesbaptista@gmail.com>

2025-10-18 11:12:00 -07:00

.circleci

[Feat] Native /ocr endpoint support (#15573 )

2025-10-15 17:20:01 -07:00

.devcontainer

chore: setting devcontainer for develop

2025-09-27 12:51:44 +09:00

.github

test: make lint test stricter

2025-09-27 13:33:25 -07:00

ci_cd

fix scan

2025-10-11 11:38:43 -07:00

cookbook

fix(cookbook): Remove the CometAPI key used for testing.

2025-10-16 14:14:58 +08:00

db_scripts

fix(migrate_keys.py): add script for migrating keys to new db

2025-07-16 10:18:36 -07:00

deploy

Merge pull request #13855 from edify42/allow-no-db-url

2025-09-06 22:02:01 -07:00

docker

docs: use docker compose instead of docker-compose

2025-09-29 11:59:53 +00:00

docs/my-website

[Perf] Alexsander fixes round 2 - Oct 18th (#15695 )

2025-10-18 11:12:00 -07:00

enterprise

[Feat] Add Guardrails for /v1/messages and /v1/responses API (#15686 )

2025-10-17 18:09:00 -07:00

litellm

[Perf] Alexsander fixes round 2 - Oct 18th (#15695 )

2025-10-18 11:12:00 -07:00

litellm-js

build(deps): bump hono from 4.6.5 to 4.9.7 in /litellm-js/spend-logs (#14513 )

2025-09-13 11:10:37 -07:00

litellm-proxy-extras

build: bump version

2025-10-13 14:26:38 -07:00

scripts

Fix Groq streaming ASCII encoding issue

2025-08-16 08:32:22 -05:00

tests

[Perf] Alexsander fixes round 2 - Oct 18th (#15695 )

2025-10-18 11:12:00 -07:00

ui/litellm-dashboard

litellm_Key Settings Max Budget Removal Error Fix (#15669 )

2025-10-17 19:41:29 -07:00

.dockerignore

[Feat] Add support for returning images with gemini/gemini-2.5-flash-image-preview with /chat/completions (#13983 )

2025-08-27 16:16:19 -07:00

.env.example

Add new model provider Novita AI (#7582 ) (#9527 )

2025-05-12 21:49:30 -07:00

.flake8

chore: list all ignored flake8 rules explicit

2023-12-23 09:07:59 +01:00

.git-blame-ignore-revs

Add my commit to .git-blame-ignore-revs

2024-05-12 10:21:10 -07:00

.gitattributes

ignore ipynbs

2023-08-31 16:58:54 -07:00

.gitignore

test(test_mcp_server_manager.py): add unit testing

2025-10-09 14:52:46 -07:00

.pre-commit-config.yaml

docs(index.md): update release note with rc patch

2025-06-17 22:55:50 -07:00

AGENTS.md

Add AGENTS.md (#11461 )

2025-06-05 16:29:28 -07:00

CLAUDE.md

docs(CLAUDE.md): add development guidance and architecture overview for Claude Code (#12011 )

2025-06-24 20:48:08 -07:00

codecov.yaml

fix comment

2024-10-23 15:44:27 +05:30

CONTRIBUTING.md

docs add slack support

2025-06-30 10:45:37 -07:00

COST_DISCOUNT_IMPLEMENTATION.md

[Feat] Cost Tracking - specify a global vendor discount for costs. (#15546 )

2025-10-14 20:07:04 -07:00

docker-compose.yml

Deletion of unnecessary and error causing volume section comment (#15425 )

2025-10-10 17:51:27 -07:00

Dockerfile

fix security

2025-09-26 19:31:56 -07:00

GEMINI.md

docs(GEMINI.md): add development guidelines and architecture overview for Gemini project

2025-06-25 08:22:15 -06:00

git_model_armor.py

Revert "Added model armor testing files basic"

2025-09-23 17:45:44 -07:00

index.yaml

add 0.2.3 helm

2024-08-19 23:59:58 +08:00

LICENSE

refactor: creating enterprise folder

2024-02-15 12:54:13 -08:00

log.txt

fix: remove unused uuid import

2025-10-11 15:38:28 -07:00

Makefile

[Fix] Ensure guardrail memory sync after database updates (#15633 )

2025-10-16 21:46:49 -07:00

mcp_servers.json

add well known MCP servers (#11209 )

2025-05-28 10:46:26 -07:00

MCP_SSL_CHANGES_SUMMARY.md

fix(mcp/): add ssl certificate settings for mcp clients

2025-10-06 18:36:05 -07:00

model_prices_and_context_window.json

fix: bedrock-pricing-geo-inregion-cross-region / add Global Cross-Region Inference (#15685 )

2025-10-17 19:37:12 -07:00

package-lock.json

Fixed Log Tab Key Alias filtering inaccurately for failed logs

2025-09-11 13:05:48 -07:00

package.json

Fixed Log Tab Key Alias filtering inaccurately for failed logs

2025-09-11 13:05:48 -07:00

poetry.lock

build: bump version

2025-10-13 14:26:38 -07:00

prometheus.yml

build(docker-compose.yml): add prometheus scraper to docker compose

2024-07-24 10:09:23 -07:00

proxy_server_config.yaml

Add shared healthcheck

2025-10-09 22:18:05 +05:30

pyproject.toml

bump: version 1.78.3 → 1.78.4

2025-10-17 18:08:16 -07:00

pyrightconfig.json

Add pyright to ci/cd + Fix remaining type-checking errors (#6082 )

2024-10-05 17:04:00 -04:00

README.md

docs(readme): Add CometAPI to supported providers table

2025-10-16 14:18:55 +08:00

render.yaml

build(render.yaml): fix health check route

2024-05-24 09:45:28 -07:00

requirements.txt

build: bump version

2025-10-13 14:26:38 -07:00

ruff.toml

(code quality) run ruff rule to ban unused imports (#7313 )

2024-12-19 12:33:42 -08:00

schema.prisma

[Feat] Tag Management - Add support for setting tag based budgets (#15433 )

2025-10-10 19:24:50 -07:00

security.md

Corrected docs updates sept 2025 (#14916 )

2025-09-25 15:49:19 -07:00

test_bulk_update_all_users.py

Bulk User Edit - additional improvements - edit all users + set 'no-default-models' on all users (#12925 )

2025-07-27 10:12:30 -07:00

test_image_edit.png

[Fix] Dall-e-2 for Image Edits API (#15604 )

2025-10-16 13:24:24 -07:00

test_model_armor.py

Revert "Added model armor testing files basic"

2025-09-23 17:45:44 -07:00

test_profile_mock_response.py.lprof

[Perf] Use fastuuid for fast UUID generations - 2.1x Faster (#13992 )

2025-08-26 19:47:25 -07:00

README.md

🚅 LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]

LiteLLM Proxy Server (LLM Gateway) | Hosted Proxy (Preview) | Enterprise Tier

LiteLLM manages:

Translate inputs to provider's completion, embedding, and image_generation endpoints
Consistent output, text responses will always be available at ['choices'][0]['message']['content']
Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
Set Budgets & Rate limits per project, api key, model LiteLLM Proxy Server (LLM Gateway)

Jump to LiteLLM Proxy (LLM Gateway) Docs
Jump to Supported LLM Providers

🚨 Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published. More information about the release cycle here

Support for more providers. Missing a provider or LLM Platform, raise a feature request.

Usage (Docs)

Important

LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here LiteLLM v1.40.14+ now requires pydantic>=2.0.0. No changes required.

pip install litellm

from litellm import completion
import os

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="openai/gpt-4o", messages=messages)

# anthropic call
response = completion(model="anthropic/claude-sonnet-4-20250514", messages=messages)
print(response)

Response (OpenAI Format)

{
    "id": "chatcmpl-1214900a-6cdd-4148-b663-b5e2f642b4de",
    "created": 1751494488,
    "model": "claude-sonnet-4-20250514",
    "object": "chat.completion",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?",
                "role": "assistant",
                "tool_calls": null,
                "function_call": null
            }
        }
    ],
    "usage": {
        "completion_tokens": 39,
        "prompt_tokens": 13,
        "total_tokens": 52,
        "completion_tokens_details": null,
        "prompt_tokens_details": {
            "audio_tokens": null,
            "cached_tokens": 0
        },
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0
    }
}

Call any model supported by a provider, with model=<provider_name>/<model_name>. There might be provider-specific details here, so refer to provider docs for more information

Async (Docs)

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="openai/gpt-4o", messages=messages)
    return response

response = asyncio.run(test_get_response())
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response. Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

from litellm import completion
response = completion(model="openai/gpt-4o", messages=messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

# claude sonnet 4
response = completion('anthropic/claude-sonnet-4-20250514', messages, stream=True)
for part in response:
    print(part)

Response chunk (OpenAI Format)

{
    "id": "chatcmpl-fe575c37-5004-4926-ae5e-bfbc31f356ca",
    "created": 1751494808,
    "model": "claude-sonnet-4-20250514",
    "object": "chat.completion.chunk",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": null,
            "index": 0,
            "delta": {
                "provider_specific_fields": null,
                "content": "Hello",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null,
                "audio": null
            },
            "logprobs": null
        }
    ],
    "provider_specific_fields": null,
    "stream_options": null,
    "citations": null
}

Logging Observability (Docs)

LiteLLM exposes pre defined callbacks to send data to Lunary, MLflow, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack

from litellm import completion

## set env variables for logging tools (when using MLflow, no API key set up is required)
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["ATHINA_API_KEY"] = "your-athina-api-key"

os.environ["OPENAI_API_KEY"] = "your-openai-key"

# set callbacks
litellm.success_callback = ["lunary", "mlflow", "langfuse", "athina", "helicone"] # log input/output to lunary, langfuse, supabase, athina, helicone etc

#openai call
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])

LiteLLM Proxy Server (LLM Gateway) - (Docs)

Track spend + Load Balance across multiple projects

Hosted Proxy (Preview)

The proxy provides:

📖 Proxy Endpoints - Swagger Docs

Quick Start Proxy - CLI

pip install 'litellm[proxy]'

Step 1: Start litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:4000

Step 2: Make ChatCompletions Request to Proxy

Important

💡 Use LiteLLM Proxy with Langchain (Python, JS), OpenAI SDK (Python, JS) Anthropic SDK, Mistral SDK, LlamaIndex, Instructor, Curl

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

Proxy Key Management (Docs)

Connect the proxy with a Postgres DB to create proxy keys

# Get the code
git clone https://github.com/BerriAI/litellm

# Go to folder
cd litellm

# Add the master key - you can change this after setup
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env

# Add the litellm salt key - you cannot change this after adding a model
# It is used to encrypt / decrypt your LLM API Key credentials
# We recommend - https://1password.com/password-generator/
# password generator to get a random hash for litellm salt key
echo 'LITELLM_SALT_KEY="sk-1234"' >> .env

source .env

# Start
docker compose up

UI on /ui on your proxy server

Set budgets and rate limits across multiple projects POST /key/generate

Request

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "ishaan@berri.ai", "team": "core-infra"}}'

Expected Response

{
    "key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token
    "expires": "2023-11-19T01:38:25.838000+00:00" # datetime object
}

Supported Providers (Docs)

Provider	Completion	Streaming	Async Completion	Async Streaming	Async Embedding	Async Image Generation
openai	✅	✅	✅	✅	✅	✅
Meta - Llama API	✅	✅	✅	✅
azure	✅	✅	✅	✅	✅	✅
AI/ML API	✅	✅	✅	✅	✅	✅
aws - sagemaker	✅	✅	✅	✅	✅
aws - bedrock	✅	✅	✅	✅	✅
google - vertex_ai	✅	✅	✅	✅	✅	✅
google - palm	✅	✅	✅	✅
google AI Studio - gemini	✅	✅	✅	✅
mistral ai api	✅	✅	✅	✅	✅
cloudflare AI Workers	✅	✅	✅	✅
CompactifAI	✅	✅	✅	✅
cohere	✅	✅	✅	✅	✅
anthropic	✅	✅	✅	✅
empower	✅	✅	✅	✅
huggingface	✅	✅	✅	✅	✅
replicate	✅	✅	✅	✅
together_ai	✅	✅	✅	✅
openrouter	✅	✅	✅	✅
ai21	✅	✅	✅	✅
baseten	✅	✅	✅	✅
vllm	✅	✅	✅	✅
nlp_cloud	✅	✅	✅	✅
aleph alpha	✅	✅	✅	✅
petals	✅	✅	✅	✅
ollama	✅	✅	✅	✅	✅
deepinfra	✅	✅	✅	✅
perplexity-ai	✅	✅	✅	✅
Groq AI	✅	✅	✅	✅
Deepseek	✅	✅	✅	✅
anyscale	✅	✅	✅	✅
IBM - watsonx.ai	✅	✅	✅	✅	✅
voyage ai					✅
xinference [Xorbits Inference]					✅
FriendliAI	✅	✅	✅	✅
Galadriel	✅	✅	✅	✅
GradientAI	✅	✅
Novita AI	✅	✅	✅	✅
Featherless AI	✅	✅	✅	✅
Nebius AI Studio	✅	✅	✅	✅	✅
Heroku	✅	✅
OVHCloud AI Endpoints	✅	✅
CometAPI	✅	✅	✅	✅	✅	✅

Read the Docs

Run in Developer mode

Services

Setup .env file in root
Run dependant services docker-compose up db prometheus

Backend

(In root) create virtual environment python -m venv .venv
Activate virtual environment source .venv/bin/activate
Install dependencies pip install -e ".[all]"
Start proxy backend python litellm/proxy_cli.py

Frontend

Navigate to ui/litellm-dashboard
Install dependencies npm install
Run npm run dev to start the dashboard

Enterprise

For companies that need better security, user management and professional support

Talk to founders

This covers:

✅ Features under the LiteLLM Commercial License:
✅ Feature Prioritization
✅ Custom Integrations
✅ Professional Support - Dedicated discord + slack
✅ Custom SLAs
✅ Secure access with Single Sign-On

Contributing

We welcome contributions to LiteLLM! Whether you're fixing bugs, adding features, or improving documentation, we appreciate your help.

Quick Start for Contributors

This requires poetry to be installed.

git clone https://github.com/BerriAI/litellm.git
cd litellm
make install-dev    # Install development dependencies
make format         # Format your code
make lint           # Run all linting checks
make test-unit      # Run unit tests
make format-check   # Check formatting only

For detailed contributing guidelines, see CONTRIBUTING.md.

Code Quality / Linting

LiteLLM follows the Google Python Style Guide.

Our automated checks include:

Black for code formatting
Ruff for linting and code quality
MyPy for type checking
Circular import detection
Import safety checks

All these checks must pass before your PR can be merged.

Support / talk with founders

Schedule Demo 👋
Community Discord 💭
Community Slack 💭
Our numbers 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
Our emails ✉️ ishaan@berri.ai / krrish@berri.ai

Why did we build this

Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

Contributors

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]

ai-gateway anthropic azure-openai bedrock cost-tracking litellm llm llmops openai openai-compatible proxy-server python sdk self-hosted vertex-ai

Readme MIT

1.1 GiB

README.md Unescape Escape

🚅 LiteLLM

LiteLLM Proxy Server (LLM Gateway) | Hosted Proxy (Preview) | Enterprise Tier

Usage (Docs)

Response (OpenAI Format)

Async (Docs)

Streaming (Docs)

Response chunk (OpenAI Format)

Logging Observability (Docs)

LiteLLM Proxy Server (LLM Gateway) - (Docs)

📖 Proxy Endpoints - Swagger Docs

Quick Start Proxy - CLI

Step 1: Start litellm proxy

Step 2: Make ChatCompletions Request to Proxy

Proxy Key Management (Docs)

Request

Expected Response

Supported Providers (Docs)

Run in Developer mode

Services

Backend

Frontend

Enterprise

Contributing

Quick Start for Contributors

Code Quality / Linting

Support / talk with founders

Why did we build this

Contributors

README.md