mirror of https://github.com/tiennm99/litellm.git synced 2026-06-28 21:10:37 +00:00

T

Richard Tweed 0ac093b59e fix: role chaining and session name with webauthentication for aws bedrock (#13205 )

* fix(bedrock): prevent duplicate role assumption in EKS/IRSA environments

Fixes issue where AWS role assumption would fail in EKS/IRSA environments
when trying to assume the same role that's already being used.

The problem occurred when:
1. EKS/IRSA automatically assumes a role (e.g., LitellmRole)
2. LiteLLM tries to assume the same role again, causing AccessDenied errors
3. Different models with different roles would fail due to incorrect role context

Changes:
- Added check in _auth_with_aws_role() to detect if already using target role
- Skip role assumption if current identity matches target role
- Return current credentials instead of attempting duplicate assumption
- Added comprehensive test coverage for the fix

This ensures proper role chaining works in EKS/IRSA environments where:
- Service Account can assume Role A
- Role A can assume Role B for different models/accounts

Resolves the AccessDenied errors reported in bedrock usage scenarios.

* fix(bedrock): simplify role assumption for EKS/IRSA environments

Fixes AWS Bedrock role assumption in EKS/IRSA environments by properly
handling ambient credentials when no explicit credentials are provided.

The issue occurred because commit 197e7efa8f
introduced changes that broke role assumption in EKS/IRSA environments.

Changes:
- Simplified _auth_with_aws_role() to use ambient credentials when no
  explicit AWS credentials are provided (aws_access_key_id and
  aws_secret_access_key are both None)
- This allows web identity tokens in EKS/IRSA to work automatically
  through boto3's credential chain
- Maintains backward compatibility for explicit credential scenarios

Added comprehensive test coverage:
- test_eks_irsa_ambient_credentials_used: Verifies ambient credentials work
- test_explicit_credentials_used_when_provided: Ensures explicit creds still work
- test_partial_credentials_still_use_ambient: Edge case handling
- test_cross_account_role_assumption: Multi-account scenarios
- test_role_assumption_with_custom_session_name: Custom session names
- test_role_assumption_ttl_calculation: TTL calculation verification
- test_role_assumption_error_handling: Error propagation
- test_multiple_role_assumptions_in_sequence: Sequential role assumptions

This fix ensures that in EKS/IRSA environments:
1. Service accounts can assume their initial role via web identity
2. That role can then assume other roles across accounts as configured
3. Different models can use different roles without conflicts

* fix(bedrock): add automatic IRSA detection for EKS environments

- Detect AWS_WEB_IDENTITY_TOKEN_FILE and AWS_ROLE_ARN environment variables
- Automatically use web identity token flow when IRSA is detected
- Read web identity token from file and pass to existing auth method
- Add test coverage for IRSA environment detection
- Fixes authentication errors in EKS with IRSA when no explicit credentials provided

* fix(bedrock): skip role assumption when IRSA role matches requested role

- Detect when AWS_ROLE_ARN environment variable matches the requested role
- Skip unnecessary role assumption when already running as the target role
- Use existing env vars authentication method for IRSA credentials
- Add test coverage for same-role IRSA scenario
- Fixes 'not authorized to perform: sts:AssumeRole' errors when trying to assume the same role

* fix(bedrock): use boto3's native IRSA support for cross-account role assumption

- Replace custom web identity token handling with boto3's built-in IRSA support
- boto3 automatically reads AWS_WEB_IDENTITY_TOKEN_FILE and assumes initial role
- Then use standard assume_role for cross-account access
- Update test to mock boto3 STS client instead of internal methods
- Fixes 'OIDC token could not be retrieved from secret manager' error

* fix(bedrock): improve IRSA error handling and add debug logging

- Add debug logging to show current identity and role assumption attempts
- Provide clearer error messages for trust policy issues
- Fix region handling in IRSA flow
- Re-raise exceptions instead of silently falling through
- This helps diagnose cross-account role assumption permission issues

* fix(bedrock): manually assume IRSA role with correct session name for cross-account scenarios

- When doing cross-account role assumption, manually assume the IRSA role first with the desired session name
- This ensures the session name in the assumed role ARN matches what's expected in trust policies
- For same-account scenarios, continue using boto3's automatic IRSA support
- Updated tests to handle the new flow
- This fixes the issue where cross-account trust policies require specific session names

* fix: Fix linting issues in base_aws_llm.py

- Fix f-string without placeholders (F541)
- Refactor _auth_with_aws_role to reduce statements count (PLR0915)
  - Extract _handle_irsa_cross_account helper method
  - Extract _handle_irsa_same_account helper method
  - Extract _extract_credentials_and_ttl helper method

---------

Co-authored-by: openhands <openhands@all-hands.dev>

2025-08-02 08:55:35 -07:00

.circleci

test: loosen check

2025-08-01 09:21:13 -07:00

.devcontainer

…

.github

build(github/manual_pypi_publish.yml): manual workflow to publish pip package - used for pushing dev releases (#12985 )

2025-07-25 09:26:47 -07:00

ci_cd

…

cookbook

Add new model provider Novita AI (#7582 ) (#9527 )

2025-05-12 21:49:30 -07:00

db_scripts

fix(migrate_keys.py): add script for migrating keys to new db

2025-07-16 10:18:36 -07:00

deploy

[Separate Health App] Update Helm Deployment.yaml (#13162 )

2025-08-01 16:50:23 -07:00

dist

…

docker

add openssl in apk install in runtime stage in dockerfile.non_root (#13168 )

2025-07-31 21:52:11 -07:00

docs/my-website

[LLM] fix model reload on model update (#13216 )

2025-08-01 18:08:02 -07:00

enterprise

[MCP Gateway] Litellm mcp pre and during guardrails (#13188 )

2025-08-01 20:02:25 -07:00

litellm

fix: role chaining and session name with webauthentication for aws bedrock (#13205 )

2025-08-02 08:55:35 -07:00

litellm-js

…

litellm-proxy-extras

poetry lock

2025-07-30 16:05:56 -07:00

tests

fix: role chaining and session name with webauthentication for aws bedrock (#13205 )

2025-08-02 08:55:35 -07:00

ui/litellm-dashboard

Add advanced date picker to all the tabs on the usage page (#13221 )

2025-08-02 07:54:49 -07:00

.dockerignore

…

.env.example

Add new model provider Novita AI (#7582 ) (#9527 )

2025-05-12 21:49:30 -07:00

.flake8

…

.git-blame-ignore-revs

…

.gitattributes

…

.gitignore

Integration: Bytez as a model provider (#12121 )

2025-07-12 10:50:39 -07:00

.pre-commit-config.yaml

docs(index.md): update release note with rc patch

2025-06-17 22:55:50 -07:00

AGENTS.md

Add AGENTS.md (#11461 )

2025-06-05 16:29:28 -07:00

CLAUDE.md

docs(CLAUDE.md): add development guidance and architecture overview for Claude Code (#12011 )

2025-06-24 20:48:08 -07:00

codecov.yaml

…

CONTRIBUTING.md

docs add slack support

2025-06-30 10:45:37 -07:00

docker-compose.yml

add openssl in apk install in runtime stage in dockerfile.non_root (#13168 )

2025-07-31 21:52:11 -07:00

Dockerfile

[Feat] UI - Allow Adding LiteLLM Auto Router on UI (#12960 )

2025-07-24 19:58:49 -07:00

GEMINI.md

docs(GEMINI.md): add development guidelines and architecture overview for Gemini project

2025-06-25 08:22:15 -06:00

index.yaml

…

LICENSE

…

Makefile

feat: add local LLM translation testing with artifact generation (#12120 )

2025-06-27 21:24:19 -07:00

mcp_servers.json

add well known MCP servers (#11209 )

2025-05-28 10:46:26 -07:00

model_prices_and_context_window.json

add openrouter grok4 (#13018 )

2025-07-29 14:24:33 -07:00

package-lock.json

…

package.json

…

poetry.lock

poetry lock

2025-07-30 16:05:56 -07:00

prometheus.yml

…

proxy_server_config.yaml

build: update model in test (#10706 )

2025-05-09 13:33:11 -07:00

pyproject.toml

bump: version 1.74.13 → 1.74.14

2025-07-31 21:52:27 -07:00

pyrightconfig.json

…

README.md

improve readme: replace claude-3-sonnet because it will be retired soon (#12239 )

2025-07-03 21:50:39 -07:00

render.yaml

…

requirements.txt

poetry lock

2025-07-30 16:05:56 -07:00

ruff.toml

…

schema.prisma

[MCP Gateway] add health check endpoints for MCP (#13106 )

2025-07-30 20:40:44 +05:30

security.md

Discard duplicate sentence (#10231 )

2025-04-23 07:05:29 -07:00

test_bulk_update_all_users.py

Bulk User Edit - additional improvements - edit all users + set 'no-default-models' on all users (#12925 )

2025-07-27 10:12:30 -07:00

README.md

🚅 LiteLLM

Call all LLM APIs using the OpenAI format [Bedrock, Huggingface, VertexAI, TogetherAI, Azure, OpenAI, Groq etc.]

LiteLLM Proxy Server (LLM Gateway) | Hosted Proxy (Preview) | Enterprise Tier

LiteLLM manages:

Translate inputs to provider's completion, embedding, and image_generation endpoints
Consistent output, text responses will always be available at ['choices'][0]['message']['content']
Retry/fallback logic across multiple deployments (e.g. Azure/OpenAI) - Router
Set Budgets & Rate limits per project, api key, model LiteLLM Proxy Server (LLM Gateway)

Jump to LiteLLM Proxy (LLM Gateway) Docs
Jump to Supported LLM Providers

🚨 Stable Release: Use docker images with the -stable tag. These have undergone 12 hour load tests, before being published. More information about the release cycle here

Support for more providers. Missing a provider or LLM Platform, raise a feature request.

Usage (Docs)

Important

LiteLLM v1.0.0 now requires openai>=1.0.0. Migration guide here
LiteLLM v1.40.14+ now requires pydantic>=2.0.0. No changes required.

pip install litellm

from litellm import completion
import os

## set ENV variables
os.environ["OPENAI_API_KEY"] = "your-openai-key"
os.environ["ANTHROPIC_API_KEY"] = "your-anthropic-key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="openai/gpt-4o", messages=messages)

# anthropic call
response = completion(model="anthropic/claude-sonnet-4-20250514", messages=messages)
print(response)

Response (OpenAI Format)

{
    "id": "chatcmpl-1214900a-6cdd-4148-b663-b5e2f642b4de",
    "created": 1751494488,
    "model": "claude-sonnet-4-20250514",
    "object": "chat.completion",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": "stop",
            "index": 0,
            "message": {
                "content": "Hello! I'm doing well, thank you for asking. I'm here and ready to help with whatever you'd like to discuss or work on. How are you doing today?",
                "role": "assistant",
                "tool_calls": null,
                "function_call": null
            }
        }
    ],
    "usage": {
        "completion_tokens": 39,
        "prompt_tokens": 13,
        "total_tokens": 52,
        "completion_tokens_details": null,
        "prompt_tokens_details": {
            "audio_tokens": null,
            "cached_tokens": 0
        },
        "cache_creation_input_tokens": 0,
        "cache_read_input_tokens": 0
    }
}

Call any model supported by a provider, with model=<provider_name>/<model_name>. There might be provider-specific details here, so refer to provider docs for more information

Async (Docs)

from litellm import acompletion
import asyncio

async def test_get_response():
    user_message = "Hello, how are you?"
    messages = [{"content": user_message, "role": "user"}]
    response = await acompletion(model="openai/gpt-4o", messages=messages)
    return response

response = asyncio.run(test_get_response())
print(response)

Streaming (Docs)

liteLLM supports streaming the model response back, pass stream=True to get a streaming iterator in response.
Streaming is supported for all models (Bedrock, Huggingface, TogetherAI, Azure, OpenAI, etc.)

from litellm import completion
response = completion(model="openai/gpt-4o", messages=messages, stream=True)
for part in response:
    print(part.choices[0].delta.content or "")

# claude sonnet 4
response = completion('anthropic/claude-sonnet-4-20250514', messages, stream=True)
for part in response:
    print(part)

Response chunk (OpenAI Format)

{
    "id": "chatcmpl-fe575c37-5004-4926-ae5e-bfbc31f356ca",
    "created": 1751494808,
    "model": "claude-sonnet-4-20250514",
    "object": "chat.completion.chunk",
    "system_fingerprint": null,
    "choices": [
        {
            "finish_reason": null,
            "index": 0,
            "delta": {
                "provider_specific_fields": null,
                "content": "Hello",
                "role": "assistant",
                "function_call": null,
                "tool_calls": null,
                "audio": null
            },
            "logprobs": null
        }
    ],
    "provider_specific_fields": null,
    "stream_options": null,
    "citations": null
}

Logging Observability (Docs)

LiteLLM exposes pre defined callbacks to send data to Lunary, MLflow, Langfuse, DynamoDB, s3 Buckets, Helicone, Promptlayer, Traceloop, Athina, Slack

from litellm import completion

## set env variables for logging tools (when using MLflow, no API key set up is required)
os.environ["LUNARY_PUBLIC_KEY"] = "your-lunary-public-key"
os.environ["HELICONE_API_KEY"] = "your-helicone-auth-key"
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["ATHINA_API_KEY"] = "your-athina-api-key"

os.environ["OPENAI_API_KEY"] = "your-openai-key"

# set callbacks
litellm.success_callback = ["lunary", "mlflow", "langfuse", "athina", "helicone"] # log input/output to lunary, langfuse, supabase, athina, helicone etc

#openai call
response = completion(model="openai/gpt-4o", messages=[{"role": "user", "content": "Hi 👋 - i'm openai"}])

LiteLLM Proxy Server (LLM Gateway) - (Docs)

Track spend + Load Balance across multiple projects

Hosted Proxy (Preview)

The proxy provides:

📖 Proxy Endpoints - Swagger Docs

Quick Start Proxy - CLI

pip install 'litellm[proxy]'

Step 1: Start litellm proxy

$ litellm --model huggingface/bigcode/starcoder

#INFO: Proxy running on http://0.0.0.0:4000

Step 2: Make ChatCompletions Request to Proxy

Important

💡 Use LiteLLM Proxy with Langchain (Python, JS), OpenAI SDK (Python, JS) Anthropic SDK, Mistral SDK, LlamaIndex, Instructor, Curl

import openai # openai v1.0.0+
client = openai.OpenAI(api_key="anything",base_url="http://0.0.0.0:4000") # set proxy to base_url
# request sent to model set on litellm proxy, `litellm --model`
response = client.chat.completions.create(model="gpt-3.5-turbo", messages = [
    {
        "role": "user",
        "content": "this is a test request, write a short poem"
    }
])

print(response)

Proxy Key Management (Docs)

Connect the proxy with a Postgres DB to create proxy keys

# Get the code
git clone https://github.com/BerriAI/litellm

# Go to folder
cd litellm

# Add the master key - you can change this after setup
echo 'LITELLM_MASTER_KEY="sk-1234"' > .env

# Add the litellm salt key - you cannot change this after adding a model
# It is used to encrypt / decrypt your LLM API Key credentials
# We recommend - https://1password.com/password-generator/ 
# password generator to get a random hash for litellm salt key
echo 'LITELLM_SALT_KEY="sk-1234"' >> .env

source .env

# Start
docker-compose up

UI on /ui on your proxy server

Set budgets and rate limits across multiple projects POST /key/generate

Request

curl 'http://0.0.0.0:4000/key/generate' \
--header 'Authorization: Bearer sk-1234' \
--header 'Content-Type: application/json' \
--data-raw '{"models": ["gpt-3.5-turbo", "gpt-4", "claude-2"], "duration": "20m","metadata": {"user": "ishaan@berri.ai", "team": "core-infra"}}'

Expected Response

{
    "key": "sk-kdEXbIqZRwEeEiHwdg7sFA", # Bearer token
    "expires": "2023-11-19T01:38:25.838000+00:00" # datetime object
}

Supported Providers (Docs)

Provider	Completion	Streaming	Async Completion	Async Streaming	Async Embedding	Async Image Generation
openai	✅	✅	✅	✅	✅	✅
Meta - Llama API	✅	✅	✅	✅
azure	✅	✅	✅	✅	✅	✅
AI/ML API	✅	✅	✅	✅	✅	✅
aws - sagemaker	✅	✅	✅	✅	✅
aws - bedrock	✅	✅	✅	✅	✅
google - vertex_ai	✅	✅	✅	✅	✅	✅
google - palm	✅	✅	✅	✅
google AI Studio - gemini	✅	✅	✅	✅
mistral ai api	✅	✅	✅	✅	✅
cloudflare AI Workers	✅	✅	✅	✅
cohere	✅	✅	✅	✅	✅
anthropic	✅	✅	✅	✅
empower	✅	✅	✅	✅
huggingface	✅	✅	✅	✅	✅
replicate	✅	✅	✅	✅
together_ai	✅	✅	✅	✅
openrouter	✅	✅	✅	✅
ai21	✅	✅	✅	✅
baseten	✅	✅	✅	✅
vllm	✅	✅	✅	✅
nlp_cloud	✅	✅	✅	✅
aleph alpha	✅	✅	✅	✅
petals	✅	✅	✅	✅
ollama	✅	✅	✅	✅	✅
deepinfra	✅	✅	✅	✅
perplexity-ai	✅	✅	✅	✅
Groq AI	✅	✅	✅	✅
Deepseek	✅	✅	✅	✅
anyscale	✅	✅	✅	✅
IBM - watsonx.ai	✅	✅	✅	✅	✅
voyage ai					✅
xinference [Xorbits Inference]					✅
FriendliAI	✅	✅	✅	✅
Galadriel	✅	✅	✅	✅
Novita AI	✅	✅	✅	✅
Featherless AI	✅	✅	✅	✅
Nebius AI Studio	✅	✅	✅	✅	✅

Read the Docs

Contributing

Interested in contributing? Contributions to LiteLLM Python SDK, Proxy Server, and LLM integrations are both accepted and highly encouraged!

Quick start: git clone → make install-dev → make format → make lint → make test-unit

See our comprehensive Contributing Guide (CONTRIBUTING.md) for detailed instructions.

Enterprise

For companies that need better security, user management and professional support

Talk to founders

This covers:

✅ Features under the LiteLLM Commercial License:
✅ Feature Prioritization
✅ Custom Integrations
✅ Professional Support - Dedicated discord + slack
✅ Custom SLAs
✅ Secure access with Single Sign-On

Contributing

We welcome contributions to LiteLLM! Whether you're fixing bugs, adding features, or improving documentation, we appreciate your help.

Quick Start for Contributors

git clone https://github.com/BerriAI/litellm.git
cd litellm
make install-dev    # Install development dependencies
make format         # Format your code
make lint           # Run all linting checks
make test-unit      # Run unit tests

For detailed contributing guidelines, see CONTRIBUTING.md.

Code Quality / Linting

LiteLLM follows the Google Python Style Guide.

Our automated checks include:

Black for code formatting
Ruff for linting and code quality
MyPy for type checking
Circular import detection
Import safety checks

Run all checks locally:

make lint           # Run all linting (matches CI)
make format-check   # Check formatting only

All these checks must pass before your PR can be merged.

Support / talk with founders

Schedule Demo 👋
Community Discord 💭
Community Slack 💭
Our numbers 📞 +1 (770) 8783-106 / ‭+1 (412) 618-6238‬
Our emails ✉️ ishaan@berri.ai / krrish@berri.ai

Why did we build this

Need for simplicity: Our code started to get extremely complicated managing & translating calls between Azure, OpenAI and Cohere.

Contributors

Run in Developer mode

Services

Setup .env file in root
Run dependant services docker-compose up db prometheus

Backend

(In root) create virtual environment python -m venv .venv
Activate virtual environment source .venv/bin/activate
Install dependencies pip install -e ".[all]"
Start proxy backend uvicorn litellm.proxy.proxy_server:app --host localhost --port 4000 --reload

Frontend

Navigate to ui/litellm-dashboard
Install dependencies npm install
Run npm run dev to start the dashboard

Description

Python SDK, Proxy Server (AI Gateway) to call 100+ LLM APIs in OpenAI (or native) format, with cost tracking, guardrails, loadbalancing and logging. [Bedrock, Azure, OpenAI, VertexAI, Cohere, Anthropic, Sagemaker, HuggingFace, VLLM, NVIDIA NIM]

Readme MIT 1.1 GiB

Languages

Python 81%

TypeScript 12.2%

JavaScript 5.9%

HTML 0.5%

HCL 0.2%

README.md Unescape Escape

🚅 LiteLLM

LiteLLM Proxy Server (LLM Gateway) | Hosted Proxy (Preview) | Enterprise Tier

Usage (Docs)

Response (OpenAI Format)

Async (Docs)

Streaming (Docs)

Response chunk (OpenAI Format)

Logging Observability (Docs)

LiteLLM Proxy Server (LLM Gateway) - (Docs)

📖 Proxy Endpoints - Swagger Docs

Quick Start Proxy - CLI

Step 1: Start litellm proxy

Step 2: Make ChatCompletions Request to Proxy

Proxy Key Management (Docs)

Request

Expected Response

Supported Providers (Docs)

Contributing

Enterprise

Contributing

Quick Start for Contributors

Code Quality / Linting

Support / talk with founders

Why did we build this

Contributors

Run in Developer mode

Services

Backend

Frontend

README.md