mirror of
https://github.com/tiennm99/litellm.git
synced 2026-07-05 13:07:08 +00:00
63a97db663
* feat(voyage): add rerank API support
Add support for Voyage AI rerank models (rerank-2.5, rerank-2.5-lite,
rerank-2, rerank-2-lite) to the LiteLLM rerank API.
Changes:
- Add VoyageRerankConfig transformation class
- Register voyage provider in rerank_api/main.py
- Add voyage case in utils.py get_provider_rerank_config
- Add rerank-2.5 and rerank-2.5-lite models to pricing JSON
- Add unit tests for transformation logic
- Update documentation for voyage.md and rerank.md
Usage:
```python
from litellm import rerank
response = rerank(
model="voyage/rerank-2.5",
query="What is the capital of France?",
documents=["Paris is...", "London is..."],
top_n=3,
)
```
* refactor(voyage): simplify rerank transformation code
Remove verbose docstrings to align with other providers (jina_ai pattern).
No functional changes - 168 lines vs 169 for jina_ai.
* fix(voyage): remove incorrect input_cost_per_query from rerank models
Voyage AI charges per token, not per query. The input_cost_per_query
field was incorrectly set to the same value as input_cost_per_token
in the existing rerank-2 and rerank-2-lite models.
Removes input_cost_per_query from all Voyage rerank models:
- voyage/rerank-2
- voyage/rerank-2-lite
- voyage/rerank-2.5
- voyage/rerank-2.5-lite
Pricing source: https://docs.voyageai.com/docs/pricing
4.4 KiB
4.4 KiB
/rerank
:::tip
LiteLLM Follows the cohere api request / response for the rerank api
:::
Overview
| Feature | Supported | Notes |
|---|---|---|
| Cost Tracking | ✅ | Works with all supported models |
| Logging | ✅ | Works across all integrations |
| End-user Tracking | ✅ | |
| Fallbacks | ✅ | Works between supported models |
| Loadbalancing | ✅ | Works between supported models |
| Guardrails | ✅ | Applies to input query only (not documents) |
| Supported Providers | Cohere, Together AI, Azure AI, DeepInfra, Nvidia NIM, Infinity, Fireworks AI, Voyage AI |
LiteLLM Python SDK Usage
Quick Start
from litellm import rerank
import os
os.environ["COHERE_API_KEY"] = "sk-.."
query = "What is the capital of the United States?"
documents = [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country.",
]
response = rerank(
model="cohere/rerank-english-v3.0",
query=query,
documents=documents,
top_n=3,
)
print(response)
Async Usage
from litellm import arerank
import os, asyncio
os.environ["COHERE_API_KEY"] = "sk-.."
async def test_async_rerank():
query = "What is the capital of the United States?"
documents = [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country.",
]
response = await arerank(
model="cohere/rerank-english-v3.0",
query=query,
documents=documents,
top_n=3,
)
print(response)
asyncio.run(test_async_rerank())
LiteLLM Proxy Usage
LiteLLM provides an cohere api compatible /rerank endpoint for Rerank calls.
Setup
Add this to your litellm proxy config.yaml
model_list:
- model_name: Salesforce/Llama-Rank-V1
litellm_params:
model: together_ai/Salesforce/Llama-Rank-V1
api_key: os.environ/TOGETHERAI_API_KEY
- model_name: rerank-english-v3.0
litellm_params:
model: cohere/rerank-english-v3.0
api_key: os.environ/COHERE_API_KEY
Start litellm
litellm --config /path/to/config.yaml
# RUNNING on http://0.0.0.0:4000
Test request
curl http://0.0.0.0:4000/rerank \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "rerank-english-v3.0",
"query": "What is the capital of the United States?",
"documents": [
"Carson City is the capital city of the American state of Nevada.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
"Washington, D.C. is the capital of the United States.",
"Capital punishment has existed in the United States since before it was a country."
],
"top_n": 3
}'
Supported Providers
⚡️See all supported models and providers at models.litellm.ai
| Provider | Link to Usage |
|---|---|
| Cohere (v1 + v2 clients) | Usage |
| Together AI | Usage |
| Azure AI | Usage |
| Jina AI | Usage |
| AWS Bedrock | Usage |
| HuggingFace | Usage |
| Infinity | Usage |
| vLLM | Usage |
| DeepInfra | Usage |
| Vertex AI | Usage |
| Fireworks AI | Usage |
| Voyage AI | Usage |