Files
litellm/docs/my-website/docs/rerank.md
T
Cesar Garcia 63a97db663 feat(voyage): add rerank API support (#17744)
* feat(voyage): add rerank API support

Add support for Voyage AI rerank models (rerank-2.5, rerank-2.5-lite,
rerank-2, rerank-2-lite) to the LiteLLM rerank API.

Changes:
- Add VoyageRerankConfig transformation class
- Register voyage provider in rerank_api/main.py
- Add voyage case in utils.py get_provider_rerank_config
- Add rerank-2.5 and rerank-2.5-lite models to pricing JSON
- Add unit tests for transformation logic
- Update documentation for voyage.md and rerank.md

Usage:
```python
from litellm import rerank

response = rerank(
    model="voyage/rerank-2.5",
    query="What is the capital of France?",
    documents=["Paris is...", "London is..."],
    top_n=3,
)
```

* refactor(voyage): simplify rerank transformation code

Remove verbose docstrings to align with other providers (jina_ai pattern).
No functional changes - 168 lines vs 169 for jina_ai.

* fix(voyage): remove incorrect input_cost_per_query from rerank models

Voyage AI charges per token, not per query. The input_cost_per_query
field was incorrectly set to the same value as input_cost_per_token
in the existing rerank-2 and rerank-2-lite models.

Removes input_cost_per_query from all Voyage rerank models:
- voyage/rerank-2
- voyage/rerank-2-lite
- voyage/rerank-2.5
- voyage/rerank-2.5-lite

Pricing source: https://docs.voyageai.com/docs/pricing
2025-12-09 17:34:09 -08:00

4.4 KiB
Raw Blame History

/rerank

:::tip

LiteLLM Follows the cohere api request / response for the rerank api

:::

Overview

Feature Supported Notes
Cost Tracking Works with all supported models
Logging Works across all integrations
End-user Tracking
Fallbacks Works between supported models
Loadbalancing Works between supported models
Guardrails Applies to input query only (not documents)
Supported Providers Cohere, Together AI, Azure AI, DeepInfra, Nvidia NIM, Infinity, Fireworks AI, Voyage AI

LiteLLM Python SDK Usage

Quick Start

from litellm import rerank
import os

os.environ["COHERE_API_KEY"] = "sk-.."

query = "What is the capital of the United States?"
documents = [
    "Carson City is the capital city of the American state of Nevada.",
    "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
    "Washington, D.C. is the capital of the United States.",
    "Capital punishment has existed in the United States since before it was a country.",
]

response = rerank(
    model="cohere/rerank-english-v3.0",
    query=query,
    documents=documents,
    top_n=3,
)
print(response)

Async Usage

from litellm import arerank
import os, asyncio

os.environ["COHERE_API_KEY"] = "sk-.."

async def test_async_rerank(): 
    query = "What is the capital of the United States?"
    documents = [
        "Carson City is the capital city of the American state of Nevada.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
        "Washington, D.C. is the capital of the United States.",
        "Capital punishment has existed in the United States since before it was a country.",
    ]

    response = await arerank(
        model="cohere/rerank-english-v3.0",
        query=query,
        documents=documents,
        top_n=3,
    )
    print(response)

asyncio.run(test_async_rerank())

LiteLLM Proxy Usage

LiteLLM provides an cohere api compatible /rerank endpoint for Rerank calls.

Setup

Add this to your litellm proxy config.yaml

model_list:
  - model_name: Salesforce/Llama-Rank-V1
    litellm_params:
      model: together_ai/Salesforce/Llama-Rank-V1
      api_key: os.environ/TOGETHERAI_API_KEY
  - model_name: rerank-english-v3.0
    litellm_params:
      model: cohere/rerank-english-v3.0
      api_key: os.environ/COHERE_API_KEY

Start litellm

litellm --config /path/to/config.yaml

# RUNNING on http://0.0.0.0:4000

Test request

curl http://0.0.0.0:4000/rerank \
  -H "Authorization: Bearer sk-1234" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "rerank-english-v3.0",
    "query": "What is the capital of the United States?",
    "documents": [
        "Carson City is the capital city of the American state of Nevada.",
        "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean. Its capital is Saipan.",
        "Washington, D.C. is the capital of the United States.",
        "Capital punishment has existed in the United States since before it was a country."
    ],
    "top_n": 3
  }'

Supported Providers

See all supported models and providers at models.litellm.ai

Provider Link to Usage
Cohere (v1 + v2 clients) Usage
Together AI Usage
Azure AI Usage
Jina AI Usage
AWS Bedrock Usage
HuggingFace Usage
Infinity Usage
vLLM Usage
DeepInfra Usage
Vertex AI Usage
Fireworks AI Usage
Voyage AI Usage