Add embedding model documentation

This commit is contained in:
Sameer Kankute
2026-03-11 11:02:49 +05:30
parent 8c5478df70
commit 1c144fc896
3 changed files with 286 additions and 0 deletions
@@ -0,0 +1,169 @@
---
slug: gemini_embedding_2_multimodal
title: "Gemini Embedding 2 Preview: Multimodal Embeddings on LiteLLM"
date: 2025-03-11T10:00:00
authors:
- name: Sameer Kankute
title: SWE @ LiteLLM (LLM Translation)
url: https://www.linkedin.com/in/sameer-kankute/
image_url: https://pbs.twimg.com/profile_images/2001352686994907136/ONgNuSk5_400x400.jpg
description: "Generate embeddings from text, images, audio, video, and PDFs with gemini-embedding-2-preview on LiteLLM via Gemini API and Vertex AI."
tags: [gemini, embeddings, multimodal, vertex ai]
hide_table_of_contents: false
---
import Tabs from '@theme/Tabs';
import TabItem from '@theme/TabItem';
# Gemini Embedding 2 Preview: Multimodal Embeddings
LiteLLM now supports **multimodal embeddings** with `gemini-embedding-2-preview`—generating a single embedding from a mix of text, images, audio, video, and PDF content. Available via both the **Gemini API** (API key) and **Vertex AI** (GCP credentials).
## Supported Input Types
| Modality | Supported Formats |
|----------|-------------------|
| **Text** | Plain text |
| **Image** | PNG, JPEG |
| **Audio** | MP3, WAV |
| **Video** | MP4, MOV |
| **Documents** | PDF |
## Input Formats
LiteLLM accepts three input formats for multimodal content:
1. **Data URIs** Base64-encoded inline: `data:image/png;base64,<encoded_data>`
2. **GCS URLs** Cloud Storage paths (Vertex AI): `gs://bucket/path/to/file.png`
3. **Gemini File References** Pre-uploaded files (Gemini API): `files/abc123`
## Quick Start
<Tabs>
<TabItem value="gemini" label="Gemini API">
```python
from litellm import embedding
import os
os.environ["GEMINI_API_KEY"] = "your-api-key"
# Text + Image (base64)
response = embedding(
model="gemini/gemini-embedding-2-preview",
input=[
"The food was delicious and the waiter...",
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
],
)
print(response)
```
</TabItem>
<TabItem value="vertex" label="Vertex AI">
```python
import litellm
from litellm import embedding
litellm.vertex_project = "your-project-id"
litellm.vertex_location = "us-central1"
# Text + Image (GCS URL)
response = embedding(
model="vertex_ai/gemini-embedding-2-preview",
input=[
"Describe this image",
"gs://my-bucket/images/photo.png"
],
)
print(response)
```
</TabItem>
<TabItem value="proxy" label="LiteLLM Proxy">
**1. Config (config.yaml)**
```yaml
model_list:
- model_name: gemini-embedding-2-preview
litellm_params:
model: gemini/gemini-embedding-2-preview
api_key: os.environ/GEMINI_API_KEY
- model_name: vertex-gemini-embedding-2-preview
litellm_params:
model: vertex_ai/gemini-embedding-2-preview
vertex_project: os.environ/VERTEXAI_PROJECT
vertex_location: os.environ/VERTEXAI_LOCATION
general_settings:
master_key: sk-1234
```
**2. Start proxy**
```bash
litellm --config config.yaml
```
**3. Call embeddings**
```bash
curl -X POST http://localhost:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-embedding-2-preview",
"input": [
"The food was delicious and the waiter...",
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
]
}'
```
</TabItem>
</Tabs>
## Input Format Examples
| Format | Example | Provider |
|--------|---------|----------|
| **Data URI** | `data:image/png;base64,...` | Gemini, Vertex AI |
| **GCS URL** | `gs://bucket/path/image.png` | Vertex AI |
| **File reference** | `files/abc123` | Gemini API only |
### Supported MIME Types for Data URIs
- **Images:** `image/png`, `image/jpeg`
- **Audio:** `audio/mpeg`, `audio/wav`
- **Video:** `video/mp4`, `video/quicktime`
- **Documents:** `application/pdf`
### GCS URL MIME Inference
For Vertex AI, MIME types are inferred from file extensions:
- `.png``image/png`
- `.jpg` / `.jpeg``image/jpeg`
- `.mp3``audio/mpeg`
- `.wav``audio/wav`
- `.mp4``video/mp4`
- `.mov``video/quicktime`
- `.pdf``application/pdf`
## Optional Parameters
| Parameter | Description | Maps to |
|-----------|-------------|---------|
| `dimensions` | Output embedding size | `outputDimensionality` |
```python
response = embedding(
model="gemini/gemini-embedding-2-preview",
input=["text to embed"],
dimensions=768, # Optional: control output vector size
)
```
@@ -514,6 +514,57 @@ All models listed [here](https://ai.google.dev/gemini-api/docs/models/gemini) ar
| Model Name | Function Call |
| :--- | :--- |
| text-embedding-004 | `embedding(model="gemini/text-embedding-004", input)` |
| gemini-embedding-2-preview | `embedding(model="gemini/gemini-embedding-2-preview", input)` | [Multimodal docs](#gemini-embedding-2-preview-multimodal) |
### Gemini Embedding 2 Preview (Multimodal)
`gemini-embedding-2-preview` supports **multimodal embeddings**—text, images, audio, video, and PDF in a single request. See [blog post](/blog/gemini_embedding_2_multimodal) for details.
**Input formats:**
- **Data URIs:** `data:image/png;base64,<encoded_data>`
- **Gemini file references:** `files/abc123` (pre-uploaded via Gemini Files API)
**Supported MIME types:** `image/png`, `image/jpeg`, `audio/mpeg`, `audio/wav`, `video/mp4`, `video/quicktime`, `application/pdf`
<Tabs>
<TabItem value="sdk" label="SDK">
```python
from litellm import embedding
import os
os.environ["GEMINI_API_KEY"] = ""
# Text + Image (base64)
response = embedding(
model="gemini/gemini-embedding-2-preview",
input=[
"The food was delicious and the waiter...",
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
],
)
print(response)
```
</TabItem>
<TabItem value="proxy" label="PROXY">
```bash
curl -X POST http://localhost:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "gemini-embedding-2-preview",
"input": [
"The food was delicious and the waiter...",
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
]
}'
```
</TabItem>
</Tabs>
**Optional:** `dimensions` maps to Gemini's `outputDimensionality`.
## Vertex AI Embedding Models
@@ -79,6 +79,7 @@ All models listed [here](https://github.com/BerriAI/litellm/blob/57f37f743886a02
| textembedding-gecko@003 | `embedding(model="vertex_ai/textembedding-gecko@003", input)` |
| text-embedding-preview-0409 | `embedding(model="vertex_ai/text-embedding-preview-0409", input)` |
| text-multilingual-embedding-preview-0409 | `embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input)` |
| gemini-embedding-2-preview | `embedding(model="vertex_ai/gemini-embedding-2-preview", input)` | [Multimodal docs](#gemini-embedding-2-preview-multimodal) |
| Fine-tuned OR Custom Embedding models | `embedding(model="vertex_ai/<your-model-id>", input)` |
### Supported OpenAI (Unified) Params
@@ -257,6 +258,71 @@ model_list:
## **Multi-Modal Embeddings**
### Gemini Embedding 2 Preview (Multimodal)
`gemini-embedding-2-preview` supports **unified multimodal embeddings**—text, images, audio, video, and PDF in a single request. See [blog post](/blog/gemini_embedding_2_multimodal) for details.
**Input formats:**
- **Data URIs:** `data:image/png;base64,<encoded_data>`
- **GCS URLs:** `gs://bucket/path/to/file.png` (MIME type inferred from extension)
**Supported MIME types:** `image/png`, `image/jpeg`, `audio/mpeg`, `audio/wav`, `video/mp4`, `video/quicktime`, `application/pdf`
<Tabs>
<TabItem value="sdk" label="SDK">
```python
import litellm
from litellm import embedding
litellm.vertex_project = "your-project-id"
litellm.vertex_location = "us-central1"
# Text + Image (GCS URL)
response = embedding(
model="vertex_ai/gemini-embedding-2-preview",
input=[
"Describe this image",
"gs://my-bucket/images/photo.png"
],
)
# Text + Image (base64)
response = embedding(
model="vertex_ai/gemini-embedding-2-preview",
input=[
"The food was delicious",
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
],
)
```
</TabItem>
<TabItem value="proxy" label="LiteLLM PROXY">
```yaml
model_list:
- model_name: vertex-gemini-embedding-2-preview
litellm_params:
model: vertex_ai/gemini-embedding-2-preview
vertex_project: "your-project-id"
vertex_location: "us-central1"
```
```bash
curl -X POST http://localhost:4000/embeddings \
-H "Authorization: Bearer sk-1234" \
-H "Content-Type: application/json" \
-d '{
"model": "vertex-gemini-embedding-2-preview",
"input": ["Describe this", "gs://bucket/image.png"]
}'
```
</TabItem>
</Tabs>
### multimodalembedding@001 (Legacy)
Known Limitations:
- Only supports 1 image / video / image per request