Add embedding model documentation

2026-06-25 17:07:16 +00:00 · 2026-03-11 11:02:49 +05:30
parent 8c5478df70
commit 1c144fc896
3 changed files with 286 additions and 0 deletions
@@ -0,0 +1,169 @@
+---
+slug: gemini_embedding_2_multimodal
+title: "Gemini Embedding 2 Preview: Multimodal Embeddings on LiteLLM"
+date: 2025-03-11T10:00:00
+authors:
+  - name: Sameer Kankute
+    title: SWE @ LiteLLM (LLM Translation)
+    url: https://www.linkedin.com/in/sameer-kankute/
+    image_url: https://pbs.twimg.com/profile_images/2001352686994907136/ONgNuSk5_400x400.jpg
+description: "Generate embeddings from text, images, audio, video, and PDFs with gemini-embedding-2-preview on LiteLLM via Gemini API and Vertex AI."
+tags: [gemini, embeddings, multimodal, vertex ai]
+hide_table_of_contents: false
+---
+
+import Tabs from '@theme/Tabs';
+import TabItem from '@theme/TabItem';
+
+# Gemini Embedding 2 Preview: Multimodal Embeddings
+
+LiteLLM now supports **multimodal embeddings** with `gemini-embedding-2-preview`—generating a single embedding from a mix of text, images, audio, video, and PDF content. Available via both the **Gemini API** (API key) and **Vertex AI** (GCP credentials).
+
+## Supported Input Types
+
+| Modality | Supported Formats | 
+|----------|-------------------|
+| **Text** | Plain text |
+| **Image** | PNG, JPEG | 
+| **Audio** | MP3, WAV | 
+| **Video** | MP4, MOV | 
+| **Documents** | PDF | 
+
+## Input Formats
+
+LiteLLM accepts three input formats for multimodal content:
+
+1. **Data URIs** – Base64-encoded inline: `data:image/png;base64,<encoded_data>`
+2. **GCS URLs** – Cloud Storage paths (Vertex AI): `gs://bucket/path/to/file.png`
+3. **Gemini File References** – Pre-uploaded files (Gemini API): `files/abc123`
+
+## Quick Start
+
+<Tabs>
+<TabItem value="gemini" label="Gemini API">
+
+```python
+from litellm import embedding
+import os
+
+os.environ["GEMINI_API_KEY"] = "your-api-key"
+
+# Text + Image (base64)
+response = embedding(
+    model="gemini/gemini-embedding-2-preview",
+    input=[
+        "The food was delicious and the waiter...",
+        "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
+    ],
+)
+print(response)
+```
+
+</TabItem>
+
+<TabItem value="vertex" label="Vertex AI">
+
+```python
+import litellm
+from litellm import embedding
+
+litellm.vertex_project = "your-project-id"
+litellm.vertex_location = "us-central1"
+
+# Text + Image (GCS URL)
+response = embedding(
+    model="vertex_ai/gemini-embedding-2-preview",
+    input=[
+        "Describe this image",
+        "gs://my-bucket/images/photo.png"
+    ],
+)
+print(response)
+```
+
+</TabItem>
+
+<TabItem value="proxy" label="LiteLLM Proxy">
+
+**1. Config (config.yaml)**
+
+```yaml
+model_list:
+  - model_name: gemini-embedding-2-preview
+    litellm_params:
+      model: gemini/gemini-embedding-2-preview
+      api_key: os.environ/GEMINI_API_KEY
+  - model_name: vertex-gemini-embedding-2-preview
+    litellm_params:
+      model: vertex_ai/gemini-embedding-2-preview
+      vertex_project: os.environ/VERTEXAI_PROJECT
+      vertex_location: os.environ/VERTEXAI_LOCATION
+
+general_settings:
+  master_key: sk-1234
+```
+
+**2. Start proxy**
+
+```bash
+litellm --config config.yaml
+```
+
+**3. Call embeddings**
+
+```bash
+curl -X POST http://localhost:4000/embeddings \
+  -H "Authorization: Bearer sk-1234" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-embedding-2-preview",
+    "input": [
+      "The food was delicious and the waiter...",
+      "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
+    ]
+  }'
+```
+
+</TabItem>
+</Tabs>
+
+## Input Format Examples
+
+| Format | Example | Provider |
+|--------|---------|----------|
+| **Data URI** | `data:image/png;base64,...` | Gemini, Vertex AI |
+| **GCS URL** | `gs://bucket/path/image.png` | Vertex AI |
+| **File reference** | `files/abc123` | Gemini API only |
+
+### Supported MIME Types for Data URIs
+
+- **Images:** `image/png`, `image/jpeg`
+- **Audio:** `audio/mpeg`, `audio/wav`
+- **Video:** `video/mp4`, `video/quicktime`
+- **Documents:** `application/pdf`
+
+### GCS URL MIME Inference
+
+For Vertex AI, MIME types are inferred from file extensions:
+
+- `.png` → `image/png`
+- `.jpg` / `.jpeg` → `image/jpeg`
+- `.mp3` → `audio/mpeg`
+- `.wav` → `audio/wav`
+- `.mp4` → `video/mp4`
+- `.mov` → `video/quicktime`
+- `.pdf` → `application/pdf`
+
+## Optional Parameters
+
+| Parameter | Description | Maps to |
+|-----------|-------------|---------|
+| `dimensions` | Output embedding size | `outputDimensionality` |
+
+```python
+response = embedding(
+    model="gemini/gemini-embedding-2-preview",
+    input=["text to embed"],
+    dimensions=768,  # Optional: control output vector size
+)
+```
@@ -514,6 +514,57 @@ All models listed [here](https://ai.google.dev/gemini-api/docs/models/gemini) ar
 | Model Name         | Function Call                                         |
 | :---               | :---                                                  |
 | text-embedding-004 | `embedding(model="gemini/text-embedding-004", input)` |
+| gemini-embedding-2-preview | `embedding(model="gemini/gemini-embedding-2-preview", input)` | [Multimodal docs](#gemini-embedding-2-preview-multimodal) |
+
+### Gemini Embedding 2 Preview (Multimodal)
+
+`gemini-embedding-2-preview` supports **multimodal embeddings**—text, images, audio, video, and PDF in a single request. See [blog post](/blog/gemini_embedding_2_multimodal) for details.
+
+**Input formats:**
+- **Data URIs:** `data:image/png;base64,<encoded_data>`
+- **Gemini file references:** `files/abc123` (pre-uploaded via Gemini Files API)
+
+**Supported MIME types:** `image/png`, `image/jpeg`, `audio/mpeg`, `audio/wav`, `video/mp4`, `video/quicktime`, `application/pdf`
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+from litellm import embedding
+import os
+os.environ["GEMINI_API_KEY"] = ""
+
+# Text + Image (base64)
+response = embedding(
+    model="gemini/gemini-embedding-2-preview",
+    input=[
+        "The food was delicious and the waiter...",
+        "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
+    ],
+)
+print(response)
+```
+
+</TabItem>
+<TabItem value="proxy" label="PROXY">
+
+```bash
+curl -X POST http://localhost:4000/embeddings \
+  -H "Authorization: Bearer sk-1234" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "gemini-embedding-2-preview",
+    "input": [
+      "The food was delicious and the waiter...",
+      "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
+    ]
+  }'
+```
+
+</TabItem>
+</Tabs>
+
+**Optional:** `dimensions` maps to Gemini's `outputDimensionality`.


 ## Vertex AI Embedding Models
@@ -79,6 +79,7 @@ All models listed [here](https://github.com/BerriAI/litellm/blob/57f37f743886a02
 | textembedding-gecko@003 | `embedding(model="vertex_ai/textembedding-gecko@003", input)` | 
 | text-embedding-preview-0409 | `embedding(model="vertex_ai/text-embedding-preview-0409", input)` |
 | text-multilingual-embedding-preview-0409 | `embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input)` | 
+| gemini-embedding-2-preview | `embedding(model="vertex_ai/gemini-embedding-2-preview", input)` | [Multimodal docs](#gemini-embedding-2-preview-multimodal) |
 | Fine-tuned OR Custom Embedding models | `embedding(model="vertex_ai/<your-model-id>", input)` | 

 ### Supported OpenAI (Unified) Params
@@ -257,6 +258,71 @@ model_list:

 ## **Multi-Modal Embeddings**

+### Gemini Embedding 2 Preview (Multimodal)
+
+`gemini-embedding-2-preview` supports **unified multimodal embeddings**—text, images, audio, video, and PDF in a single request. See [blog post](/blog/gemini_embedding_2_multimodal) for details.
+
+**Input formats:**
+- **Data URIs:** `data:image/png;base64,<encoded_data>`
+- **GCS URLs:** `gs://bucket/path/to/file.png` (MIME type inferred from extension)
+
+**Supported MIME types:** `image/png`, `image/jpeg`, `audio/mpeg`, `audio/wav`, `video/mp4`, `video/quicktime`, `application/pdf`
+
+<Tabs>
+<TabItem value="sdk" label="SDK">
+
+```python
+import litellm
+from litellm import embedding
+
+litellm.vertex_project = "your-project-id"
+litellm.vertex_location = "us-central1"
+
+# Text + Image (GCS URL)
+response = embedding(
+    model="vertex_ai/gemini-embedding-2-preview",
+    input=[
+        "Describe this image",
+        "gs://my-bucket/images/photo.png"
+    ],
+)
+
+# Text + Image (base64)
+response = embedding(
+    model="vertex_ai/gemini-embedding-2-preview",
+    input=[
+        "The food was delicious",
+        "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII"
+    ],
+)
+```
+
+</TabItem>
+<TabItem value="proxy" label="LiteLLM PROXY">
+
+```yaml
+model_list:
+  - model_name: vertex-gemini-embedding-2-preview
+    litellm_params:
+      model: vertex_ai/gemini-embedding-2-preview
+      vertex_project: "your-project-id"
+      vertex_location: "us-central1"
+```
+
+```bash
+curl -X POST http://localhost:4000/embeddings \
+  -H "Authorization: Bearer sk-1234" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "vertex-gemini-embedding-2-preview",
+    "input": ["Describe this", "gs://bucket/image.png"]
+  }'
+```
+
+</TabItem>
+</Tabs>
+
+### multimodalembedding@001 (Legacy)

 Known Limitations:
 - Only supports 1 image / video / image per request