diff --git a/docs/my-website/blog/gemini_embedding_2_multimodal/index.md b/docs/my-website/blog/gemini_embedding_2_multimodal/index.md new file mode 100644 index 0000000000..8c09432e3b --- /dev/null +++ b/docs/my-website/blog/gemini_embedding_2_multimodal/index.md @@ -0,0 +1,169 @@ +--- +slug: gemini_embedding_2_multimodal +title: "Gemini Embedding 2 Preview: Multimodal Embeddings on LiteLLM" +date: 2025-03-11T10:00:00 +authors: + - name: Sameer Kankute + title: SWE @ LiteLLM (LLM Translation) + url: https://www.linkedin.com/in/sameer-kankute/ + image_url: https://pbs.twimg.com/profile_images/2001352686994907136/ONgNuSk5_400x400.jpg +description: "Generate embeddings from text, images, audio, video, and PDFs with gemini-embedding-2-preview on LiteLLM via Gemini API and Vertex AI." +tags: [gemini, embeddings, multimodal, vertex ai] +hide_table_of_contents: false +--- + +import Tabs from '@theme/Tabs'; +import TabItem from '@theme/TabItem'; + +# Gemini Embedding 2 Preview: Multimodal Embeddings + +LiteLLM now supports **multimodal embeddings** with `gemini-embedding-2-preview`—generating a single embedding from a mix of text, images, audio, video, and PDF content. Available via both the **Gemini API** (API key) and **Vertex AI** (GCP credentials). + +## Supported Input Types + +| Modality | Supported Formats | +|----------|-------------------| +| **Text** | Plain text | +| **Image** | PNG, JPEG | +| **Audio** | MP3, WAV | +| **Video** | MP4, MOV | +| **Documents** | PDF | + +## Input Formats + +LiteLLM accepts three input formats for multimodal content: + +1. **Data URIs** – Base64-encoded inline: `data:image/png;base64,` +2. **GCS URLs** – Cloud Storage paths (Vertex AI): `gs://bucket/path/to/file.png` +3. **Gemini File References** – Pre-uploaded files (Gemini API): `files/abc123` + +## Quick Start + + + + +```python +from litellm import embedding +import os + +os.environ["GEMINI_API_KEY"] = "your-api-key" + +# Text + Image (base64) +response = embedding( + model="gemini/gemini-embedding-2-preview", + input=[ + "The food was delicious and the waiter...", + "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII" + ], +) +print(response) +``` + + + + + +```python +import litellm +from litellm import embedding + +litellm.vertex_project = "your-project-id" +litellm.vertex_location = "us-central1" + +# Text + Image (GCS URL) +response = embedding( + model="vertex_ai/gemini-embedding-2-preview", + input=[ + "Describe this image", + "gs://my-bucket/images/photo.png" + ], +) +print(response) +``` + + + + + +**1. Config (config.yaml)** + +```yaml +model_list: + - model_name: gemini-embedding-2-preview + litellm_params: + model: gemini/gemini-embedding-2-preview + api_key: os.environ/GEMINI_API_KEY + - model_name: vertex-gemini-embedding-2-preview + litellm_params: + model: vertex_ai/gemini-embedding-2-preview + vertex_project: os.environ/VERTEXAI_PROJECT + vertex_location: os.environ/VERTEXAI_LOCATION + +general_settings: + master_key: sk-1234 +``` + +**2. Start proxy** + +```bash +litellm --config config.yaml +``` + +**3. Call embeddings** + +```bash +curl -X POST http://localhost:4000/embeddings \ + -H "Authorization: Bearer sk-1234" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "gemini-embedding-2-preview", + "input": [ + "The food was delicious and the waiter...", + "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII" + ] + }' +``` + + + + +## Input Format Examples + +| Format | Example | Provider | +|--------|---------|----------| +| **Data URI** | `data:image/png;base64,...` | Gemini, Vertex AI | +| **GCS URL** | `gs://bucket/path/image.png` | Vertex AI | +| **File reference** | `files/abc123` | Gemini API only | + +### Supported MIME Types for Data URIs + +- **Images:** `image/png`, `image/jpeg` +- **Audio:** `audio/mpeg`, `audio/wav` +- **Video:** `video/mp4`, `video/quicktime` +- **Documents:** `application/pdf` + +### GCS URL MIME Inference + +For Vertex AI, MIME types are inferred from file extensions: + +- `.png` → `image/png` +- `.jpg` / `.jpeg` → `image/jpeg` +- `.mp3` → `audio/mpeg` +- `.wav` → `audio/wav` +- `.mp4` → `video/mp4` +- `.mov` → `video/quicktime` +- `.pdf` → `application/pdf` + +## Optional Parameters + +| Parameter | Description | Maps to | +|-----------|-------------|---------| +| `dimensions` | Output embedding size | `outputDimensionality` | + +```python +response = embedding( + model="gemini/gemini-embedding-2-preview", + input=["text to embed"], + dimensions=768, # Optional: control output vector size +) +``` diff --git a/docs/my-website/docs/embedding/supported_embedding.md b/docs/my-website/docs/embedding/supported_embedding.md index 11ca4da48a..87acd0b33a 100644 --- a/docs/my-website/docs/embedding/supported_embedding.md +++ b/docs/my-website/docs/embedding/supported_embedding.md @@ -514,6 +514,57 @@ All models listed [here](https://ai.google.dev/gemini-api/docs/models/gemini) ar | Model Name | Function Call | | :--- | :--- | | text-embedding-004 | `embedding(model="gemini/text-embedding-004", input)` | +| gemini-embedding-2-preview | `embedding(model="gemini/gemini-embedding-2-preview", input)` | [Multimodal docs](#gemini-embedding-2-preview-multimodal) | + +### Gemini Embedding 2 Preview (Multimodal) + +`gemini-embedding-2-preview` supports **multimodal embeddings**—text, images, audio, video, and PDF in a single request. See [blog post](/blog/gemini_embedding_2_multimodal) for details. + +**Input formats:** +- **Data URIs:** `data:image/png;base64,` +- **Gemini file references:** `files/abc123` (pre-uploaded via Gemini Files API) + +**Supported MIME types:** `image/png`, `image/jpeg`, `audio/mpeg`, `audio/wav`, `video/mp4`, `video/quicktime`, `application/pdf` + + + + +```python +from litellm import embedding +import os +os.environ["GEMINI_API_KEY"] = "" + +# Text + Image (base64) +response = embedding( + model="gemini/gemini-embedding-2-preview", + input=[ + "The food was delicious and the waiter...", + "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII" + ], +) +print(response) +``` + + + + +```bash +curl -X POST http://localhost:4000/embeddings \ + -H "Authorization: Bearer sk-1234" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "gemini-embedding-2-preview", + "input": [ + "The food was delicious and the waiter...", + "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII" + ] + }' +``` + + + + +**Optional:** `dimensions` maps to Gemini's `outputDimensionality`. ## Vertex AI Embedding Models diff --git a/docs/my-website/docs/providers/vertex_embedding.md b/docs/my-website/docs/providers/vertex_embedding.md index 5656ade337..9b530f2ae0 100644 --- a/docs/my-website/docs/providers/vertex_embedding.md +++ b/docs/my-website/docs/providers/vertex_embedding.md @@ -79,6 +79,7 @@ All models listed [here](https://github.com/BerriAI/litellm/blob/57f37f743886a02 | textembedding-gecko@003 | `embedding(model="vertex_ai/textembedding-gecko@003", input)` | | text-embedding-preview-0409 | `embedding(model="vertex_ai/text-embedding-preview-0409", input)` | | text-multilingual-embedding-preview-0409 | `embedding(model="vertex_ai/text-multilingual-embedding-preview-0409", input)` | +| gemini-embedding-2-preview | `embedding(model="vertex_ai/gemini-embedding-2-preview", input)` | [Multimodal docs](#gemini-embedding-2-preview-multimodal) | | Fine-tuned OR Custom Embedding models | `embedding(model="vertex_ai/", input)` | ### Supported OpenAI (Unified) Params @@ -257,6 +258,71 @@ model_list: ## **Multi-Modal Embeddings** +### Gemini Embedding 2 Preview (Multimodal) + +`gemini-embedding-2-preview` supports **unified multimodal embeddings**—text, images, audio, video, and PDF in a single request. See [blog post](/blog/gemini_embedding_2_multimodal) for details. + +**Input formats:** +- **Data URIs:** `data:image/png;base64,` +- **GCS URLs:** `gs://bucket/path/to/file.png` (MIME type inferred from extension) + +**Supported MIME types:** `image/png`, `image/jpeg`, `audio/mpeg`, `audio/wav`, `video/mp4`, `video/quicktime`, `application/pdf` + + + + +```python +import litellm +from litellm import embedding + +litellm.vertex_project = "your-project-id" +litellm.vertex_location = "us-central1" + +# Text + Image (GCS URL) +response = embedding( + model="vertex_ai/gemini-embedding-2-preview", + input=[ + "Describe this image", + "gs://my-bucket/images/photo.png" + ], +) + +# Text + Image (base64) +response = embedding( + model="vertex_ai/gemini-embedding-2-preview", + input=[ + "The food was delicious", + "data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAgAAAAIAQMAAAD+wSzIAAAABlBMVEX///+/v7+jQ3Y5AAAADklEQVQI12P4AIX8EAgALgAD/aNpbtEAAAAASUVORK5CYII" + ], +) +``` + + + + +```yaml +model_list: + - model_name: vertex-gemini-embedding-2-preview + litellm_params: + model: vertex_ai/gemini-embedding-2-preview + vertex_project: "your-project-id" + vertex_location: "us-central1" +``` + +```bash +curl -X POST http://localhost:4000/embeddings \ + -H "Authorization: Bearer sk-1234" \ + -H "Content-Type: application/json" \ + -d '{ + "model": "vertex-gemini-embedding-2-preview", + "input": ["Describe this", "gs://bucket/image.png"] + }' +``` + + + + +### multimodalembedding@001 (Legacy) Known Limitations: - Only supports 1 image / video / image per request