Gemini Embedding 2

State-of-the-art multimodal embedding model

Maps text, images, videos, audio, and documents into a single, unified embedding space to capture the semantic relationships across data

Read blog

Capabilities
Performance
Hands-on
Showcase
Model information

Capabilities

Gemini Embedding 2 enables enhanced understanding of multimodal data for downstream tasks, from retrieval and classification to clustering and recommendations

Natively multimodal

Understands different modalities and interleaved inputs, eliminating the need for separate embedding models and reducing pipeline complexity.

Contextual grounding for RAG

Anchors agents and applications by providing the deep context necessary for accurate, grounded retrieval.

Semantic alignment across modalities

Powers multimodal search and clustering without requiring users to have cross-modality aligned data.

Powerful performance, minimal footprint

Supports large input length and uses Matryoshka Representation Learning for flexible output dimensions that maintain accuracy at smaller sizes.

Multilingual understanding

Captures conceptual meaning in over 100 languages, resulting in more consistent representations for cross-lingual tasks.

Performance

State-of-the-art results on a range of cross-modal benchmarks

Metric type	Metric name	Gemini Embedding 2	gemini-embedding- 001 Legacy text-only Google model	multimodalembedding @001 Legacy multimodal Google model	Amazon Nova 2 Multimodal Embeddings	Voyage Multimodal 3.5
Text-Text	MTEB (Multilingual) Mean (Task)	69.9	68.4	—	63.8**	58.5***
Text-Text	MTEB (Code) Mean (Task)	84.0	76.0	—	*	*
Text-Image	TextCaps recall@1	89.6	—	74.0	76.0	79.4
Text-Image	Docci recall@1	93.4	—	—	84.0	83.8
Image-Text	TextCaps recall@1	97.4	—	88.1	88.9	88.6
Image-Text	Docci recall@1	91.3	—	—	76.5	77.4
Text-Document	ViDoRe v2 ndcg@10	64.9	—	28.9	60.6	65.5**
Text-Video	Vatex ndcg@10	68.8	—	54.9	60.3	55.2
	MSR-VTT ndcg@10	68.0	—	57.9	67.0	63.0**
	Youcook2 ndcg@10	52.5	—	34.9	34.7	31.4**
Speech-Text	MSEB mrr@10	73.9	—	—	*	—
Speech-Text	MSEB (ASR)**** mrr@10	70.4	—	—	*	—

* score not available
** self-reported
*** voyage-3.5
**** ASR model converts audio queries to text

Hands-on

Generate embeddings and explore how you can use them

Multimodal Search with Gemini Embedding 2

Surface the most relevant matches across modalities by calculating semantic similarity.

Showcase

From enhancing RAG systems to unlocking deeper data insights, companies are already using Gemini Embedding 2 to unlock high-value multimodal applications

“Empowering our teams to seamlessly search past and present content has increasingly driven us to vector search. While initially seeing great results with traditional large text embeddings (3,072 dim), crowding in vector space quickly took over; the right results couldn't reliably surface their way up from the noise. Gemini's new Embedding 2 model completely changed the game. Text queries can now pinpoint untranscribed micro-expressions, and we can even leverage existing media, such as a photo or B-roll clip, as the search input to instantly retrieve matching video assets. This propelled our text-to-video Recall@1 rate to 85.3%.”

Seth Georgion
VP Technology Innovation, Paramount Skydance

“We chose Gemini Embedding 2 to help legal professionals find critical information during the discovery process in litigation – a highly technical challenge in a high-stakes setting, and one Gemini excels at. In our most recent tests, Gemini's multi-modal embedding model improves precision and recall across millions of records, while unlocking powerful new search functionality for images and videos. For legal professionals, these new capabilities open up entirely novel ways to quickly understand case materials in even the largest matters.”

Max Christoff
CTO, Everlaw

“Gemini Embedding 2 is the foundation for Sparkonomony’s Creative Economic Equality Engine. Its native multi-modality slashes our latency by up to 70% by removing LLM inference and nearly doubles semantic similarity scores for text-image and text-video pairs-leaping from 0.4 to 0.8. This powers our proprietary Creator Genome to index millions of minutes of video, alongside images and text, with unprecedented precision—unlocking unbiased brand collaborations and democratizing economic success for every creator.”

Guneet Singh
Co-founder, Sparkonomy

“The API continuity is excellent. Gemini Embedding 2 drops right into our existing workflow with minimal changes. We're testing new ways to embed text-based conversational memories together with audio and visual embed-dings, especially assistant question-and-answer pairs, and seeing a 20% lift in top-1 recall for our personal wellness app.”

Ertuğrul Çavuşoğlu
Co-founder, Mindlid

Model information

Name

Embedding 2

Status

Generally available

Input

Output

Embeddings

Input tokens

8,192

Dimension sizes

128 - 3072

Availability

Gemini API
Gemini Enterprise Agent Platform

Documentation

View Gemini API docs
View Google Cloud docs

Explore our next generation AI systems

Our latest AI breakthroughs and updates from the lab

Unlocking a new era of discovery with AI

Our mission is to build AI responsibly to benefit humanity