Google releases Gemini Embedding 2 AI model with multimodal support

Google has released the new Gemini Embedding 2 model in public preview. Here's what it offers over its predecessor.

Alongside new AI features for its Workspace apps, Google also released the new Gemini Embedding 2 AI model. It's the first native multimodal embedding model from the search giant that maps text, images, video, and documents into a single embedding space.

For the uninitiated, embedding models differ from generative models (such as Gemini 3) in that they are used for "understanding" by converting different modalities (text, images, or video) into a mathematical format, called vectors, that a machine can easily read and analyze. These embeddings can provide more context-aware results than keyword-based approaches through semantic search, classification, and clustering.

The first Google Embedding model was text-only. Now, Gemini Embedding 2 can map text, images, videos, audio, and documents into a unified embedding space and capture semantic intent across 100 languages. Gemini Embedding 2 comes with the following limits for different modalities:

Text: a context window of up to 8192 tokens
Images: up to six images per request with support for PNG/JPEG formats
Videos: up to 120 seconds of video input in MP4/MOV formats
Audio: ingests and embeds audio data without needing intermediate transcriptions
Documents: embed PDFs up to six pages

Google explained in a blog post that the new model "simplifies complex pipelines and enhances a wide variety of multimodal downstream tasks—from Retrieval-Augmented Generation (RAG) and semantic search to sentiment analysis and data clustering." It can analyze complex relationships among different media types by taking multiple modalities of input (such as images + text) in a single request.

Speaking of examples, the search giant noted that Gemini embeddings can help legal professionals find critical information during the discovery process in litigation. It was found that Gemini's multimodal embedding improved precision and recall across millions of records and enhanced image and video search.

Gemini Embeddings 2 (gemini-embedding-2-preview) is now available in public preview via the Gemini API and Vertex AI. Meanwhile, gemini-embedding-001 is still available to use for text-only use cases.