Jump to Content

Welcome to 
the  era

Chat with Gemini

The Gemini ecosystem represents
Google's
most capable AI.

Our Gemini models are built from the ground up for multimodality — reasoning seamlessly across text, images, audio, video, and code.

Latest updates

The Gemini era

Gemini represents a significant leap forward in how AI can help improve our daily lives.

Introducing
Gemini 1.5

Our next-generation model

Gemini 1.5 delivers dramatically enhanced performance with a more efficient architecture. The first model we’ve released for early testing, Gemini 1.5 Pro, introduces a breakthrough experimental feature in long-context understanding.

Reasoning about vast
amounts of information

Gemini 1.5 Pro can analyze and summarize the 402-page transcripts from Apollo 11’s mission to the moon.

Better understanding
across modalities

Gemini 1.5 Pro can perform highly sophisticated reasoning tasks for different modalities, like a silent Buster Keaton movie.

Problem-solving with
longer blocks of code

Gemini 1.5 Pro can reason across 100,000 lines of code giving helpful solutions, modifications and explanations.

Gemini comes in three model sizes

Ultra

1.0

Our most capable and largest model for highly-complex tasks.

Pro

1.01.5

Our best model for scaling across a wide range of tasks.

Nano

1.0

Our most efficient model for on-device tasks.

Meet the first version of Gemini— our most capable AI model.

Gemini 1.0 Ultra

90.0%

CoT@32*

86.4%

5-shot* (reported)
Previous SOTA (GPT-4)

*Note that evaluations of previous SOTA models use different prompting techniques.

Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of AI models.

Gemini 1.0 Ultra surpasses state-of-the-art performance on a range of benchmarks including text and coding.

TEXT

TEXT
Capability
Benchmark
Higher is better
Description
Gemini 1.0 Ultra
GPT-4API numbers calculated where reported numbers were missing
General
MMLURepresentation of questions in 57 subjects (incl. STEM, humanities, and others)
Representation of questions in 57 subjects (incl. STEM, humanities, and others)
90%CoT@32*
86.4%5-shot** (reported)
Reasoning
Big-Bench HardDiverse set of challenging tasks requiring multi-step reasoning
Diverse set of challenging tasks requiring multi-step reasoning
83.6%3-shot
83.1%3-shot (API)
DROPReading comprehension (F1 Score)
Reading comprehension (F1 Score)
82.4Variable shots
80.93-shot (reported)
HellaSwagCommonsense reasoning for everyday tasks
Commonsense reasoning for everyday tasks
87.8%10-shot*
95.3%10-shot* (reported)
Math
GSM8KBasic arithmetic manipulations (incl. Grade School math problems)
Basic arithmetic manipulations (incl. Grade School math problems)
94.4%maj1@32
92%5-shot CoT (reported)
MATHChallenging math problems (incl. algebra, geometry, pre-calculus, and others)
Challenging math problems (incl. algebra, geometry, pre-calculus, and others)
53.2%4-shot
52.9%4-shot (API)
Code
HumanEvalPython code generation
Python code generation
74.4%0-shot (IT)*
67%0-shot* (reported)
Natural2CodePython code generation. New held out dataset HumanEval-like, not leaked on the web
Python code generation. New held out dataset HumanEval-like, not leaked on the web
74.9%0-shot
73.9%0-shot (API)

*See the technical report for details on performance with other methodologies
**GPT-4 scores 87.29% with CoT@32—see the technical report for full comparison

Our Gemini 1.0 models surpass state-of-the-art performance on a range of multimodal benchmarks.

MULTIMODAL

MULTIMODAL
Capability
Benchmark
Description
Higher is better unless otherwise noted
Gemini
GPT-4VPrevious SOTA model listed when capability is not supported in GPT-4V
Image
MMMUMulti-discipline college-level reasoning problems
Multi-discipline college-level reasoning problems
59.4%0-shot pass@1
Gemini 1.0 Ultra (pixel only*)
56.8%0-shot pass@1
GPT-4V
VQAv2Natural image understanding
Natural image understanding
77.8%0-shot
Gemini 1.0 Ultra (pixel only*)
77.2%0-shot
GPT-4V
TextVQAOCR on natural images
OCR on natural images
82.3%0-shot
Gemini 1.0 Ultra (pixel only*)
78%0-shot
GPT-4V
DocVQADocument understanding
Document understanding
90.9%0-shot
Gemini 1.0 Ultra (pixel only*)
88.4%0-shot
GPT-4V (pixel only)
Infographic VQAInfographic understanding
Infographic understanding
80.3%0-shot
Gemini 1.0 Ultra (pixel only*)
75.1%0-shot
GPT-4V (pixel only)
MathVistaMathematical reasoning in visual contexts
Mathematical reasoning in visual contexts
53%0-shot
Gemini 1.0 Ultra (pixel only*)
49.9%0-shot
GPT-4V
Video
VATEXEnglish video captioning
(CIDEr)
English video captioning
(CIDEr)
62.74-shot
Gemini 1.0 Ultra
564-shot
DeepMind Flamingo
Perception Test MCQAVideo question answering
Video question answering
54.7%0-shot
Gemini 1.0 Ultra
46.3%0-shot
SeViLA
Audio
CoVoST 2 (21 languages)Automatic speech translation
(BLEU score)
Automatic speech translation
(BLEU score)
40.1Gemini 1.0 Pro
29.1Whisper v2
FLEURS (62 languages)Automatic speech recognition
(based on word error rate, lower is better)
Automatic speech recognition
(based on word error rate, lower is better)
7.6%Gemini 1.0 Pro
17.6%Whisper v3

*Gemini image benchmarks are pixel only—no assistance from OCR systems

Read the technical report

Anything to anything

Gemini models are natively multimodal, which gives you the potential to transform any type of input into any type of output.

Following content is a visual/ descriptive representation of the functionality of Gemini:

Gemini models can generate code based on different inputs you give it.

Could Gemini help make a demo based on this video?

Gemini
I see a murmuration of starlings, so I coded a flocking simulation.

Gemini models can generate text and images, combined.

Could Gemini show me ideas for what to make?

Gemini
Pink and blue mouse octopus knit

How about an octopus with blue and pink tentacles?

Gemini models can reason visually across languages.

User has uploaded a voice prompt asking 'Explain how I can play this', and uploaded a sheet of music.

Could Gemini explain what this means?

Gemini

I see the time signature is 6/8. This means there are 6 eighth notes in each measure.

The dynamic marking is piano, which means to play softly. Andante grazioso means to play at a graceful walking pace.

The potential of Gemini

Learn about what our Gemini models can do from some of the people who built it.

Image: two people standing at a table about to say something.

Taylor Applebaum and Sebastian Nowozin

Unlocking insights in scientific literature

Image: two people standing at a table with a computer in front of a curtain.

Rémi Leblond and Gabriela Surita

Excelling at competitive programming

Image: a person with glasses standing in a room, smiling and ready to speak.

Adrià Recasens

Processing and understanding raw audio signal end-to-end

Image: a person with glasses sitting in front of a computer speaking.

Sam Cheung

Explaining reasoning in math and physics

Image: a person with a beard standing in front of an open comptuer, similing and ready to speak.

Palash Nandy

Reasoning about user intent to generate bespoke experiences

Building and deploying
Gemini responsibly

We've built our Gemini models responsibly from the start, incorporating safeguards and working together with partners to make it safer and more inclusive.

Try Gemini Advanced with our most capable AI model

Line background

Build with Gemini

Integrate Gemini models into your applications with Google AI Studio and Google Cloud Vertex AI.

ai.google.dev