Jump to Content

Welcome to 
the  era

Chat with Gemini

The Gemini ecosystem represents
Google's
most capable AI.

Our Gemini models are built from the ground up for multimodality — reasoning seamlessly across text, images, audio, video, and code.

Latest updates

The Gemini era

Gemini represents a significant leap forward in how AI can help improve our daily lives.

Read the blog post

Meet the first version of Gemini— our most capable AI model.

Gemini Ultra

90.0%

CoT@32*

86.4%

5-shot* (reported)
Previous SOTA (GPT-4)

*Note that evaluations of previous SOTA models use different prompting techniques.

Gemini is the first model to outperform human experts on MMLU (Massive Multitask Language Understanding), one of the most popular methods to test the knowledge and problem solving abilities of AI models.

Our Gemini model surpasses state-of-the-art performance on a range of benchmarks including text and coding.

TEXT

TEXT
Capability
Benchmark
Higher is better
Description
Gemini Ultra
GPT-4API numbers calculated where reported numbers were missing
General
MMLURepresentation of questions in 57 subjects (incl. STEM, humanities, and others)
Representation of questions in 57 subjects (incl. STEM, humanities, and others)
90%CoT@32*
86.4%5-shot** (reported)
Reasoning
Big-Bench HardDiverse set of challenging tasks requiring multi-step reasoning
Diverse set of challenging tasks requiring multi-step reasoning
83.6%3-shot
83.1%3-shot (API)
DROPReading comprehension (F1 Score)
Reading comprehension (F1 Score)
82.4Variable shots
80.93-shot (reported)
HellaSwagCommonsense reasoning for everyday tasks
Commonsense reasoning for everyday tasks
87.8%10-shot*
95.3%10-shot* (reported)
Math
GSM8KBasic arithmetic manipulations (incl. Grade School math problems)
Basic arithmetic manipulations (incl. Grade School math problems)
94.4%maj1@32
92%5-shot CoT (reported)
MATHChallenging math problems (incl. algebra, geometry, pre-calculus, and others)
Challenging math problems (incl. algebra, geometry, pre-calculus, and others)
53.2%4-shot
52.9%4-shot (API)
Code
HumanEvalPython code generation
Python code generation
74.4%0-shot (IT)*
67%0-shot* (reported)
Natural2CodePython code generation. New held out dataset HumanEval-like, not leaked on the web
Python code generation. New held out dataset HumanEval-like, not leaked on the web
74.9%0-shot
73.9%0-shot (API)

*See the technical report for details on performance with other methodologies
**GPT-4 scores 87.29% with CoT@32—see the technical report for full comparison

Our Gemini model surpasses state-of-the-art performance on a range of multimodal benchmarks.

MULTIMODAL

MULTIMODAL
Capability
Benchmark
Description
Higher is better unless otherwise noted
Gemini
GPT-4VPrevious SOTA model listed when capability is not supported in GPT-4V
Image
MMMUMulti-discipline college-level reasoning problems
Multi-discipline college-level reasoning problems
59.4%0-shot pass@1
Gemini Ultra (pixel only*)
56.8%0-shot pass@1
GPT-4V
VQAv2Natural image understanding
Natural image understanding
77.8%0-shot
Gemini Ultra (pixel only*)
77.2%0-shot
GPT-4V
TextVQAOCR on natural images
OCR on natural images
82.3%0-shot
Gemini Ultra (pixel only*)
78%0-shot
GPT-4V
DocVQADocument understanding
Document understanding
90.9%0-shot
Gemini Ultra (pixel only*)
88.4%0-shot
GPT-4V (pixel only)
Infographic VQAInfographic understanding
Infographic understanding
80.3%0-shot
Gemini Ultra (pixel only*)
75.1%0-shot
GPT-4V (pixel only)
MathVistaMathematical reasoning in visual contexts
Mathematical reasoning in visual contexts
53%0-shot
Gemini Ultra (pixel only*)
49.9%0-shot
GPT-4V
Video
VATEXEnglish video captioning
(CIDEr)
English video captioning
(CIDEr)
62.74-shot
Gemini Ultra
564-shot
DeepMind Flamingo
Perception Test MCQAVideo question answering
Video question answering
54.7%0-shot
Gemini Ultra
46.3%0-shot
SeViLA
Audio
CoVoST 2 (21 languages)Automatic speech translation
(BLEU score)
Automatic speech translation
(BLEU score)
40.1Gemini Pro
29.1Whisper v2
FLEURS (62 languages)Automatic speech recognition
(based on word error rate, lower is better)
Automatic speech recognition
(based on word error rate, lower is better)
7.6%Gemini Pro
17.6%Whisper v3

*Gemini image benchmarks are pixel only—no assistance from OCR systems

Read the technical report

Gemini comes in three model sizes

Ultra

Our most capable and largest model for highly-complex tasks.

Pro

Our best model for scaling across a wide range of tasks.

Nano

Our most efficient model for on-device tasks.

Anything to anything

Gemini models are natively multimodal, which gives you the potential to transform any type of input into any type of output.

Following content is a visual/ descriptive representation of the functionality of Gemini:

Gemini models can generate code based on different inputs you give it.

Could Gemini help make a demo based on this video?

Gemini
I see a murmuration of starlings, so I coded a flocking simulation.

Gemini models can generate text and images, combined.

Could Gemini show me ideas for what to make?

Gemini
Pink and blue mouse octopus knit

How about an octopus with blue and pink tentacles?

Gemini models can reason visually across languages.

User has uploaded a voice prompt asking 'Explain how I can play this', and uploaded a sheet of music.

Could Gemini explain what this means?

Gemini

I see the time signature is 6/8. This means there are 6 eighth notes in each measure.

The dynamic marking is piano, which means to play softly. Andante grazioso means to play at a graceful walking pace.

Hands-on with our Gemini model

Watch highlights from testing of our Gemini model's multimodal reasoning capabilities. Curious to learn more? Explore our prompting techniques here.










The potential of Gemini

Learn about what our Gemini models can do from some of the people who built it.

Read the blog post
Image: two people standing at a table about to say something.

Taylor Applebaum and Sebastian Nowozin

Unlocking insights in scientific literature

Image: two people standing at a table with a computer in front of a curtain.

Rémi Leblond and Gabriela Surita

Excelling at competitive programming

Image: a person with glasses standing in a room, smiling and ready to speak.

Adrià Recasens

Processing and understanding raw audio signal end-to-end

Image: a person with glasses sitting in front of a computer speaking.

Sam Cheung

Explaining reasoning in math and physics

Image: a person with a beard standing in front of an open comptuer, similing and ready to speak.

Palash Nandy

Reasoning about user intent to generate bespoke experiences

Building and deploying
Gemini responsibly

We've built our Gemini models responsibly from the start, incorporating safeguards and working together with partners to make it safer and more inclusive.

Try Gemini Advanced with our most capable AI model

Line background

Build with Gemini

Integrate Gemini models into your applications with Google AI Studio and Google Cloud Vertex AI.

ai.google.dev

See more of what #GeminiAI can do

  • Can Gemini turn images into code?
  • Can Gemini find similarities between two images?
  • Can Gemini understand unusual emojis?
  • Can Gemini understand outfits?
  • Can Gemini guess the movie?
  • Can Gemini make sense of my environment?