Gemini 2.0
Built for the agentic era
Unlock a new era of agentic experiences with our most capable AI model yet.
Introducing the 2.0 model family
Native in, native out
2.0 Flash Experimental introduces improved capabilities like native tool use and for the first time, Gemini can also natively create images and generate speech.
One step closer to a universal AI assistant
Gemini 2.0 unlocks new possibilities for AI agents - intelligent systems that can use memory, reasoning, and planning to complete tasks for you. All under your supervision.
-
Taking action
Agents can follow instructions and take helpful actions under your supervision.
-
Tool use
Agents can search for information, look up reviews, translate and more.
-
Real-time streaming
Agents respond seamlessly to live audio and video input.
Agents using multimodal understanding
A research prototype exploring future capabilities of a universal AI assistant.
Agents that can help you accomplish complex tasks
A research prototype exploring the future of human-agent interaction, starting with your browser.
Agents in other domains
Performance
Gemini 2.0 is our most capable model yet, building on the strengths of our previous generations.
Benchmarks
Enhanced capabilities against a wide range of benchmarks.
Capability | Benchmark | Description | Gemini 1.5 Flash 002 | Gemini 1.5 Pro 002 | Gemini 2.0 Flash Experimental |
---|---|---|---|---|---|
General |
MMLU-Pro
Enhanced version of popular MMLU dataset with questions across multiple subjects with higher difficulty tasks
|
Enhanced version of popular MMLU dataset with questions across multiple subjects with higher difficulty tasks | 67.3% | 75.8% | 76.4% |
Code |
Natural2Code
Code generation across Python, Java, C++, JS, Go. Held out dataset HumanEval-like, not leaked on the web
|
Code generation across Python, Java, C++, JS, Go. Held out dataset HumanEval-like, not leaked on the web | 79.8% | 85.4% | 92.9% |
Code |
Bird-SQL (Dev)
Benchmark evaluating converting natural language questions into executable SQL
|
Benchmark evaluating converting natural language questions into executable SQL | 45.6% | 54.4% | 56.9% |
Code |
LiveCodeBench
Code generation in Python. Code Generation subset covering more recent examples: 06/01/2024 - 10/05/2024
|
Code generation in Python. Code Generation subset covering more recent examples: 06/01/2024 - 10/05/2024 | 30.0% | 34.3% | 35.1% |
Factuality |
FACTS Grounding
Ability to provide factuality correct responses given documents and diverse user requests. Held out internal dataset
|
Ability to provide factuality correct responses given documents and diverse user requests. Held out internal dataset | 82.9% | 80.0% | 83.6% |
Math |
MATH
Challenging math problems (incl. algebra, geometry, pre-calculus, and others)
|
Challenging math problems (incl. algebra, geometry, pre-calculus, and others) | 77.9% | 86.5% | 89.7% |
Math |
HiddenMath
Competition-level math problems. Held out dataset AIME/AMC-like, crafted by experts and not leaked on the web
|
Competition-level math problems. Held out dataset AIME/AMC-like, crafted by experts and not leaked on the web | 47.2% | 52.0% | 63.0% |
Reasoning |
GPQA (diamond)
Challenging dataset of questions written by domain experts in biology, physics, and chemistry
|
Challenging dataset of questions written by domain experts in biology, physics, and chemistry | 51.0% | 59.1% | 62.1% |
Long-context |
MRCR (1M)
Novel, diagnostic long-context understanding evaluation
|
Novel, diagnostic long-context understanding evaluation | 71.9% | 82.6% | 69.2% |
Image |
MMMU
Multi-discipline college-level multimodal understanding and reasoning problems
|
Multi-discipline college-level multimodal understanding and reasoning problems | 62.3% | 65.9% | 70.7% |
Image |
Vibe-Eval (Reka)
Visual understanding in chat models with challenging everyday examples. Evaluated with a Gemini Flash model as a rater
|
Visual understanding in chat models with challenging everyday examples. Evaluated with a Gemini Flash model as a rater | 48.9% | 53.9% | 56.3% |
Audio |
CoVoST2 (21 lang)
Automatic speech translation (BLEU score)
|
Automatic speech translation (BLEU score) | 37.4 | 40.1 | 39.2 |
Video |
EgoSchema (test)
Video analysis across multiple domains
|
Video analysis across multiple domains | 66.8% | 71.2% | 71.5% |
Developer showcase
Product explorations from developers experimenting with Gemini 2.0. Some sequences shortened.
Developer ecosystem
Build with cutting-edge generative AI models and tools to make AI helpful for everyone.
Gemini model family
Our versatile models run efficiently on everything from data centers to on-device.
-
1.0 Ultra
Our largest model for highly complex tasks.
-
1.5 Pro
Our best model for reasoning across large amounts of information.
-
2.0 Flash Experimental
Our workhorse model with low latency and enhanced performance, built to power agentic experiences.
-
1.0 Nano
Our most efficient model for on-device tasks.
Build with the latest models from Google DeepMind
Get your API key and integrate powerful AI capabilities into your applications in less than 5 minutes.
Experimental
Gemini 2.0 Flash Experimental
Gemini 2.0 Flash Thinking Experimental 01-21
Gemini 2.0 Flash Thinking Experimental 1219
Gemini Experimental 1206
Gemini Pro
Gemini 1.5 Pro
Gemini Flash
Gemini 1.5 Flash
Gemini 1.5 Flash-8B