Gemini 2.5 Pro is our most advanced model yet, excelling at coding and complex prompts.
Pro performance
-
Enhanced reasoning
State-of-the-art in key math and science benchmarks.
-
Advanced coding
Easily generate code for web development tasks.
-
Natively multimodal
Understands input across text, audio, images and video.
-
Long context
Explore vast datasets with a 1-million token context window.
Deep Think
We’re making Gemini 2.5 Pro even better by introducing an enhanced reasoning mode called Deep Think.
It uses our latest cutting edge research in reasoning - including parallel thinking techniques - resulting in incredible performance.
Methodology
All Gemini results come from our runs. USAMO 2025: https://matharena.ai. LiveCodeBench V6: * o3 High: Internal runs since numbers are not available in official leaderboard, o4-mini High: https://livecodebench.github.io/leaderboard.html (2/1/2025-5/1/2025). MMMU: Self reported by OpenAI
Preview
Native audio
Converse in more expressive ways with native audio outputs that capture the subtle nuances of how we speak. Seamlessly switch between 24 languages, all with the same voice.
Vibe-coding nature with 2.5 Pro
Images transformed into code-based representations of its natural behavior.
Watch
Hands-on with 2.5 Pro
See how Gemini 2.5 Pro uses its reasoning capabilities to create interactive simulations and do advanced coding.
Benchmarks
Gemini 2.5 Pro leads common benchmarks by meaningful margins.
Benchmark |
Gemini 2.5 Pro
Preview (05-06)
|
OpenAI o3
|
OpenAI GPT-4.1
|
Claude 3.7 Sonnet
64k Extended thinking
|
Grok 3 Beta
Extended thinking
|
DeepSeek R1
|
|
---|---|---|---|---|---|---|---|
Input price
|
$/1M tokens |
$2.50 $1.25 <= 200k tokens |
$10.00 | $2.00 | $3.00 | $3.00 | $0.55 |
Output price
|
$/1M tokens |
$15.00 $10.00 <= 200k tokens |
$40.00 | $8.00 | $15.00 | $15.00 | $2.19 |
Reasoning & knowledge
Humanity's Last Exam (no tools)
|
17.8% | 20.3% | 5.4% | 8.9% | — | 8.6%* | |
Science
GPQA diamond
|
single attempt (pass@1) | 83.0% | 83.3% | 66.3% | 78.2% | 80.2% | 71.5% |
|
multiple attempts | — | — | — | 84.8% | 84.6% | — |
Mathematics
AIME 2025
|
single attempt (pass@1) | 83.0% | 88.9% | — | 49.5% | 77.3% | 70.0% |
|
multiple attempts | — | — | — | — | 93.3% | — |
Code generation
LiveCodeBench v5
|
single attempt (pass@1) | 75.6% | — | — | — | 70.6% | 64.3% |
|
multiple attempts | — | — | — | — | 79.4% | — |
Code editing
Aider Polyglot
|
76.5% / 72.7%
whole / diff
|
81.3% / 79.6%
whole / diff
|
51.6% / 52.9%
whole / diff
|
64.9%
diff
|
— |
56.9%
diff
|
|
Agentic coding
SWE-bench Verified
|
63.2% | 69.1% | 54.6% | 70.3% | — | 49.2% | |
Factuality
SimpleQA
|
50.8% | 49.4% | 41.6% | — | 43.6% | 30.1% | |
Visual reasoning
MMMU
|
single attempt (pass@1) | 79.6% | 82.9% | 75.0% | 75.0% | 76.0% | no MM support |
|
multiple attempts | — | — | — | — | 78.0% | no MM support |
Image understanding
Vibe-Eval (Reka)
|
65.6% | — | — | — | — | no MM support | |
Video
Video-MME
|
84.8% | — | — | — | — | no MM support | |
Long context
MRCR
|
128k (average) | 93.0% | — | — | — | — | — |
|
1M (pointwise) | 82.9% | — | — | — | — | — |
Multilingual performance
Global MMLU (Lite)
|
88.6% | — | — | — | — | — |
Model deployment status | Experimental, Preview |
Supported data types for input | Text, Image, Video, Audio |
Supported data types for output | Text |
Supported # tokens for input | 1M |
Supported # tokens for output | 64k |
Knowledge cutoff | January 2025 |
Tool use |
Function calling Structured output Search as a tool Code execution |
Best for |
Reasoning Coding Complex prompts |
Availability |
Google AI Studio Gemini API Gemini App |