Gemini
Our most intelligent AI models
Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.
Model family
Gemini 2.5 builds on the best of Gemini — with native multimodality and a long context window.
Hands-on with Gemini 2.5
See how Gemini 2.5 uses its reasoning capabilities to create interactive simulations and do advanced coding.
Adaptive and budgeted thinking
Adaptive controls and adjustable thinking budgets allow you to balance performance and cost.
-
Calibrated
The model explores diverse thinking strategies, leading to more accurate and relevant outputs.
-
Controllable
Developers have fine-grained control over the model's thinking process, allowing them to manage resource usage.
-
Adaptive
When no thinking budget is set, the model assesses the complexity of a task and calibrates the amount of thinking accordingly.
Gemini 2.5 Deep Think
An enhanced reasoning mode that uses cutting edge research techniques in parallel thinking and reinforcement learning to significantly improve Gemini’s ability to solve complex problems.
Deep Think can better help tackle problems that require creativity, strategic planning, and making improvements step-by-step.
-
Iterative development and design
We’ve seen impressive results on tasks that require building something by making small changes over time.
-
Aiding scientific and mathematical discovery
By reasoning through complex problems, Deep Think can act as a powerful tool for researchers.
-
Algorithmic development and code
Deep Think excels at tough coding problems where problem formulation and careful consideration of tradeoffs and time complexity is paramount.
Benchmarks
In addition to its strong performance on academic benchmarks, Gemini 2.5 tops the popular coding leaderboard WebDev Arena.
Benchmark |
Gemini 2.5
Flash-Lite Non-thinking |
Gemini 2.5
Flash-Lite Thinking |
Gemini 2.5
Flash Non-thinking |
Gemini 2.5
Flash Thinking View 2.5 Flash |
Gemini 2.5
Pro Thinking View 2.5 Pro |
|
---|---|---|---|---|---|---|
Input price
|
$/1M tokens (no caching) |
$0.10 | $0.10 | $0.30 | $0.30 |
$1.25 $2.50 > 200k tokens |
Output price
|
$/1M tokens | $0.40 | $0.40 | $2.50 | $2.50 |
$10.00 $15.00 > 200k tokens |
Reasoning & knowledge
Humanity's Last Exam (no tools)
|
5.1% | 6.9% | 8.4% | 11.0% | 21.6% | |
Science
GPQA diamond
|
64.6% | 66.7% | 78.3% | 82.8% | 86.4% | |
Mathematics
AIME 2025
|
49.8% | 63.1% | 61.6% | 72.0% | 88.0% | |
Code generation
LiveCodeBench
(UI: 1/1/2025-5/1/2025)
|
33.7% | 34.3% | 41.1% | 55.4% | 69.0% | |
Code editing
Aider Polyglot
|
26.7%
|
27.1%
|
44.0%
|
56.7%
|
82.2%
|
|
Agentic coding
SWE-bench Verified
|
single attempt | 31.6% | 27.6% | 50.0% | 48.9% | 59.6% |
|
multiple attempts | 42.6% | 44.9% | 60.0% | 60.3% | 67.2% |
Factuality
SimpleQA
|
10.7% | 13.0% | 25.8% | 26.9% | 54.0% | |
Factuality
FACTS grounding
|
84.1% | 86.8% | 83.4% | 85.3% | 87.8% | |
Visual reasoning
MMMU
|
72.9% | 72.9% | 76.9% | 79.7% | 82.0% | |
Image understanding
Vibe-Eval (Reka)
|
51.3% | 57.5% | 66.2% | 65.4% | 67.2% | |
Long context
MRCR v2 (8-needle)
|
128k (average) | 16.6% | 30.6% | 34.1% | 54.3% | 58.0% |
|
1M (pointwise) | 4.1% | 5.4% | 16.8% | 21.0% | 16.4% |
Multilingual performance
Global MMLU (Lite)
|
81.1% | 84.5% | 85.8% | 88.4% | 89.2% |