Gemini
Our most intelligent AI models
Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.
Model family
Gemini 2.5 builds on the best of Gemini — with native multimodality and a long context window.
Hands-on with Gemini 2.5
See how Gemini 2.5 uses its reasoning capabilities to create interactive simulations and do advanced coding.
Adaptive and budgeted thinking
Adaptive controls and adjustable thinking budgets allow you to balance performance and cost.
-
Calibrated
The model explores diverse thinking strategies, leading to more accurate and relevant outputs.
-
Controllable
Developers have fine-grained control over the model's thinking process, allowing them to manage resource usage.
-
Adaptive
When no thinking budget is set, the model assesses the complexity of a task and calibrates the amount of thinking accordingly.
Benchmarks
In addition to its strong performance on academic benchmarks, Gemini 2.5 tops the popular coding leaderboard WebDev Arena.
Benchmark |
Gemini 2.5
Flash-Lite Preview 06-17 Non-thinking |
Gemini 2.5
Flash-Lite Preview 06-17 Thinking |
Gemini 2.5
Flash Non-thinking |
Gemini 2.5
Flash Thinking View 2.5 Flash |
Gemini 2.5
Pro Thinking View 2.5 Pro |
|
---|---|---|---|---|---|---|
Input price
|
$/1M tokens (no caching) |
$0.10 | $0.10 | $0.30 | $0.30 |
$1.25 $2.50 > 200k tokens |
Output price
|
$/1M tokens | $0.40 | $0.40 | $2.50 | $2.50 |
$10.00 $15.00 > 200k tokens |
Reasoning & knowledge
Humanity's Last Exam (no tools)
|
5.1% | 6.9% | 8.4% | 11.0% | 21.6% | |
Science
GPQA diamond
|
64.6% | 66.7% | 78.3% | 82.8% | 86.4% | |
Mathematics
AIME 2025
|
49.8% | 63.1% | 61.6% | 72.0% | 88.0% | |
Code generation
LiveCodeBench
(UI: 1/1/2025-5/1/2025)
|
33.7% | 34.3% | 41.1% | 55.4% | 69.0% | |
Code editing
Aider Polyglot
|
26.7%
|
27.1%
|
44.0%
|
56.7%
|
82.2%
|
|
Agentic coding
SWE-bench Verified
|
single attempt | 31.6% | 27.6% | 50.0% | 48.9% | 59.6% |
|
multiple attempts | 42.6% | 44.9% | 60.0% | 60.3% | 67.2% |
Factuality
SimpleQA
|
10.7% | 13.0% | 25.8% | 26.9% | 54.0% | |
Factuality
FACTS grounding
|
84.1% | 86.8% | 83.4% | 85.3% | 87.8% | |
Visual reasoning
MMMU
|
72.9% | 72.9% | 76.9% | 79.7% | 82.0% | |
Image understanding
Vibe-Eval (Reka)
|
51.3% | 57.5% | 66.2% | 65.4% | 67.2% | |
Long context
MRCR v2 (8-needle)
|
128k (average) | 16.6% | 30.6% | 34.1% | 54.3% | 58.0% |
|
1M (pointwise) | 4.1% | 5.4% | 16.8% | 21.0% | 16.4% |
Multilingual performance
Global MMLU (Lite)
|
81.1% | 84.5% | 85.8% | 88.4% | 89.2% |