2.5 Flash is our fast, cost-efficient, thinking model.
Speed and value at scale
Ideal for tasks like summarization, chat applications, data extraction, and captioning.
-
Balance performance and your budget
Control how much 2.5 Flash reasons to reduce latency and cost.
-
Natively multimodal
Understands input across text, audio, images and video.
-
Long context
Explore vast datasets with a 1-million token context window.
Adaptive and budgeted thinking
-
Calibrated
The model applies appropriate thinking strategies across diverse scenarios, leading to more accurate and relevant outputs.
-
Controllable
Developers gain fine-grained control over the model's thinking process, allowing them to manage resource usage.
-
Adaptive
When no thinking budget is set, the model is still able to assess the complexity of a task and calibrates the amount of thinking accordingly.
Benchmarks
Benchmark |
Gemini 2.5 Flash
Preview (04-17)
Thinking |
Gemini 2.0 Flash
Non-thinking
|
OpenAI o4-mini
|
Claude 3.7 Sonnet
64k Extended thinking
|
Grok 3 Beta
Extended thinking
|
DeepSeek R1
|
|
---|---|---|---|---|---|---|---|
Input price
|
$/1M tokens | $0.15 | $0.10 | $1.10 | $3.00 | $3.00 | $0.55 |
Output price
|
$/1M tokens |
$0.60 no reasoning $3.50 reasoning |
$0.40 | $4.40 | $15.00 | $15.00 | $2.19 |
Reasoning & knowledge
Humanity's Last Exam (no tools)
|
12.1% | 5.1% | 14.3% | 8.9% | — | 8.6%* | |
Science
GPQA diamond
|
single attempt (pass@1) | 78.3% | 60.1% | 81.4% | 78.2% | 80.2% | 71.5% |
|
multiple attempts | — | — | — | 84.8% | 84.6% | — |
Mathematics
AIME 2025
|
single attempt (pass@1) | 78.0% | 27.5% | 92.7% | 49.5% | 77.3% | 70.0% |
|
multiple attempts | — | — | — | — | 93.3% | — |
Mathematics
AIME 2024
|
single attempt (pass@1) | 88.0% | 32.0% | 93.4% | 61.3% | 83.9% | 79.8% |
|
multiple attempts | — | — | — | 80.0% | 93.3% | — |
Code generation
LiveCodeBench v5
|
single attempt (pass@1) | 63.5% | 34.5% | — | — | 70.6% | 64.3% |
|
multiple attempts | — | — | — | — | 79.4% | — |
Code editing
Aider Polyglot
|
51.1% / 44.2%
whole / diff-fenced
|
22.2%
whole
|
68.9% / 58.2%
whole / diff
|
64.9%
diff
|
53.3%
diff
|
56.9%
diff
|
|
Factuality
SimpleQA
|
29.7% | 29.9% | — | — | 43.6% | 30.1% | |
Visual reasoning
MMMU
|
single attempt (pass@1) | 76.7% | 71.7% | 81.6% | 75.0% | 76.0% | no MM support |
|
multiple attempts | — | — | — | — | 78.0% | no MM support |
Image understanding
Vibe-Eval (Reka)
|
62.0% | 56.4% | — | — | — | no MM support | |
Long context
MRCR
|
128k (average) | 84.6% | 74.2% | — | — | — | — |
|
1M (pointwise) | 66.3% | 48.2% | — | — | — | — |
Multilingual performance
Global MMLU (Lite)
|
88.4% | 83.4% | — | — | — | — |
2.0 Flash is now broadly available with multimodal reasoning and native tool use.
Native in, native out
Model information
2.0 Flash | 2.5 Flash | |
Model deployment status | General availability | Preview |
Supported data types for input | Text, Image, Video, Audio | Text, Image, Video, Audio |
Supported data types for output | Text | Text |
Supported # tokens for input | 1M | 1M |
Supported # tokens for output | 8k | 64k |
Knowledge cutoff | June 2024 | January 2025 |
Tool use |
Search as a tool Code execution |
Function calling Structured output Search as a tool Code execution |
Best for |
Low latency scenarios Automating tasks |
Cost-efficient thinking Well-rounded capabilities |
Availability |
Google AI Studio Gemini API Vertex AI Gemini App |
Google AI Studio Gemini API Vertex AI Gemini App |