Gemini
Our most intelligent AI models
Gemini 2.5 models are capable of reasoning through their thoughts before responding, resulting in enhanced performance and improved accuracy.
Model family
Gemini 2.5 builds on the best of Gemini — with native multimodality and a long context window.
Create and edit images with Gemini 2.5 Flash Image
Generate, transform and edit images with simple text prompts, or combine multiple images to create something new. All in Gemini.
Hands-on with Gemini 2.5
See how Gemini 2.5 uses its reasoning capabilities to create interactive simulations and do advanced coding.
Adaptive and budgeted thinking
Adaptive controls and adjustable thinking budgets allow you to balance performance and cost.
- 
        
Calibrated The model explores diverse thinking strategies, leading to more accurate and relevant outputs. 
- 
        
Controllable Developers have fine-grained control over the model's thinking process, allowing them to manage resource usage. 
- 
        
Adaptive When no thinking budget is set, the model assesses the complexity of a task and calibrates the amount of thinking accordingly. 
Gemini 2.5 Deep Think
An enhanced reasoning mode that uses cutting edge research techniques in parallel thinking and reinforcement learning to significantly improve Gemini’s ability to solve complex problems.
Deep Think can better help tackle problems that require creativity, strategic planning, and making improvements step-by-step.
- 
        
Iterative development and design We’ve seen impressive results on tasks that require building something by making small changes over time. 
- 
        
Aiding scientific and mathematical discovery By reasoning through complex problems, Deep Think can act as a powerful tool for researchers. 
- 
        
Algorithmic development and code Deep Think excels at tough coding problems where problem formulation and careful consideration of tradeoffs and time complexity is paramount. 
Benchmarks
In addition to its strong performance on academic benchmarks, Gemini 2.5 tops the popular coding leaderboard WebDev Arena.
| Benchmark | 
							Gemini 2.5 Flash-Lite Non-thinking | 
							Gemini 2.5 Flash-Lite Thinking | 
							Gemini 2.5 Flash Non-thinking | 
							Gemini 2.5 Flash Thinking View 2.5 Flash | 
							Gemini 2.5 Pro Thinking View 2.5 Pro | |
|---|---|---|---|---|---|---|
| 
							
								Input price
							
						 | $/1M tokens (no caching) | $0.10 | $0.10 | $0.30 | $0.30 | $1.25 $2.50 > 200k tokens | 
| 
							
								Output price
							
						 | $/1M tokens | $0.40 | $0.40 | $2.50 | $2.50 | $10.00 $15.00 > 200k tokens | 
| 
							
								Reasoning & knowledge
							
							
								Humanity's Last Exam (no tools)
							
						 | 5.1% | 6.9% | 8.4% | 11.0% | 21.6% | |
| 
							
								Science
							
							
								GPQA diamond
							
						 | 64.6% | 66.7% | 78.3% | 82.8% | 86.4% | |
| 
							
								Mathematics
							
							
								AIME 2025
							
						 | 49.8% | 63.1% | 61.6% | 72.0% | 88.0% | |
| 
							
								Code generation
							
							
								LiveCodeBench
							
							
								(UI: 1/1/2025-5/1/2025)
							
						 | 33.7% | 34.3% | 41.1% | 55.4% | 69.0% | |
| 
							
								Code editing
							
							
								Aider Polyglot
							
						 | 
							
								26.7%
							
						 | 
							
								27.1%
							
						 | 
							
								44.0%
							
						 | 
							
								56.7%
							
						 | 
							
								82.2%
							
						 | |
| 
							
								Agentic coding
							
							
								SWE-bench Verified
							
						 | single attempt | 31.6% | 27.6% | 50.0% | 48.9% | 59.6% | 
|  | multiple attempts | 42.6% | 44.9% | 60.0% | 60.3% | 67.2% | 
| 
							
								Factuality
							
							
								SimpleQA
							
						 | 10.7% | 13.0% | 25.8% | 26.9% | 54.0% | |
| 
							
								Factuality
							
							
								FACTS grounding
							
						 | 84.1% | 86.8% | 83.4% | 85.3% | 87.8% | |
| 
							
								Visual reasoning
							
							
								MMMU
							
						 | 72.9% | 72.9% | 76.9% | 79.7% | 82.0% | |
| 
							
								Image understanding
							
							
								Vibe-Eval (Reka)
							
						 | 51.3% | 57.5% | 66.2% | 65.4% | 67.2% | |
| 
							
								Long context
							
							
								MRCR v2 (8-needle)
							
						 | 128k (average) | 16.6% | 30.6% | 34.1% | 54.3% | 58.0% | 
|  | 1M (pointwise) | 4.1% | 5.4% | 16.8% | 21.0% | 16.4% | 
| 
							
								Multilingual performance
							
							
								Global MMLU (Lite)
							
						 | 81.1% | 84.5% | 85.8% | 88.4% | 89.2% |