Learn anything
Understand complex topics in a way that makes sense for you – with clear, concise, and helpful responses
Learn, plan, and build like never before with Gemini 3 Pro’s incredible reasoning powers
Our most intelligent model yet
Understand complex topics in a way that makes sense for you – with clear, concise, and helpful responses
Bring your ideas to life – from sketches and prompts to interactive tools and experiences
Delegate tasks and multi-step projects to get things done faster than ever before
Build with our new agentic development platform
Leap from prompt to production
Get started building with cutting-edge AI models
Gemini 3 Pro excels at practical, front-end development – with a more intuitive interface and richer design
Gemini 3 Pro’s state-of-the-art reasoning provides unprecedented nuance and depth
Gemini 3 Pro seamlessly synthesizes information across text, images, video, audio, and even code to help you learn
Our most intelligent model yet sets a new bar for AI model performance.
| Benchmark | Notes | Gemini 3 Pro | Gemini 2.5 Pro | Claude Sonnet 4.5 | GPT-5.1 |
|---|---|---|---|---|---|
| Academic reasoning Humanity's Last Exam | No tools | 37.5% | 21.6% | 13.7% | 26.5% |
| With search and code execution | 45.8% | — | — | — | |
| Visual reasoning puzzles ARC-AGI-2 | ARC Prize Verified | 31.1% | 4.9% | 13.6% | 17.6% |
| Scientific knowledge GPQA Diamond | No tools | 91.9% | 86.4% | 83.4% | 88.1% |
| Mathematics AIME 2025 | No tools | 95.0% | 88.0% | 87.0% | 94.0% |
| With code execution | 100.0% | — | 100.0% | — | |
| Challenging Math Contest problems MathArena Apex | 23.4% | 0.5% | 1.6% | 1.0% | |
| Multimodal understanding and reasoning MMMU-Pro | 81.0% | 68.0% | 68.0% | 76.0% | |
| Screen understanding ScreenSpot-Pro | 72.7% | 11.4% | 36.2% | 3.5% | |
| Information synthesis from complex charts CharXiv Reasoning | 81.4% | 69.6% | 68.5% | 69.5% | |
| OCR OmniDocBench 1.5 | Overall Edit Distance, lower is better | 0.115 | 0.145 | 0.145 | 0.147 |
| Knowledge acquisition from videos Video-MMMU | 87.6% | 83.6% | 77.8% | 80.4% | |
| Competitive coding problems LiveCodeBench Pro | Elo Rating, higher is better | 2,439 | 1,775 | 1,418 | 2,243 |
| Agentic terminal coding Terminal-Bench 2.0 | Terminus-2 agent | 54.2% | 32.6% | 42.8% | 47.6% |
| Agentic coding SWE-Bench Verified | Single attempt | 76.2% | 59.6% | 77.2% | 76.3% |
| Agentic tool use τ2-bench | 85.4% | 54.9% | 84.7% | 80.2% | |
| Long-horizon agentic tasks Vending-Bench 2 | Net worth (mean), higher is better | $5,478.16 | $573.64 | $3,838.74 | $1,473.43 |
| Held out internal grounding, parametric, MM, and search retrieval benchmarks FACTS Benchmark Suite | 70.5% | 63.4% | 50.4% | 50.8% | |
| Parametric knowledge SimpleQA Verified | 72.1% | 54.5% | 29.3% | 34.9% | |
| Multilingual Q&A MMMLU | 91.8% | 89.5% | 89.1% | 91.0% | |
| Commonsense reasoning across 100 Languages and Cultures Global PIQA | 93.4% | 91.5% | 90.1% | 90.9% | |
| Long context performance MRCR v2 (8-needle) | 128k (average) | 77.0% | 58.0% | 47.1% | 61.6% |
| 1M (pointwise) | 26.3% | 16.4% | not supported | not supported |
For details on our evaluation methodology please see deepmind.google/models/evals-methodology/gemini-3-pro
Smart, concise, direct responses – with genuine insight over cliche and flattery.
Text, imagines, video, audio – even code. Gemini 3 is state-of-the-art on reasoning with unprecedented depth and nuance.
Gemini 3 brings exceptional instruction following – with meaningful improved tool use and agentic coding.
Better tool use. Simultaneous, multi-step tasks. Gemini 3’s agentic capabilities can build more helpful and intelligent personal AI assistants.