Learn anything
Understand complex topics in a way that makes sense for you – with clear, concise, and helpful responses
A smarter model to help you learn, plan, and build like never before.
Understand complex topics in a way that makes sense for you – with clear, concise, and helpful responses
Bring your ideas to life – from sketches and prompts to interactive tools and experiences
Delegate tasks and multi-step projects to get things done faster than ever before
Smart, concise, direct responses – with genuine insight over cliche and flattery.
Text, images, video, audio – even code. Gemini 3.1 Pro is state-of-the-art on reasoning with unprecedented depth and nuance.
Gemini 3 brings exceptional instruction following – with meaningful improved tool use and agentic coding.
Better tool use. Simultaneous, multi-step tasks. Gemini 3’s agentic capabilities can build more helpful and intelligent personal AI assistants.
Gemini 3.1 Pro uses advanced reasoning to configure live telemetry streams to build dynamic applications like this aerospace dashboard.
Gemini 3.1 Pro codes an immersive starling murmuration, complete with hand-tracking manipulation and dynamic generative audio.
From terrain generation to traffic flow, Gemini 3.1 Pro uses advanced reasoning to code and assemble the many layers of a simulated city.
Gemini 3.1 Pro understands design intent, converting static SVGs into animated, code-based graphics for faster, cleaner web development.
Gemini 3.1 Pro reasons through the atmospheric tone of a novel to build a modern, personalized portfolio.
Build with our new agentic development platform
Leap from prompt to production
Get started building with cutting-edge AI models
| Benchmark | Notes | Gemini 3.1 Pro Thinking (High) | Gemini 3 Pro Thinking (High) | Sonnet 4.6 Thinking (Max) | Opus 4.6 Thinking (Max) | GPT-5.2 Thinking (xhigh) | GPT-5.3-Codex Thinking (xhigh) |
|---|---|---|---|---|---|---|---|
| Humanity's Last Exam Academic reasoning (full set, text + MM) | No tools | 44.4% | 37.5% | 33.2% | 40.0% | 34.5% | — |
| Search (blocklist) + Code | 51.4% | 45.8% | 49.0% | 53.1% | 45.5% | — | |
| ARC-AGI-2 Abstract reasoning puzzles | ARC Prize Verified | 77.1% | 31.1% | 58.3% | 68.8% | 52.9% | — |
| GPQA Diamond Scientific knowledge | No tools | 94.3% | 91.9% | 89.9% | 91.3% | 92.4% | — |
| Terminal-Bench 2.0 Agentic terminal coding | Terminus-2 harness | 68.5% | 56.9% | 59.1% | 65.4% | 54.0% | 64.7% |
| Other best self-reported harness | — | — | — | — | 62.2% (Codex) | 77.3% (Codex) | |
| SWE-Bench Verified Agentic coding | Single attempt | 80.6% | 76.2% | 79.6% | 80.8% | 80.0% | — |
| SWE-Bench Pro (Public) Diverse agentic coding tasks | Single attempt | 54.2% | 43.3% | — | — | 55.6% | 56.8% |
| LiveCodeBench Pro Competitive coding problems from Codeforces, ICPC, and IOI | Elo | 2887 | 2439 | — | — | 2393 | — |
| SciCode Scientific research coding | 59% | 56% | 47% | 52% | 52% | — | |
| APEX-Agents Long horizon professional tasks | 33.5% | 18.4% | — | 29.8% | 23.0% | — | |
| GDPval-AA Elo Expert tasks | 1317 | 1195 | 1633 | 1606 | 1462 | — | |
| τ2-bench Agentic and tool use | Retail | 90.8% | 85.3% | 91.7% | 91.9% | 82.0% | — |
| Telecom | 99.3% | 98.0% | 97.9% | 99.3% | 98.7% | — | |
| MCP Atlas Multi-step workflows using MCP | 69.2% | 54.1% | 61.3% | 59.5% | 60.6% | — | |
| BrowseComp Agentic search | Search + Python + Browse | 85.9% | 59.2% | 74.7% | 84.0% | 65.8% | — |
| MMMU-Pro Multimodal understanding and reasoning | No tools | 80.5% | 81.0% | 74.5% | 73.9% | 79.5% | — |
| MMMLU Multilingual Q&A | 92.6% | 91.8% | 89.3% | 91.1% | 89.6% | — | |
| MRCR v2 (8-needle) Long context performance | 128k (average) | 84.9% | 77.0% | 84.9% | 84.0% | 83.8% | — |
| 1M (pointwise) | 26.3% | 26.3% | Not supported | Not supported | Not supported | — |