Introducing our most intelligent model yet. With state-of-the-art reasoning to help you learn, build, and plan anything.

Models

Completing everyday tasks, or solving complex problems. Discover the right model for what you need

Gemini 3.1 Deep Think

Pushes the boundaries of intelligence, delivering a significant upgrade to Gemini 3.1's specialized reasoning mode to help you solve the most complex technical problems.

Gemini 3.1 Deep Think mode can better help tackle real world problems that require rigor, breakthrough creativity and intelligence. Available for Google AI Ultra subscribers.

Gemini 1 introduced native multimodality and long context to help AI understand the world. Gemini 2 added thinking, reasoning and tool use to create a foundation for agents.

Now, Gemini 3 brings these capabilities together – so you can bring any idea to life.


Performance

Gemini 3 is state-of-the-art across a wide range of benchmarks

Our most intelligent model yet sets a new bar for AI model performance

Benchmark Notes Gemini 3.1 Pro Thinking (High) Gemini 3 Pro Thinking (High) Sonnet 4.6 Thinking (Max) Opus 4.6 Thinking (Max) GPT-5.2 Thinking (xhigh) GPT-5.3-Codex Thinking (xhigh)
Humanity's Last Exam Academic reasoning (full set, text + MM) No tools 44.4% 37.5% 33.2% 40.0% 34.5%
Search (blocklist) + Code 51.4% 45.8% 49.0% 53.1% 45.5%
ARC-AGI-2 Abstract reasoning puzzles ARC Prize Verified 77.1% 31.1% 58.3% 68.8% 52.9%
GPQA Diamond Scientific knowledge No tools 94.3% 91.9% 89.9% 91.3% 92.4%
Terminal-Bench 2.0 Agentic terminal coding Terminus-2 harness 68.5% 56.9% 59.1% 65.4% 54.0% 64.7%
Other best self-reported harness 62.2% (Codex) 77.3% (Codex)
SWE-Bench Verified Agentic coding Single attempt 80.6% 76.2% 79.6% 80.8% 80.0%
SWE-Bench Pro (Public) Diverse agentic coding tasks Single attempt 54.2% 43.3% 55.6% 56.8%
LiveCodeBench Pro Competitive coding problems from Codeforces, ICPC, and IOI Elo 2887 2439 2393
SciCode Scientific research coding 59% 56% 47% 52% 52%
APEX-Agents Long horizon professional tasks 33.5% 18.4% 29.8% 23.0%
GDPval-AA Elo Expert tasks 1317 1195 1633 1606 1462
τ2-bench Agentic and tool use Retail 90.8% 85.3% 91.7% 91.9% 82.0%
Telecom 99.3% 98.0% 97.9% 99.3% 98.7%
MCP Atlas Multi-step workflows using MCP 69.2% 54.1% 61.3% 59.5% 60.6%
BrowseComp Agentic search Search + Python + Browse 85.9% 59.2% 74.7% 84.0% 65.8%
MMMU-Pro Multimodal understanding and reasoning No tools 80.5% 81.0% 74.5% 73.9% 79.5%
MMMLU Multilingual Q&A 92.6% 91.8% 89.3% 91.1% 89.6%
MRCR v2 (8-needle) Long context performance 128k (average) 84.9% 77.0% 84.9% 84.0% 83.8%
1M (pointwise) 26.3% 26.3% Not supported Not supported Not supported

Safety

Building with responsibility at the core

As we develop these new technologies, we recognize the responsibility it entails, and aim to prioritize safety and security in all our efforts.


For developers

Build with cutting-edge generative AI models and tools to make AI helpful for everyone

Gemini’s advanced thinking, native multimodality and massive context window empowers developers to build next-generation experiences.


Try Gemini