Introducing our most intelligent model yet. With state-of-the-art reasoning to help you learn, build, and plan anything.

Models

Completing everyday tasks, or solving complex problems. Discover the right model for what you need

Gemini brings reasoning and intelligence to your daily life.


Gemini 1 introduced native multimodality and long context to help AI understand the world. Gemini 2 added thinking, reasoning and tool use to create a foundation for agents.

Now, Gemini 3 brings these capabilities together – so you can bring any idea to life.

Build with Google Antigravity – our AI-first developer experience



Performance

Gemini 3 is state-of-the-art across a wide range of benchmarks

Our most intelligent model yet sets a new bar for AI model performance

Benchmark Notes Gemini 3 Flash Thinking Gemini 3 Pro Thinking Gemini 2.5 Flash Thinking Gemini 2.5 Pro Thinking Claude Sonnet 4.5 Thinking GPT-5.2 Extra high Grok 4.1 Fast Reasoning
Input price $/1M tokens $0.50 $2.00 $4.00 > 200k tokens $0.30 $1.25 $2.50 > 200k tokens $3.00 $6.00 /MTok > 200k tokens $1.75 $0.20
Output price $/1M tokens $3.00 $12.00 $18.00 > 200k tokens $2.50 $10.00 $15.00 > 200k tokens $15.00 $22.50 > 200k tokens $14.00 $0.50
Academic reasoning
(full set, text + MM) Humanity's Last Exam No tools 33.7% 37.5% 11.0% 21.6% 13.7% 34.5% 17.6%
With search and code execution 43.5% 45.8% 45.5%
Visual reasoning puzzles ARC-AGI-2 ARC Prize Verified 33.6% 31.1% 2.5% 4.9% 13.6% 52.9%
Scientific knowledge GPQA Diamond No tools 90.4% 91.9% 82.8% 86.4% 83.4% 92.4% 84.3%
Mathematics AIME 2025 No tools 95.2% 95.0% 72.0% 88.0% 87.0% 100% 91.9%
With code execution 99.7% 100% 75.7% 100%
Multimodal understanding and reasoning MMMU-Pro 81.2% 81.0% 66.7% 68.0% 68.0% 79.5% 63.0%
Screen understanding ScreenSpot-Pro No tools unless specified 69.1% 72.7% 3.9% 11.4% 36.2% 86.3% with python
Information synthesis from complex charts CharXiv Reasoning No tools 80.3% 81.4% 63.7% 69.6% 68.5% 82.1%
OCR OmniDocBench 1.5 Overall Edit Distance, lower is better 0.121 0.115 0.154 0.145 0.145 0.143
Knowledge acquisition from videos Video-MMMU 86.9% 87.6% 79.2% 83.6% 77.8% 85.9%
Competitive coding problems from Codeforces, ICPC, and IOI LiveCodeBench Pro Elo Rating, higher is better 2316 2439 1143 1775 1418 2393
Agentic terminal coding Terminal-Bench 2.0 Terminus-2 harness 47.6% 54.2% 16.9% 32.6% 42.8%
Agentic coding SWE-bench Verified Single attempt 78.0% 76.2% 60.4% 59.6% 77.2% 80.0% 50.6%
Agentic tool use τ2-bench 90.2% 90.7% 79.5% 77.8% 87.2%
Long horizon real-world software tasks Toolathlon 49.4% 36.4% 3.7% 10.5% 38.9% 46.3%
Multi-step workflows using MCP MCP Atlas 57.4% 54.1% 3.4% 8.8% 43.8% 60.6%
Agentic long term coherence Vending-Bench 2 Net worth (mean), higher is better $3,635 $5,478 $549 $574 $3,839 $3,952 $1,107
Factuality benchmark across grounding, parametric, search, and MM FACTS Benchmark Suite 61.9% 70.5% 50.4% 63.4% 48.9% 61.4% 42.1%
Parametric knowledge SimpleQA Verified 68.7% 72.1% 28.1% 54.5% 29.3% 38.0% 19.5%
Multilingual Q&A MMMLU 91.8% 91.8% 86.6% 89.5% 89.1% 89.6% 86.8%
Commonsense reasoning across 100 Languages and Cultures Global PIQA 92.8% 93.4% 90.2% 91.5% 90.1% 91.2% 85.6%
Long context performance MRCR v2 (8-needle) 128k (average) 67.2% 77.0% 54.3% 58.0% 47.1% 81.9% 54.6%
1M (pointwise) 22.1% 26.3% 21.0% 16.4% not supported not supported 6.1%

Gemini 3 Deep Think

Pushes the boundaries of intelligence, delivering a step-change in Gemini 3’s reasoning and multimodal understanding capabilities to help you solve your most complex problems

Gemini 3 Deep Think can better help tackle problems that require creativity, strategic planning, and making improvements step-by-step. Available for Google AI Ultra subscribers.

Three bar charts comparing AI model performance. 1) Humanity’s Last Exam (Reasoning & knowledge): Gemini 3 Deep Think scores highest at 41%, followed by Gemini 3 Pro (37.5%), GPT-5 Pro (30.7%), GPT-5.1 (26.5%), Gemini 2.5 Pro (21.6%), and Claude Sonnet 4.5 (13.7%). 2) GPQA Diamond (Scientific knowledge): Gemini 3 Deep Think leads at 93.8%, followed by Gemini 3 Pro (91.9%), GPT-5 Pro (88.4%), GPT-5.1 (88.1%), Gemini 2.5 Pro (86.4%), and Claude Sonnet 4.5 (83.4%). 3) ARC-AGI-2 (Visual reasoning): Gemini 3 Deep Think (using tools) dominates at 45.1%, followed by Gemini 3 Pro at 31.1%, GPT-5.1 (17.6%), GPT-5 Pro (15.8%), Claude Sonnet 4.5 (13.6%), and Gemini 2.5 Pro (4.9%). Three bar charts comparing AI model performance. 1) Humanity’s Last Exam (Reasoning & knowledge): Gemini 3 Deep Think scores highest at 41%, followed by Gemini 3 Pro (37.5%), GPT-5 Pro (30.7%), GPT-5.1 (26.5%), Gemini 2.5 Pro (21.6%), and Claude Sonnet 4.5 (13.7%). 2) GPQA Diamond (Scientific knowledge): Gemini 3 Deep Think leads at 93.8%, followed by Gemini 3 Pro (91.9%), GPT-5 Pro (88.4%), GPT-5.1 (88.1%), Gemini 2.5 Pro (86.4%), and Claude Sonnet 4.5 (83.4%). 3) ARC-AGI-2 (Visual reasoning): Gemini 3 Deep Think (using tools) dominates at 45.1%, followed by Gemini 3 Pro at 31.1%, GPT-5.1 (17.6%), GPT-5 Pro (15.8%), Claude Sonnet 4.5 (13.6%), and Gemini 2.5 Pro (4.9%).

Iterative development and design

We’ve seen impressive results on tasks that require building something by making small changes over time.

Aiding scientific and mathematical discovery

By reasoning through complex problems, Deep Think can act as a powerful tool for researchers.

Algorithmic development and code

Deep Think excels at tough coding problems where problem formulation and careful consideration of tradeoffs and time complexity is paramount.


Safety

Building with responsibility at the core

As we develop these new technologies, we recognize the responsibility it entails, and aim to prioritize safety and security in all our efforts.


For developers

Build with cutting-edge generative AI models and tools to make AI helpful for everyone

Gemini’s advanced thinking, native multimodality and massive context window empowers developers to build next-generation experiences.


Try Gemini