Introducing our most intelligent model yet. With state-of-the-art reasoning to help you learn, build, and plan anything.

Models

Completing everyday tasks, or solving complex problems. Discover the right model for what you need

Gemini brings reasoning and intelligence to your daily life.

3 Pro

Best for complex tasks and bringing creative concepts to life

2.5 Flash

Best for fast performance on everyday tasks

2.5 Flash-Lite

Best for high volume, cost efficient tasks


Gemini 1 introduced native multimodality and long context to help AI understand the world. Gemini 2 added thinking, reasoning and tool use to create a foundation for agents.

Now, Gemini 3 brings these capabilities together – so you can bring any idea to life.

Build with Google Antigravity – our AI-first developer experience




Performance

Gemini 3 is state-of-the-art across a wide range of benchmarks

Our most intelligent model yet sets a new bar for AI model performance

Benchmark Notes Gemini 3 Pro Gemini 2.5 Pro Claude Sonnet 4.5 GPT-5.1
Academic reasoning Humanity's Last Exam No tools 37.5% 21.6% 13.7% 26.5%
With search and code execution 45.8%
Visual reasoning puzzles ARC-AGI-2 ARC Prize Verified 31.1% 4.9% 13.6% 17.6%
Scientific knowledge GPQA Diamond No tools 91.9% 86.4% 83.4% 88.1%
Mathematics AIME 2025 No tools 95.0% 88.0% 87.0% 94.0%
With code execution 100.0% 100.0%
Challenging Math Contest problems MathArena Apex 23.4% 0.5% 1.6% 1.0%
Multimodal understanding and reasoning MMMU-Pro 81.0% 68.0% 68.0% 76.0%
Screen understanding ScreenSpot-Pro 72.7% 11.4% 36.2% 3.5%
Information synthesis from complex charts CharXiv Reasoning 81.4% 69.6% 68.5% 69.5%
OCR OmniDocBench 1.5 Overall Edit Distance, lower is better 0.115 0.145 0.145 0.147
Knowledge acquisition from videos Video-MMMU 87.6% 83.6% 77.8% 80.4%
Competitive coding problems LiveCodeBench Pro Elo Rating, higher is better 2,439 1,775 1,418 2,243
Agentic terminal coding Terminal-Bench 2.0 Terminus-2 agent 54.2% 32.6% 42.8% 47.6%
Agentic coding SWE-Bench Verified Single attempt 76.2% 59.6% 77.2% 76.3%
Agentic tool use τ2-bench 85.4% 54.9% 84.7% 80.2%
Long-horizon agentic tasks Vending-Bench 2 Net worth (mean), higher is better $5,478.16 $573.64 $3,838.74 $1,473.43
Held out internal grounding, parametric, MM, and search retrieval benchmarks FACTS Benchmark Suite 70.5% 63.4% 50.4% 50.8%
Parametric knowledge SimpleQA Verified 72.1% 54.5% 29.3% 34.9%
Multilingual Q&A MMMLU 91.8% 89.5% 89.1% 91.0%
Commonsense reasoning across 100 Languages and Cultures Global PIQA 93.4% 91.5% 90.1% 90.9%
Long context performance MRCR v2 (8-needle) 128k (average) 77.0% 58.0% 47.1% 61.6%
1M (pointwise) 26.3% 16.4% not supported not supported

For details on our evaluation methodology please see deepmind.google/models/evals-methodology/gemini-3-pro


Gemini 3 Deep Think

Pushes the boundaries of intelligence, delivering a step-change in Gemini 3’s reasoning and multimodal understanding capabilities to help you solve your most complex problems

Gemini 3 Deep Think can better help tackle problems that require creativity, strategic planning, and making improvements step-by-step.

Three bar charts comparing AI model performance. 1) Humanity’s Last Exam (Reasoning & knowledge): Gemini 3 Deep Think scores highest at 41%, followed by Gemini 3 Pro (37.5%), GPT-5 Pro (30.7%), GPT-5.1 (26.5%), Gemini 2.5 Pro (21.6%), and Claude Sonnet 4.5 (13.7%). 2) GPQA Diamond (Scientific knowledge): Gemini 3 Deep Think leads at 93.8%, followed by Gemini 3 Pro (91.9%), GPT-5 Pro (88.4%), GPT-5.1 (88.1%), Gemini 2.5 Pro (86.4%), and Claude Sonnet 4.5 (83.4%). 3) ARC-AGI-2 (Visual reasoning): Gemini 3 Deep Think (using tools) dominates at 45.1%, followed by Gemini 3 Pro at 31.1%, GPT-5.1 (17.6%), GPT-5 Pro (15.8%), Claude Sonnet 4.5 (13.6%), and Gemini 2.5 Pro (4.9%). Three bar charts comparing AI model performance. 1) Humanity’s Last Exam (Reasoning & knowledge): Gemini 3 Deep Think scores highest at 41%, followed by Gemini 3 Pro (37.5%), GPT-5 Pro (30.7%), GPT-5.1 (26.5%), Gemini 2.5 Pro (21.6%), and Claude Sonnet 4.5 (13.7%). 2) GPQA Diamond (Scientific knowledge): Gemini 3 Deep Think leads at 93.8%, followed by Gemini 3 Pro (91.9%), GPT-5 Pro (88.4%), GPT-5.1 (88.1%), Gemini 2.5 Pro (86.4%), and Claude Sonnet 4.5 (83.4%). 3) ARC-AGI-2 (Visual reasoning): Gemini 3 Deep Think (using tools) dominates at 45.1%, followed by Gemini 3 Pro at 31.1%, GPT-5.1 (17.6%), GPT-5 Pro (15.8%), Claude Sonnet 4.5 (13.6%), and Gemini 2.5 Pro (4.9%).

Iterative development and design

We’ve seen impressive results on tasks that require building something by making small changes over time.

Aiding scientific and mathematical discovery

By reasoning through complex problems, Deep Think can act as a powerful tool for researchers.

Algorithmic development and code

Deep Think excels at tough coding problems where problem formulation and careful consideration of tradeoffs and time complexity is paramount.

Safety

Building with responsibility at the core

As we develop these new technologies, we recognize the responsibility it entails, and aim to prioritize safety and security in all our efforts.


Get started

Explore example prompts for Google AI Studio


For developers

Build with cutting-edge generative AI models and tools to make AI helpful for everyone

Gemini’s advanced thinking, native multimodality and massive context window empowers developers to build next-generation experiences.

Try Gemini