3.1 Pro
Best for complex tasks and bringing creative concepts to life
Frontier intelligence with action
Frontier performance for agents and coding
Best for complex tasks and bringing creative concepts to life
Best for modern challenges across science, research and engineering
Best for high-volume tasks that need efficiency and intelligence
Introducing our latest series of models combining frontier intelligence with action. Build more capable, intelligent agents.
Best for complex tasks and bringing creative concepts to life
Best for frontier performance across agents and coding
Best for high-volume tasks that need efficiency and intelligence
Tackle complex, development tasks with advanced reasoning at speed.
Transform text, images, video and audio into rich interactive user interfaces.
Execute sophisticated workflows over extended timeframes.
Leverage advanced tools to solve demanding, real-world problems.
| Benchmark | Gemini 3.5 Flash | Gemini 3 Flash | Gemini 3.1 Pro | Claude Sonnet 4.6 | Claude Opus 4.7 | GPT-5.5 | ||
|---|---|---|---|---|---|---|---|---|
| Coding | Terminal-bench 2.1 Agentic terminal coding | Terminus-2 harness | 76.2% | 58.0% | 70.3% | — | 66.1% | 78.2% |
| SWE-Bench Pro (Public) Diverse agentic coding tasks | Single attempt | 55.1% | 49.6% | 54.2% | — | 64.3% | 58.6% | |
| Agentic | MCP Atlas Multi-step workflows using MCP | 83.6% | 62.0% | 78.2% | 69.5% | 79.1% | 75.3% | |
| Toolathlon Real-world general tool use | 56.5% | 49.4% | — | — | — | 55.6% | ||
| UI Control | OSWorld-Verified Agentic computer use | 78.4% | 65.1% | 76.2% | 72.5% | 78.0% | 78.7% | |
| Expert tasks | Finance Agent v2 Financial analysis and decision-making | 57.9% | 42.6% | 43.0% | 51.0% | 51.5% | 51.8% | |
| GDPval-AA Economically valuable knowledge work | Elo | 1656 | 1204 | 1314 | 1676 | 1753 | 1769 | |
| Multimodal | CharXiv Reasoning Information synthesis from complex charts | No tools | 84.2% | 80.3% | 83.3% | 72.4% | 82.1% | 84.1% |
| MMMU-Pro Multimodal understanding and reasoning | No tools | 83.6% | 81.2% | 80.5% | 74.5% | 75.2% | 81.2% | |
| Blueprint-Bench 2 Agentic spatial reasoning | Normalized score | 33.6% | 0.0% | 26.5% | 6.7% | 24.5% | 36.2% | |
| Long context | MRCR v2 (8-needle) Long context performance | 128k (average) | 77.3% | 67.2% | 84.9% | 84.9% | 59.3% | 94.8% |
| 1M (pointwise) | 26.6% | 22.1% | 26.3% | — | — | — | ||
| Reasoning | Humanity’s Last Exam Academic reasoning (full set, text + MM) | 40.2% | 33.7% | 44.4% | 33.2% | 46.9% | 41.4% | |
| ARC-AGI-2 Abstract reasoning puzzles | 72.1% | 33.6% | 77.1% | 58.3% | 75.8% | 84.6% |
For details on our evaluation methodology please see deepmind.google/models/evals-methodology/gemini-3-5-flash
See how Gemini 3.5 Flash generates six payment UI options in under 60 seconds.
See how Gemini 3.5 Flash can create 64 fractal variations at a high speed.
See how Gemini 3.5 Flash ingests the AlphaGo paper and builds an intelligent game autonomously.
Watch how Gemini 3.5 Flash coordinates multiple workflows to generate and refine a brand for a fundraiser with minimal input.
See how Gemini 3.5 Flash turns a text description into fully interactive HTML components.
See how Gemini 3.5 Flash coordinates multiple agents to create a song using the Strudel music library.
Watch Gemini 3.5 Flash coordinate a team of specialized agents to design and build a virtual city.
See how Gemini 3.5 Flash deploys parallel agents to automatically rename and structure messy datasets.
Watch Gemini 3.5 Flash deploy agents to continuously refine a game in real time.
Shopify is running subagents in parallel to analyze complex data over a long horizon for more accurate merchant growth forecasts at a global scale.
Macquarie Bank is piloting how 3.5 Flash can accelerate customer onboarding by reasoning over complex 100+ page documents, retrieving relevant information and making reliable recommendations with low latency.
Salesforce is integrating 3.5 Flash into Agentforce to reliably automate complicated enterprise tasks by deploying multiple subagents that retain context and execute complex, multi-turn tool calling.
3.5 Flash is helping Ramp enable smarter, more reliable OCR through multimodal understanding of complex invoices combined with reasoning over historical patterns.
Xero is deploying agents to autonomously manage complex, multi-week workflows, such as identifying suppliers and gathering information for 1099 tax forms, enabling small businesses to automate tedious admin tasks.
Databricks is using agentic workflows to monitor and retrieve real-time information, reason across massive datasets to diagnose issues, identify fixes and propose solutions for data scientists.
Our AI-first development platform that allows anyone to be a builder
Leap from prompt to production
Get started building with cutting-edge AI models
Build, scale, and govern agents
As we develop these new technologies, we recognize the responsibility it entails, and aim to prioritize safety and security in all our efforts.
Best for modern challenges across science, research and engineering
Create anything from anything, starting with video
State-of-the-art image generation and editing models, built on Gemini
Advanced real-time audio models, built on Gemini
Our most advanced vision-language-action model
State-of-the-art multimodal embedding model
Supercharge your creativity and productivity
Ask whatever's on your mind to get an AI powered response
The fastest path from prompt to production
Our AI-first development platform that allows anyone to be a builder
Get started building with cutting-edge AI models
Build, scale, and govern agents