Gemini 3.5 Flash
Best for frontier performance across agents and coding
Best for frontier performance across agents and coding
Solve complex real world problems at speed.
Speed and scale don’t have to come at the cost of intelligence.
Deep reasoning across long horizons and iterative coding tasks.
Frontier-level understanding across text, audio, images, code, and video.
See how Gemini 3.5 Flash generates six payment UI options in under 60 seconds.
See how Gemini 3.5 Flash can create 64 fractal variations at a high speed.
See how Gemini 3.5 Flash ingests the AlphaGo paper and builds an intelligent game autonomously.
Watch how Gemini 3.5 Flash coordinates multiple workflows to generate and refine a brand for a fundraiser with minimal input.
See how Gemini 3.5 Flash turns a text description into fully interactive HTML components.
See how Gemini 3.5 Flash coordinates multiple agents to create a song using the Strudel music library.
Watch Gemini 3.5 Flash coordinate a team of specialized agents to design and build a virtual city.
See how Gemini 3.5 Flash deploys parallel agents to automatically rename and structure messy datasets.
Watch Gemini 3.5 Flash deploy agents to continuously refine a game in real time.
| Benchmark | Gemini 3.5 Flash | Gemini 3 Flash | Gemini 3.1 Pro | Claude Sonnet 4.6 | Claude Opus 4.7 | GPT-5.5 | ||
|---|---|---|---|---|---|---|---|---|
| Coding | Terminal-bench 2.1 Agentic terminal coding | Terminus-2 harness | 76.2% | 58.0% | 70.3% | — | 66.1% | 78.2% |
| SWE-Bench Pro (Public) Diverse agentic coding tasks | Single attempt | 55.1% | 49.6% | 54.2% | — | 64.3% | 58.6% | |
| Agentic | MCP Atlas Multi-step workflows using MCP | 83.6% | 62.0% | 78.2% | 69.5% | 79.1% | 75.3% | |
| Toolathlon Real-world general tool use | 56.5% | 49.4% | — | — | — | 55.6% | ||
| UI Control | OSWorld-Verified Agentic computer use | 78.4% | 65.1% | 76.2% | 72.5% | 78.0% | 78.7% | |
| Expert tasks | Finance Agent v2 Financial analysis and decision-making | 57.9% | 42.6% | 43.0% | 51.0% | 51.5% | 51.8% | |
| GDPval-AA Economically valuable knowledge work | Elo | 1656 | 1204 | 1314 | 1676 | 1753 | 1769 | |
| Multimodal | CharXiv Reasoning Information synthesis from complex charts | No tools | 84.2% | 80.3% | 83.3% | 72.4% | 82.1% | 84.1% |
| MMMU-Pro Multimodal understanding and reasoning | No tools | 83.6% | 81.2% | 80.5% | 74.5% | 75.2% | 81.2% | |
| Blueprint-Bench 2 Agentic spatial reasoning | Normalized score | 33.6% | 0.0% | 26.5% | 6.7% | 24.5% | 36.2% | |
| Long context | MRCR v2 (8-needle) Long context performance | 128k (average) | 77.3% | 67.2% | 84.9% | 84.9% | 59.3% | 94.8% |
| 1M (pointwise) | 26.6% | 22.1% | 26.3% | — | — | — | ||
| Reasoning | Humanity’s Last Exam Academic reasoning (full set, text + MM) | 40.2% | 33.7% | 44.4% | 33.2% | 46.9% | 41.4% | |
| ARC-AGI-2 Abstract reasoning puzzles | 72.1% | 33.6% | 77.1% | 58.3% | 75.8% | 84.6% |
For details on our evaluation methodology please see deepmind.google/models/evals-methodology/gemini-3-5-flash