Introducing our latest series of models combining frontier intelligence with action. Build more capable, intelligent agents.

Models

Completing everyday tasks, or solving your most challenging problems. Discover the right model for what you need

Performance

Our most impressive model yet for agentic workflows. Gemini 3.5 is leading across a wide range of benchmarks.

BenchmarkGemini 3.5 FlashGemini 3 FlashGemini 3.1 ProClaude Sonnet 4.6Claude Opus 4.7GPT-5.5
CodingTerminal-bench 2.1 Agentic terminal codingTerminus-2 harness76.2%58.0%70.3%66.1%78.2%
SWE-Bench Pro (Public) Diverse agentic coding tasksSingle attempt55.1%49.6%54.2%64.3%58.6%
AgenticMCP Atlas Multi-step workflows using MCP83.6%62.0%78.2%69.5%79.1%75.3%
Toolathlon Real-world general tool use56.5%49.4%55.6%
UI ControlOSWorld-Verified Agentic computer use78.4%65.1%76.2%72.5%78.0%78.7%
Expert tasksFinance Agent v2 Financial analysis and decision-making57.9%42.6%43.0%51.0%51.5%51.8%
GDPval-AA Economically valuable knowledge workElo165612041314167617531769
MultimodalCharXiv Reasoning Information synthesis from complex chartsNo tools84.2%80.3%83.3%72.4%82.1%84.1%
MMMU-Pro Multimodal understanding and reasoningNo tools83.6%81.2%80.5%74.5%75.2%81.2%
Blueprint-Bench 2 Agentic spatial reasoningNormalized score33.6%0.0%26.5%6.7%24.5%36.2%
Long contextMRCR v2 (8-needle) Long context performance128k (average)77.3%67.2%84.9%84.9%59.3%94.8%
1M (pointwise)26.6%22.1%26.3%
ReasoningHumanity’s Last Exam Academic reasoning (full set, text + MM)40.2%33.7%44.4%33.2%46.9%41.4%
ARC-AGI-2 Abstract reasoning puzzles72.1%33.6%77.1%58.3%75.8%84.6%

For details on our evaluation methodology please see deepmind.google/models/evals-methodology/gemini-3-5-flash




Safety

Building with responsibility at the core

As we develop these new technologies, we recognize the responsibility it entails, and aim to prioritize safety and security in all our efforts.


Gemini Ecosystem


Try Gemini