Gemini 3.1 Pro

Best for complex tasks and bringing creative concepts to life

A smarter model to help you learn, plan, and build like never before.

Partner with a pro

With state-of-the-art reasoning capabilities

Learn anything

Understand complex topics in a way that makes sense for you – with clear, concise, and helpful responses

Build anything

Bring your ideas to life – from sketches and prompts to interactive tools and experiences

Plan anything

Delegate tasks and multi-step projects to get things done faster than ever before




Performance

Gemini 3.1 raises the bar across a wide range of benchmarks.

Benchmark Notes Gemini 3.1 Pro Thinking (High) Gemini 3 Pro Thinking (High) Sonnet 4.6 Thinking (Max) Opus 4.6 Thinking (Max) GPT-5.2 Thinking (xhigh) GPT-5.3-Codex Thinking (xhigh)
Humanity's Last Exam Academic reasoning (full set, text + MM) No tools 44.4% 37.5% 33.2% 40.0% 34.5%
Search (blocklist) + Code 51.4% 45.8% 49.0% 53.1% 45.5%
ARC-AGI-2 Abstract reasoning puzzles ARC Prize Verified 77.1% 31.1% 58.3% 68.8% 52.9%
GPQA Diamond Scientific knowledge No tools 94.3% 91.9% 89.9% 91.3% 92.4%
Terminal-Bench 2.0 Agentic terminal coding Terminus-2 harness 68.5% 56.9% 59.1% 65.4% 54.0% 64.7%
Other best self-reported harness 62.2% (Codex) 77.3% (Codex)
SWE-Bench Verified Agentic coding Single attempt 80.6% 76.2% 79.6% 80.8% 80.0%
SWE-Bench Pro (Public) Diverse agentic coding tasks Single attempt 54.2% 43.3% 55.6% 56.8%
LiveCodeBench Pro Competitive coding problems from Codeforces, ICPC, and IOI Elo 2887 2439 2393
SciCode Scientific research coding 59% 56% 47% 52% 52%
APEX-Agents Long horizon professional tasks 33.5% 18.4% 29.8% 23.0%
GDPval-AA Elo Expert tasks 1317 1195 1633 1606 1462
τ2-bench Agentic and tool use Retail 90.8% 85.3% 91.7% 91.9% 82.0%
Telecom 99.3% 98.0% 97.9% 99.3% 98.7%
MCP Atlas Multi-step workflows using MCP 69.2% 54.1% 61.3% 59.5% 60.6%
BrowseComp Agentic search Search + Python + Browse 85.9% 59.2% 74.7% 84.0% 65.8%
MMMU-Pro Multimodal understanding and reasoning No tools 80.5% 81.0% 74.5% 73.9% 79.5%
MMMLU Multilingual Q&A 92.6% 91.8% 89.3% 91.1% 89.6%
MRCR v2 (8-needle) Long context performance 128k (average) 84.9% 77.0% 84.9% 84.0% 83.8%
1M (pointwise) 26.3% 26.3% Not supported Not supported Not supported

Model information

Name
3.1 Pro
Status
Preview
Input
  • Text
  • Image
  • Video
  • Audio
  • PDF
Output
  • Text
Input tokens
1M
Output tokens
64k
Knowledge cutoff
January 2025
Tool use
  • Function calling
  • Structured output
  • Search as a tool
  • Code execution
Best for
  • Agentic
  • Advanced coding
  • Long context understanding
  • Multimodal understanding
  • Algorithmic development
Availability
  • Gemini App
  • Google Cloud / Vertex AI
  • Google AI Studio
  • Gemini API
  • Google AI Mode
  • Google Antigravity
Documentation
View developer docs
Model card
View model card