Gemini Robotics-ER 1.6

Our advanced embodied reasoning model—designed to help robots reason about the physical world with unprecedented precision, plan complex tasks, and make logical decisions.

Our Gemini-based multimodal model gives advanced world understanding to robots.

Capabilities
Performance

Capabilities

Gemini Robotics-ER 1.6 specializes in core robotics capabilities like spatial logic, task planning, and success detection.

It acts as a high-level brain to break down complex tasks, use intermediate steps to reason, and intelligently decide when to retry or progress.

Orchestration

Orchestrates robot activities, like a high-level brain. Excels at planning and making logical decisions within a physical environment. Interacts in natural language, estimates progress, and can natively call tools – like using Google Search to look for information.

Advanced spatial logic

Uses precision pointing for spatial identification, motion reasoning, and safely handling objects under strict physical constraints.

Visual & multi-view reasoning

Understands relationships across multiple camera streams to detect task success, and combines agentic vision with code execution to read complex industrial instruments.

Performance

Shows significant improvement over Gemini Robotics-ER 1.5 and Gemini 3.0 Flash on spatial reasoning, instrument reading, success detection, and physical safety compliance.

Benchmark results comparing Gemini Robotics-ER 1.6 with Gemini Robotics-ER 1.5 and Gemini 3 Flash models. The instrument reading evaluations were run with agentic vision enabled (except for Gemini Robotics-ER 1.5 which doesn’t support it). All other evals were run with agentic vision disabled. The single view and multiview success detection evaluations contain different examples so are not comparable.

How the different elements of Gemini Robotics-ER 1.6 contribute to reaching a high level of performance on the instrument reading task.

Gemini Robotics-ER 1.6 improves substantially compared to Gemini Robotics-ER 1.5 on Safety Instruction Following which tests the ability to adhere to physical safety constraints. It improves compared to Gemini 3 Flash on pointing, and both models have very high accuracy for text. Gemini 3 Flash does better on bounding boxes.

Explore our next generation AI systems

Our latest AI breakthroughs and updates from the lab

Unlocking a new era of discovery with AI