Gemini Robotics-ER 1.5

Our state-of-the-art embodied reasoning model – it specializes in understanding physical spaces, planning, and making logical decisions within its surroundings.

Explore Gemini Robotics-ER 1.5

Our Gemini-based multimodal model gives advanced world understanding to robots.

Capabilities

Gemini Robotics-ER 1.5 is capable of making detailed plans from simple commands.

For example, let’s say a human instructed it to ‘clean the kitchen’. The ER model would break down the task into smaller, manageable steps – clear the counter, load the dishwasher, wipe the surfaces. This model also supports thinking.

Orchestration

Orchestrates robot activities, like a high-level brain. Excels at planning and making logical decisions within a physical environment. Interacts in natural language, estimates progress, and can natively call tools – like using Google Search to look for information.

Advanced spatial understanding

Perceives and understands the surrounding environment to locate and handle objects with greater accuracy.

Temporal reasoning

Understands the cause and effect relationships between objects and actions as they unfold over time.

Performance

Aggregated performance on 15 embodied reasoning academic benchmarks. The benchmarks include: Point-Bench, RefSpatial, RoboSpatial-Pointing, Where2Place, BLINK, CV-Bench, ERQA, EmbSpatial, MindCube, RoboSpatial-VQA, SAT, Cosmos-Reason1, Min Video Pairs, OpenEQA and VSI-Bench.

Model information

Name

Gemini Robotics-ER 1.5

Input

Output

Input tokens

1M

Knowledge cutoff

January 2025

Availability

Public preview

Model card

View model card

Technical report

View technical report