Gemini Robotics

Powering an era of physical agents to transform how robots actively understand their environments

Gemini Robotics 1.5

Our most capable vision-language-action (VLA) model, which turns visual information and instructions into motor commands to perform a task.

Learn more

Gemini Robotics-ER 1.5

Our state-of-the-art embodied reasoning model – it specializes in understanding physical spaces, planning, and making logical decisions within its surroundings.

Learn more

Responsibly advancing AI and robotics

To ensure Gemini Robotics benefits humanity, we’ve taken a comprehensive approach to safety, from practical safeguards to collaborations with experts, policymakers, and our Responsibility and Safety Council.

Learn more

Gemini Robotics models allow robots of any shape and size to perceive, reason, use tools and interact with humans. They can solve a wide range of complex real-world tasks – even those they haven’t been trained to complete.

Gemini Robotics 1.5 is designed to reason through multi-step complex tasks, and to make decisions to form a plan of action. It will then work to carry out each step autonomously.

Capabilities
Hands-on
Responsibility
Model and tools
Collaborations

Capabilities

Gemini models are capable of responding to text, images, audio, and video. Gemini Robotics adds the ability to reason about physical spaces – allowing robots to take action in the real world.

Generality

Understands the physical world, and adapts and generalizes its behaviour to fit new situations. Breaks down goals into manageable steps to make longer-term plans and overcome unexpected problems.

Agentic

Assess complex challenges, natively call tools – like Google Search – to look up information, and create detailed step-by-step plans to overcome them.

Thinking

Enables robots to think before acting, improving the quality of their actions, and making their decisions more transparent in natural language.

Interactivity

Understands and responds to everyday commands. Can explain its approach while taking action. Users can redirect it at any point, without using technical language. It also adjusts to any changes in its environment.

Dexterity

Enables robots to tackle complex tasks requiring fine motor skills and precise manipulation – like folding origami, packing a lunch box, or preparing a salad.

Multiple embodiments

Adapts to a diverse array of robot forms, from bi-arm static robotic platforms like ALOHA and Bi-arm Franka, to humanoid robots like Apptronik’s Apollo. A single model can be used across all these robots, in turn accelerating its learning across multiple embodiments.

Hands-on

See how Gemini Robotics handles a range of different tasks.

Agentic capabilities

Uses digital tools autonomously to solve complex tasks.

Thinking while acting

Solves longer, multi-step tasks – without needing new instructions after each step.

Learning across embodiments

Transfers learned motions across robots of different sizes and shapes, helping robots to become more useful.

Embodied reasoning

Understand its environment and how to complete a task.

Generality in action

Generalizes across novel situations and solves a vast range of tasks.

Dynamic interactions

Responds to natural conversation and adapts rapidly to changing environments.

Partnering with Apptronik

Helping to build the next generation of humanoid robots.

Dexterous skills

Perform tasks that require fine motor skills and coordination.

Responsibly advancing AI and robotics

To ensure Gemini Robotics benefits humanity, we’ve taken a comprehensive approach to safety, from practical safeguards to collaborations with experts, policymakers, and our Responsibility and Safety Council.

Learn more

Model and tools

We take a dual-model approach, pairing a vision-language-action (VLA) with an embodied reasoning (ER) model. Each model plays a specialized role, working together as a powerful and versatile system.

Gemini Robotics 1.5

Our most capable vision-language-action (VLA) model. It can ‘see’ (vision), ‘understand’ (language) and ‘act’ (action) within the physical world. It processes visual inputs and user prompts, learning within different embodiments and increasing its ability to generalize problem-solving.

Learn more

Gemini Robotics-ER 1.5

Our state-of-the-art embodied reasoning model. It specializes in understanding physical spaces, planning, and making logical decisions relating to its surroundings. It doesn’t directly control robotic limbs – but provides high-level insights to help the VLA model decide what to do next.

Learn more

Gemini Robotics On-Device

This iteration of our VLA model is incredibly versatile, and optimized to run locally on robotic devices. This will allow robotics developers to adapt the model to improve performance on their own applications.

Learn more

Gemini Robotics SDK