Gemini Robotics 1.5

Our most capable vision-language-action (VLA) model, which turns visual information and instructions into motor commands to perform a task.

Explore Gemini Robotics 1.5

Join waitlist

View tech report

Our agentic Gemini-based multimodal model allows robots to take action in the physical world.

Capabilities
Performance
Model information

Capabilities

Gemini models are capable of responding to text, images, audio, and video. Gemini Robotics adds the ability to reason about physical spaces – allowing robots to take action in the real world.

Generality

Understands the physical world, and adapts and generalizes its behaviour to fit new situations. Breaks down goals into manageable steps to make longer-term plans and overcome unexpected problems.

Interactivity

Understands and responds to everyday commands. Can explain its approach while taking action. Users can redirect it at any point, without using technical language. It also adjusts to any changes in its environment.

Dexterity

Enables robots to tackle complex tasks requiring fine motor skills and precise manipulation – like folding origami, packing a lunch box, or preparing a salad.

Thinking

Enables robots to think before acting, improving the quality of their actions, and making their decisions more transparent in natural language.

Multiple embodiments

Adapts to a diverse array of robot forms, from bi-arm static robotic platforms like ALOHA and Bi-arm Franka, to humanoid robots like Apptronik’s Apollo. A single model can be used across all these robots, in turn accelerating its learning across multiple embodiments.