Gemini Robotics 1.5
Powering an era of physical agents to transform how robots actively understand their environments
Gemini Robotics models allow robots of any shape and size to perceive, reason, use tools and interact with humans. They can solve a wide range of complex real-world tasks – even those they haven’t been trained to complete.
Gemini Robotics 1.5 is designed to reason through multi-step complex tasks, and to make decisions to form a plan of action. It will then work to carry out each step autonomously.
Capabilities
Gemini models are capable of responding to text, images, audio, and video. Gemini Robotics adds the ability to reason about physical spaces – allowing robots to take action in the real world.
-
Generality
Understands the physical world, and adapts and generalizes its behaviour to fit new situations. Breaks down goals into manageable steps to make longer-term plans and overcome unexpected problems.
-
Agentic
Assess complex challenges, natively call tools – like Google Search – to look up information, and create detailed step-by-step plans to overcome them.
-
Thinking
Enables robots to think before acting, improving the quality of their actions, and making their decisions more transparent in natural language.
-
Interactivity
Understands and responds to everyday commands. Can explain its approach while taking action. Users can redirect it at any point, without using technical language. It also adjusts to any changes in its environment.
-
Dexterity
Enables robots to tackle complex tasks requiring fine motor skills and precise manipulation – like folding origami, packing a lunch box, or preparing a salad.
-
Multiple embodiments
Adapts to a diverse array of robot forms, from bi-arm static robotic platforms like ALOHA and Bi-arm Franka, to humanoid robots like Apptronik’s Apollo. A single model can be used across all these robots, in turn accelerating its learning across multiple embodiments.
Hands-on
See how Gemini Robotics handles a range of different tasks.
Model and tools
We take a dual-model approach, pairing a vision-language-action (VLA) with an embodied reasoning (ER) model. Each model plays a specialized role, working together as a powerful and versatile system.
-
Gemini Robotics 1.5
Our most capable vision-language-action (VLA) model. It can ‘see’ (vision), ‘understand’ (language) and ‘act’ (action) within the physical world. It processes visual inputs and user prompts, learning within different embodiments and increasing its ability to generalize problem-solving.
-
Gemini Robotics-ER 1.5
Our state-of-the-art embodied reasoning model. It specializes in understanding physical spaces, planning, and making logical decisions relating to its surroundings. It doesn’t directly control robotic limbs – but provides high-level insights to help the VLA model decide what to do next.
-
Gemini Robotics On-Device
This iteration of our VLA model is incredibly versatile, and optimized to run locally on robotic devices. This will allow robotics developers to adapt the model to improve performance on their own applications.
Gemini Robotics SDK
Helping developers easily adapt our Gemini Robotics On-Device model to new tasks and environments.
Collaborations
We’re partnering with Apptronik to build the next generation of humanoid robots. We’re also working with over 60 trusted testers to guide the future of Gemini Robotics-ER.
Experience Gemini Robotics
If you're interested in testing our models, please share a few details to join the waitlist.