Models
Gemini Robotics On-Device brings AI to local robotic devices
We’re introducing an efficient, on-device robotics model with general-purpose dexterity and fast task adaptation.
In March, we introduced Gemini Robotics, our most advanced VLA (vision language action) model, bringing Gemini 2.0’s multimodal reasoning and real-world understanding into the physical world.
Today, we’re introducing Gemini Robotics On-Device, our most powerful VLA model optimized to run locally on robotic devices. Gemini Robotics On-Device shows strong general-purpose dexterity and task generalization, and it’s optimized to run efficiently on the robot itself.
Since the model operates independent of a data network, it’s helpful for latency sensitive applications, and ensures robustness in environments with intermittent or zero connectivity.
We’re also sharing a Gemini Robotics SDK to help developers easily evaluate Gemini Robotics On-Device on their tasks and environments, test our model in our MuJoCo physics simulator, and quickly adapt it to new domains, with as few as 50 to 100 demonstrations. Developers can access the SDK by signing up to our trusted tester program.
Model capabilities and performance
Gemini Robotics On-Device is a robotics foundation model for bi-arm robots, engineered to require minimal computational resources. It builds on the task generalization and dexterity capabilities of Gemini Robotics and is:
- Designed for rapid experimentation with dexterous manipulation.
- Adaptable to new tasks through fine-tuning to improve performance.
- Optimized to run locally with low-latency inference.
Gemini Robotics On-Device achieves strong visual, semantic and behavioral generalization across a wide range of testing scenarios, follows natural language instructions, and completes highly-dexterous tasks like unzipping bags or folding clothes — all while operating directly on the robot.
In our evaluations, our On-Device mode exhibits strong generalization performance while running entirely locally.
Chart evaluating Gemini Robotics On-Device’s generalization performance, compared to our flagship Gemini Robotics model and the previous best on-device model.
Gemini Robotics On-Device also outperforms other on-device alternatives on more challenging out-of-distribution tasks and complex multi-step instructions. For developers seeking state-of-the-art results in these settings, without on-device limitations, we also offer the Gemini Robotics model.
Chart evaluating Gemini Robotics On-Device’s instruction following performance, compared to our flagship Gemini Robotics model and the previous best on-device model.
To learn more about our evaluations, read our Gemini Robotics tech report.
Adaptable to new tasks, generalizable across embodiments
Gemini Robotics On-Device is the first VLA model we're making available for fine-tuning. While many tasks will work out of the box, developers can also choose to adapt the model to achieve better performance for their applications. Our model quickly adapts to new tasks, with as few as 50 to 100 demonstrations — indicating how well this on-device model can generalize its foundational knowledge to new tasks.
Here, we show how Gemini Robotics On-Device outperforms the current, best on-device VLA on tasks involving fine-tuning to newer models. We tested the model on seven dexterous manipulation tasks of varying degrees of difficulty, including zipping a lunch-box, drawing a card and pouring salad dressing.
Chart showing Gemini Robotics On-Device’s task adaptation performance, with fewer than 100 examples.
We further adapted the Gemini Robotics On-Device model to different robot embodiments. While we trained our model only for ALOHA robots, we were able to further adapt it to a bi-arm Franka FR3 robot and the Apollo humanoid robot by Apptronik.
On the bi-arm Franka, the model performs general-purpose instruction following, including handling previously unseen objects and scenes, completing dexterous tasks like folding a dress, or executing industrial belt assembly tasks that require precision and dexterity.
On the Apollo humanoid, we adapt the model to a significantly different embodiment. The same generalist model can follow natural language instructions and manipulate different objects, including previously unseen objects, in a general manner.
Responsible development and safety
We’re developing all Gemini Robotics models in alignment with our AI Principles and applying a holistic safety approach spanning semantic and physical safety.
In practice, we capture semantic and content safety using the Live API, and interface our models with low-level safety critical controllers to execute the actions. We recommend evaluating the end-to-end system on our recently developed semantic safety benchmark and performing red-teaming exercises at all levels to expose the model’s safety vulnerabilities.
Our Responsible Development & Innovation (ReDI) team continues to analyze and advise on the real-world impact of all Gemini Robotics models, finding ways to maximize their societal impact and minimize risk. Then our Responsibility & Safety Council (RSC) reviews these assessments, providing feedback to integrate into model development to help further maximize benefits and minimize risk.
To gain a deeper understanding of Gemini Robotics On-Device’s usage and safety profile and to gather feedback, we’re initially releasing it to a select group of trusted testers.
Accelerating innovation in robotics
Gemini Robotics On-Device marks a step forward in making powerful robotics models more accessible and adaptable — and our on-device solution will help the robotics community tackle important latency and connectivity challenges.
The Gemini Robotics SDK will further accelerate innovation by allowing developers to adapt the model to their specific needs. Sign up for model and SDK access via our trusted tester program.
We’re excited to see what the robotics community will build with these new tools as we continue to explore the future of bringing AI into the physical world.
Acknowledgements
We gratefully acknowledge contributions, advice, and support from Abbas Abdolmaleki, Saminda Abeyruwan, Joshua Ainslie, Jean-Baptiste Alayrac, Montserrat Gonzalez Arenas, Travis Armstrong, Maria Attarian, Ashwin Balakrishna, Yanan Bao, Clara Barbu, Catarina Barros, Robert Baruch, Nathan Batchelor, Maria Bauza, Lucas Beyer, Michael Bloesch, Michiel Blokzijl, Steven Bohez, Konstantinos Bousmalis, Demetra Brady, Philemon Brakel, Anthony Brohan, Thomas Buschmann, Arunkumar Byravan, Kendra Byrne, Serkan Cabi, Ken Caluwaerts, Federico Casarini, Christine Chan, Oscar Chang, Jose Enrique Chen, Xi Chen, Huizhong Chen, Hao-Tien Lewis Chiang, Krzysztof Choromanski, Adrian Collister, Kieran Connell, David D'Ambrosio, Sudeep Dasari, Todor Davchev, Coline Devin, Norman Di Palo, Tianli Ding, Adil Dostmohamed, Anca Dragan, Yilun Du, Debidatta Dwibedi, Michael Elabd, Tom Erez, Claudio Fantacci, Cody Fong, Erik Frey, Chuyuan Fu, Frankie Garcia, Ashley Gibb, Marissa Giustina, Keerthana Gopalakrishnan, Laura Graesser, Simon Green, Oliver Groth, Roland Hafner, Leonard Hasenclever, Sam Haves, Nicolas Heess, Brandon Hernaez, Tim Hertweck, Alexander Herzog, R. Alex Hofer, Sandy H Huang, Jan Humplik , Atil Iscen, Mithun George Jacob, Deepali Jain, Sally Jesmonth, Ryan Julian, Dmitry Kalashnikov, M. Emre Karagozler, Stefani Karp, Chase Kew, Jerad Kirkland, Sean Kirmani, Yuheng Kuang, Thomas Lampe, Antoine Laurens, Isabel Leal, Alex X. Lee, Tsang-Wei Edward Lee, Jennie Lees, Jacky Liang, Yixin Lin, Li-Heng Lin, Caden Lu, Sharath Maddineni, Anirudha Majumdar, Kevis-Kokitsi Maninis, Siobhan Mcloughlin, Assaf Hurwitz Michaely, Joss Moore, Robert Moreno, Thomas Mulc, Michael Neunert, Francesco Nori, Dave Orr, Carolina Parada, Emilio Parisotto, Peter Pastor, André Susano Pinto, Acorn Pooley, Grace Popple, Thomas Power, Alessio Quaglino, Haroon Qureshi, Kanishka Rao, Dushyant Rao, Krista Reymann, Martin Riedmiller, Francesco Romano, Keran Rong, Dorsa Sadigh, Stefano Saliceti, Daniel Salz, Pannag Sanketi, Mili Sanwalka, Kevin Sayed, Pierre Sermanet, Dhruv Shah, Mohit Sharma, Kathryn Shea, Mohit Shridhar, Charles Shu, Vikas Sindhwani, Sumeet Singh, Radu Soricut, Andreas Steiner, Rachel Sterneck, Ian Storz, Razvan Surdulescu, Ben Swanson, Mitri Syriani, Jie Tan, Yuval Tassa, Alan Thompson, Dhruva Tirumala, Jonathan Tompson, Karen Truong, Jake Varley, Siddharth Verma, Grace Vesom, Giulia Vezzani, Oriol Vinyals, Ayzaan Wahid, Zhicheng Wang, Stefan Welker, Paul Wohlhart, Chengda Wu, Markus Wulfmeier, Fei Xia, Ted Xiao, Annie Xie, Jinyu Xie, Peng Xu, Sichun Xu, Ying Xu, Zhuo Xu, Yuxiang Yang, Rui Yao, Sergey Yaroshenko, Matt Young, Wenhao Yu, Wentao Yuan, Martina Zambelli, Xiaohua Zhai, Jingwei Zhang, Tingnan Zhang, Allan Zhou, Yuxiang Zhou, Guangyao (Stannis) Zhou, Howard Zhou.
We also thank the operations and support staff that performed data collection and robot evaluations for this project.