Scaling up learning across many different robot types
Together with partners from 33 academic labs, we have pooled data from 22 different robot types to create the Open X-Embodiment dataset and RT-X model
Robots are great specialists, but poor generalists. Typically, you have to train a model for each task, robot, and environment. Changing a single variable often requires starting from scratch. But what if we could combine the knowledge across robotics and create a way to train a general-purpose robot?
Today, we are launching a new set of resources for general-purpose robotics learning across different robot types, or embodiments. Together with partners from 33 academic labs we have pooled data from 22 different robot types to create the Open X-Embodiment dataset. We also release RT-1-X, a robotics transformer (RT) model derived from RT-1 and trained on our dataset, that shows skills transfer across many robot embodiments.
In this work, we show training a single model on data from multiple embodiments leads to significantly better performance across many robots than those trained on data from individual embodiments. We tested our RT-1-X model in five different research labs, demonstrating 50% success rate improvement on average across five different commonly used robots compared to methods developed independently and specifically for each robot. We also showed that training our visual language action model, RT-2, on data from multiple embodiments tripled its performance on real-world robotic skills.
We developed these tools to collectively advance cross-embodiment research in the robotics community. The Open X-Embodiment dataset and RT-1-X model checkpoint are now available for the benefit of the broader research community, thanks to the work of robotics labs around the world that shared data and helped evaluate our model in a commitment to openly and responsibly developing this technology. We believe these tools will transform the way robots are trained and accelerate this field of research.
Open X-Embodiment Dataset: Collecting data to train AI robots
Datasets, and the models trained on them, have played a critical role in advancing AI. Just as ImageNet propelled computer vision research, we believe Open X-Embodiment can do the same to advance robotics. Building a dataset of diverse robot demonstrations is the key step to training a generalist model that can control many different types of robots, follow diverse instructions, perform basic reasoning about complex tasks, and generalize effectively. However, collecting such a dataset is too resource-intensive for any single lab.
To develop the Open X-Embodiment dataset, we partnered with academic research labs across more than 20 institutions to gather data from 22 robot embodiments, demonstrating more than 500 skills and 150,000 tasks across more than 1 million episodes. This dataset is the most comprehensive robotics dataset of its kind.
RT-X: A general-purpose robotics model
RT-X builds on two of our robotics transformer models. We trained RT-1-X using RT-1, our model for real-world robotic control at scale, and we trained RT-2-X on RT-2, our vision-language-action (VLA) model that learns from both web and robotics data. Through this, we show that given the same model architecture, RT-1-X and RT-2-X are able to achieve greater performance thanks to the much more diverse, cross-embodiment data they are trained on. We also show that they improve on models trained in specific domains, and exhibit better generalization and new capabilities.
To evaluate RT-1-X in partner academic universities, we compared how it performed against models developed for their specific task, like opening a door, on corresponding dataset. RT-1-X trained with the Open X-Embodiment dataset outperformed the original model by 50% on average.
Emergent skills in RT-X
To investigate the transfer of knowledge across robots, we conduct experiments with our helper robot on tasks that involve objects and skills that are not present in the RT-2 dataset but exist in another dataset for a different robot. Specifically, RT-2-X was three times as successful as our previous best model, RT-2, for emergent skills.
Our results suggest that co-training with data from other platforms imbues RT-2-X with additional skills that were not present in the original dataset, enabling it to perform novel tasks.
RT-2-X demonstrates skills that the RT-2 model was not capable of previously, including better spatial understanding. For example, if we ask the robot to "move apple near cloth" instead of "move apple on cloth" the trajectories are quite different. By changing the preposition from "near" to "on", we can modulate the actions that robot takes.
RT-2-X shows that combining data from other robots into the training improves the range of tasks that can be performed even by a robot that already has large amounts of data available – but only when utilizing a sufficiently high-capacity architecture.
Responsibly advancing robotics research
Robotics research is at an exciting, but early, juncture. New research shows the potential to develop more useful helper robots by scaling learning with more diverse data, and better models. Working collaboratively with labs around the world and sharing resources is crucial to advancing robotics research in an open and responsible way. We hope that open sourcing the data and providing safe but limited models will reduce barriers and accelerate research. The future of robotics relies on enabling robots to learn from each other, and most importantly, allowing researchers to learn from one another.
This work demonstrates that models that generalize across embodiments are possible, with dramatic improvements in performance both with robots here at Google DeepMind and on robots at different universities around the world. Future research could explore how to combine these advances with the self-improvement property of RoboCat to enable the models to improve with their own experience. Another future direction could be to further probe how different dataset mixtures might affect cross-embodiment generalization and how the improved generalization materializes.
Partner with us: firstname.lastname@example.org