Gemini Robotics
Our most advanced vision-language-action (VLA) model
Our Gemini 2.0-based model allows robots to take action in the physical world.
Key capabilities
-
Generality
Uses Gemini's world understanding to generalize to novel situations, including dealing with new objects, diverse instructions, and new environments.
-
Interactivity
Understands and responds to everyday commands, and reacts to sudden changes in instructions – or its surroundings. Then carries on without further input.
-
Dexterity
Enables robots to tackle complex tasks requiring fine motor skills and precise manipulation – like folding origami, packing a lunch box, or preparing a salad.
-
Multiple embodiments
Adapts to diverse robot types, from bi-arm platforms like ALOHA 2 to complex humanoid robots like Apptronik’s Apollo.
Benchmarks
Gemini Robotics success rate compared to state-of-the-art multi-task diffusion policy across a range of in - and out of - distribution tasks. See Tech Report for full results and baseline details.
Capability | Description |
Multi-Task Diffusion
|
Gemini Robotics
|
---|---|---|---|
In-distribution task performance | Model performance averaged across tasks and initial conditions present in the training data for short-horizon dexterous tasks. | 42.6% | 74.5% |
Visual generalization | Measures model robustness to visual changes of the scene, including variations in background,lighting conditions, distractor objects or textures. | 23.1% | 50.0% |
Instruction generalization | Understanding invariance and equivalence in natural language instructions, measures ability to understand paraphrasing, be robust to typos, understand different languages, and varying levels of specificities. | 18.5% | 39.3% |
Action generalization | Capability of the model to generalize to new motions require to deal with new initial conditions (e.g., object placement) or object instances (e.g., shape or physical properties) not seen during training. | 16.7% | 52.8% |
Instruction following | Ability to closely follow detailed natural language instructions to complete diverse pick and place tasks. | 17% | 87% |
Long-horizon dexterity after fine-tuning | Model’s ability to become proficient at much more challenging long-horizon dexterous tasks with further fine-tuning and specialization. | 33% | 78.8% |
Model information
Model deployment status | Private preview |
Supported data types for input | Image, Text |
Supported data types for output | Action |
Supported # tokens for input | 32k |
Knowledge cutoff | June 2024 |
Availability | Partners |