Gemini Robotics-ER
Our advanced embodied reasoning model
Our Gemini 2.0-based model gives advanced world understanding to robots.
Key capabilities
-
Object detection
Identifies and tracks the location and size of objects within 2D and 3D spaces.
-
Pointing
Identifies objects and elements within those objects, to interact with them.
-
Grasp prediction
Calculates how to grip objects, adjusting as necessary.
-
Trajectory reasoning
Generates a plan of the necessary actions needed to complete a task.
-
Multi-view correspondence
Can reason in 3D space and identify objects from different points of view.
Benchmarks
Capability | Benchmark | Description |
Gemini 2.0 Flash
|
Gemini Robotics-ER
|
---|---|---|---|---|
Pointing |
Paco-LVIS
|
Localization of object parts across image (LVIS) and video (Ego4D) datasets | 46.1% | 71.3% |
Pixmo-Point
|
Challenging pointing dataset, requires accurate localization of objects in cluttered scenes. | 25.8% | 49.5% | |
Where2Place
|
Predict spatial localization that requires one-step reasoning. | 33.8% | 45% | |
3D detection |
SUN-RGBD
|
Predict object 3D bounding boxes from monocular RGB images in SUN-RGBD indoor scenes dataset. | 30.7% | 48.3% |
Model information
Model deployment status | Private preview |
Supported data types for input | Image, Video, Text |
Supported data types for output | Text |
Supported # tokens for input | 32k |
Knowledge cutoff | June 2024 |
Availability | Trusted tester |