Gemini Robotics 1.5

Our most capable vision-language-action (VLA) model, which turns visual information and instructions into motor commands to perform a task.

Our agentic Gemini-based multimodal model allows robots to take action in the physical world.


Performance

Gemini Robotics 1.5 consistently outperforms our previous models across all four categories of generalization.

Bar chart comparing the 'Progress score' of three models: Gemini Robotics 1.5, Gemini Robotics, and Gemini Robotics On-Device. Gemini Robotics 1.5 consistently outperforms the others across all five categories: In-Distribution (0.83), Instruction Generalization (0.76), Action Generalization (0.54), Visual Generalization (0.81), and Task Generalization (0.70). Bar chart comparing the 'Progress score' of three models: Gemini Robotics 1.5, Gemini Robotics, and Gemini Robotics On-Device. Gemini Robotics 1.5 consistently outperforms the others across all five categories: In-Distribution (0.83), Instruction Generalization (0.76), Action Generalization (0.54), Visual Generalization (0.81), and Task Generalization (0.70).

Model information

Name
Gemini Robotics 1.5
Status
Private preview
Input
  • Text
  • Image
Output
  • Text
  • Action
Input tokens
32k
Knowledge cutoff
October 2024
Availability
Partners
Model card
View model card
Technical report
View technical report