TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Published: 2 October 2023

Abstract

We present a novel model for Tracking Any Point (TAP) that effectively tracks any queried point on any physical surface throughout a video sequence. Our approach employs two stages: (1) a matching stage, which independently locates a suitable candidate point match for the query point on every other frame, and (2) a refinement stage, which updates both the trajectory and query features based on local correlations. The resulting model surpasses all baseline methods by a significant margin on the TAP-Vid benchmark, as demonstrated by an approximate 20% absolute average Jaccard (AJ) improvement on DAVIS. Our model facilitates fast inference on long and high-resolution video sequences. On a modern GPU, our implementation has the capacity to track points faster than real-time, and can be flexibly extended to higher-resolution videos. Given the high-quality trajectories extracted from a large dataset, we demonstrate a proof-of-concept diffusion model which generates trajectories from static images, enabling plausible animations. Visualizations, source code, and pretrained models can be found at https://deepmind-tapir.github.io/

Authors

Carl Doersch, Yi Yang, Mel Vecerik, Dilara Gokay, Yusuf Aytar, Joao Carreira, Andrew Zisserman, Ankush Gupta

Venue

ICCV 2023

Gemini

Gemma

Generative models

Gemini model ecosystem

Projects

Publications

News

AI for biology

AI for climate and sustainability

AI for mathematics and computer science

AI for physics and chemistry

AI transparency

News

Careers

Milestones

Education

Responsibility

The Podcast

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Abstract

Authors

Venue

TAPIR: Tracking Any Point with per-frame Initialization and temporal Refinement

Share

Abstract

Authors

Venue