October 9, 2023

DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Abstract

Visual understanding of our world goes beyond the semantics and flat structure of individual images. In this paper, we work towards capturing both the 3D structure as well as the dynamics of real-world scenes from monocular real-world videos. Our model, the Dynamic Scene Transformer (DyST), builds upon recent work in neural scene representation and learns a latent decomposition into scene content as well as per-view scene dynamics and camera pose. This separation is achieved through a special co-training scheme on monocular videos and our new synthetic dataset DySO. DyST learns tangible latent representations for dynamic scenes that enable view generation with separate control over the camera and the content of the scene.

Authors

Max Seitzer*, Sjoerd van Steenkiste, Thomas Kipf, Klaus Greff, Mehdi S. M. Sajjadi

* External author

Venue

ICLR 2024

Explore our next generation AI systems

Our latest AI breakthroughs and updates from the lab

Unlocking a new era of discovery with AI

Our mission is to build AI responsibly to benefit humanity

DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Abstract

Authors

Venue