DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Seitzer, Maximilian; van Steenkiste, Sjoerd; Kipf, Thomas; Greff, Klaus; Sajjadi, Mehdi S. M.

Computer Science > Computer Vision and Pattern Recognition

arXiv:2310.06020v2 (cs)

[Submitted on 9 Oct 2023 (v1), last revised 15 Mar 2024 (this version, v2)]

Title:DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Authors:Maximilian Seitzer, Sjoerd van Steenkiste, Thomas Kipf, Klaus Greff, Mehdi S. M. Sajjadi

View PDF HTML (experimental)

Abstract:Visual understanding of the world goes beyond the semantics and flat structure of individual images. In this work, we aim to capture both the 3D structure and dynamics of real-world scenes from monocular real-world videos. Our Dynamic Scene Transformer (DyST) model leverages recent work in neural scene representation to learn a latent decomposition of monocular real-world videos into scene content, per-view scene dynamics, and camera pose. This separation is achieved through a novel co-training scheme on monocular videos and our new synthetic dataset DySO. DyST learns tangible latent representations for dynamic scenes that enable view generation with separate control over the camera and the content of the scene.

Comments:	ICLR 2024 spotlight. Project website: this https URL
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Artificial Intelligence (cs.AI); Graphics (cs.GR); Machine Learning (cs.LG); Robotics (cs.RO)
Cite as:	arXiv:2310.06020 [cs.CV]
	(or arXiv:2310.06020v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.2310.06020

Submission history

From: Maximilian Seitzer [view email]
[v1] Mon, 9 Oct 2023 18:00:01 UTC (1,037 KB)
[v2] Fri, 15 Mar 2024 13:53:19 UTC (4,150 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2023-10

Change to browse by:

cs
cs.AI
cs.GR
cs.LG
cs.RO

References & Citations

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:DyST: Towards Dynamic Neural Scene Representations on Real-World Videos

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators