Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders

Grathwohl, Will; Wilson, Aaron

Computer Science > Computer Vision and Pattern Recognition

arXiv:1612.04440 (cs)

[Submitted on 14 Dec 2016 (v1), last revised 19 Dec 2016 (this version, v2)]

Title:Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders

Authors:Will Grathwohl, Aaron Wilson

View PDF

Abstract:There are many forms of feature information present in video data. Principle among them are object identity information which is largely static across multiple video frames, and object pose and style information which continuously transforms from frame to frame. Most existing models confound these two types of representation by mapping them to a shared feature space. In this paper we propose a probabilistic approach for learning separable representations of object identity and pose information using unsupervised video data. Our approach leverages a deep generative model with a factored prior distribution that encodes properties of temporal invariances in the hidden feature set. Learning is achieved via variational inference. We present results of learning identity and pose information on a dataset of moving characters as well as a dataset of rotating 3D objects. Our experimental results demonstrate our model's success in factoring its representation, and demonstrate that the model achieves improved performance in transfer learning tasks.

Comments:	fixed typo in equation 16
Subjects:	Computer Vision and Pattern Recognition (cs.CV); Machine Learning (cs.LG); Machine Learning (stat.ML)
Cite as:	arXiv:1612.04440 [cs.CV]
	(or arXiv:1612.04440v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1612.04440

Submission history

From: Will Grathwohl [view email]
[v1] Wed, 14 Dec 2016 00:20:46 UTC (546 KB)
[v2] Mon, 19 Dec 2016 17:17:26 UTC (546 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CV

< prev | next >

new | recent | 2016-12

Change to browse by:

cs
cs.LG
stat
stat.ML

References & Citations

DBLP - CS Bibliography

listing | bibtex

Will Grathwohl
Aaron Wilson

export BibTeX citation

Computer Science > Computer Vision and Pattern Recognition

Title:Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Disentangling Space and Time in Video with Hierarchical Variational Auto-encoders

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators