Recurrent Network Models for Human Dynamics

Fragkiadaki, Katerina; Levine, Sergey; Felsen, Panna; Malik, Jitendra

Computer Science > Computer Vision and Pattern Recognition

arXiv:1508.00271 (cs)

[Submitted on 2 Aug 2015 (v1), last revised 29 Sep 2015 (this version, v2)]

Title:Recurrent Network Models for Human Dynamics

Authors:Katerina Fragkiadaki, Sergey Levine, Panna Felsen, Jitendra Malik

View PDF

Abstract:We propose the Encoder-Recurrent-Decoder (ERD) model for recognition and prediction of human body pose in videos and motion capture. The ERD model is a recurrent neural network that incorporates nonlinear encoder and decoder networks before and after recurrent layers. We test instantiations of ERD architectures in the tasks of motion capture (mocap) generation, body pose labeling and body pose forecasting in videos. Our model handles mocap training data across multiple subjects and activity domains, and synthesizes novel motions while avoid drifting for long periods of time. For human pose labeling, ERD outperforms a per frame body part detector by resolving left-right body part confusions. For video pose forecasting, ERD predicts body joint displacements across a temporal horizon of 400ms and outperforms a first order motion model based on optical flow. ERDs extend previous Long Short Term Memory (LSTM) models in the literature to jointly learn representations and their dynamics. Our experiments show such representation learning is crucial for both labeling and prediction in space-time. We find this is a distinguishing feature between the spatio-temporal visual domain in comparison to 1D text, speech or handwriting, where straightforward hard coded representations have shown excellent results when directly combined with recurrent units.

Comments:	International Conference on Computer Vision 2015
Subjects:	Computer Vision and Pattern Recognition (cs.CV)
Cite as:	arXiv:1508.00271 [cs.CV]
	(or arXiv:1508.00271v2 [cs.CV] for this version)
	https://doi.org/10.48550/arXiv.1508.00271

Submission history

From: Katerina Fragkiadaki [view email]
[v1] Sun, 2 Aug 2015 18:59:52 UTC (3,679 KB)
[v2] Tue, 29 Sep 2015 01:28:23 UTC (4,308 KB)

Computer Science > Computer Vision and Pattern Recognition

Title:Recurrent Network Models for Human Dynamics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computer Vision and Pattern Recognition

Title:Recurrent Network Models for Human Dynamics

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators