Google Scholar

Occlusion-aware networks for 3d human pose estimation in video

Y Cheng, B Yang, B Wang, W Yan… - Proceedings of the …, 2019 - openaccess.thecvf.com

Proceedings of the IEEE/CVF international conference on …, 2019•openaccess.thecvf.com

Abstract

Occlusion is a key problem in 3D human pose estimation from a monocular video. To address this problem, we introduce an occlusion-aware deep-learning framework. By employing estimated 2D confidence heatmaps of keypoints and an optical-flow consistency constraint, we filter out the unreliable estimations of occluded keypoints. When occlusion occurs, we have incomplete 2D keypoints and feed them to our 2D and 3D temporal convolutional networks (2D and 3D TCNs) that enforce temporal smoothness to produce a complete 3D pose. By using incomplete 2D keypoints, instead of complete but incorrect ones, our networks are less affected by the error-prone estimations of occluded keypoints. Training the occlusion-aware 3D TCN requires pairs of a 3D pose and a 2D pose with occlusion labels. As no such a dataset is available, we introduce a" Cylinder Man Model" to approximate the occupation of body parts in 3D space. By projecting the model onto a 2D plane in different viewing angles, we obtain and label the occluded keypoints, providing us plenty of training data. In addition, we use this model to create a pose regularization constraint, preferring the 2D estimations of unreliable keypoints to be occluded. Our method outperforms state-of-the-art methods on Human 3.6 M and HumanEva-I datasets.

openaccess.thecvf.com

Show moreShow less

Save Cite Cited by 254 Related articles All 5 versions View as HTML

Cite

Advanced search

Saved to My library

Occlusion-aware networks for 3d human pose estimation in video