Occlusion-aware networks for 3d human pose estimation in video
Proceedings of the IEEE/CVF international conference on …, 2019•openaccess.thecvf.com
Occlusion is a key problem in 3D human pose estimation from a monocular video. To
address this problem, we introduce an occlusion-aware deep-learning framework. By
employing estimated 2D confidence heatmaps of keypoints and an optical-flow consistency
constraint, we filter out the unreliable estimations of occluded keypoints. When occlusion
occurs, we have incomplete 2D keypoints and feed them to our 2D and 3D temporal
convolutional networks (2D and 3D TCNs) that enforce temporal smoothness to produce a …
address this problem, we introduce an occlusion-aware deep-learning framework. By
employing estimated 2D confidence heatmaps of keypoints and an optical-flow consistency
constraint, we filter out the unreliable estimations of occluded keypoints. When occlusion
occurs, we have incomplete 2D keypoints and feed them to our 2D and 3D temporal
convolutional networks (2D and 3D TCNs) that enforce temporal smoothness to produce a …
Abstract
Occlusion is a key problem in 3D human pose estimation from a monocular video. To address this problem, we introduce an occlusion-aware deep-learning framework. By employing estimated 2D confidence heatmaps of keypoints and an optical-flow consistency constraint, we filter out the unreliable estimations of occluded keypoints. When occlusion occurs, we have incomplete 2D keypoints and feed them to our 2D and 3D temporal convolutional networks (2D and 3D TCNs) that enforce temporal smoothness to produce a complete 3D pose. By using incomplete 2D keypoints, instead of complete but incorrect ones, our networks are less affected by the error-prone estimations of occluded keypoints. Training the occlusion-aware 3D TCN requires pairs of a 3D pose and a 2D pose with occlusion labels. As no such a dataset is available, we introduce a" Cylinder Man Model" to approximate the occupation of body parts in 3D space. By projecting the model onto a 2D plane in different viewing angles, we obtain and label the occluded keypoints, providing us plenty of training data. In addition, we use this model to create a pose regularization constraint, preferring the 2D estimations of unreliable keypoints to be occluded. Our method outperforms state-of-the-art methods on Human 3.6 M and HumanEva-I datasets.
openaccess.thecvf.com