research-article

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

Authors:

Wenchuan Zhang,

Wu Zeng Academic Editor:

Chi-Hua ChenAuthors Info & Claims

Wireless Communications and Mobile Computing, Volume 2021

https://doi.org/10.1155/2021/1333250

Published: 01 January 2021 Publication History

Abstract

Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation. Specifically, we design a lightweight upsampling block instead of transposing the convolution as the decoder and use the lightweight network as our downsampling part. Then, we choose the high-resolution features as the input for upsampling to reduce the number of model parameters. Finally, we propose a parallel OKS-NMS, which significantly outperforms the conventional NMS in terms of accuracy and speed. Experimental results on the benchmark datasets show that MobilePoseNet obtains almost comparable results to state-of-the-art methods with a low compilation load. Compared to SimpleBaseline, the parameter of MobilePoseNet is only 4%, while the estimation accuracy reaches 98%.

References

[1]

J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, “Real-time human pose recognition in parts from single depth images,” in Computer Vision and Pattern Recognition 2011, pp. 1297–1304, Colorado Springs, USA, 2011.

[2]

N.-G. Cho, A. L. Yuille, and S.-W. Lee, “Adaptive occlusion state estimation for human pose tracking under self- occlusions,” Pattern Recognition, vol. 46, no. 3, pp. 649–661, 2013.

Digital Library

[3]

G. Cheron, I. Laptev, and C. Schmid, “P-CNN: pose-based CNN features for action recognition,” in International Conference on Computer Vision, pp. 3218–3226, Santiago, Chile, 2015.

[4]

G. Papandreou, T. Zhu, L.-C. Chen, S. Gidaris, J. Tompson, and K. Murphy, “PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model,” in European Conference on Computer Vision, pp. 282–299, Munich, Germany, 2018.

[5]

K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703, Long Beach, USA, 2019.

[6]

A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European Conference on Computer Vision, pp. 483–499, Glasgow, United Kingdom, 2016.

[7]

F. Zhang, X. Zhu, and M. Ye, “Fast human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3517–3526, Long Beach, USA, 2019.

[8]

Y. Chen, Y. Tian, and M. He, “Monocular human pose estimation: a survey of deep learning-based methods,” computer vision and image understanding, vol. 192, article 102897, 2020.

Digital Library

[9]

W. Li, Z. Wang, B. Yin, Q. Peng, Y. Du, T. Xiao, G. Yu, H. Lu, Y. Wei, and J. Sun, “Rethinking on multi-stage networks for human pose estimation,” 2019, https://arxiv.org/abs/1901.00148.

[10]

A. Bulat and G. Tzimiropoulos, “Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3726–3734, Venice, Italy, 2017.

[11]

Z. Zhang, J. Tang, and G. Wu, “Simple and lightweight human pose estimation.” https://arxiv.org/abs/1911.10346.

[12]

M. Ding, X. Lian, L. Yang, P. Wang, X. Jin, Z. Lu, and P. Luo, “HR-NAS: searching efficient high-resolution neural architectures with lightweight transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2982–2992, 2021.

[13]

A. Howard, R. Pang, H. Adam, Q. Le, M. Sandler, B. Chen, W. Wang, L.-C. Chen, M. Tan, G. Chu, V. Vasudevan, and Y. Zhu, “Searching for MobileNetV3,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324, Seoul, Korea (south), 2019.

[14]

X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “SOLOv2: dynamic and fast instance segmentation,” pp. 17721–17732, 2020, https://arxiv.org/abs/2003.10152.

[15]

G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, “Towards accurate multi-person pose estimation in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3711–3719, Hawaii, USA, 2017.

[16]

A. Toshev and C. Szegedy, “DeepPose: human pose estimation via deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1653–1660, Columbus, USA, 2014.

Digital Library

[17]

C. Yu, B. Xiao, C. Gao, L. Yuan, L. Zhang, N. Sang, and J. Wang, “Lite-HRNet: a lightweight high-resolution network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10440–10450, 2021.

[18]

K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, USA, 2016.

[19]

M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520, Salt Lake City, USA, 2018.

[20]

W. Lu, R. Yu, S. Wang, C. Wang, P. Jian, and H. Huang, “Sentence semantic matching based on 3D CNN for human–robot language interaction,” ACM Transactions on Internet Technology (TOIT), vol. 21, no. 4, pp. 1–24, 2021.

Digital Library

[21]

Q. Zhou, X. Wu, S. Zhang, B. Kang, Z. Ge, and L. Jan Latecki, “Contextual ensemble network for semantic segmentation,” Pattern Recognition, vol. 122, article 108290, 2022.

Digital Library

[22]

K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988, Venice,Italy, 2017.

[23]

Q. Zhou, Y. Wang, Y. Fan, X. Wu, S. Zhang, B. Kang, and L. J. Latecki, “AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network,” applied soft computing, vol. 96, p. 106682, 2020.

Digital Library

[24]

Q. Zhou, Y. Wang, J. Liu, X. Jin, and L. J. Latecki, “An open-source project for real-time image semantic segmentation,” SCIENCE CHINA Information Sciences, vol. 62, no. 12, article 227101, 2019.

[25]

B. Xiao, H. Wu, and Y. Wei, “Simple baselines for human pose estimation and tracking,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 472–487, Munich,Germany, 2018.

[26]

J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2011–2023, Salt Lake City, USA, 2018.

Digital Library

[27]

X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, “Integral human pose regression,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 536–553, Munich,Germany, 2018.

[28]

Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun, “Cascaded pyramid network for multi-person pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7103–7112, Salt Lake City, USA, 2018.

[29]

T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in European Conference on Computer Vision, pp. 740–755, Zurich, Switzerland, 2014.

[30]

M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, “2D human pose estimation: new benchmark and state of the art analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3686–3693, Columbus, USA, 2014.

[31]

S. Huang, M. Gong, and D. Tao, “A coarse-fine network for keypoint localization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3047–3056, Venice, Italy, 2017.

[32]

F. Zhang, X. Zhu, H. Dai, M. Ye, and C. Zhu, “Distribution-aware coordinate representation for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7093–7102, Seattle, USA, 2020.

[33]

N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: practical guidelines for efficient CNN architecture design,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 122–138, Munich,Germany, 2018.

[34]

Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2D pose estimation using part affinity fields,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302–1310, Hawaii, USA, 2017.

[35]

A. Newell, Z. Huang, and J. Deng, “Associative embedding: end-to-end learning for joint detection and grouping,” Advances in neural information processing systems, vol. 30, pp. 2278–2288, 2017.

[36]

M. Kocabas, S. Karagoz, and E. Akbas, “MultiPoseNet: fast multi-person pose estimation using pose residual network,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 437–453, Munich,Germany, 2018.

[37]

B. Cheng, B. Xiao, J. Wang, H. Shi, T. S. Huang, and L. Zhang, “HigherHRNet: scale-aware representation learning for bottom-up human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5386–5395, Seattle, USA, 2020.

[38]

H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, “RMPE: regional multi-person pose estimation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2353–2362, Venice,Italy, 2017.

[39]

W. Yang, S. Li, W. Ouyang, H. Li, and X. Wang, “Learning feature pyramids for human pose estimation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1290–1299, Venice,Italy, 2017.

Index Terms

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation
1. Computing methodologies

Index terms have been assigned to the content through auto-classification.

Recommendations

3D Human pose estimation

Review of the recent literature in 3D human pose estimation from RGB images and videos.Release of a challenging, publicly available, 3D pose estimation synthetic dataset.Extensive experimental evaluation of some representative state-of-the-art methods. ...
Efficient High-Resolution Human Pose Estimation
PRICAI 2022: Trends in Artificial Intelligence
Abstract
As a fundamental task of computer vision, human pose estimation (HPE) has achieved significant improvement with the rise of deep learning. However, many existing methods focus too much on model accuracy, leading to high complexity models, which ...
Lightweight Multiperson Pose Estimation With Staggered Alignment Self-Distillation
Accurate 2D human pose estimation from images is vital for understanding human actions. However, deploying the latest models, e.g., regression-based models, on resource-limited devices remains challenging due to their high computational requirements. In ...

Comments

Information & Contributors

Information

Published In

cover image Wireless Communications & Mobile Computing

Wireless Communications & Mobile Computing Volume 2021, Issue

2021

14355 pages

ISSN:1530-8669

Issue’s Table of Contents

Copyright © 2021 Shili Niu et al.

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Publisher

John Wiley and Sons Ltd.

United Kingdom

Publication History

Published: 01 January 2021

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents