Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Designing Compact Convolutional Filters for Lightweight Human Pose Estimation

Published: 01 January 2021 Publication History

Abstract

Existing methods for human pose estimation usually use a large intermediate tensor, leading to a high computational load, which is detrimental to resource-limited devices. To solve this problem, we propose a low computational cost pose estimation network, MobilePoseNet, which includes encoder, decoder, and parallel nonmaximum suppression operation. Specifically, we design a lightweight upsampling block instead of transposing the convolution as the decoder and use the lightweight network as our downsampling part. Then, we choose the high-resolution features as the input for upsampling to reduce the number of model parameters. Finally, we propose a parallel OKS-NMS, which significantly outperforms the conventional NMS in terms of accuracy and speed. Experimental results on the benchmark datasets show that MobilePoseNet obtains almost comparable results to state-of-the-art methods with a low compilation load. Compared to SimpleBaseline, the parameter of MobilePoseNet is only 4%, while the estimation accuracy reaches 98%.

References

[1]
J. Shotton, A. Fitzgibbon, M. Cook, T. Sharp, M. Finocchio, R. Moore, A. Kipman, and A. Blake, “Real-time human pose recognition in parts from single depth images,” in Computer Vision and Pattern Recognition 2011, pp. 1297–1304, Colorado Springs, USA, 2011.
[2]
N.-G. Cho, A. L. Yuille, and S.-W. Lee, “Adaptive occlusion state estimation for human pose tracking under self- occlusions,” Pattern Recognition, vol. 46, no. 3, pp. 649–661, 2013.
[3]
G. Cheron, I. Laptev, and C. Schmid, “P-CNN: pose-based CNN features for action recognition,” in International Conference on Computer Vision, pp. 3218–3226, Santiago, Chile, 2015.
[4]
G. Papandreou, T. Zhu, L.-C. Chen, S. Gidaris, J. Tompson, and K. Murphy, “PersonLab: person pose estimation and instance segmentation with a bottom-up, part-based, geometric embedding model,” in European Conference on Computer Vision, pp. 282–299, Munich, Germany, 2018.
[5]
K. Sun, B. Xiao, D. Liu, and J. Wang, “Deep high-resolution representation learning for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703, Long Beach, USA, 2019.
[6]
A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European Conference on Computer Vision, pp. 483–499, Glasgow, United Kingdom, 2016.
[7]
F. Zhang, X. Zhu, and M. Ye, “Fast human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3517–3526, Long Beach, USA, 2019.
[8]
Y. Chen, Y. Tian, and M. He, “Monocular human pose estimation: a survey of deep learning-based methods,” computer vision and image understanding, vol. 192, article 102897, 2020.
[9]
W. Li, Z. Wang, B. Yin, Q. Peng, Y. Du, T. Xiao, G. Yu, H. Lu, Y. Wei, and J. Sun, “Rethinking on multi-stage networks for human pose estimation,” 2019, https://arxiv.org/abs/1901.00148.
[10]
A. Bulat and G. Tzimiropoulos, “Binarized convolutional landmark localizers for human pose estimation and face alignment with limited resources,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3726–3734, Venice, Italy, 2017.
[11]
Z. Zhang, J. Tang, and G. Wu, “Simple and lightweight human pose estimation.” https://arxiv.org/abs/1911.10346.
[12]
M. Ding, X. Lian, L. Yang, P. Wang, X. Jin, Z. Lu, and P. Luo, “HR-NAS: searching efficient high-resolution neural architectures with lightweight transformers,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2982–2992, 2021.
[13]
A. Howard, R. Pang, H. Adam, Q. Le, M. Sandler, B. Chen, W. Wang, L.-C. Chen, M. Tan, G. Chu, V. Vasudevan, and Y. Zhu, “Searching for MobileNetV3,” in Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324, Seoul, Korea (south), 2019.
[14]
X. Wang, R. Zhang, T. Kong, L. Li, and C. Shen, “SOLOv2: dynamic and fast instance segmentation,” pp. 17721–17732, 2020, https://arxiv.org/abs/2003.10152.
[15]
G. Papandreou, T. Zhu, N. Kanazawa, A. Toshev, J. Tompson, C. Bregler, and K. Murphy, “Towards accurate multi-person pose estimation in the wild,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3711–3719, Hawaii, USA, 2017.
[16]
A. Toshev and C. Szegedy, “DeepPose: human pose estimation via deep neural networks,” in Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1653–1660, Columbus, USA, 2014.
[17]
C. Yu, B. Xiao, C. Gao, L. Yuan, L. Zhang, N. Sang, and J. Wang, “Lite-HRNet: a lightweight high-resolution network,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10440–10450, 2021.
[18]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778, Las Vegas, USA, 2016.
[19]
M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: inverted residuals and linear bottlenecks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 4510–4520, Salt Lake City, USA, 2018.
[20]
W. Lu, R. Yu, S. Wang, C. Wang, P. Jian, and H. Huang, “Sentence semantic matching based on 3D CNN for human–robot language interaction,” ACM Transactions on Internet Technology (TOIT), vol. 21, no. 4, pp. 1–24, 2021.
[21]
Q. Zhou, X. Wu, S. Zhang, B. Kang, Z. Ge, and L. Jan Latecki, “Contextual ensemble network for semantic segmentation,” Pattern Recognition, vol. 122, article 108290, 2022.
[22]
K. He, G. Gkioxari, P. Dollar, and R. Girshick, “Mask R-CNN,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2980–2988, Venice,Italy, 2017.
[23]
Q. Zhou, Y. Wang, Y. Fan, X. Wu, S. Zhang, B. Kang, and L. J. Latecki, “AGLNet: towards real-time semantic segmentation of self-driving images via attention-guided lightweight network,” applied soft computing, vol. 96, p. 106682, 2020.
[24]
Q. Zhou, Y. Wang, J. Liu, X. Jin, and L. J. Latecki, “An open-source project for real-time image semantic segmentation,” SCIENCE CHINA Information Sciences, vol. 62, no. 12, article 227101, 2019.
[25]
B. Xiao, H. Wu, and Y. Wei, “Simple baselines for human pose estimation and tracking,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 472–487, Munich,Germany, 2018.
[26]
J. Hu, L. Shen, S. Albanie, G. Sun, and E. Wu, “Squeeze-and-excitation networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2011–2023, Salt Lake City, USA, 2018.
[27]
X. Sun, B. Xiao, F. Wei, S. Liang, and Y. Wei, “Integral human pose regression,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 536–553, Munich,Germany, 2018.
[28]
Y. Chen, Z. Wang, Y. Peng, Z. Zhang, G. Yu, and J. Sun, “Cascaded pyramid network for multi-person pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7103–7112, Salt Lake City, USA, 2018.
[29]
T.-Y. Lin, M. Maire, S. J. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick, “Microsoft COCO: common objects in context,” in European Conference on Computer Vision, pp. 740–755, Zurich, Switzerland, 2014.
[30]
M. Andriluka, L. Pishchulin, P. Gehler, and B. Schiele, “2D human pose estimation: new benchmark and state of the art analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3686–3693, Columbus, USA, 2014.
[31]
S. Huang, M. Gong, and D. Tao, “A coarse-fine network for keypoint localization,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 3047–3056, Venice, Italy, 2017.
[32]
F. Zhang, X. Zhu, H. Dai, M. Ye, and C. Zhu, “Distribution-aware coordinate representation for human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7093–7102, Seattle, USA, 2020.
[33]
N. Ma, X. Zhang, H.-T. Zheng, and J. Sun, “ShuffleNet V2: practical guidelines for efficient CNN architecture design,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 122–138, Munich,Germany, 2018.
[34]
Z. Cao, T. Simon, S.-E. Wei, and Y. Sheikh, “Realtime multi-person 2D pose estimation using part affinity fields,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1302–1310, Hawaii, USA, 2017.
[35]
A. Newell, Z. Huang, and J. Deng, “Associative embedding: end-to-end learning for joint detection and grouping,” Advances in neural information processing systems, vol. 30, pp. 2278–2288, 2017.
[36]
M. Kocabas, S. Karagoz, and E. Akbas, “MultiPoseNet: fast multi-person pose estimation using pose residual network,” in Proceedings of the European Conference on Computer Vision (ECCV), pp. 437–453, Munich,Germany, 2018.
[37]
B. Cheng, B. Xiao, J. Wang, H. Shi, T. S. Huang, and L. Zhang, “HigherHRNet: scale-aware representation learning for bottom-up human pose estimation,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5386–5395, Seattle, USA, 2020.
[38]
H.-S. Fang, S. Xie, Y.-W. Tai, and C. Lu, “RMPE: regional multi-person pose estimation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 2353–2362, Venice,Italy, 2017.
[39]
W. Yang, S. Li, W. Ouyang, H. Li, and X. Wang, “Learning feature pyramids for human pose estimation,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 1290–1299, Venice,Italy, 2017.

Index Terms

  1. Designing Compact Convolutional Filters for Lightweight Human Pose Estimation
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Wireless Communications & Mobile Computing
        Wireless Communications & Mobile Computing  Volume 2021, Issue
        2021
        14355 pages
        This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

        Publisher

        John Wiley and Sons Ltd.

        United Kingdom

        Publication History

        Published: 01 January 2021

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 0
          Total Downloads
        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 09 Nov 2024

        Other Metrics

        Citations

        View Options

        View options

        Get Access

        Login options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media