Abstract
Human pose estimation is one of the fundamental tasks in computer vision, applied in areas such as motion recognition, games, and animation production. Most of the current deep network models entail deepening the number of network layers to obtain better performance. This requires computational resources that exceed the computational capacity of embedded and mobile devices, thereby limiting the practical application of these approaches. In this paper, we propose a lightweight network model that incorporates the idea of Ghost modules. We design Ghost modules to replace the base modules in the original high-resolution network, thus reducing the network model parameters. In addition, we design a non-local high-resolution network that is fused in the 1/32 resolution stage of the network. This enables the network to acquire global features, thus improving the accuracy of human pose estimation and reducing the network parameters while ensuring the accuracy of the model. We verify the algorithm on the MPII and COCO datasets and the proposed model achieves a 1.8% improvement in accuracy while using 40% fewer parameters compared with the conventional high-resolution network.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Yan, S., Xiong, Y., Lin, D.: Spatial temporal graph convolutional networks for skeleton-based action recognition. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 7444–7452 (2018)
Xiao, B., Wu, H., Wei, Y.: Simple baselines for human pose estimation and tracking. In: Proceedings of the European Conference on Computer Vision, pp. 466–481 (2018)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: European Conference on Computer Vision, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Yang, W., Li, S., Ouyang, W., et al.: Learning feature pyramids for human pose estimation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1281–1290 (2017)
Chen, Y., Wang, Z., Peng, Y., et al.: Cascaded pyramid network for multi-person pose estimation. In: Proceedings of the IEEE Conference On Computer Vision And Pattern Recognition, pp. 7103–7112 (2018)
Pishchulin, L., Insafutdinov, E., Tang, S., et al.: Deepcut: joint subset partition and labeling for multi person pose estimation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4929–4937 (2016)
Insafutdinov, E., Pishchulin, L., Andres, B., et al.: Deepercut: a deeper, stronger, and faster multi-person pose estimation model. In: European Conference on Computer Vision. Springer, Cham pp. 34–50 (2016)
He, K.M., Zhang, X.Y., Ren, S.Q., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016). https://doi.org/10.1007/978-3-319-46466-4_3
Levinkov, E., Uhrig, J., Tang, S., et al.: Joint graph decomposition & node labeling: problem, algorithms, applications. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6012–6020 (2017)
Varadarajan, S., Datta, P., Tickoo, O.: A greedy part assignment algorithm for real-time multi-person 2D pose estimation. In: 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). IEEE, pp. 418–426 (2018)
Cao, Z., Simon, T., Wei, S.E., et al.: Realtime multi-person 2D pose estimation using part affinity fields. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7291–7299 (2017)
Xia, F., Wang, P., Chen, X., et al.: Joint multi-person pose estimation and semantic part segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 6769–6778 (2017)
Qiu, J., Yang, Y., Wang, X., Tao, D.: Hallucinating visual instances in total absentia. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12350, pp. 264–282. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58558-7_16
Yang, Y., Ren, Z., Li, H., Zhou, C., Wang, X., Hua, G.: Learning dynamics via graph neural networks for human pose estimation and tracking. In: Proceedings of the Computer Vision and Pattern Recognition (CVPR) (2021)
Rastegari, M., Ordonez, V., Redmon, J., et al.: Xnor-net: imagenet classification using binary convolutional neural networks. In: European Conference on Computer Vision, pp. 525–542. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46493-0_32
Zhao, R., Tang, C.Q., Liu, W.L., et al.: A new BP neural network pruning algorithm based on grey relational analysis. Technol. Innov. Appl. 13, 17–18 (2016)
Iandola, F.N., Han, S., Moskewicz, M.W., et al.: SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size [EB/OL] (2016). https://arxiv.org/pdf/1602.07360.pdf
Sun, K., Xiao, B., Liu, D., et al.: Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 5693–5703 (2019)
Han, K., Wang, Y., Tian, Q., et al.: GhostNet: more features from cheap operations. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 1580–1589 (2020)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Andriluka, M., Pishchulin, L., Gehler, P., et al.: 2D human pose estimation: new benchmark and state of the art analysis. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3686–3693 (2014)
Lin, T.-Y., et al.: Microsoft coco: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Zhang, C. et al. (2021). Lightweight Non-local High-Resolution Networks for Human Pose Estimation. In: Peng, Y., Hu, SM., Gabbouj, M., Zhou, K., Elad, M., Xu, K. (eds) Image and Graphics. ICIG 2021. Lecture Notes in Computer Science(), vol 12889. Springer, Cham. https://doi.org/10.1007/978-3-030-87358-5_33
Download citation
DOI: https://doi.org/10.1007/978-3-030-87358-5_33
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87357-8
Online ISBN: 978-3-030-87358-5
eBook Packages: Computer ScienceComputer Science (R0)