RegionDrop: Fast Human Pose Estimation Using Annotation-Aware Spatial Sparsity

Sada, Youki; Shibata, Seiya; Kobayashi, Yuki; Takenaka, Takashi

doi:10.1007/978-3-031-15937-4_63

Youki Sada¹²,
Seiya Shibata¹²,
Yuki Kobayashi¹² &
…
Takashi Takenaka¹²

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13532))

Included in the following conference series:

International Conference on Artificial Neural Networks

2087 Accesses

Abstract

Convolutional neural networks (CNN) have been attracting attention for accurate scene parsing including a human pose estimation. However, CNN requires a massive amount of floating-point operations, so it is difficult to realize CNN on low-cost devices. Thus, we propose RegionDrop, an annotation-aware spatially sparse network, which skips computations of unnecessary spatial regions in activations. We present a novel loss that directly uses annotations so that important activation regions are retained. We also developed an efficient sparse GPU kernel to accelerate processing speed of both depthwise and general $K\times K$ convolutional layers. Our RegionDrop is evaluated by using two pose estimation networks, a modified stacked hourglass network, and an HRNet. RegionDrop using an hourglass network archived 3.2 times faster processing speed compared with a non-sparse network, and 1.8 times faster processing speed than a prior spatially sparse network, with no accuracy degradation. Moreover, the processing speed of RegionDrop using HRNet is increased by a factor of 2.0 with negligible accuracy loss.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

GITPose: going shallow and deeper using vision transformers for human pose estimation

Article Open access 20 March 2024

Simple Pose Network with Skip-Connections for Single Human Pose Estimation

EfficientPose: Efficient human pose estimation with neural architecture search

Article Open access 07 April 2021

Notes

1.
http://cocodataset.org/#keypoints-eval.
2.
https://github.com/thomasverelst/dynconv is used to measure throughput of FP32 DynConv models and our implementation is used for FP16 DynConv models.

References

Andriluka, M., Pishchulin, L., Gehler, P., Schiele, B.: 2D human pose estimation: new benchmark and state of the art analysis. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2014)
Google Scholar
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2D pose estimation using part affinity fields. In: CVPR, pp. 7291–7299 (2017)
Google Scholar
Chen, L., et al.: SCA-CNN: spatial and channel-wise attention in convolutional networks for image captioning. In: CVPR, pp. 5659–5667 (2017)
Google Scholar
Cheng, B., Xiao, B., Wang, J., Shi, H., Huang, T.S., Zhang, L.: HigherHRNet: scale-aware representation learning for bottom-up human pose estimation. In: CVPR, pp. 5386–5395 (2020)
Google Scholar
Figurnov, M., et al.: Spatially adaptive computation time for residual networks. In: CVPR, pp. 1039–1048 (2017)
Google Scholar
Geng, Z., Sun, K., Xiao, B., Zhang, Z., Wang, J.: Bottom-up human pose estimation via disentangled keypoint regression. In: CVPR, pp. 14676–14686 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778 (2016)
Google Scholar
Howard, A.G., et al.: MobileNets: efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017)
Hua, W., Zhou, Y., De Sa, C.M., Zhang, Z., Suh, G.E.: Channel gating neural networks. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Google Scholar
Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: CVPR, pp. 4700–4708 (2017)
Google Scholar
Hubara, I., Courbariaux, M., Soudry, D., El-Yaniv, R., Bengio, Y.: Binarized neural networks. In: Advances in Neural Information Processing Systems, vol. 29 (2016)
Google Scholar
Jang, E., Gu, S., Poole, B.: Categorical reparameterization with gumbel-softmax. arXiv preprint arXiv:1611.01144 (2016)
Kreiss, S., Bertoni, L., Alahi, A.: PifPaf: composite fields for human pose estimation. In: CVPR, pp. 11977–11986 (2019)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Newell, A., Huang, Z., Deng, J.: Associative embedding: end-to-end learning for joint detection and grouping. arXiv preprint arXiv:1611.05424 (2016)
Newell, A., Yang, K., Deng, J.: Stacked hourglass networks for human pose estimation. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9912, pp. 483–499. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46484-8_29
Chapter Google Scholar
Ren, M., Pokrovsky, A., Yang, B., Urtasun, R.: SBNet: sparse blocks network for fast inference. In: CVPR, pp. 8711–8720 (2018)
Google Scholar
Sun, K., Xiao, B., Liu, D., Wang, J.: Deep high-resolution representation learning for human pose estimation. In: CVPR, pp. 5693–5703 (2019)
Google Scholar
Veit, A., Belongie, S.: Convolutional networks with adaptive inference graphs. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11205, pp. 3–18. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01246-5_1
Chapter Google Scholar
Verelst, T., Tuytelaars, T.: Dynamic convolutions: exploiting spatial sparsity for faster inference. In: CVPR, pp. 2320–2329 (2020)
Google Scholar
Woo, S., Park, J., Lee, J.-Y., Kweon, I.S.: CBAM: convolutional block attention module. In: Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y. (eds.) ECCV 2018. LNCS, vol. 11211, pp. 3–19. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01234-2_1
Chapter Google Scholar
Xie, Z., Zhang, Z., Zhu, X., Huang, G., Lin, S.: Spatially adaptive inference with stochastic feature sampling and interpolation. In: Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 531–548. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58452-8_31
Chapter Google Scholar
Zhang, X., Zhou, X., Lin, M., Sun, J.: ShuffleNet: an extremely efficient convolutional neural network for mobile devices. In: CVPR, pp. 6848–6856 (2018)
Google Scholar

Download references

Author information

Authors and Affiliations

Digital Technology Development Laboratories, NEC Corporation, Kanagawa, Japan
Youki Sada, Seiya Shibata, Yuki Kobayashi & Takashi Takenaka

Authors

Youki Sada
View author publications
You can also search for this author in PubMed Google Scholar
Seiya Shibata
View author publications
You can also search for this author in PubMed Google Scholar
Yuki Kobayashi
View author publications
You can also search for this author in PubMed Google Scholar
Takashi Takenaka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Youki Sada .

Editor information

Editors and Affiliations

University of the West of England, Bristol, UK
Elias Pimenidis
Lancaster University, Lancaster, UK
Plamen Angelov
Digital Innovation, Teeside University, Middlesbrough, UK
Chrisina Jayne
Democritus University of Thrace, Xanthi, Greece
Antonios Papaleonidas
The University of the West of England, Bristol, UK
Mehmet Aydin

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Sada, Y., Shibata, S., Kobayashi, Y., Takenaka, T. (2022). RegionDrop: Fast Human Pose Estimation Using Annotation-Aware Spatial Sparsity. In: Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M. (eds) Artificial Neural Networks and Machine Learning – ICANN 2022. ICANN 2022. Lecture Notes in Computer Science, vol 13532. Springer, Cham. https://doi.org/10.1007/978-3-031-15937-4_63

Download citation

DOI: https://doi.org/10.1007/978-3-031-15937-4_63
Published: 07 September 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-15936-7
Online ISBN: 978-3-031-15937-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

RegionDrop: Fast Human Pose Estimation Using Annotation-Aware Spatial Sparsity

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GITPose: going shallow and deeper using vision transformers for human pose estimation

Simple Pose Network with Skip-Connections for Single Human Pose Estimation

EfficientPose: Efficient human pose estimation with neural architecture search

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

RegionDrop: Fast Human Pose Estimation Using Annotation-Aware Spatial Sparsity

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

GITPose: going shallow and deeper using vision transformers for human pose estimation

Simple Pose Network with Skip-Connections for Single Human Pose Estimation

EfficientPose: Efficient human pose estimation with neural architecture search

Notes

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation