research-article

Enhanced Human Pose Estimation with Attention-Augmented HRNet

Authors:

Junjie Zhang,

Haojie Yang,

Yancong DengAuthors Info & Claims

IPMV '24: Proceedings of the 2024 6th International Conference on Image Processing and Machine Vision

Pages 88 - 93

https://doi.org/10.1145/3645259.3645274

Published: 03 May 2024 Publication History

Get Access

Abstract

Human pose estimation is a pivotal task in computer vision, aiming to predict the spatial locations of key body joints within an image accurately. The challenge arises from the need to understand complex human poses, occlusions, and variations in body configurations, which often perplex traditional pose estimation models. To bolster the accuracy and robustness of human pose estimation models, we introduce an Attention-Augmented HRNet Architecture. This proposed model augments the original HRNet by integrating self-attention mechanisms. These mechanisms capture long-range dependencies among keypoints and concentrate on pivotal body regions more effectively. Experimental results demonstrate that the Attention-Augmented HRNet surpasses the baseline HRNet that lacks attention, attaining state-of-the-art performance on the COCO dataset. Specifically, our model achieves an Average Precision (AP) of 74.5%.

References

[1]

Andriluka, M., Pishchulin, L., Gehler, P., & Schiele, B. (2014). “2d human pose estimation: New benchmark and state of the art analysis”. In Proceedings of the IEEE Conference on computer Vision and Pattern Recognition.

Digital Library

Google Scholar

[2]

Sun, K., Xiao, B., Liu, D., & Wang, J. (2019). “Deep high-resolution representation learning for human pose estimation”. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.

Crossref

Google Scholar

[3]

MacKenzie, I. Scott. (2012). Human-computer interaction: An empirical research perspective.

Google Scholar

[4]

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., ... & Polosukhin, I. (2017). “Attention is all you need”. Advances in neural information processing systems.

Google Scholar

[5]

Albawi, S., Mohammed, T. A., & Al-Zawi, S. (2017). “Understanding of a convolutional neural network”. International conference on engineering and technology (ICET).

Crossref

Google Scholar

[6]

Medsker, L. R., & Jain, L. C. (2001). “Recurrent neural networks”. Design and Applications, 5(2): 64-67.

Google Scholar

[7]

Newell A, Yang K, Deng J. “Stacked hourglass networks for human pose estimation”. Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands.

Google Scholar

[8]

Woo, S., Park, J., Lee, J. Y., & Kweon, I. S. (2018), “Cbam: Convolutional block attention module”, Proceedings of the European conference on computer vision (ECCV), 3-19.

Digital Library

Google Scholar

[9]

Wang, J., Sun, K., Cheng, T., Jiang, B., Deng, C., Zhao, Y., ... & Fu, W. (2020). “Deep High-Resolution Representation Learning for Visual Recognition”. IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(2): 665-678.

Google Scholar

[10]

Lin T Y, Maire M, Belongie S, “Microsoft coco: Common objects in context”. Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland.

Google Scholar

Recommendations

Human Pose Estimation based on Attention Multi-resolution Network
ICMR '21: Proceedings of the 2021 International Conference on Multimedia Retrieval

Recently, multi-resolution neural networks, which combine features of different resolutions, have achieved good results in human pose estimation tasks. In this paper, we propose an attention-mechanism-based multi-resolution network, which adds an ...
A survey of human pose estimation

Summarization of methods on human pose estimation in recent years.Conclusion of the traditional human pose estimation methods.Illustrated based on a two-stage framework.Comprehensive comparisons are given based on the open source methods. Estimating ...
Human pose estimation via multi-layer composite models

We introduce a hierarchical part-based approach for human pose estimation in static images. Our model is a multi-layer composite of tree-structured pictorial-structure models, each modeling human pose at a different scale and with a different graphical ...

Comments

Information & Contributors

Information

Published In

IPMV '24: Proceedings of the 2024 6th International Conference on Image Processing and Machine Vision

January 2024

129 pages

ISBN:9798400708473

DOI:10.1145/3645259

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

IPMV 2024

IPMV 2024: 2024 6th International Conference on Image Processing and Machine Vision

January 12 - 14, 2024

Macau, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
71
Total Downloads

Downloads (Last 12 months)71
Downloads (Last 6 weeks)12

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Abstract

References

Recommendations

Human Pose Estimation based on Attention Multi-resolution Network

A survey of human pose estimation

Human pose estimation via multi-layer composite models

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Login options

Full Access

View options

PDF

eReader

HTML Format

Share

Share this Publication link

Share on social media

Affiliations