research-article

Vital information is only worth one thumbnail: : Towards efficient human pose estimation

Authors:

Mingli DingAuthors Info & Claims

Volume 147, Issue C

https://doi.org/10.1016/j.patcog.2023.110111

Published: 04 March 2024 Publication History

Abstract

In pursuit of impressive performance, existing DCNN-based approaches of human pose estimation usually use massive networks and large-size images to train a deep model. When applying these deep based methods in real-time systems, current works try to compress the deep network by reducing the number of layers and channels, but such approaches are complex and poorly generalized since they require elaborate design of small-scale network structures. Based on the fact that large-size images contain redundant information, in this paper, we explore the influence of image-size on system complexity and propose a novel framework called ThumbPose to accelerate and compress deep models by inferring on thumbnail representations in the task of human pose estimation. In our framework, we first propose a style supervised online downscaler to reduce an input image into a thumbnail image. Furthermore, a training strategy of dual-branch auto-encoding is designed to obtain effective and accurate thumbnail representation in a knowledge distillation manner, which is further used to maintain the performance of thumbnail images as the original-size input images. For heat-map based human pose estimation, ThumbPose is an orthogonal and implementation-friendly method, that can not only compress and accelerate the inference network but also obtain an image downscaler in a supervised manner that can be used in other high-level tasks (e.g. detection, segmentation, etc. in practical applications). Extensive experiments on MS COCO dataset demonstrate the effectiveness of our proposed method, and ThumbPose achieves superior performance (＋ 1.3% AP and ＋ 0.7% AR) with negligible additional cost (<0.2 GFLOPs) compared to previous state-of-the-art methods when using small-size images as inputs. Moreover, experiments on MPII show that our model achieves higher accuracy (＋ 0.2% [email protected]) with minimal computation (2.5 GFLOPs) compared to superior lightweight models obtained by the network compression methods.

Highlights

•

A novel framework is proposed for efficient human pose estimation.

•

A style supervised online downscaler is designed to compress input images to thumbnail ones.

•

A dual-branch auto-encoding strategy is designed for refining the thumbnail representation.

•

We obtain SOTA performance on COCO and MPII datasets with low computational costs.

References

[1]

Duan H., Zhao Y., Chen K., Shao D., Lin D., Dai B., Revisiting skeleton-based action recognition, 2021, arXiv preprint arXiv:2104.13586.

Abstract

Highlights

References

Index Terms

Recommendations

ThumbNet: One Thumbnail Image Contains All You Need for Recognition

Human pose estimation via multi-layer composite models

In-bed human pose estimation using multi-source information fusion for health monitoring in real-world scenarios

Comments

Information

Published In

Publisher

Publication History

Author Tags

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

View options

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations