research-article

Class-level Aware Network for Human Parsing

Authors:

Yuan XiaoAuthors Info & Claims

CNIOT '21: Proceedings of the 2021 2nd International Conference on Computing, Networks and Internet of Things

Article No.: 37, Pages 1 - 6

https://doi.org/10.1145/3468691.3468733

Published: 07 August 2021 Publication History

Abstract

Having shown great performance in human parsing, convolutional neural networks(CNNs) come with much computation budget. In this paper, a novel class-level aware network(CANet), which employs an asymmetric encoder-decoder architecture, is presented to achieve reliable human parsing results in a memory friendly way. To achieve the trade-off between speed and accuracy in human parsing, we design group-split-bottleneck(GS-bt) block, where group convolution and channel split are utilized in the residual block. In decoder network, the attention pyramid pooling module(APPM) is proposed to recovering the details of human parsing. Moreover, a multi-class classification branch is developed to extract class-level information and revise human parsing results. Compared to current models, our model has less parameters and experiments demonstrate that the proposed CANet can reach state-of-the-art results on PASCAL-Person-Part dataset.

References

[1]

I. Sutskever A. Krizhevsky and G. E. Hinton.2012. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems(NIPS).

[2]

Vijay Badrinarayanan, Alex Kendall, and Roberto Cipolla. 2017. SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 12(2017), 2481–2495. https://doi.org/10.1109/TPAMI.2016.2644615

[3]

L. Chen, Y. Yang, J. Wang, W. Xu, and A. L. Yuille. 2016. Attention to Scale: Scale-Aware Semantic Image Segmentation. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3640–3649. https://doi.org/10.1109/CVPR.2016.396

[4]

Liang-Chieh Chen, Yi Yang, Jiang Wang, Wei Xu, and Alan L. Yuille. 2016. Attention to Scale: Scale-Aware Semantic Image Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, and Hartwig Adam. 2018. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV). 801–818.

Digital Library

[6]

Xianjie Chen, Roozbeh Mottaghi, Xiaobai Liu, Sanja Fidler, Raquel Urtasun, and Alan Yuille. 2014. Detect What You Can: Detecting and Representing Objects using Holistic Models and Body Parts. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1971–1978.

Digital Library

[7]

Francois Chollet. 2017. Xception: Deep Learning With Depthwise Separable Convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1251–1258.

[8]

Terrance Devries and Graham W. Taylor. 2017. Improved Regularization of Convolutional Neural Networks with Cutout. CoRR abs/1708.04552(2017). arxiv:1708.04552

[9]

Hao-Shu Fang, Guansong Lu, Xiaolin Fang, Jianwen Xie, Yu-Wing Tai, and Cewu Lu. 2018. Weakly and Semi Supervised Human Body Part Parsing via Pose-Guided Knowledge Transfer. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]

Alberto Garcia-Garcia, Sergio Orts-Escolano, Sergiu Oprea, Victor Villena-Martinez, and José García Rodríguez. 2017. A Review on Deep Learning Techniques Applied to Semantic Segmentation. CoRR abs/1704.06857(2017). arxiv:1704.06857

[11]

Ke Gong, Xiaodan Liang, Yicheng Li, Yimin Chen, Ming Yang, and Liang Lin. 2018. Instance-level human parsing via part grouping network. In Proceedings of the European Conference on Computer Vision (ECCV). 770–785.

Digital Library

[12]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778.

[13]

Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. CoRR abs/1704.04861(2017). arxiv:1704.04861

[14]

Gao Huang, Zhuang Liu, Laurens van der Maaten, and Kilian Q. Weinberger. 2017. Densely Connected Convolutional Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2261–2269. https://doi.org/10.1109/CVPR.2017.243

[15]

Simon Jégou, Michal Drozdzal, David Vázquez, Adriana Romero, and Yoshua Bengio. 2017. The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 11–19. https://doi.org/10.1109/CVPRW.2017.156

[16]

Fu Jun, Liu Jing, Tian Haijie, Li Yong, Bao Yongjun, Fang Zhiwei, and Lu Hanqing. 2019. Dual attention network for scene segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3146–3154.

[17]

Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In Proceedings of 3rd International Conference on Learning Representations (ICLR).

[18]

Alex Krizhevsky, Geoffrey Hinton, 2009. Learning multiple layers of features from tiny images. (2009).

[19]

Philipp Krähenbühl and Vladlen Koltun. 2011. Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. In Advances in Neural Information Processing Systems(NIPS), Vol. 24. 109–117.

[20]

Jogendra Nath Kundu, Gaurav Singh Rajput, and R. Venkatesh Babu. 2020. VRT-Net: Real-Time Scene Parsing via Variable Resolution Transform. In Proceedings of the IEEE Winter Conference on Applications of Computer Vision (WACV). 2038–2045.

[21]

Peike Li, Yunqiu Xu, Yunchao Wei, and Yi Yang. 2019. Self-Correction for Human Parsing. arXiv preprint arXiv:1910.09777(2019).

[22]

Xiaodan Liang, Ke Gong, Xiaohui Shen, and Liang Lin. 2018. Look into Person: Joint Body Parsing and Pose Estimation Network and a New Benchmark. IEEE Transactions on Pattern Analysis and Machine Intelligence 41, 4(2018), 871–885.

Digital Library

[23]

Xiaodan Liang, Xiaohui Shen, Donglai Xiang, Jiashi Feng, Liang Lin, and Shuicheng Yan. 2016. Semantic object parsing with local-global long short-term memory. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3185–3193.

[24]

Guosheng Lin, Fayao Liu, A. Milan, Chunhua Shen, and I. Reid. 2020. RefineNet: Multi-Path Refinement Networks for Dense Prediction. IEEE Transactions on Pattern Analysis and Machine Intelligence 42 (2020), 1228–1242.

[25]

Fausto Milletari, Nassir Navab, and Seyed-Ahmad Ahmadi. 2016. V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation. In Fourth International Conference on 3D Vision. 565–571. https://doi.org/10.1109/3DV.2016.79

[26]

Adam Paszke, Abhishek Chaurasia, Sangpil Kim, and Eugenio Culurciello. 2016. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation. arXiv:1606.02147 (2016).

[27]

Eduardo Romera, Jose M. Alvarez, Luis M. Bergasa, and Roberto Arroyo. 2018. ERFNet: Efficient Residual Factorized ConvNet for Real-Time Semantic Segmentation. IEEE Transactions on Intelligent Transportation Systems 19, 1(2018), 263–272. https://doi.org/10.1109/TITS.2017.2750080

[28]

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. 2015. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention(MICCAI), Vol. 9351. 234–241. https://doi.org/10.1007/978-3-319-24574-4_28

[29]

Tao Ruan, Ting Liu, Zilong Huang, Yunchao Wei, Shikui Wei, and Yao Zhao. 2019. Devil in the details: Towards accurate single and multiple human parsing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33. 4814–4821.

Digital Library

[30]

E. Shelhamer, J. Long, and T. Darrell. 2017. Fully Convolutional Networks for Semantic Segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence 39, 4(2017), 640–651.

Digital Library

[31]

K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 1409.1556 (09 2014).

[32]

C. Szegedy, Wei Liu, Yangqing Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2015. Going deeper with convolutions. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1–9. https://doi.org/10.1109/CVPR.2015.7298594

[33]

Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. 2017. Residual Attention Network for Image Classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3156–3164.

[34]

Changqian Yu, Jingbo Wang, Chao Peng, Changxin Gao, Gang Yu, and Nong Sang. 2018. Learning a discriminative feature network for semantic segmentation. In Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). 1857–1866.

[35]

Xiangyu Zhang, Xinyu Zhou, Mengxiao Lin, and Jian Sun. 2018. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 6848–6856.

[36]

Hengshuang Zhao, Xiaojuan Qi, Xiaoyong Shen, Jianping Shi, and Jiaya Jia. 2018. ICNet for Real-Time Semantic Segmentation on High-Resolution Images. In Proceedings of the European Conference on Computer Vision (ECCV). 405–420.

Digital Library

[37]

Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, and Jiaya Jia. 2017. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2881–2890.

Cited By

Michieli UZanuttigh P(2022)Edge-Aware Graph Matching Network for Part-Based Semantic SegmentationInternational Journal of Computer Vision10.1007/s11263-022-01671-z130:11(2797-2821)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s11263-022-01671-z

Index Terms

Class-level Aware Network for Human Parsing
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
    2. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Mask-Guided Deformation Adaptive Network for Human Parsing
Due to the challenges of densely compacted body parts, nonrigid clothing items, and severe overlap in crowd scenes, human parsing needs to focus more on multilevel feature representations compared to general scene parsing tasks. Based on this observation, ...
Deep Human Parsing with Active Template Regression
In this work, the human parsing task, namely decomposing a human image into semantic fashion/body regions, is formulated as an active template regression (ATR) problem, where the normalized mask of each fashion/body item is expressed as the linear ...
Graph-Based Scale-Aware Network for Human Parsing
Pattern Recognition and Computer Vision
Abstract
Recent work has made considerable progress in exploring contextual information for human parsing with the Fully Convolutional Network framework. However, there still exist two challenges: (1) inherent relative relationships between parts; (2) ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

CNIOT '21: Proceedings of the 2021 2nd International Conference on Computing, Networks and Internet of Things

May 2021

270 pages

ISBN:9781450389693

DOI:10.1145/3468691

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 August 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CNIOT2021

CNIOT2021: 2021 2nd International Conference on Computing, Networks and Internet of Things

May 20 - 22, 2021

Beijing, China

Acceptance Rates

Overall Acceptance Rate 39 of 82 submissions, 48%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
34
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 23 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Michieli UZanuttigh P(2022)Edge-Aware Graph Matching Network for Part-Based Semantic SegmentationInternational Journal of Computer Vision10.1007/s11263-022-01671-z130:11(2797-2821)Online publication date: 1-Nov-2022
https://dl.acm.org/doi/10.1007/s11263-022-01671-z

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents