Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Explicit State Representation Guided Video-based Pedestrian Attribute Recognition

Published: 19 December 2023 Publication History

Abstract

The pedestrian attribute recognition aims to generate a structured description of pedestrians, which serves an important role in surveillance. Current works usually assume that the images and the specific pedestrian states, including pedestrian occlusion and pedestrian orientation, are given. However, we argue that the current works ignore the guidance of the pedestrian state and cannot achieve the appropriate performance since the appearance feature will become unreliable due to the variance of the pedestrian state, which is common in practice. Therefore, this paper proposes the Explicit State Representation (ExSR) Guided Pedestrian Attribute Recognition to improve the accuracy through state learning and attribute fusion among frames. Firstly, the pedestrian state is explicitly represented by concatenating the pedestrian orientation and occlusion, which can be accurately determined via analyzing the pose. Secondly, the state-aware pedestrian attribute fusion method is proposed and divided into two cases, namely the inter-state case and the intra-state case. In the intra-state case, the appearance feature will remain stable and the attribute relations are propagated to refine. The method of exploiting attribute relations within a single frame is the Graph Neural Network. In the inter-state case, the state changes, the attribute relationship propagation is prevented, and the advantages of attribute recognition in each frame are complemented to make a reliable judgment on the invisible region. The experimental results demonstrate that the ExSR outperforms the state-of-the-art methods on two public databases, benefiting from the explicit introduction of the state into the attribute recognition.

References

[1]
Haoran An, Hai-Miao Hu, Yuanfang Guo, Qianli Zhou, and Bo Li. 2020. Hierarchical reasoning network for pedestrian attribute recognition. IEEE Transactions on Multimedia 23 (2020), 268–280.
[2]
Zhiyuan Chen, Annan Li, and Yunhong Wang. 2019. A temporal attentive approach for video-based pedestrian attribute recognition. In Chinese Conference on Pattern Recognition and Computer Vision (PRCV). Springer, 209–220.
[3]
Zhao-Min Chen, Xiu-Shen Wei, Peng Wang, and Yanwen Guo. 2019. Multi-label image recognition with graph convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 5177–5186.
[4]
Yubin Deng, Ping Luo, Chen Change Loy, and Xiaoou Tang. 2014. Pedestrian attribute recognition at far distance. In Proceedings of the 22nd ACM International Conference on Multimedia. 789–792.
[5]
Haonan Fan, Hai-Miao Hu, Shuailing Liu, Weiqing Lu, and Shiliang Pu. 2020. Correlation graph convolutional network for pedestrian attribute recognition. IEEE Transactions on Multimedia (2020).
[6]
Keke He, Zhanxiong Wang, Yanwei Fu, Rui Feng, Yu-Gang Jiang, and Xiangyang Xue. 2017. Adaptively weighted multi-task deep network for person attribute classification. In Proceedings of the 25th ACM International Conference on Multimedia. 1636–1644.
[7]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[8]
Han Hu, Zheng Zhang, Zhenda Xie, and Stephen Lin. 2019. Local relation networks for image recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). 3464–3473.
[9]
Catalin Ionescu, Dragos Papava, Vlad Olaru, and Cristian Sminchisescu. 2013. Human3. 6M: Large scale datasets and predictive methods for 3D human sensing in natural environments. IEEE Transactions on Pattern Analysis and Machine Intelligence 36, 7 (2013), 1325–1339.
[10]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2012. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2012), 221–231.
[11]
Zhong Ji, Weixiong Zheng, and Yanwei Pang. 2017. Deep pedestrian attribute recognition based on LSTM. In 2017 IEEE International Conference on Image Processing (ICIP). IEEE, 151–155.
[12]
Jian Jia, Houjing Huang, Wenjie Yang, Xiaotang Chen, and Kaiqi Huang. 2020. Rethinking of pedestrian attribute recognition: Realistic datasets with efficient method. arXiv preprint arXiv:2005.11909 (2020).
[13]
Imran N. Junejo. 2021. Multi-branch Gabor wavelet layers for pedestrian attribute recognition. IEEE Access 9 (2021), 40019–40026.
[14]
Imran N. Junejo and Naveed Ahmed. 2021. Depthwise separable convolutional neural networks for pedestrian attribute recognition. SN Computer Science 2, 2 (2021), 1–11.
[15]
Neeraj Kumar, Alexander C. Berg, Peter N. Belhumeur, and Shree K. Nayar. 2009. Attribute and simile classifiers for face verification. In 2009 IEEE 12th International Conference on Computer Vision. IEEE, 365–372.
[16]
Dangwei Li, Xiaotang Chen, and Kaiqi Huang. 2015. Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios. In ACPR. 111–115.
[17]
Dangwei Li, Xiaotang Chen, Zhang Zhang, and Kaiqi Huang. 2018. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In 2018 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.
[18]
Dangwei Li, Zhang Zhang, Xiaotang Chen, and Kaiqi Huang. 2018. A richly annotated pedestrian dataset for person retrieval in real surveillance scenarios. IEEE Transactions on Image Processing 28, 4 (2018), 1575–1590.
[19]
Jianfeng Li, Junqiao Zhao, Shuangfu Song, and Tiantian Feng. 2021. Occlusion aware unsupervised learning of optical flow from video. In Thirteenth International Conference on Machine Vision, Vol. 11605. SPIE, 224–231.
[20]
Yining Li, Chen Huang, Chen Change Loy, and Xiaoou Tang. 2016. Human attribute recognition by deep hierarchical contexts. In European Conference on Computer Vision. Springer, 684–700.
[21]
Yutian Lin, Liang Zheng, Zhedong Zheng, Yu Wu, Zhilan Hu, Chenggang Yan, and Yi Yang. 2019. Improving person re-identification by attribute and identity learning. Pattern Recognition 95 (2019), 151–161.
[22]
Hao Liu, Jingjing Wu, Jianguo Jiang, Meibin Qi, and Bo Ren. 2018. Sequence-based person attribute recognition with joint CTC-Attention model. arXiv preprint arXiv:1811.08115 (2018).
[23]
Xihui Liu, Haiyu Zhao, Maoqing Tian, Lu Sheng, Jing Shao, Shuai Yi, Junjie Yan, and Xiaogang Wang. 2017. HydraPlus-Net: Attentive deep features for pedestrian analysis. In Proceedings of the IEEE International Conference on Computer Vision. 350–359.
[24]
Niall McLaughlin, Jesus Martinez Del Rincon, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1325–1334.
[25]
Li Mi and Zhenzhong Chen. 2020. Hierarchical graph attention network for visual relationship detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13886–13895.
[26]
Mahnaz Moghaddam, Mostafa Charmi, and Hossein Hassanpoor. 2021. Jointly human semantic parsing and attribute recognition with feature pyramid structure in EfficientNets. IET Image Processing 15, 10 (2021), 2281–2291.
[27]
Khoi Pham, Kushal Kafle, Zhe Lin, Zhihong Ding, Scott Cohen, Quan Tran, and Abhinav Shrivastava. 2021. Learning to predict visual attributes in the wild. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 13018–13028.
[28]
Ergys Ristani, Francesco Solera, Roger Zou, Rita Cucchiara, and Carlo Tomasi. 2016. Performance measures and a data set for multi-target, multi-camera tracking. In European Conference on Computer Vision. Springer, 17–35.
[29]
Nikolaos Sarafianos, Theodoros Giannakopoulos, Christophoros Nikou, and Ioannis A. Kakadiaris. 2018. Curriculum learning of visual attribute clusters for multi-task classification. Pattern Recognition 80 (2018), 94–108.
[30]
Nikolaos Sarafianos, Xiang Xu, and Ioannis A. Kakadiaris. 2018. Deep imbalanced attribute classification using visual attention aggregation. In Proceedings of the European Conference on Computer Vision (ECCV). 680–697.
[31]
Xiangpeng Song, Hongbin Yang, and Congcong Zhou. 2019. Pedestrian attribute recognition with graph convolutional network in surveillance scenarios. Future Internet 11, 11 (2019), 245.
[32]
Andreas Specker, Arne Schumann, and Jürgen Beyerer. 2020. An evaluation of design choices for pedestrian attribute recognition in video. In 2020 IEEE International Conference on Image Processing (ICIP). IEEE, 2331–2335.
[33]
Patrick Sudowe, Hannah Spitzer, and Bastian Leibe. 2015. Person attribute recognition with a jointly-trained holistic CNN model. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 87–95.
[34]
Ilya Sutskever, Oriol Vinyals, and Quoc V. Le. 2014. Sequence to sequence learning with neural networks. Advances in Neural Information Processing Systems 27 (2014).
[35]
Zichang Tan, Yang Yang, Jun Wan, Guodong Guo, and Stan Z Li. 2020. Relation-aware pedestrian attribute recognition with graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34. 12055–12062.
[36]
Chufeng Tang, Lu Sheng, Zhaoxiang Zhang, and Xiaolin Hu. 2019. Improving pedestrian attribute recognition with weakly-supervised multi-scale attribute-specific localization. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4997–5006.
[37]
Xiao Wang, Shaofei Zheng, Rui Yang, Aihua Zheng, Zhe Chen, Jin Tang, and Bin Luo. 2022. Pedestrian attribute recognition: A survey. Pattern Recognition 121 (2022), 108220.
[38]
Yu Wu, Yutian Lin, Xuanyi Dong, Yan Yan, Wanli Ouyang, and Yi Yang. 2018. Exploit the unknown gradually: One-shot video-based person re-identification by stepwise learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5177–5186.
[39]
Jia Xu and HongBo Yang. 2018. Identification of pedestrian attributes based on video sequence. In 2018 IEEE International Conference on Advanced Manufacturing (ICAM). IEEE, 467–470.
[40]
Chunfeng Yao, Bailan Feng, Defeng Li, and Jian Li. 2017. Hierarchical pedestrian attribute recognition based on adaptive region localization. In 2017 IEEE International Conference on Multimedia & Expo Workshops (ICMEW). IEEE, 471–476.
[41]
Haitian Zeng, Haizhou Ai, Zijie Zhuang, and Long Chen. 2020. Multi-task learning via co-attentive sharing for pedestrian attribute recognition. In 2020 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 1–6.
[42]
Junjie Zhang, Qi Wu, Chunhua Shen, Jian Zhang, and Jianfeng Lu. 2018. Multilabel image classification with regional latent semantic dependencies. IEEE Transactions on Multimedia 20, 10 (2018), 2801–2813.
[43]
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. MARS: A video benchmark for large-scale person re-identification. In European Conference on Computer Vision. Springer, 868–884.
[44]
Liang Zheng, Liyue Shen, Lu Tian, Shengjin Wang, Jingdong Wang, and Qi Tian. 2015. Scalable person re-identification: A benchmark. In Proceedings of the IEEE International Conference on Computer Vision. 1116–1124.

Index Terms

  1. Explicit State Representation Guided Video-based Pedestrian Attribute Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Intelligent Systems and Technology
    ACM Transactions on Intelligent Systems and Technology  Volume 15, Issue 1
    February 2024
    533 pages
    EISSN:2157-6912
    DOI:10.1145/3613503
    • Editor:
    • Huan Liu
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 19 December 2023
    Online AM: 26 October 2023
    Accepted: 24 August 2023
    Revised: 11 July 2023
    Received: 08 August 2022
    Published in TIST Volume 15, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Video-based pedestrian attribute recognition
    2. pedestrian explicit state
    3. graph convolution network

    Qualifiers

    • Research-article

    Funding Sources

    • “Pioneer” and “eading Goose” R&D Program of Zhejiang
    • National Natural Science Foundation of China
    • Fundamental Research Funds for the Central Universities

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 299
      Total Downloads
    • Downloads (Last 12 months)208
    • Downloads (Last 6 weeks)37
    Reflects downloads up to 03 Feb 2025

    Other Metrics

    Citations

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media