research-article

Depthwise Separable Convolutional Neural Networks for Pedestrian Attribute Recognition

Authors:

Imran N. Junejo,

Naveed AhmedAuthors Info & Claims

SN Computer Science, Volume 2, Issue 2

https://doi.org/10.1007/s42979-021-00493-z

Published: 14 February 2021 Publication History

Abstract

Video surveillance is ubiquitous. In addition to understanding various scene objects, extracting human visual attributes from the scene has attracted tremendous traction over the past many years. This is a challenging problem even for human observers. This is a multi-label problem, i.e., a subject in a scene can have multiple attributes that we are hoping to recognize, such as shoes types, clothing type, wearing some accessory, or carrying some object or not, etc. Solutions have been presented over the years and many researchers have employed convolutional neural networks (CNNs). In this work, we propose using Depthwise Separable Convolution Neural Network (DS-CNN) to solve the pedestrian attribute recognition problem. The network employs depthwise separable convolution layers (DSCL), instead of the regular 2D convolution layers. DS-CNN performs extremely well, especially with smaller datasets. In addition, with a compact network, DS-CNN reduces the number of trainable parameters while making learning efficient. We evaluated our method on two benchmark pedestrian datasets and results show improvements over the state of the art.

References

[1]

Raudies F and Neumann H Abio-inspired, motion-basedanalysisofcrowdbehavior attributes relevancetomotiontransparency,velocitygradients,andmotionpatterns PLoS ONE 2013 7 12 1-17

[2]

Rahman K, Ghani NA, Kamil AA, Mustafa A, and Chowdhury MAK Modellingpedestriantravel timeandthedesignoffacilities: a queuingapproach PLoS ONE 2013 8 1-11

[3]

Nanda A, Chauhan DS, Sa PK, and Bakshi S Illuminationand scaleinvariantrelevantvisualfeatureswith hypergraph-basedlearningformulti-shotperson re-identification Multimed Tools Appl 2019 78 4 3885-3910

[4]

Deng Y, Luo P, Loy CC, Tang X. Pedestrian attribute recognition at far distance. In: Proceedings of the 22nd ACM international conference on multimedia, MM’14; 2014, 789–792.

[5]

Li D, Zhang Z, Chen X, Ling H, Huang K. A richly annotated dataset for pedestrian attribute recognition. CoRR, vol. abs/1603.07054, 2016.

[6]

Lowe DG. Object recognition from local scale-invariant features. In: Proceedings of the seventh IEEE international conference on computer vision; 1999, vol. 2, pp. 1150–1157

[7]

Dalal N, Triggs B. Histograms of oriented gradients for human detection. In: 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05); 2005, vol. 1, pp. 886–893.

[8]

Viola P, Jones M. Robust real-time object detection. In: International journal of computer vision (IJCV); 2001, vol. 57.

[9]

Chollet F. Xception: deep learning with depthwise separable convolutions. In: 2017 IEEE conference on computer vision and pattern recognition (CVPR); 2017, pp. 1800–1807.

[10]

Hu Z, Youmin H, Liu J, Wu B, Han D, and Kurfess T 3dseparable convolutional neuralnetworkfordynamichandgesturerecognition Neurocomputing 2018 318 151-161

[11]

Gonda F, Wei D, Parag T, Pfister H. Parallel separable 3d convolution for video and volumetric data understanding. In: BMVC; 2018.

[12]

Hussein N, Gavves E, Smeulders AWM. Timeception for complex action recognition. In: IEEE conference on computer vision and pattern recognition, CVPR; 2019, 2019, pp. 254–263.

[13]

Junejo IN. A deep learning based multi-color space approach for pedestrian attribute recognition. In: Proceedings of the 2019 3rd international conference on graphics and signal processing; 2019, ICGSP’19, pp. 113–116, ACM.

[14]

Yang R, Luo B, Tang J, Wang X, Zheng S. Pedestrian attribute recognition: a survey. arXiv: 1901.07474 [preprint]. 2019.

[15]

Maji S, Berg AC, Malik J. Classification using intersection kernel support vector machines is efficient. In: 2008 IEEE conference on computer vision and pattern recognition; 2008, pp. 1–8

[16]

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T. Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia; 2014, MM’14.

[17]

Joo J, Wang S, Zhu S. Human attribute recognition by rich appearance dictionary. In: 2013 IEEE international conference on computer vision; 2013, pp. 721–8.

[18]

Bourdev L, Maji S, Malik J. Describing people: a poselet-based approach to attribute classification. In: 2011 international conference on computer vision, 2011, pp. 1543–50.

[19]

Zhao X, Sang L, Ding G, Han J, Di Na, and Yan C Recurrent attention model for pedestrian attribute recognition Proc AAAI Conf Artif Intell 2019 33 01 9275-9282

[20]

Zhu J, Liao S, Yi D, Lei Z, Li SZ. Multi-label CNN based pedestrian attribute learning for soft biometrics. In: 2015 international conference on biometrics (ICB); 2015, pp. 535–40.

[21]

Zhou Y, Yu K, Leng B, Zhang Z, Li D, Huang K. Weakly-supervised learning of mid-level features for pedestrian attribute recognition and localization In: British machine vision conference BMVC 4–7; 2017.

[22]

Chen Y, Duffner S, Stoian A, Dufour J-Y, Baskurt A. Pedestrian attribute recognition with part-based CNN and combined feature representations. In: Proceedings of the 13th international joint conference on computer vision, imaging and computer graphics theory and applications; 2018, pp. 114–22.

[23]

Liao S, Hu Y, Zhu X, Li SZ. Person re-identification by local maximal occurrence representation and metric learning. In: 2015 IEEE conference on computer vision and pattern recognition (CVPR); 2015, pp. 2197–206.

[24]

Li D, Chen X, Zhang Z, Huang K. Pose guided deep model for pedestrian attribute recognition in surveillance scenarios. In: 2018 IEEE international conference on multimedia and expo (ICME); 2018, pp. 1–6.

[25]

Liu P, Liu X, Yan J, Shao J. Localization guided learning for pedestrian attribute recognition. In: British machine vision conference 2018, BMVC 2018; 2018.

[26]

Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In: Proceedings of the 32nd international conference on international conference on machine learning—vol. 37; 2015, ICML’15, pp. 448–56.

[27]

Li Q, Zhao X, He R, Huang K. Visual-semantic graph reasoning for pedestrian attribute recognition. In: Proceedings of the AAAI conference on artificial intelligence, vol. 33, No. 01; 2019.

[28]

Sarfraz M, Schumann A, Wang Y, Stiefelhagen R. Deep view-sensitive pedestrian attribute inference in an end-to-end model. In: British machine vision conference (BMVC); 2017.

[29]

Sarfraz MS, Schumann A, Eberle A, Stiefelhagen R. A pose-sensitive embedding for person re-identification with expanded cross neighborhood re-ranking. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

[30]

An H, Fan H, Deng K, Hu H-M. Part-guided network for pedestrian attribute recognition. In: 2019 IEEE visual communications and image processing (VCIP), pp. 1–4, 2019.

[31]

Liu X, Zhao H, Tian M, Sheng L, Shao J, Yan J, Wang X. Hydraplus-net: attentive deep features for pedestrian analysis. In: Proceedings of the IEEE international conference on computer vision; 2017, pp. 1–9.

[32]

Sarafianos N, Xu X, Kakadiaris IA. Deep imbalanced attribute classification using visual attention aggregation. In: Springer European conference on computer vision; 2018, pp. 708–25.

[33]

Zhou B, Lapedriza A, Xiao J, Torralba A, Oliva A. Learning deep features for scene recognition using places database. In: Proceedings of the 27th international conference on neural information processing systems—vol. 1, MIT Press, Cambridge, MA, USA, 2014, NIPS’14, pp. 487–95.

[34]

Guo H, Fan X, and Wang S Human attributerecognitionbyrefiningattention heatmap Pattern Recognit Lett 2017 94 C 38-45

[35]

Li W, Zhu X, Gong S. Harmonious attention network for person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

[36]

Chang X, Hospedales TM, Xiang T. Multi-level factorisation net for person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

[37]

Wang J, Zhu X, Gong S, Li W. Transferable joint attribute-identity deep learning for unsupervised person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

[38]

Si J, Zhang H, Li C-G, Kuen J, Kong X, Kot AC, Wang G. Dual attention matching network for context-aware feature sequence based person re-identification. In: The IEEE conference on computer vision and pattern recognition (CVPR); 2018.

[39]

Qian X, Fu Y, Xiang T, Wang W, Qiu J, Wu Y, Jiang Y-G, Xue X. Pose-normalized image generation for person re-identification. In: The European conference on computer vision (ECCV); 2018.

[40]

Chikontwe P and Lee HJ Deep multi-task network for learning person identity and attributes IEEE Access 2018 6 60801-60811

[41]

Bekele E, Lawson W. The deeper, the better: analysis of person attributes recognition. In: 14th IEEE international conference on automatic face & gesture recognition, FG; 2019.

[42]

Li RHQ, Zhao X, Huang K. Visual-semantic graph reasoning for pedestrian attribute recognition. In: 33rd AAAI Conference on Artificial Intelligence, AAAI; 2019.

[43]

Zhao X, Sang L, Ding G, Han J, Di N, Yan C. Recurrent attention model for pedestrian attribute recognition. In: 33rd AAAI conference on artificial intelligence, AAAI; 2019.

[44]

Sudowe P, Spitzer H, Leibe B. Person attribute recognition with a jointly-trained holistic CNN model. In: 2015 IEEE international conference on computer vision workshop (ICCVW); 2015, pp. 329–337

[45]

Chollet F. keras; 2015.

[46]

Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition. In: CoRR, vol. abs/1409.1556; 2014.

Cited By

Lu WHu HYu JZhang SWang H(2023)Explicit State Representation Guided Video-based Pedestrian Attribute RecognitionACM Transactions on Intelligent Systems and Technology10.1145/362624015:1(1-24)Online publication date: 19-Dec-2023
https://dl.acm.org/doi/10.1145/3626240

Recommendations

Depthwise Separable Axial Asymmetric Wavelet Convolutional Neural Networks
Abstract
Reinterpreting wavelet multi-resolution analysis as CNN methods to endow them with the capacity for high-level semantic feature extraction has emerged as a research topic in deep sparse representations. We explore the fundamental operations of ...
Highlights
- Designed Depthwise Separable Axial Asymmetric Wavelet Convolution Block.
- Developed A periodic convolution padding scheme.
- Used wavelet features to develop a soft constraints.
Exploring attribute localization and correlation for pedestrian attribute recognition
Abstract
Pedestrian Attribute Recognition (PAR) is currently an emerging research topic in the field of video surveillance. For PAR, it usually needs to analyze dozens of attributes simultaneously, e.g., age, gender and Clothing type. However, ...
Applying depthwise separable and multi-channel convolutional neural networks of varied kernel size on semantic trajectories
Abstract
Convolutional neural networks (CNN) have become due to their outstanding performance in the past few years rapidly the standard approach when it comes to processing 2D data as these can be found in the image recognition and classification domain. ...

Comments

Information & Contributors

Information

Published In

cover image SN Computer Science

SN Computer Science Volume 2, Issue 2

Apr 2021

1008 pages

EISSN:2661-8907

Issue’s Table of Contents

© The Author(s), under exclusive licence to Springer Nature Singapore Pte Ltd. part of Springer Nature 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 14 February 2021

Accepted: 29 January 2021

Received: 11 March 2020

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lu WHu HYu JZhang SWang H(2023)Explicit State Representation Guided Video-based Pedestrian Attribute RecognitionACM Transactions on Intelligent Systems and Technology10.1145/362624015:1(1-24)Online publication date: 19-Dec-2023
https://dl.acm.org/doi/10.1145/3626240

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents