Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition

Yang Li; Huahu Xu; Minjie Bian; Junsheng Xiao

doi:10.3390/s20030811

Attention Based CNN-ConvLSTM for Pedestrian Attribute Recognition

Sensors (Basel). 2020 Feb 3;20(3):811. doi: 10.3390/s20030811.

Authors

Yang Li^{1

2}, Huahu Xu^{1

3}, Minjie Bian³, Junsheng Xiao¹

Affiliations

¹ School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China.
² School of Information Technology, Shanghai Jianqiao University, Shanghai 201306, China.
³ Information Office, Shanghai University, Shanghai 200444, China.

Abstract

As a result of its important role in video surveillance, pedestrian attribute recognition has become an attractive facet of computer vision research. Because of the changes in viewpoints, illumination, resolution and occlusion, the task is very challenging. In order to resolve the issue of unsatisfactory performance of existing pedestrian attribute recognition methods resulting from ignoring the correlation between pedestrian attributes and spatial information, in this paper, the task is regarded as a spatiotemporal, sequential, multi-label image classification problem. An attention-based neural network consisting of convolutional neural networks (CNN), channel attention (CAtt) and convolutional long short-term memory (ConvLSTM) is proposed (CNN-CAtt-ConvLSTM). Firstly, the salient and correlated visual features of pedestrian attributes are extracted by pre-trained CNN and CAtt. Then, ConvLSTM is used to further extract spatial information and correlations from pedestrian attributes. Finally, pedestrian attributes are predicted with optimized sequences based on attribute image area size and importance. Extensive experiments are carried out on two common pedestrian attribute datasets, PEdesTrian Attribute (PETA) dataset and Richly Annotated Pedestrian (RAP) dataset, and higher performance than other state-of-the-art (SOTA) methods is achieved, which proves the superiority and validity of our method.

Keywords: channel attention (CAtt); conventional long short-term memory (ConvLSTM); convolutional neutral networks (CNN); multi-label classification; pedestrian attribute recognition.

MeSH terms

Algorithms
Attention / physiology*
Biometric Identification
Humans
Image Processing, Computer-Assisted
Memory, Long-Term / physiology
Neural Networks, Computer
Pattern Recognition, Automated / methods*
Pedestrians*
Recognition, Psychology / physiology*
Video Recording

Abstract

MeSH terms

Grants and funding