Abstract
People identification in video based on the way they walk (i.e., gait) is a relevant task in computer vision using a noninvasive approach. Standard and current approaches typically derive gait signatures from sequences of binary energy maps of subjects extracted from images, but this process introduces a large amount of non-stationary noise, thus conditioning their efficacy. In contrast, in this paper we focus on the raw pixels, or simple functions derived from them, letting advanced learning techniques to extract relevant features. Therefore, we present a comparative study of different convolutional neural network (CNN) architectures by using three different modalities (i.e., gray pixels, optical flow channels and depth maps) on two widely adopted and challenging datasets: TUM-GAID and CASIA-B. In addition, we perform a comparative study between different early and late fusion methods used to combine the information obtained from each kind of modalities. Our experimental results suggest that (1) the raw pixel values represent a competitive input modality, compared to the traditional state-of-the-art silhouette-based features (e.g., GEI), since equivalent or better results are obtained; (2) the fusion of the raw pixel information with information from optical flow and depth maps allows to obtain state-of-the-art results on the gait recognition task with an image resolution several times smaller than the previously reported results; and (3) the selection and the design of the CNN architecture are critical points that can make a difference between state-of-the-art results or poor ones.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Ahmadi N, Akbarizadeh G (2018) Iris tissue recognition based on GLDM feature extraction and hybrid MLPNN-ICA classifier. Neural Comput Appl. https://doi.org/10.1007/s00521-018-3754-0
Zeng F, Hu S (2019) Xiao K (2018) Research on partial fingerprint recognition algorithm based on deep learning. Neural Comput Appl 31:4789–4798. https://doi.org/10.1007/s00521-018-3609-8
Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Underst 104:90–126
Turaga P, Chellappa R, Subrahmanian VS, Udrea O (2008) Machine recognition of human activities: a survey. IEEE Trans Circuits Syst Video Technol 18(11):1473–1488
Soomro K, Zamir AR, Shah M (2012) UCF101: a dataset of 101 human action classes from videos in the wild. In: CRCV-TR-12-01
Hu W, Tan T, Wang L, Maybank S (2004) A survey on visual surveillance of object motion and behaviors. IEEE Trans Systems Man Cybern C Appl Rev 34(3):334–352
Han J, Bhanu B (2006) Individual recognition using gait energy image. IEEE Trans Pattern Anal Mach Intell 28(2):316–322
Wu Z, Huang Y, Wang L, Wang X, Tan T (2017) A comprehensive study on cross-view gait based human identification with deep CNNs. IEEE Trans Pattern Anal Mach Intell 39(2):209–226
Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097–1105
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 770–778
Goodfellow I, Bengio Y, Courville A (2016) Deep learning. MIT Press, Cambridge
Marín-Jiménez M, de la Blanca NP, Mendoza M, Lucena M, Fuertes J (2009) Learning action descriptors for recognition. In: WIAMIS 2009, vol 0, London, UK. IEEE Computer Society, pp 5–8
Marín-Jiménez MJ, De La Blanca NP, Mendoza MA (2010) RBM-based silhouette encoding for human action modelling. In: Proceedings of the international conference on pattern recognition. IEEE, pp 979–982
Castro FM, Marín-Jiménez MJ, Guil N, Schmid C, Alahari K (2018) End-to-end incremental learning. In: Proceedings of the European conference on computer vision (ECCV), pp 233–248
de Jesús RJ (2017a) Stable Kalman filter and neural network for the chaotic systems identification. J Frankl Inst 354(16):7444–7462
de Jesús RJ (2017b) Usnfis: uniform stable neuro fuzzy inference system. Neurocomputing 262:57–66
de Jesús RJ (2009) Sofmls: online self-organizing fuzzy modified least-squares network. IEEE Trans Fuzzy Syst 17(6):1296–1309
Liu B, Ding Z, Lv C (2019) Distributed training for multi-layer neural networks by consensus. IEEE Trans Neural Netw Learn Syst. https://doi.org/10.1109/TNNLS.2019.2921926
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. CoRR arXiv:1409.1556
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision, Springer, pp 818–833
Le QV, Zou WY, Yeung SY, Ng AY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3361–3368
Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 1725–1732
Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In: Advances in neural information processing systems, pp 568–576
Donahue J, Anne Hendricks L, Guadarrama S, Rohrbach M, Venugopalan S, Saenko K, Darrell T (2015) Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 2625–2634
Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory-pooled deep-convolutional descriptors. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 4305–4314
Perronnin F, Larlus D (2015) Fisher vectors meet neural networks: a hybrid classification architecture. In: Proceedings of the IEEE conference on computer vision and pattern recognition (CVPR), pp 3743–3752
Hossain E, Chetty G (2013) Multimodal feature learning for gait biometric based human identity recognition. In: Neural information processing, pp 721–728
Wu Z, Huang Y, Wang L (2015) Learning representative deep features for image set analysis. IEEE Trans Multimed 17(11):1960–1968
Gálai B, Benedek C (2015) Feature selection for lidar-based gait recognition. In: 2015 International workshop on computational intelligence for multimedia understanding (IWCIM), pp 1–5
Alotaibi M, Mahmood A (2015) Improved gait recognition based on specialized deep convolutional neural networks. In: IEEE applied imagery pattern recognition workshop (AIPR), pp 1–7
Takemura N, Makihara Y, Muramatsu D, Echigo T, Yagi Y (2017) On input/output architectures for convolutional neural network-based cross-view gait recognition. IEEE Trans Circuits Syst Video Technol. https://doi.org/10.1109/TCSVT.2017.2760835
He Y, Zhang J, Shan H, Wang L (2019) Multi-task gans for view-specific feature learning in gait recognition. IEEE Trans Inf Forensics Secur 14(1):102–113
Castro FM, Marín-Jiménez MJ, Guil N, Pérez de la Blanca N (2017a) Automatic learning of gait signatures for people identification. In: Advances in Computational intelligence: 14th international work-conference on artificial neural networks (IWANN), pp 257–270
Castro FM, Marín-Jiménez MJ, Guil N, López-Tapia S, de la Blanca NP (2017b) Evaluation of CNN architectures for gait recognition based on optical flow maps. In: BIOSIG, pp 251–258
Marín-Jiménez MJ, Castro FM, Guil N, de la Torre F, Medina-Carnicer R (2017) Deep multi-task learning for gait-based biometrics. In: 2017 IEEE international conference on image processing (ICIP). IEEE, pp 106–110
Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of the international conference on computer vision (ICCV), pp 4489–4497
Wolf T, Babaee M, Rigoll G (2016) Multi-view gait recognition using 3D convolutional neural networks. In: Proceedings of the IEEE international conference on image processing, pp 4165–4169
Mansimov E, Srivastava N, Salakhutdinov R (2015) Initialization strategies of spatio-temporal convolutional neural networks. CoRR arXiv:1503.07274
Holden D, Saito J, Komura T, Joyce T (2015) Learning motion manifolds with convolutional autoencoders. In: SIGGRAPH Asia 2015 Technical Briefs, p 18
Neverova N, Wolf C, Lacey G, Fridman L, Chandra D, Barbello B, Taylor G (2016) Learning human identity from motion patterns. IEEE Access 4:1810–1820
Delgado-Escaño R, Castro FM, Cózar JR, Marín-Jiménez MJ, Guil N (2019) An end-to-end multi-task and fusion CNN for inertial-based gait recognition. IEEE Access 7:1897–1908
Atrey PK, Hossain MA, El Saddik A, Kankanhalli MS (2010) Multimodal fusion for multimedia analysis: a survey. Multimed Syst 16(6):345–379
Wu S (2009) Applying statistical principles to data fusion in information retrieval. Expert Syst Appl 36(2):2997–3006
Chai Y, Ren J, Zhao H, Li Y, Ren J, Murray P (2015) Hierarchical and multi-featured fusion for effective gait recognition under variable scenarios. Pattern Anal Applic 19:905–917. https://doi.org/10.1007/s10044-015-0471-5
Hofmann M, Geiger J, Bachmann S, Schuller B, Rigoll G (2014) The TUM gait from audio, image and depth (gaid) database: multimodal recognition of subjects and traits. J Vis Commun Image Represent 25(1):195–206
Castro FM, Marín-Jiménez, Guil N (2015) Empirical study of audio-visual features fusion for gait recognition. In: Proceedings of the international conference on computer analysis of images and patterns, pp 727–739
Castro FM, Marín-Jiménez MJ, Guil N (2016) Multimodal features fusion for gait, gender and shoes recognition. Mach Vis Appl 27(8):1213–1228
Eitel A, Springenberg JT, Spinello L, Riedmiller M, Burgard W (2015) Multimodal deep learning for robust RGB-D object recognition. In: Proceedings of the IEEE/RSJ conference on intelligent robots and systems. IEEE, pp 681–687
Wang A, Lu J, Cai J, Cham TJ, Wang G (2015) Large-margin multi-modal deep learning for RGB-D object recognition. IEEE Trans Multimed 17(11):1887–1898
Sivapalan S, Chen D, Denman S, Sridharan S, Fookes C (2011) Gait energy volumes and frontal gait recognition using depth images. In: 2011 international joint conference on biometrics (IJCB). IEEE, pp 1–6
Ji S, Xu W, Yang M, Yu K (2012) 3d convolutional neural networks for human action recognition. IEEE Trans Pattern Anal Mach Intell 35(1):221–231
Bengio Y, Louradour J, Collobert R, Weston J (2009) Curriculum learning. In: Proceedings of the 26th annual international conference on machine learning, ACM, pp 41–48
Vedaldi A, Lenc K (2015) Matconvnet: convolutional neural networks for Matlab. In: Proceedings of the 23rd ACM international conference on Multimedia, ACM, pp 689–692
Chetlur S, Woolley C, Vandermersch P, Cohen J, Tran J, Catanzaro B, Shelhamer E (2014) cuDNN: efficient primitives for deep learning. CoRR arxiv:1410.0759
Yu S, Tan D, Tan T (2006) A framework for evaluating the effect of view angle, clothing and carrying condition on gait recognition. Proc Int Conf Pattern Recognit 4:441–444
Farnebäck G (2003) Two-frame motion estimation based on polynomial expansion. In: Proceedings of scandinavian conference on image analysis, vol 2749, pp 363–370
Bradski G (2000) OpenCV library. Dr Dobb’s J Softw Tools 25:120–125
KaewTraKulPong P, Bowden R (2002) An improved adaptive background mixture model for real-time tracking with shadow detection. In: Jones GA, Paragios N, Regazzoni CS (eds) Video-based surveillance systems. Springer, Berlin, pp 135–144
Barnich O, Droogenbroeck MV (2009) Frontal-view gait recognition by intra- and inter-frame rectangle size distribution. Pattern Recognit Lett 30(10):893–901
Castro FM, Marín-Jiménez M, Guil Mata N, Muñoz Salinas R (2017) Fisher motion descriptor for multiview gait recognition. Int J Pattern Recognit Artif Intell 31(1):1756002
Zeng W, Wang C, Yang F (2014) Silhouette-based gait recognition via deterministic learning. Pattern Recognit 47(11):3568–3584
Whytock T, Belyaev A, Robertson N (2014) Dynamic distance-based shape features for gait recognition. J Math Imaging Vis 50(3):314–326
Guan Y, Li CT (2013) A robust speed-invariant gait recognition system for walker and runner identification. In: IEEE international conference on biometrics (ICB), pp 1–8
Chen X, Weng J, Lu W, Xu J (2018) Multi-gait recognition based on attribute discovery. IEEE Trans Pattern Anal Mach Intell 40(7):1697–1710
Hu M, Wang Y, Zhang Z, Zhang D, Little J (2013) Incremental learning for video-based gait recognition with LBP flow. IEEE Trans Cybern 43(1):77–89
Wang C, Zhang J, Wang L, Pu J, Yuan X (2012) Human identification using temporal information preserving gait template. IEEE Trans Pattern Anal Mach Intell 34(11):2164–2176
Li W, Kuo CCJ, Peng J (2018) Gait recognition via gei subspace projections and collaborative representation classification. Neurocomputing 275:1932–1945
Acknowledgements
This work has been funded by project TIC-1692 (Junta de Andalucía). We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan X Pascal GPU used for this research. Portions of the research in this paper use the CASIA Gait Database collected by Institute of Automation, Chinese Academy of Sciences.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
This work has been founded by a research project of Junta de Andalucía, Spain. Moreover, Francisco M. Castro and Nicolás Guil are working for the University of Málaga, Manuel J. Marín-Jiménez is working for the University of Córdoba, and Nicolás Pérez de la Blanca is working for the University of Granada.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Castro, F.M., Marín-Jiménez, M.J., Guil, N. et al. Multimodal feature fusion for CNN-based gait recognition: an empirical comparison. Neural Comput & Applic 32, 14173–14193 (2020). https://doi.org/10.1007/s00521-020-04811-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00521-020-04811-z