Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Published: 24 January 2019 Publication History

Abstract

Person re-identification aims at identifying a certain pedestrian across non-overlapping multi-camera networks in different time and places. Existing person re-identification approaches mainly focus on matching pedestrians on images; however, little attention has been paid to re-identify pedestrians in videos. Compared to images, video clips contain motion patterns of pedestrians, which is crucial to person re-identification. Moreover, consecutive video frames present pedestrian appearance with different body poses and from different viewpoints, providing valuable information toward addressing the challenge of pose variation, occlusion, and viewpoint change, and so on. In this article, we propose a Dense 3D-Convolutional Network (D3DNet) to jointly learn spatio-temporal and appearance representation for person re-identification in videos. The D3DNet consists of multiple three-dimensional (3D) dense blocks and transition layers. The 3D dense blocks enlarge the receptive fields of visual neurons in both spatial and temporal dimensions, leading to discriminative appearance representation as well as short-term and long-term motion patterns of pedestrians without the requirement of an additional motion estimation module. Moreover, we formulate a loss function consisting of an identification loss and a center loss to minimize intra-class variance and maximize inter-class variance simultaneously, toward addressing the challenge of large intra-class variance and small inter-class variance. Extensive experiments on two real-world video datasets of person identification, i.e., MARS and iLIDS-VID, have shown the effectiveness of the proposed approach.

References

[1]
Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, and Xiaogang Wang. 2018. Group consistent similarity learning via deep CRF for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8649--8658.
[2]
Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1268--1277.
[3]
Dapeng Chen, Zejian Yuan, Gang Hua, Nanning Zheng, and Jingdong Wang. 2015. Similarity learning on an explicit polynomial kernel feature map for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1565--1573.
[4]
Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2.
[5]
Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. 2007. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine learning. ACM, 209--216.
[6]
Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2360--2367.
[7]
Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 3.
[8]
Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 221--231.
[9]
Yifan Jiao, Zhetao Li, Shucheng Huang, Xiaoshan Yang, Bin Liu, and Tianzhu Zhang. 2018. 3D attention-based deep ranking model for video highlight detection. IEEE Trans. Multimedia 20, 10 (2018), 2693–2705.
[10]
Srikrishna Karanam, Yang Li, and Richard J Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the IEEE International Conference on Computer Vision. 4516--4524.
[11]
Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In Proceedings of the BMVA Conference on British Machine Vision Conference. BMVA, 275--1.
[12]
Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2288--2295.
[13]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.
[14]
Dangwei Li, Xiaotang Chen, Zhang Zhang, and Kaiqi Huang. 2017. Learning deep context-aware features over body and latent parts for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7398--7407.
[15]
Wei Li and Xiaogang Wang. 2013. Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601.
[16]
Zhetao Li, Jie Zhang, Kaihua Zhang, and Zhiyong Li. 2018. Visual tracking with weighted adaptive local sparse appearance model via spatio-temporal context learning. IEEE Trans. Image Process. 27, 9 (2018), 4479–4489.
[17]
Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197--2206.
[18]
Giuseppe Lisanti, Svebor Karaman, and Iacopo Masi. 2017. Multichannel-kernel canonical correlation analysis for cross-view person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (2017), 13.
[19]
Hao Liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, and Jiashi Feng. 2017. Video-based person re-identification with accumulative motion context. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (2018), 2788–2802.
[20]
Jiawei Liu, Zheng-Jun Zha, QI Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet cnn for person re-identification. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 192--196.
[21]
Kan Liu, Bingpeng Ma, Wei Zhang, and Rui Huang. 2015. A spatio-temporal appearance representation for video-based pedestrian re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 3810--3818.
[22]
Yu Liu, Junjie Yan, and Wanli Ouyang. 2017. Quality aware network for set to set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694--4703.
[23]
David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110.
[24]
Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210.
[25]
Ju Man and Bir Bhanu. 2006. Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 316--322.
[26]
Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical gaussian descriptor for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1363--1372.
[27]
Niall Mclaughlin, Jesus Martinez Del Rincon, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1325--1334.
[28]
Federico Pala, Riccardo Satta, Giorgio Fumera, and Fabio Roli. 2016. Multimodal person reidentification using RGB-D cameras. IEEE Trans. Circ. Syst. Vid. Technol. 26, 4 (2016), 788--799.
[29]
Zhiyuan Shi, Timothy M Hospedales, and Tao Xiang. 2015. Transferring a semantic representation for person re-identification and search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4184--4193.
[30]
Chi Su, Fan Yang, Shiliang Zhang, Qi Tian, Larry Steven Davis, and Wen Gao. 2018. Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1167--1181.
[31]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.
[32]
Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah. 2017. Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 (2017).
[33]
Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497.
[34]
Rahul Rama Varior, Gang Wang, Jiwen Lu, and Ting Liu. 2016. Learning invariant color features for person re-identification. IEEE Trans. Image Process. 25, 7 (2016), 3395--3410.
[35]
Faqiang Wang, Wangmeng Zuo, Liang Lin, David Zhang, and Lei Zhang. 2016. Joint learning of single-image and cross-image representations for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288--1296.
[36]
Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2016. Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2501--2514.
[37]
Yicheng Wang, Zhenzhong Chen, Feng Wu, and Gang Wang. 2018. Person re-identification with cascaded pairwise convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1470--1478.
[38]
Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2018. Person transfer GAN to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.
[39]
Kilian Q. Weinberger and Lawrence K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 2 (2009), 207–244.
[40]
Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision. Springer, 499--515.
[41]
Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1249--1258.
[42]
Fei Xiong, Mengran Gou, Octavia Camps, and Mario Sznaier. 2014. Person re-identification using kernel-based metric learning methods. In Proceedings of European Conference on Computer Vision. Springer, 1--16.
[43]
Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 4743--4752.
[44]
Yichao Yan, Bingbing Ni, Zhichao Song, Chao Ma, Yan Yan, and Xiaokang Yang. 2016. Person re-identification via recurrent feature aggregation. In Proceedings of the European Conference on Computer Vision. 701--716.
[45]
Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. 2017. Enhancing person re-identification in a self-trained subspace. ACM Trans. Multimedia Comput. Commun. Appl. 13, 3 (2017), 27.
[46]
Yang Yang, Jimei Yang, Junjie Yan, Shengcai Liao, Dong Yi, and Stan Z Li. 2014. Salient color names for person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 536--551.
[47]
Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1345--1353.
[48]
Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1239--1248.
[49]
Wei Zhang, Shengnan Hu, and Kan Liu. 2017. Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294 (2017).
[50]
Liming Zhao, Xi Li, Jingdong Wang, and Yueting Zhuang. 2017. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 8.
[51]
Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Person re-identification by saliency learning. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2 (2014), 356--370.
[52]
Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. Mars: A video benchmark for large-scale person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 868--884.
[53]
Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).
[54]
Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2011. Person re-identification by probabilistic relative distance comparison. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 649--656.
[55]
Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2016. Towards open-world person re-identification by one-shot group-based verification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2016), 591--606.
[56]
Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned cnn embedding for person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 13.
[57]
Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3652--3661.
[58]
Sanping Zhou, Jinjun Wang, Jiayun Wang, Yihong Gong, and Nanning Zheng. 2017. Point to set similarity based deep feature learning for person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5028--5037.
[59]
Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 6776--6785.
[60]
Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251.

Cited By

View all
  • (2024)STFE: A Comprehensive Video-Based Person Re-Identification Network Based on Spatio-Temporal Feature EnhancementIEEE Transactions on Multimedia10.1109/TMM.2024.336213626(7237-7249)Online publication date: 2024
  • (2024)A Video Is Worth Three Views: Trigeminal Transformers for Video-Based Person Re-IdentificationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.338691425:9(12818-12828)Online publication date: Sep-2024
  • (2024)HASI: Hierarchical Attention-Aware Spatio–Temporal Interaction for Video-Based Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334042834:6(4973-4988)Online publication date: Jun-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 15, Issue 1s
Special Section on Deep Learning for Intelligent Multimedia Analytics and Special Section on Multi-Modal Understanding of Social, Affective and Subjective Attributes of Data
January 2019
265 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3309769
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2019
Accepted: 01 June 2018
Revised: 01 April 2018
Received: 01 October 2017
Published in TOMM Volume 15, Issue 1s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Person re-identification
  2. deep learning
  3. network structure

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Key R8D Program of China
  • National Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)27
  • Downloads (Last 6 weeks)4
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)STFE: A Comprehensive Video-Based Person Re-Identification Network Based on Spatio-Temporal Feature EnhancementIEEE Transactions on Multimedia10.1109/TMM.2024.336213626(7237-7249)Online publication date: 2024
  • (2024)A Video Is Worth Three Views: Trigeminal Transformers for Video-Based Person Re-IdentificationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.338691425:9(12818-12828)Online publication date: Sep-2024
  • (2024)HASI: Hierarchical Attention-Aware Spatio–Temporal Interaction for Video-Based Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334042834:6(4973-4988)Online publication date: Jun-2024
  • (2023)Frequency Information Disentanglement Network for Video-Based Person Re-IdentificationIEEE Transactions on Image Processing10.1109/TIP.2023.329690132(4287-4298)Online publication date: 2023
  • (2023)Deep learning algorithms for person re-identification: sate-of-the-art and research challengesMultimedia Tools and Applications10.1007/s11042-023-16286-w83:8(22005-22054)Online publication date: 10-Aug-2023
  • (2022)Self-supervised human semantic parsing for video-based person re-identificationJUSTC10.52396/JUSTC-2021-021252:9(5)Online publication date: 2022
  • (2022)Recent Advances in Vision-Based On-Road Behaviors Understanding: A Critical SurveySensors10.3390/s2207265422:7(2654)Online publication date: 30-Mar-2022
  • (2022)A Survey of Human Gait-Based Artificial Intelligence ApplicationsFrontiers in Robotics and AI10.3389/frobt.2021.7492748Online publication date: 3-Jan-2022
  • (2022)Efficient Text-based Person Search via Single-stage Identity-guided Attribute Parsing and Alignment2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956569(4111-4117)Online publication date: 21-Aug-2022
  • (2022)Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00717(7309-7318)Online publication date: Jun-2022
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media