research-article

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos

Authors:

Yongdong ZhangAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 15, Issue 1s

Article No.: 8, Pages 1 - 19

https://doi.org/10.1145/3231741

Published: 24 January 2019 Publication History

Abstract

Person re-identification aims at identifying a certain pedestrian across non-overlapping multi-camera networks in different time and places. Existing person re-identification approaches mainly focus on matching pedestrians on images; however, little attention has been paid to re-identify pedestrians in videos. Compared to images, video clips contain motion patterns of pedestrians, which is crucial to person re-identification. Moreover, consecutive video frames present pedestrian appearance with different body poses and from different viewpoints, providing valuable information toward addressing the challenge of pose variation, occlusion, and viewpoint change, and so on. In this article, we propose a Dense 3D-Convolutional Network (D3DNet) to jointly learn spatio-temporal and appearance representation for person re-identification in videos. The D3DNet consists of multiple three-dimensional (3D) dense blocks and transition layers. The 3D dense blocks enlarge the receptive fields of visual neurons in both spatial and temporal dimensions, leading to discriminative appearance representation as well as short-term and long-term motion patterns of pedestrians without the requirement of an additional motion estimation module. Moreover, we formulate a loss function consisting of an identification loss and a center loss to minimize intra-class variance and maximize inter-class variance simultaneously, toward addressing the challenge of large intra-class variance and small inter-class variance. Extensive experiments on two real-world video datasets of person identification, i.e., MARS and iLIDS-VID, have shown the effectiveness of the proposed approach.

References

[1]

Dapeng Chen, Dan Xu, Hongsheng Li, Nicu Sebe, and Xiaogang Wang. 2018. Group consistent similarity learning via deep CRF for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 8649--8658.

[2]

Dapeng Chen, Zejian Yuan, Badong Chen, and Nanning Zheng. 2016. Similarity learning with spatial constraints for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1268--1277.

[3]

Dapeng Chen, Zejian Yuan, Gang Hua, Nanning Zheng, and Jingdong Wang. 2015. Similarity learning on an explicit polynomial kernel feature map for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1565--1573.

[4]

Weihua Chen, Xiaotang Chen, Jianguo Zhang, and Kaiqi Huang. 2017. Beyond triplet loss: A deep quadruplet network for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2.

[5]

Jason V Davis, Brian Kulis, Prateek Jain, Suvrit Sra, and Inderjit S Dhillon. 2007. Information-theoretic metric learning. In Proceedings of the 24th International Conference on Machine learning. ACM, 209--216.

Digital Library

[6]

Michela Farenzena, Loris Bazzani, Alessandro Perina, Vittorio Murino, and Marco Cristani. 2010. Person re-identification by symmetry-driven accumulation of local features. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2360--2367.

[7]

Gao Huang, Zhuang Liu, Kilian Q Weinberger, and Laurens van der Maaten. 2017. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1. 3.

[8]

Shuiwang Ji, Wei Xu, Ming Yang, and Kai Yu. 2013. 3D convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 1 (2013), 221--231.

Digital Library

[9]

Yifan Jiao, Zhetao Li, Shucheng Huang, Xiaoshan Yang, Bin Liu, and Tianzhu Zhang. 2018. 3D attention-based deep ranking model for video highlight detection. IEEE Trans. Multimedia 20, 10 (2018), 2693–2705.

[10]

Srikrishna Karanam, Yang Li, and Richard J Radke. 2015. Person re-identification with discriminatively trained viewpoint invariant dictionaries. In Proceedings of the IEEE International Conference on Computer Vision. 4516--4524.

Digital Library

[11]

Alexander Klaser, Marcin Marszałek, and Cordelia Schmid. 2008. A spatio-temporal descriptor based on 3d-gradients. In Proceedings of the BMVA Conference on British Machine Vision Conference. BMVA, 275--1.

[12]

Martin Koestinger, Martin Hirzer, Paul Wohlhart, Peter M Roth, and Horst Bischof. 2012. Large scale metric learning from equivalence constraints. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2288--2295.

Digital Library

[13]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems. 1097--1105.

Digital Library

[14]

Dangwei Li, Xiaotang Chen, Zhang Zhang, and Kaiqi Huang. 2017. Learning deep context-aware features over body and latent parts for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 7398--7407.

[15]

Wei Li and Xiaogang Wang. 2013. Locally aligned feature transforms across views. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3594--3601.

Digital Library

[16]

Zhetao Li, Jie Zhang, Kaihua Zhang, and Zhiyong Li. 2018. Visual tracking with weighted adaptive local sparse appearance model via spatio-temporal context learning. IEEE Trans. Image Process. 27, 9 (2018), 4479–4489.

[17]

Shengcai Liao, Yang Hu, Xiangyu Zhu, and Stan Z Li. 2015. Person re-identification by local maximal occurrence representation and metric learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2197--2206.

[18]

Giuseppe Lisanti, Svebor Karaman, and Iacopo Masi. 2017. Multichannel-kernel canonical correlation analysis for cross-view person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 13, 2 (2017), 13.

Digital Library

[19]

Hao Liu, Zequn Jie, Karlekar Jayashree, Meibin Qi, Jianguo Jiang, Shuicheng Yan, and Jiashi Feng. 2017. Video-based person re-identification with accumulative motion context. IEEE Trans. Circ. Syst. Vid. Technol. 28, 10 (2018), 2788–2802.

Digital Library

[20]

Jiawei Liu, Zheng-Jun Zha, QI Tian, Dong Liu, Ting Yao, Qiang Ling, and Tao Mei. 2016. Multi-scale triplet cnn for person re-identification. In Proceedings of the 2016 ACM on Multimedia Conference. ACM, 192--196.

Digital Library

[21]

Kan Liu, Bingpeng Ma, Wei Zhang, and Rui Huang. 2015. A spatio-temporal appearance representation for video-based pedestrian re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 3810--3818.

Digital Library

[22]

Yu Liu, Junjie Yan, and Wanli Ouyang. 2017. Quality aware network for set to set recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4694--4703.

[23]

David G. Lowe. 2004. Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vis. 60, 2 (2004), 91--110.

Digital Library

[24]

Xiaolong Ma, Xiatian Zhu, Shaogang Gong, Xudong Xie, Jianming Hu, Kin-Man Lam, and Yisheng Zhong. 2017. Person re-identification by unsupervised video matching. Pattern Recogn. 65 (2017), 197--210.

Digital Library

[25]

Ju Man and Bir Bhanu. 2006. Individual recognition using gait energy image. IEEE Trans. Pattern Anal. Mach. Intell. 28, 2 (2006), 316--322.

Digital Library

[26]

Tetsu Matsukawa, Takahiro Okabe, Einoshin Suzuki, and Yoichi Sato. 2016. Hierarchical gaussian descriptor for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1363--1372.

[27]

Niall Mclaughlin, Jesus Martinez Del Rincon, and Paul Miller. 2016. Recurrent convolutional network for video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1325--1334.

[28]

Federico Pala, Riccardo Satta, Giorgio Fumera, and Fabio Roli. 2016. Multimodal person reidentification using RGB-D cameras. IEEE Trans. Circ. Syst. Vid. Technol. 26, 4 (2016), 788--799.

Digital Library

[29]

Zhiyuan Shi, Timothy M Hospedales, and Tao Xiang. 2015. Transferring a semantic representation for person re-identification and search. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 4184--4193.

[30]

Chi Su, Fan Yang, Shiliang Zhang, Qi Tian, Larry Steven Davis, and Wen Gao. 2018. Multi-task learning with low rank attribute embedding for multi-camera person re-identification. IEEE Trans. Pattern Anal. Mach. Intell. 40, 5 (2018), 1167--1181.

[31]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and Zbigniew Wojna. 2016. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2818--2826.

[32]

Yonatan Tariku Tesfaye, Eyasu Zemene, Andrea Prati, Marcello Pelillo, and Mubarak Shah. 2017. Multi-target tracking in multiple non-overlapping cameras using constrained dominant sets. arXiv preprint arXiv:1706.06196 (2017).

[33]

Du Tran, Lubomir Bourdev, Rob Fergus, Lorenzo Torresani, and Manohar Paluri. 2015. Learning spatiotemporal features with 3d convolutional networks. In Proceedings of the IEEE International Conference on Computer Vision. 4489--4497.

Digital Library

[34]

Rahul Rama Varior, Gang Wang, Jiwen Lu, and Ting Liu. 2016. Learning invariant color features for person re-identification. IEEE Trans. Image Process. 25, 7 (2016), 3395--3410.

Digital Library

[35]

Faqiang Wang, Wangmeng Zuo, Liang Lin, David Zhang, and Lei Zhang. 2016. Joint learning of single-image and cross-image representations for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1288--1296.

[36]

Taiqing Wang, Shaogang Gong, Xiatian Zhu, and Shengjin Wang. 2016. Person re-identification by discriminative selection in video ranking. IEEE Trans. Pattern Anal. Mach. Intell. 38, 12 (2016), 2501--2514.

Digital Library

[37]

Yicheng Wang, Zhenzhong Chen, Feng Wu, and Gang Wang. 2018. Person re-identification with cascaded pairwise convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1470--1478.

[38]

Longhui Wei, Shiliang Zhang, Wen Gao, and Qi Tian. 2018. Person transfer GAN to bridge domain gap for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition.

[39]

Kilian Q. Weinberger and Lawrence K. Saul. 2009. Distance metric learning for large margin nearest neighbor classification. J. Mach. Learn. Res. 10, 2 (2009), 207–244.

Digital Library

[40]

Yandong Wen, Kaipeng Zhang, Zhifeng Li, and Yu Qiao. 2016. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision. Springer, 499--515.

[41]

Tong Xiao, Hongsheng Li, Wanli Ouyang, and Xiaogang Wang. 2016. Learning deep feature representations with domain guided dropout for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1249--1258.

[42]

Fei Xiong, Mengran Gou, Octavia Camps, and Mario Sznaier. 2014. Person re-identification using kernel-based metric learning methods. In Proceedings of European Conference on Computer Vision. Springer, 1--16.

[43]

Shuangjie Xu, Yu Cheng, Kang Gu, Yang Yang, Shiyu Chang, and Pan Zhou. 2017. Jointly attentive spatial-temporal pooling networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. 4743--4752.

[44]

Yichao Yan, Bingbing Ni, Zhichao Song, Chao Ma, Yan Yan, and Xiaokang Yang. 2016. Person re-identification via recurrent feature aggregation. In Proceedings of the European Conference on Computer Vision. 701--716.

[45]

Xun Yang, Meng Wang, Richang Hong, Qi Tian, and Yong Rui. 2017. Enhancing person re-identification in a self-trained subspace. ACM Trans. Multimedia Comput. Commun. Appl. 13, 3 (2017), 27.

Digital Library

[46]

Yang Yang, Jimei Yang, Junjie Yan, Shengcai Liao, Dong Yi, and Stan Z Li. 2014. Salient color names for person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 536--551.

[47]

Jinjie You, Ancong Wu, Xiang Li, and Wei-Shi Zheng. 2016. Top-push video-based person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1345--1353.

[48]

Li Zhang, Tao Xiang, and Shaogang Gong. 2016. Learning a discriminative null space for person re-identification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 1239--1248.

[49]

Wei Zhang, Shengnan Hu, and Kan Liu. 2017. Learning compact appearance representation for video-based person re-identification. arXiv preprint arXiv:1702.06294 (2017).

[50]

Liming Zhao, Xi Li, Jingdong Wang, and Yueting Zhuang. 2017. Deeply-learned part-aligned representations for person re-identification. In Proceedings of the IEEE International Conference on Computer Vision, Vol. 8.

[51]

Rui Zhao, Wanli Ouyang, and Xiaogang Wang. 2014. Person re-identification by saliency learning. IEEE Trans. Pattern Anal. Mach. Intell. 39, 2 (2014), 356--370.

Digital Library

[52]

Liang Zheng, Zhi Bie, Yifan Sun, Jingdong Wang, Chi Su, Shengjin Wang, and Qi Tian. 2016. Mars: A video benchmark for large-scale person re-identification. In Proceedings of European Conference on Computer Vision. Springer, 868--884.

[53]

Liang Zheng, Yi Yang, and Alexander G Hauptmann. 2016. Person re-identification: Past, present and future. arXiv preprint arXiv:1610.02984 (2016).

[54]

Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2011. Person re-identification by probabilistic relative distance comparison. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 649--656.

Digital Library

[55]

Wei-Shi Zheng, Shaogang Gong, and Tao Xiang. 2016. Towards open-world person re-identification by one-shot group-based verification. IEEE Trans. Pattern Anal. Mach. Intell. 38, 3 (2016), 591--606.

Digital Library

[56]

Zhedong Zheng, Liang Zheng, and Yi Yang. 2017. A discriminatively learned cnn embedding for person reidentification. ACM Trans. Multimedia Comput. Commun. Appl. 14, 1 (2017), 13.

Digital Library

[57]

Zhun Zhong, Liang Zheng, Donglin Cao, and Shaozi Li. 2017. Re-ranking person re-identification with k-reciprocal encoding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 3652--3661.

[58]

Sanping Zhou, Jinjun Wang, Jiayun Wang, Yihong Gong, and Nanning Zheng. 2017. Point to set similarity based deep feature learning for person reidentification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 5028--5037.

[59]

Zhen Zhou, Yan Huang, Wei Wang, Liang Wang, and Tieniu Tan. 2017. See the forest for the trees: Joint spatial and temporal recurrent neural networks for video-based person re-identification. In Proceedings of the IEEE International Conference on Computer Vision. IEEE, 6776--6785.

[60]

Jun Yan Zhu, Taesung Park, Phillip Isola, and Alexei A. Efros. 2017. Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision. 2242--2251.

Cited By

Yang XWang XLiu LWang NGao X(2024)STFE: A Comprehensive Video-Based Person Re-Identification Network Based on Spatio-Temporal Feature EnhancementIEEE Transactions on Multimedia10.1109/TMM.2024.336213626(7237-7249)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3362136
Liu XZhang PYu CQian XYang XLu H(2024)A Video Is Worth Three Views: Trigeminal Transformers for Video-Based Person Re-IdentificationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.338691425:9(12818-12828)Online publication date: Sep-2024
https://doi.org/10.1109/TITS.2024.3386914
Chen SDa HWang DZhang XYan YZhu S(2024)HASI: Hierarchical Attention-Aware Spatio–Temporal Interaction for Video-Based Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334042834:6(4973-4988)Online publication date: Jun-2024
https://doi.org/10.1109/TCSVT.2023.3340428
Show More Cited By

Index Terms

Dense 3D-Convolutional Neural Network for Person Re-Identification in Videos
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
        Object identification
      2. Computer vision tasks
        Visual content-based indexing and retrieval
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Recommendations

Video-based person re-identification with scene and person attributes
Abstract
Person re-identification (Re-ID) is an essential computer vision task retrieving a person of interest across multiple non-overlapping cameras. In recent years, video-based person Re-ID research has become more and more popular. Compared with image-...
A Unified Generative Adversarial Framework for Image Generation and Person Re-identification
MM '18: Proceedings of the 26th ACM international conference on Multimedia

Person re-identification (re-id) aims to match a certain person across multiple non-overlapping cameras. It is a challenging task because the same person's appearance can be very different across camera views due to the presence of large pose ...
Deep asymmetric video-based person re-identification
Highlights
- We address the “view-bias” problem, a key challenge of video-based person re-id.
- We propose a Deep Asymmetric Metric learning (DAM) method that embeds an asymmetric metric into a deep neural network.
- To make DAM scalable to large ...
Abstract
In this paper, we investigate the problem of video-based person re-identification (re-id) which matches people’s video clips across non-overlapping camera views at different time. A key challenge of video-based person re-id is a person’s ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 15, Issue 1s

Special Section on Deep Learning for Intelligent Multimedia Analytics and Special Section on Multi-Modal Understanding of Social, Affective and Subjective Attributes of Data

January 2019

265 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3309769

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 January 2019

Accepted: 01 June 2018

Revised: 01 April 2018

Received: 01 October 2017

Published in TOMM Volume 15, Issue 1s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Key R8D Program of China
National Natural Science Foundation of China (NSFC)

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

47
Total Citations
View Citations
797
Total Downloads

Downloads (Last 12 months)27
Downloads (Last 6 weeks)4

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yang XWang XLiu LWang NGao X(2024)STFE: A Comprehensive Video-Based Person Re-Identification Network Based on Spatio-Temporal Feature EnhancementIEEE Transactions on Multimedia10.1109/TMM.2024.336213626(7237-7249)Online publication date: 2024
https://doi.org/10.1109/TMM.2024.3362136
Liu XZhang PYu CQian XYang XLu H(2024)A Video Is Worth Three Views: Trigeminal Transformers for Video-Based Person Re-IdentificationIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.338691425:9(12818-12828)Online publication date: Sep-2024
https://doi.org/10.1109/TITS.2024.3386914
Chen SDa HWang DZhang XYan YZhu S(2024)HASI: Hierarchical Attention-Aware Spatio–Temporal Interaction for Video-Based Person Re-IdentificationIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2023.334042834:6(4973-4988)Online publication date: Jun-2024
https://doi.org/10.1109/TCSVT.2023.3340428
Liu LYang XWang NGao X(2023)Frequency Information Disentanglement Network for Video-Based Person Re-IdentificationIEEE Transactions on Image Processing10.1109/TIP.2023.329690132(4287-4298)Online publication date: 2023
https://doi.org/10.1109/TIP.2023.3296901
Yadav AVishwakarma D(2023)Deep learning algorithms for person re-identification: sate-of-the-art and research challengesMultimedia Tools and Applications10.1007/s11042-023-16286-w83:8(22005-22054)Online publication date: 10-Aug-2023
https://doi.org/10.1007/s11042-023-16286-w
Wu WLiu J(2022)Self-supervised human semantic parsing for video-based person re-identificationJUSTC10.52396/JUSTC-2021-021252:9(5)Online publication date: 2022
https://doi.org/10.52396/JUSTC-2021-0212
Trabelsi RKhemmar RDecoux BErtaud JButteau R(2022)Recent Advances in Vision-Based On-Road Behaviors Understanding: A Critical SurveySensors10.3390/s2207265422:7(2654)Online publication date: 30-Mar-2022
https://doi.org/10.3390/s22072654
Harris EKhoo IDemircan E(2022)A Survey of Human Gait-Based Artificial Intelligence ApplicationsFrontiers in Robotics and AI10.3389/frobt.2021.7492748Online publication date: 3-Jan-2022
https://doi.org/10.3389/frobt.2021.749274
Liu TZhu CYang L(2022)Efficient Text-based Person Search via Single-stage Identity-guided Attribute Parsing and Alignment2022 26th International Conference on Pattern Recognition (ICPR)10.1109/ICPR56361.2022.9956569(4111-4117)Online publication date: 21-Aug-2022
https://doi.org/10.1109/ICPR56361.2022.9956569
Wu WLiu JZheng KSun QZha Z(2022)Temporal Complementarity-Guided Reinforcement Learning for Image-to-Video Person Re-Identification2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR52688.2022.00717(7309-7318)Online publication date: Jun-2022
https://doi.org/10.1109/CVPR52688.2022.00717
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents