3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification

Kim, Kyeong Tae; Lee, Bumshik; Choi, Jae Young

doi:10.1007/s11042-020-09495-0

3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification

Published: 25 September 2020

Volume 80, pages 4023–4036, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

260 Accesses
Explore all metrics

Abstract

This paper proposes a novel video face identification method, named “3D-2D-DCNN cascade” that serially combines 3D and 2D deep convolutional neural networks (DCNNs) for robust video face recognition (FR). In our method, an input video (face) sequence is first divided into a number of sub-video sequences and each of the sub-video sequences is then used as an input to the 3D-DCNN, aiming to obtain a set of class-confidence scores for a given input video sequence. These class-confidence scores are aggregated in a novel way, resulting in the formation of our novel class-confidence matrix. Key characteristic of our method is to make use of this class-confidence matrix for fine-tuning 2D-DCNN, which is serially linked to 3D-DCNN, to obtain the final face identification results. To verify the proposed method, two popular video identification benchmarks, COX Face and YTC databases, were used. Compared to the best reported recognition results on these two benchmarks, our proposed method achieves better or comparable recognition performances.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effective multiple person recognition in random video sequences using a convolutional neural network

Article 09 February 2019

Deep Learning Architectures for Face Recognition in Video Surveillance

Face Recognition Using 3D CNNs

References

Deng J, et al (2018) Arcface: additive angular margin loss for deep face recognition. In: arXiv preprint arXiv:1801.07698
Google Scholar
Ding C, Tao D (2018) Trunk-branch ensemble convolutional neural networks for video-based face recognition. IEEE Trans Pattern Anal Mach Intell 40(4):1002–1014
Article Google Scholar
Glorot X, Bengio Y (2010) International conference on artificial intelligence and statistics. In: Understanding the difficulty of training deep feedforward neural networks, pp 249–256
Google Scholar
Gong S, Yichun S, Jain AK (2019) Video face recognition: component-wise feature aggregation network (C-FAN). arXiv preprint arXiv:1902.07327
Goyal P, et al (2017) Accurate, large minibatch SGD: training imagenet in 1 hour. arXiv:1706.02677
GU J, et al (2018) Recent advances in convolutional neural networks. Pattern Recogn 77:354–377
Article Google Scholar
Hayat M, Bennamoun M, An S (2015) Deep reconstruction models for image set classification. IEEE Trans Pattern Anal Mach Intell 37(4):713–727
Article Google Scholar
Hernández-Durán M, Plasencia-Calaña Y, Méndez-Vázquez H (2018) Low-resolution face recognition with deep convolutional features in the dissimilarity space. International Workshop on Artificial Intelligence and Pattern Recognition, pp 95–103
Huang Z, Wang R, Shan S, Chen X (2014) Learning euclidean-to-riemannian metric for point-to-set classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition:1677–1684
Huang Z, Shan S, Wang R, Zhang H, Lao S, Kuerban A, Chen X (2015) A benchmark and comparative study of video-based face recognition on cox face database. IEEE Trans Image Process 24(12):5967–5981
Article MathSciNet Google Scholar
Huang Z, Wang R, Shan S, Chen X (2015) Projection metric learning on Grassmann manifold with application to video based face recognition. Proc IEEE Conf Comput Vis Pattern Recognit:140–149
Intra-Face (2013) http://humansensing.cs.cmu.edu/intraface. Accessed June, 23, 2017
Jia X, et al (2018) Highly scalable deep learning training system with mixed-precision: training imagenet in four minutes. In: arXiv:1807.11205
Google Scholar
Karpathy A, et al (2014) Large-scale video classification with convolutional neural networks. Proc IEEE Conf Comput Vis Pattern Recognit:1725–1732
Keskar NS, et al (2017) On large-batch training for deep learning: generalization gap and sharp minima. In: arXiv:1609.04836
Google Scholar
Kim M, Kumar S, Pavlovic V, Rowley H (2008) Face tracking and recognition with visual constraints in real-world videos. In IEEE Conf Computer Vision and Pattern Recognition:1–8
Liao X, Li K, Zhu X, Liu KR (2020) Robust detection of image operator chain with two-stream convolutional neural network. IEEE Journal of Selected Topics in Signal Processing:1–1
Lu J, Wang G, Deng W, Moulin P, Zhou J (2015) Multimanifold deep metric learning for image set classification. Proc IEEE Conf Comput Vis Pattern Recognit:1137–1145
Lu J, Wang G, Moulin P (2016) Localized multifeature metric learning for image-set-based face recognition. IEEE Transactions on Circuits and Systems for Video Technology 26(3):529–540
Article Google Scholar
Masters D, Luschi C (2018) Revisiting small batch training for deep neural networks. arXiv:1804.07612
Parchami M, Bashbaghi S, Granger E (2017) Cnns with cross-correlation matching for face recognition in video surveillance using a single training sample per person. 2017 14th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), pp 1–6
Parchami M, Bashbaghi S, Granger E (2017) Video-based face recognition using ensemble of haar-like deep convolutional neural networks. IJCNN
Parkhi OM, Vedaldi A, Zisserman A (2015) Deep face recognition. European Conference on Computer Vision, pp 1–12
Qi X, Liu C, Schuckers S (2018) Boosting face in video recognition via cnn based key frame extraction. 2018 International Conference on Biometrics (ICB), pp 132–139
Rao Y, Lu J, Zhou J (2019) Learning discriminative aggregation network for video-based face recognition and person re-identification. Int J Comput Vis 127(6–7):701–718
Article Google Scholar
Tran D, et al (2015) Learning spatiotemporal features with 3d convolutional networks. Proceedings of the IEEE International Conference on Computer Vision:4489–4497
Wang R, Chen X (2009) Manifold discriminant analysis. In CVPR, pp:429–436
Wang H, Wang Y, Cao Y (2009) Video-based face recognition: a survey. World Academy of Science, Eng Technol 60:293–302
Google Scholar
Wu Y, He K (2018) Group normalization. Proceedings of the European conference on computer vision (ECCV), pp 3–19
Yang M, Wang X, Liu W, Shen L (2016) Joint regularized nearest points for image set based face recognition. Image Vis Comput 58:47–60
Article Google Scholar
Yang J, Ren P, Zhang D, Chen D, Wen F, Li H, Hua G (2017) Neural aggregation network for video face recognition. IEEE Conference on Computer Vision and Pattern Recognition 4(6):7
Google Scholar

Download references

Funding

This research was supported by Hankuk University of Foreign Studies Research Fund. This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education under Grant 2018R1D1A1A09082615.

Author information

Authors and Affiliations

Pattern Recognition and Machine Intelligence Laboratory, Division of Computer & Electronic Systems Engineering, Hankuk University of Foreign Studies, 81, Oedae-ro, Mohyeon-myeon, Cheoin-gu, Yongin-si, Gyeonggi-do, 17305, Republic of Korea
Kyeong Tae Kim & Jae Young Choi
Department of Information and Communications Engineering, Chosun University, 61452, Gwangju, Republic of Korea
Bumshik Lee

Authors

Kyeong Tae Kim
View author publications
You can also search for this author in PubMed Google Scholar
Bumshik Lee
View author publications
You can also search for this author in PubMed Google Scholar
Jae Young Choi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jae Young Choi.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Kim, K.T., Lee, B. & Choi, J.Y. 3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification. Multimed Tools Appl 80, 4023–4036 (2021). https://doi.org/10.1007/s11042-020-09495-0

Download citation

Received: 16 February 2020
Revised: 19 July 2020
Accepted: 29 July 2020
Published: 25 September 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11042-020-09495-0

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effective multiple person recognition in random video sequences using a convolutional neural network

Deep Learning Architectures for Face Recognition in Video Surveillance

Face Recognition Using 3D CNNs

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

3D-2D deep convolutional neural network (DCNN) Cascade for robust video face identification

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effective multiple person recognition in random video sequences using a convolutional neural network

Deep Learning Architectures for Face Recognition in Video Surveillance

Face Recognition Using 3D CNNs

References

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation