A Bayesian Scene-Prior-Based Deep Network Model for Face Verification
Abstract
:1. Introduction
- We propose a scene model based on the Bayesian deep network technique, which can infer several complicated scenes for the face-verification task.
- A new unsupervised face-verification model is developed on the basis of the scene transfer learning technique.
- Experiments on two challenging datasets validated the proposed model in the case of a lack of sufficient training samples.
2. Previous Work
3. The Proposed Methodology
3.1. The Bayesian Scene-Prior-Based Deep Network Model
3.2. Scene Inference
- (1)
- Perform convolution and pooling on .
- (2)
- Determine a mixed feature space or a mixed image space, as shown in Equation (11).By using the -norm, the mixed image space can be derived by the following equation:
- (3)
- Decide . The term is in general obtained by integrating over the hidden variables and s.
3.3. Hyperparameter Optimization
3.4. Overlapping Distributions’ Transform
3.5. Scene Backward Propagation
4. Experiments and Results
4.1. Datasets and Evaluation
4.2. Training Process for the New Model
4.3. The Semantic Features’ Extraction
4.4. Distribution of Semantic Features
4.5. Comparison with Other Models
5. Conclusions and Discussions
Author Contributions
Funding
Acknowledgments
Conflicts of Interest
Appendix A. Scene Dictionary Learning
Algorithm 1 Scene dictionary learning |
Input X datasets of detected and aligned faces Output S: scene dictionary for all individuals
|
Appendix B. Scene Inference and Model Training
Algorithm 2 Scene inference and model training |
X datasets of detected and aligned faces; S: scene dictionary for all individuals Enlarged datasets and a new trained CNN model
|
References
- Sun, Y.; Wang, X.; Tang, X. Deep learning face representation from predicting 10,000 classes. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, 23–28 June 2014; pp. 1891–1898. [Google Scholar]
- Sun, Y.; Wang, X.; Tang, X. Sparsifying neural network connections for face recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas Valley, NV, USA, 26 June–1 July 2016. [Google Scholar]
- Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Web scale training for face identification. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Zhu, Z.; Luo, P.; Wang, X.; Tang, X. Deep Learning Identity Preserving Face Space. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, NSW, Australia, 1–8 December 2013; pp. 113–120. [Google Scholar]
- Tran, L.; Yin, X.; Liu, X. Disentangled Representation Learning GAN for Pose Invariant Face Recognition. In Proceedings of the 2017 IEEE Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Chen, W.; Liu, C.H. Transfer between pose and expression training in face recognition. Vis. Res. 2009, 49, 368–373. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Chen, B.C.; Chen, C.S.; Hsu, W.H. Cross age reference coding for age invariant face recognition and retrieval. In Computer Vision ECCV 2014; Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T., Eds.; Springer: Cham, Switzerland, 2014; pp. 768–783. [Google Scholar]
- Cheng, Y.; Jiao, L.; Cao, X.; Li, Z. Illumination insensitive features for face recognition. Vis. Comput. 2017, 33, 1483–1493. [Google Scholar] [CrossRef]
- Ruiz del Solar, J.; Verschae, R.; Correa, M. Recognition of Faces in Unconstrained Environments: A Comparative Study. EURASIP J. Adv. Signal Process. 2009, 2009, 184617. [Google Scholar] [CrossRef]
- Huang, G.B.; Learned-Miller, E. Labeled Faces in the Wild: Updates and New Reporting Procedures; (UM-CS-2014-003), Technical Report; University of Massachusetts Amherst: Amherst, MA, USA, 2014. [Google Scholar]
- Deng, W.; Zheng, L.; Ye, Q.; Murphy, K.; Kang, G.; Yang, Y.; Jiao, J. Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification. arXiv, 2017; arXiv:1711.07027. [Google Scholar]
- Fei, L.; Perona, P. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’05), San Diego, CA, USA, 20–25 June 2005; Volume 2, pp. 524–531. [Google Scholar]
- Chen, L.C.; Barron, J.T.; Papandreou, G.; Murphy, K.; Yuille, A.L. Semantic image segmentation with task-specific edge detection using cnns and a discriminatively trained domain transform. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas Valley, NV, USA, 26 June–1 July 2016; pp. 4545–4554. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas Valley, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going Deeper with Convolutions. arXiv, 2014; arXiv:1409.4842. [Google Scholar]
- Liu, J.; Deng, Y.; Bai, T.; Huang, C. Targeting ultimate accuracy: Face recognition via deep embedding. arXiv, 2015; arXiv:1506.07310. [Google Scholar]
- Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A Unified Embedding for Face Recognition and Clustering. arXiv, 2015; arXiv:1503.03832. [Google Scholar]
- Taigman, Y.; Yang, M.; Ranzato, M.; Wolf, L. Deepface: Closing the gap to human level performance in face verification. In Proceedings of the 27th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2014), Columbus, OH, USA, 23–28 June 2014; pp. 1701–1708. [Google Scholar]
- Sun, Y.; Liang, D.; Wang, X.; Tang, X. DeepID3: Face Recognition with Very Deep Neural Networks. arXiv, 2015; arXiv:1502.00873. [Google Scholar]
- Raudys, S.; Pikelis, V. On dimensionality, sample size, classification error, and complexity of classification algorithm in pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell. 1980, 3, 242–252. [Google Scholar] [CrossRef]
- Salakhutdinov, R.; Tenenbaum, J.B.; Torralba, A. Learning with Hierarchical Deep Models. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 1958–1971. [Google Scholar] [CrossRef] [PubMed]
- Zheng, S.; Jayasumana, S.; Romera Paredes, B.; Vineet, V.; Su, Z.; Du, D.; Huang, C.; Torr, P.H.S. Conditional random fields as recurrent neural networks. In Proceedings of the 2015 IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015; pp. 1529–1537. [Google Scholar]
- Zhang, B.; Perina, A.; Li, Z.; Murino, V.; Liu, J.; Ji, R. Bounding multiple gaussians uncertainty with application to object tracking. Int. J. Comput. Vis. 2016, 118, 364–379. [Google Scholar] [CrossRef]
- Wolf, L.; Hassner, T.; Maoz, I. Face recognition in unconstrained videos with matched background similarity. In Proceedings of the 24th IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2011), Colorado Springs, CO, USA, 20–25 June 2011; pp. 529–534. [Google Scholar]
- Yi, D.; Lei, Z.; Liao, S.; Li, S.Z. Learning Face Representation from Scratch. arXiv, 2014; arXiv:1411.7923. [Google Scholar]
- Srivastava, R.K.; Greff, K.; Schmidhuber, J. Training very deep networks. arXiv, 2015; arXiv:1507.06228. [Google Scholar]
- Parkhi, O.M.; Vedaldi, A.; Zisserman, A. Deep face recognition. In Proceedings of the 2015 British Machine Vision Conference, Swansea, UK, 7–10 September 2015. [Google Scholar]
- Van Der Maaten, L. Accelerating tSNE Using Tree based Algorithms. J. Mach. Learn. Res. 2014, 15, 3221–3245. [Google Scholar]
- Arashloo, S.R.; Kittler, J. Class Specific Kernel Fusion of Multiple Descriptors for Face Verification Using Multiscale Binarised Statistical Image Features. IEEE Trans. Inf. Forensics Secur. 2014, 9, 2100–2109. [Google Scholar] [CrossRef]
- Xu, J.F.; Luu, K.; Savvides, M. Spartans: Single Sample Periocular Based Alignment Robust Recognition Technique Applied to Non Frontal Scenarios. IEEE Trans. Image Process. 2015, 24, 4780–4795. [Google Scholar] [CrossRef] [PubMed]
- Amos, B.; Ludwiczuk, B.; Satyanarayanan, M. OpenFace: A General Purpose Face Recognition Library with Mobile Applications; Technical report, CMU CS 16 118; CMU School of Computer Science: Pittsburgh, PA, USA, 2016. [Google Scholar]
- Tran, A.; Hassner, T.; Masi, I.; Medioni, G. Regressing Robust and Discriminative 3D Morphable Models with a very Deep Neural Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Masi, I.; Tran, A.T.; Leksut, J.T.; Hassner, T.; Medioni, G.G. Do We Really Need to Collect Millions of Faces for Effective Face Recognition? arXiv, 2016; arXiv:1603.07057. [Google Scholar]
- Wen, Y.; Zhang, K.; Li, Z.; Qiao, Y. A discriminative feature learning approach for deep face recognition. In Proceedings of the European Conference on Computer Vision 2016, Amsterdam, The Netherlands, 11–14 October 2016; pp. 499–515. [Google Scholar]
- Liu, W.; Wen, Y.; Yu, Z.; Li, M.; Raj, B.; Song, L. Sphereface: Deep hypersphere embedding for face recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017. [Google Scholar]
- Qi, X.; Zhang, L. Face Recognition via Centralized Coordinate Learning. arXiv, 2018; arXiv:1801.05678. [Google Scholar]
- Hu, G.; Yang, H.; Yuan, Y.; Zhang, Z.; Lu, Z.; Mukherjee, S.S.; Hospedales, T.; Robertson, N.M.; Yang, Y. Attribute enhanced face recognition with neural tensor fusion networks. In Proceedings of the International Conference on Computer Vision (ICCV 2017), Venice, Italy, 22–29 October 2017. [Google Scholar]
- Xi, M.; Chen, L.; Polajnar, D.; Tong, W. Local binary pattern network: A deep learning approach for face recognition. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3224–3228. [Google Scholar]
- Wang, F.; Liu, W.; Liu, H.; Cheng, J. Additive Margin Softmax for Face Verification. arXiv, 2018; arXiv:1801.05599. [Google Scholar]
Symbol | Notation |
---|---|
s = (, ) | Scene; number is unknown in advance |
Two feature spaces, the pure and the mixed space | |
C | Category |
p(,) | PDF (probability density function) |
The overlapping image space | |
The statistics (; ) | |
The pixel value at position | |
The PDF of the kth scene | |
i,j,k | The image, category, and scene orders |
Features extracted from the image |
Name | Type | Stride | Output | #P |
---|---|---|---|---|
Conv11 | Conv | (3, 3, 1) | (100, 100, 32) | 280 |
Conv12 | Conv | (3, 3, 1) | (100, 100, 64) | 18,000 |
Pool1 | Maxpooling | (2, 2, 2) | (50, 50, 64) | |
Conv21 | Conv | (3, 3, 1) | (50, 50, 64) | 36,000 |
Conv22 | Conv | (3, 3, 1) | (50, 50, 128) | 72,000 |
Pool2 | Maxpooling | (2, 2, 2) | (25, 25, 128) | |
Conv31 | Conv | (3, 3, 1) | (25, 25, 96) | 108,000 |
Conv32 | Conv | (3, 3, 1) | (25, 25, 192) | 162,000 |
Pool3 | Maxpooling | (2, 2, 2) | (13, 13, 192) | |
Conv41 | Conv | (3, 3, 1) | (13, 13, 128) | 216,000 |
Conv42 | Conv | (3, 3, 1) | (13, 13, 256) | 288,000 |
Pool4 | Maxpooling | (2, 2, 2) | (7, 7, 256) | |
Conv51 | Conv | (3, 3, 1) | (7, 7, 160) | 360,000 |
Conv52 | Conv | (3, 3, 1) | (7, 7, 320) | 450,000 |
Pool5 | AVGpooling | (7, 7, 1) | (1, 1, 320) | |
Dropout | Dropout | (1, 1, 320) | 3,305,000 | |
Fc6 | Fullyconnect | 10,575 | ||
Cost1 | Softmax | 10,575 | ||
KL | Generate | 10,575 | 2000 | |
Total | 5,017,000 |
Method | LFW | YTF | Protocol | Images | Networks |
---|---|---|---|---|---|
CNN-3DMM estimation [32] | 92.35% | 88.80% | Unrestricted | 0.5 M | 1 |
Casia [25] | 97.73% | 92.24% | Unrestricted | 1.0 M | 1 |
Pose/shape/expression augmentation [33] | 98.07% | N/A | Unrestricted | 2.5 M | 1 |
VGGFace [27] | 98.95% | 97.30% | Unrestricted | 2.6 M | 1 |
Discriminative [34] | 99.28% | 94.90% | Unrestricted | 0.7 M | 1 |
SphereFace [35] | 99.42% | 95.00% | Unrestricted | 0.5 M | 1 |
DeepID [1:3] [19] | 99.53% | 93.20% | Unrestricted | 0.3 M | 200 |
CCL with AAM [36] | 99.58% | 95.28% | Unrestricted | 0.5 M | 1 |
Facenet [17] | 99.63% | 99.63% | Unrestricted | 200 M | 1 |
GTNN [37] | 99.65% | N/A | Unrestricted | 6.2 M | 2 |
Baidu [16] | 99.77% | N/A | Unrestricted | 1.3 M | 10 |
LBPNet [38] | 94.04% | N/A | Unsupervised | 0.5 M | 1 |
Deepface [18] | 95.20% | 91.40% | Unsupervised | 4 M | 1 |
Casia [25] | 97.30% | 90.60% | Unsupervised | 0.5 M | 1 |
MRF-FUSION-CGKDA [29] | 98.94% | 93.20% | Unsupervised | 0.5 M | 5 |
AM-Softmax w/o FN [39] | 99.12% | N/A | Unsupervised | 0.5 M | 1 |
Ours | 99.2% | 94.30% | Unsupervised | 0.5 M | 1 |
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Wang, H.; Song, W.; Liu, W.; Song, N.; Wang, Y.; Pan, H. A Bayesian Scene-Prior-Based Deep Network Model for Face Verification. Sensors 2018, 18, 1906. https://doi.org/10.3390/s18061906
Wang H, Song W, Liu W, Song N, Wang Y, Pan H. A Bayesian Scene-Prior-Based Deep Network Model for Face Verification. Sensors. 2018; 18(6):1906. https://doi.org/10.3390/s18061906
Chicago/Turabian StyleWang, Huafeng, Wenfeng Song, Wanquan Liu, Ning Song, Yuehai Wang, and Haixia Pan. 2018. "A Bayesian Scene-Prior-Based Deep Network Model for Face Verification" Sensors 18, no. 6: 1906. https://doi.org/10.3390/s18061906