AD-VAE: Adversarial Disentangling Variational Autoencoder
Abstract
:1. Introduction
- We propose a novel framework, AD-VAE, that combines the ability of VAE to disentangle identity representations with the capacity of the GAN to generate identity-preserving prototypes.
- The AD-VAE achieves state-of-the-art results on four controlled datasets (AR, E-YaleB, CAS-PEAL, and FERET) and the uncontrolled dataset LFW, demonstrating its robustness in handling variations in pose, illumination, and occlusion.
- Unlike other methods, AD-VAE accomplishes this without requiring external pre-trained encoders, making it a self-contained solution for SSPP FR.
2. Materials and Methods
2.1. Related Works
2.2. Background
2.2.1. Generative Adversarial Networks
2.2.2. Variational Autoencoders
2.2.3. Variation Disentangling Generative Adversarial Networks
2.3. The Proposed Method
- : Enable to classify the generated prototype image as the same identity of label of .
- : Enable to detect that there are no variations in .
- : Fool to classify the generated prototype as a real prototype.
- : Enable the generator to generate an image closest to the real prototype image .
- : Enable the generator to generate an image such that the prior distribution of is closest to the prior distribution of .
- : Predict a correct identity of input image as labeled in .
- : Predict a correct occurrence of variation on input image as labeled in .
- : Predict the real prototype image as real and predict a generate prototype image as fake.
3. Results
3.1. Data Collection, Pre-Processing, and Feature Selection
- AR [28] consists of 126 identities, having 26 images with expression, illumination, and conclusion per subject. From this dataset, we use a subset with 100 identities. Randomly, we choose 50 identities for the training set and 50 for the test set.
- Extend Yale B (E-YaleB) [29] consists of 38 identities under a wide range of lighting conditions, including variations in light intensity (ranging from low to high), different types of lighting (such as natural light, artificial light, and directional lighting) and various light angles (e.g., frontal, lateral, and top–down). Due to the low number of subjects, according to [7], we introduce the AR lighting subset into E-YaleB to extend the number of identities. We randomly choose 100 identities from the mixed dataset for the training set and the remaining 38 identities for the test set.
- FERET [30] consists of 1199 identities with variations in gender, age, and ethnicity. From this dataset, we use a subset of 200 identities containing only four pose variations. We randomly choose 150 identities for the training set and the remaining 50 for the test set.
- CAS-PEAL [31] consists of 1040 identities with variations like poses, occlusions, and ages. From this dataset, we use a subset with 300 identities from normal and accessory categories, with a neutral image and another 6 wearing different glasses and hats. We randomly choose 200 identities for the training set and the remaining 100 for the test set.
- LFW [32] consists of 5749 identities collected under an uncontrolled environment, with a wide range of expressions, poses, illuminations, and other variations. We use a subset of 158 identities with more than ten images per subject from the aligned version of LFW, the LFW-a. For evaluation, we choose 50 identities containing neutral face images for the test set and the other 108 for the training set.
- Image resizing: All images are resized to 64x64 pixels to match the input dimensions of the network.
- Normalization: Pixel values are normalized to the range [0, 1] to improve convergence during training.
- Alignment: For the LFW dataset, we use the aligned version (LFW-a) to reduce variations caused by misalignment.
- Handling missing values: All datasets used in this study contain complete data, with no missing values, eliminating the need for further data imputation.
- Latent code generation: The encoder () generates the mean () and variance () of the latent space distribution. The latent code (c) is then sampled from this distribution using the reparameterization trick, which ensures that c is differentiable with respect to the network’s parameters. This differentiability allows for gradient-based optimization during training. A noise vector (z) is sampled independently from a Gaussian distribution for variation modeling.
- Feature concatenation: The generator () combines c and z into a single input vector () to create identity-preserving prototypes with controlled variations.
- Representation dimensionality: For all datasets, the latent dimension () is set to 100, ensuring consistent feature representation across datasets.
3.2. Evaluation in Single Sample Face Recognition
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
SSPP | Single Sample Per Person |
FR | Face Recognition |
ProgPROG-AD-VAE | Progressive Adversarial Disentangling Variational Autoencoder |
VAE | Variational Autoencoder |
SRC | Sparse Representation Classifier |
CRC | Collaborative Representation Classifier |
PCA | Principal Component Analysis |
CJR-RACF | Class-level joint representation with regional adaptive convolution features |
JCR-ACF | Joint and Collaborative Representation with local Adaptive Convolution Feature |
DMMA | Discriminative Multi-manifold Analysis |
PCRC | Patch Based CRC |
SVDL | Sparse Variation Dictionary Learning |
SLRC | Superposed Linear Representation Classifier |
S3RC | Semi-supervised sparse representation classifier |
VD-GAN | Variation Disentangling Generative Adversarial Network |
References
- Lahasan, B.; Lutfi, S.L.; San-Segundo, R. A survey on techniques to handle face recognition challenges: Occlusion, single sample per subject and expression. Artif. Intell. Rev. 2017, 52, 949–979. [Google Scholar] [CrossRef]
- Liu, F.; Chen, D.; Wang, F.; Li, Z.; Xu, F. Deep learning based single sample face recognition: A survey. Artif. Intell. Rev. 2023, 56, 2723–2748. [Google Scholar] [CrossRef]
- Minaee, S.; Abdolrashidi, A.; Su, H.; Bennamoun, M.; Zhang, D. Biometrics recognition using deep learning: A survey. Artif. Intell. Rev. 2023, 56, 8647–8695. [Google Scholar] [CrossRef]
- Zhao, W.; Chellappa, R.; Phillips, P.J.; Rosenfeld, A. Face recognition: A literature survey. ACM Comput. Surv. 2003, 35, 399–458. [Google Scholar] [CrossRef]
- Deng, W.; Hu, J.; Guo, J. In Defense of Sparsity Based Face Recognition. In Proceedings of the 2013 IEEE Conference on Computer Vision and Pattern Recognition, Portland, OR, USA, 23–28 June 2013; pp. 399–406. [Google Scholar]
- Lu, J.; Tan, Y.P.; Wang, G. Discriminative Multimanifold Analysis for Face Recognition from a Single Training Sample per Person. IEEE Trans. Pattern Anal. Mach. Intell. 2013, 35, 39–51. [Google Scholar] [CrossRef] [PubMed]
- Pang, M.; Wang, B.; Cheung, Y.m.; Chen, Y.; Wen, B. VD-GAN: A Unified Framework for Joint Prototype and Representation Learning From Contaminated Single Sample per Person. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2246–2259. [Google Scholar] [CrossRef]
- Li, S.; Li, H. Deep Generative Modeling Based on VAE-GAN for 3D Indoor Scene Synthesis. Int. J. Comput. Games Technol. 2023, 2023, 3368647. [Google Scholar] [CrossRef]
- Cheng, M.; Fang, F.; Pain, C.; Navon, I. An advanced hybrid deep adversarial autoencoder for parameterized nonlinear fluid flow modelling. Comput. Methods Appl. Mech. Eng. 2020, 372, 113375. [Google Scholar] [CrossRef]
- Mak, H.W.L.; Han, R.; Yin, H.H.F. Application of Variational AutoEncoder (VAE) Model and Image Processing Approaches in Game Design. Sensors 2023, 23, 3457. [Google Scholar] [CrossRef]
- Gao, Y.; Ma, J.; Yuille, A.L. Semi-Supervised Sparse Representation Based Classification for Face Recognition with Insufficient Labeled Samples. Trans. Img. Proc. 2017, 26, 2545–2560. [Google Scholar] [CrossRef]
- Deng, W.; Hu, J.; Wu, Z.; Guo, J. From one to many: Pose-Aware Metric Learning for single-sample face recognition. Pattern Recognit. 2018, 77, 426–437. [Google Scholar] [CrossRef]
- Gu, J.; Hu, H.; Li, H.; Hu, W. Patch-based alignment-free generic sparse representation for pose-robust face recognition. In Proceedings of the 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA, 25–28 September 2016; pp. 3006–3010. [Google Scholar]
- Abdelmaksoud, M.; Nabil, E.; Farag, I.; Hameed, H.A. A Novel Neural Network Method for Face Recognition with a Single Sample Per Person. IEEE Access 2020, 8, 102212–102221. [Google Scholar] [CrossRef]
- Ding, Y.; Liu, F.; Tang, Z.; Zhang, T. Uniform Generic Representation for Single Sample Face Recognition. IEEE Access 2020, 8, 158281–158292. [Google Scholar] [CrossRef]
- Hu, X.; Peng, S.; Wang, L.; Yang, Z.; Li, Z. Surveillance video face recognition with single sample per person based on 3D modeling and blurring. Neurocomputing 2017, 235, 46–58. [Google Scholar] [CrossRef]
- Yang, M.; Wen, W.; Wang, X.; Shen, L.; Gao, G. Adaptive Convolution Local and Global Learning for Class-Level Joint Representation of Facial Recognition with a Single Sample Per Data Subject. IEEE Trans. Inf. Forensics Secur. 2020, 15, 2469–2484. [Google Scholar] [CrossRef]
- Adjabi, I. Combining hand-crafted and deep-learning features for single sample face recognition. In Proceedings of the 2022 7th International Conference on Image and Signal Processing and their Applications (ISPA), Mostaganem, Algeria, 8–9 May 2022; pp. 1–6. [Google Scholar]
- Tran, L.; Yin, X.; Liu, X. Disentangled Representation Learning GAN for Pose-Invariant Face Recognition. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 22–25 July 2017; pp. 1283–1292. [Google Scholar]
- Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Networks. arXiv 2014, arXiv:1406.2661. [Google Scholar] [CrossRef]
- Plumerault, A.; Borgne, H.L.; Hudelot, C. AVAE: Adversarial Variational Auto Encoder. arXiv 2020, arXiv:2012.11551. [Google Scholar]
- Lee, W.; Kim, D.; Hong, S.; Lee, H. High-Fidelity Synthesis with Disentangled Representation. arXiv 2020, arXiv:2001.04296. [Google Scholar]
- Chen, X.; Duan, Y.; Houthooft, R.; Schulman, J.; Sutskever, I.; Abbeel, P. InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets. arXiv 2016, arXiv:1606.03657. [Google Scholar]
- Pang, M.; Wang, B.; Ye, M.; Cheung, Y.m.; Chen, Y.; Wen, B. DisP+V: A Unified Framework for Disentangling Prototype and Variation From Single Sample per Person. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 867–881. [Google Scholar] [CrossRef] [PubMed]
- Tran, L.; Yin, X.; Liu, X. Representation Learning by Rotating Your Faces. IEEE Trans. Pattern Anal. Mach. Intell. 2019, 41, 3007–3021. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar]
- Gimenez, J.R.; Zou, J. A Unified f-divergence Framework Generalizing VAE and GAN. arXiv 2022, arXiv:2205.05214. [Google Scholar]
- Martinez, A.; Benavente, R. The AR Face Database; Technical Report No. 24; Computer Vision Center, Universitat Autònoma de Barcelona: Barcelona, Spain, 1998. [Google Scholar]
- Georghiades, A.; Belhumeur, P.; Kriegman, D. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intell. 2001, 23, 643–660. [Google Scholar] [CrossRef]
- Phillips, P.; Moon, H.; Rizvi, S.; Rauss, P. The FERET evaluation methodology for face-recognition algorithms. IEEE Trans. Pattern Anal. Mach. Intell. 2000, 22, 1090–1104. [Google Scholar] [CrossRef]
- Gao, W.; Cao, B.; Shan, S.; Chen, X.; Zhou, D.; Zhang, X.; Zhao, D. The CAS-PEAL Large-Scale Chinese Face Database and Baseline Evaluations. IEEE Trans. Syst. Man Cybern. Part A Syst. Hum. 2008, 38, 149–161. [Google Scholar]
- Huang, G.B.; Mattar, M.; Berg, T.; Learned-Miller, E. Labeled Faces in the Wild: A Database for Studying Face Recognition in Unconstrained Environments; Tech. Report 07-49; University of Massachusetts: Amherst, MA, USA, 2008. [Google Scholar]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. arXiv 2015, arXiv:1511.06434. [Google Scholar]
- Turk, M.; Pentland, A. Face recognition using eigenfaces. In Proceedings of the 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Maui, HI, USA, 3–6 June 1991; pp. 586–591. [Google Scholar]
- Wright, J.; Yang, A.Y.; Ganesh, A.; Sastry, S.S.; Ma, Y. Robust Face Recognition via Sparse Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2009, 31, 210–227. [Google Scholar] [CrossRef] [PubMed]
- Zhang, L.; Yang, M.; Feng, X. Sparse representation or collaborative representation: Which helps face recognition? In Proceedings of the 2011 International Conference on Computer Vision, Barcelona, Spain, 6–13 November 2011; pp. 471–478. [Google Scholar]
- Zhu, P.; Zhang, L.; Hu, Q.; Shiu, S.C.K. Multi-scale Patch Based Collaborative Representation for Face Recognition with Margin Distribution Optimization. In Proceedings of the Computer Vision—ECCV 2012; Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 822–835. [Google Scholar]
- Yang, M.; Van, L.; Zhang, L. Sparse Variation Dictionary Learning for Face Recognition with a Single Training Sample per Person. In Proceedings of the 2013 IEEE International Conference on Computer Vision, Sydney, Australia, 1–8 December 2013; pp. 689–696. [Google Scholar]
- Deng, W.; Hu, J.; Guo, J. Face Recognition via Collaborative Representation: Its Discriminant Nature and Superposed Representation. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 40, 2513–2521. [Google Scholar] [CrossRef]
- Yang, M.; Wang, X.; Zeng, G.; Shen, L. Joint and collaborative representation with local adaptive convolution feature for face recognition with single sample per person. Pattern Recognit. 2017, 66, 117–128. [Google Scholar] [CrossRef]
- Zhao, K.; Xu, J.; Cheng, M.M. RegularFace: Deep Face Recognition via Exclusive Regularization. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 1136–1144. [Google Scholar]
- Deng, J.; Guo, J.; Yang, J.; Xue, N.; Cotsia, I.; Zafeiriou, S.P. ArcFace: Additive Angular Margin Loss for Deep Face Recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5962–5979. [Google Scholar] [CrossRef] [PubMed]
- Karras, T.; Aittala, M.; Hellsten, J.; Laine, S.; Lehtinen, J.; Aila, T. Training Generative Adversarial Networks with Limited Data. arXiv 2020, arXiv:2006.06676. [Google Scholar]
- Kim, M.; Liu, F.; Jain, A.; Liu, X. DCFace: Synthetic Face Generation with Dual Condition Diffusion Model. arXiv 2023, arXiv:2304.07060. [Google Scholar]
- Shoshan, A.; Bhonker, N.; Kviatkovsky, I.; Medioni, G. GAN-Control: Explicitly Controllable GANs. arXiv 2021, arXiv:2101.02477. [Google Scholar]
Encoder | Discriminator D | |
---|---|---|
Layer | input/output | Filter/Stride/Padding |
Conv2d-1 | 3/64 | 4x4/2/1 |
Conv2d-2 | 64/128 | 4x4/2/1 |
Conv2d-3 | 128/256 | 4x4/2/1 |
Conv2d-4 | 256/512 | 4x4/2/1 |
Finals layers | Flatten | |
Fullconected- | output = | FullConected- |
Final layers | Flatten | |
Fullconected | output = |
Generator | Decoder | |
---|---|---|
Fullconected | ) | |
Rechape(512x4x4)> | BatchNorm2d> | ReLU |
Layer | input/output | Filter/Stride/Padding |
ConvTranspose2d-1 | 512/256 | 4x4/2/1 |
ConvTranspose2d-2 | 256/128 | 4x4/2/1 |
ConvTranspose2d-3 | 128/64 | 4x4/2/1 |
ConvTranspose2d-4 | 64/3 | 4x4/2/1 |
Tanh |
Methods | AR | E-YaleB&AR | CAS-PEAL | FERET |
---|---|---|---|---|
PCA | ||||
VAE | ||||
SRC | ||||
CRC | ||||
DMMA | ||||
PCRC | ||||
SVDL | ||||
SLRC | ||||
S3RC | ||||
VD-GAN | ||||
AD-VAE |
Methods | Recognition Rate (%) |
---|---|
JCR-ACF | 86.0% |
Regular-face | 83.7% |
Arc-face | 92.3% |
CJR-RACF | 95.5% |
VD-GANLcnn | 98.4% |
AD-VAE | % |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Silva, A.; Farias, R. AD-VAE: Adversarial Disentangling Variational Autoencoder. Sensors 2025, 25, 1574. https://doi.org/10.3390/s25051574
Silva A, Farias R. AD-VAE: Adversarial Disentangling Variational Autoencoder. Sensors. 2025; 25(5):1574. https://doi.org/10.3390/s25051574
Chicago/Turabian StyleSilva, Adson, and Ricardo Farias. 2025. "AD-VAE: Adversarial Disentangling Variational Autoencoder" Sensors 25, no. 5: 1574. https://doi.org/10.3390/s25051574
APA StyleSilva, A., & Farias, R. (2025). AD-VAE: Adversarial Disentangling Variational Autoencoder. Sensors, 25(5), 1574. https://doi.org/10.3390/s25051574