Abstract
Face recognition plays a crucial role in various applications, ranging from security to personal convenience. Recent advancements have emphasized the importance of recognizing individuals based on age-related facial features within this domain. This paper presents a comprehensive evaluation of two deep learning architectures for age-based face recognition: Siamese Convolutional Networks (SCNs) and Vision Transformers (ViTs). Convolutional Neural Networks (CNNs), which are critical in modern face recognition, serve as the backbone for Siamese Convolutional Networks (SCNs). SCNs are specifically designed to detect similarities between input pairs by emphasising local features crucial for age-related distinctions. In contrast, ViTs, initially developed for natural language processing, have demonstrated promising performance in image recognition, showcasing their aptitude for capturing global image context. This work investigates the performance of these distinct architectures in discerning age-related variations within facial data features. Performance comparisons were conducted on three established SCN models and two ViT architectures. The results revealed that the optimal SCNs primarily focused on the mouth, nose, and eye regions, indicating their reliance on local features for age estimation. Interestingly, the ViT models achieved superior performance despite lacking explicit feature localization. This suggests that a holistic understanding of the facial context may be more effective than focusing solely on isolated features for age-based recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Agarap, A.: Deep learning using rectified linear units (ReLU). arXiv preprint arXiv:1803.08375 (2019)
Alzubaidi, L., et al.: Review of deep learning: concepts, CNN architectures, challenges, applications, future directions. J. Big Data 8(1), 1–74 (2021)
Bromley, J., Guyon, I., LeCun, Y., Säckinger, E., Shah, R.: Signature verification using a “siamese” time delay neural network. In: Advances in Neural Information Processing Systems, vol. 6 (1993)
Chen, B.-C., Chen, C.-S., Hsu, W.H.: Cross-age reference coding for age-invariant face recognition and retrieval. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014, Part VI. LNCS, vol. 8694, pp. 768–783. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10599-4_49
Chen, D., Cao, X., Wen, F., Sun, J.: Blessing of dimensionality: high-dimensional feature and its efficient compression for face verification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3025–3032 (2013)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (October 2020)
Ganin, Y., et al.: Domain-adversarial training of neural networks. J. Mach. Learn. Res. 17(1), 2096–2130 (2016)
Gong, D., Li, Z., Lin, D., Liu, J., Tang, X.: Hidden factor analysis for age invariant face recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2872–2879 (2013)
Gyawali, D., Pokharel, P., Chauhan, A., Shakya, S.: Age range estimation using MTCNN and VGG-face model. In: Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), pp. 1–6 (2020)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Heidari, M., Fouladi-Ghaleh, K.: Using siamese networks with transfer learning for face recognition on small-samples datasets. In: Proceedings of the 2020 International Conference on Machine Vision and Image Processing (MVIP), pp. 1–4 (2020)
Johnston, K., Ngxande, M.: Robust facial recognition for occlusions using facial landmarks. In: Proceedings of the 43rd Conference of the South African Institute of Computer Scientists and Information Technologists, vol. 85, pp. 48–61 (2022)
Keles, F., Wijewardena, P., Hegde, C.: On the computational complexity of self-attention. In: Proceedings of the International Conference on Algorithmic Learning Theory, pp. 597–619 (2023)
Koch, G., Zemel, R., Salakhutdinov, R.: Siamese neural networks for one-shot image recognition. In: Proceedings of the ICML Deep Learning Workshop, vol. 2 (2015)
Kouris, A., Venieris, S., Bouganis, C.: CascadeCNN: pushing the performance limits of quantisation in convolutional neural networks. In: Proceedings of the 2018 28th International Conference on Field Programmable Logic and Applications (FPL), pp. 155–1557 (2018)
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016, Part I. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Melekhov, I., Kannala, J., Rahtu, E.: Siamese network features for image matching. In: Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 378–383 (2016)
Mishkin, D., Matas, J.: All you need is a good init. arXiv preprint arXiv:1511.06422 (2015)
Mishra, S., et al.: Multivariate statistical data analysis-principal component analysis (PCA). Int. J. Livestock Res. 7(5), 60–78 (2017)
Moustafa, A., Elnakib, A., Areed, N.: Age-invariant face recognition based on deep features analysis. Signal Image Video Process. 14, 1027–1034 (2020)
Pan, S., Yang, Q.: A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22(10), 1345–1359 (2009)
Ramachandra, B., Jones, M., Vatsavai, R.: Learning a distance function with a siamese network to localize anomalies in videos. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 2598–2607 (2020)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. IEEE Trans. Pattern Anal. Mach. Intell. 28 (2015)
Russakovsky, O., et al.: ImageNet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Selvaraju, R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Shi, C., Zhao, S., Zhang, K., Wang, Y., Liang, L.: Face-based age estimation using improved swin transformer with attention-based convolution. Front. Neurosci. 17, 1136934 (2023)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Sun, Z., Tzimiropoulos, G.: Part-based face recognition with vision transformers. arXiv preprint arXiv:2212.00057 (2022)
Swift, A., Liew, S., Weinkle, S., Garcia, J., Silberberg, M.: The facial aging process from the “inside out’’. Aesthetic Surg. J. 41(10), 1107–1119 (2021)
Taigman, Y., Yang, M., Ranzato, M., Wolf, L.: DeepFace: closing the gap to human-level performance in face verification. In: Proceedings of the 2014 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1701–1708 (2014)
Tharwat, A., Gaber, T., Ibrahim, A., Hassanien, A.E.: Linear discriminant analysis: a detailed tutorial. AI Commun. 30(2), 169–190 (2017)
Touvron, H., Vedaldi, A., Douze, M., Jégou, H.: Fixing the train-test resolution discrepancy. In: Advances in Neural Information Processing Systems, vol. 32 (2019)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Visani, G., Bagli, E., Chesani, F., Poluzzi, A., Capuzzo, D.: Statistical stability indices for lime: obtaining reliable explanations for machine learning models. J. Oper. Res. Soc. 73(1), 91–101 (2022)
Wang, G., Wang, S., Chi, W., Liu, S., Fan, D.: A person reidentification algorithm based on improved Siamese network and hard sample. Math. Probl. Eng. 2020, 1–11 (2020)
Wang, J., Li, Z.: Research on face recognition based on CNN. In: Proceedings of the IOP Conference Series: Earth and Environmental Science, vol. 170, p. 032110 (2018)
Wang, Q., Li, B., Xiao, T., Zhu, J., Li, C., Wong, D., Chao, L.: Learning deep transformer models for machine translation. arXiv preprint arXiv:1906.01787 (2019)
Wen, Y., Li, Z., Qiao, Y.: Latent factor guided convolutional neural networks for age-invariant face recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 4893–4901 (2016)
Wu, H., Xin, M., Fang, W., Hu, H., Hu, Z.: Multi-level feature network with multi-loss for person re-identification. IEEE Access 7, 91052–91062 (2019)
Wu, H., Xu, Z., Zhang, J., Yan, W., Ma, X.: Face recognition based on convolution Siamese networks. In: Proceedings of the 2017 10th International Congress on Image and Signal Processing, BioMedical Engineering and Informatics (CISP-BMEI), pp. 1–5 (2017)
Yu, Z., Huang, H., Chen, W., Su, Y., Liu, Y., Wang, X.: YOLO-facev2: a scale and occlusion aware face detector. arXiv preprint arXiv:2208.02019 (2022)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016)
Zhong, Y., Deng, W.: Face transformer for recognition. arXiv preprint arXiv:2103.14803 (2021)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Mertens, P.J., Ngxande, M. (2024). Age-Related Face Recognition Using Siamese Networks and Vision Transformers. In: Gerber, A. (eds) South African Computer Science and Information Systems Research Trends. SAICSIT 2024. Communications in Computer and Information Science, vol 2159. Springer, Cham. https://doi.org/10.1007/978-3-031-64881-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-64881-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-64880-9
Online ISBN: 978-3-031-64881-6
eBook Packages: Computer ScienceComputer Science (R0)