Abstract
Although CNN-based face alignment algorithms have got promising results. However, their alignment accuracy are still suffer from faces with severe occlusions and large poses, which mainly because (1) the inability to model long-range dependencies, construct effective face shape constraints and (2) the limitation on the size of the labeled facial datasets. To address the above problems, this study proposed a transformer-based data distillation semi-supervised face alignment algorithm. The transformer-based heatmap detection network introduces the transformer to model more efficient face shape constraint relationships, thus improving algorithm robustness under partial occlusion. Moreover, a quality-aware pseudolabeled sample distillation network is designed to help transformer obtain the CNNs inherent inductive biases by evaluating the quality of pseudolabeled data generated by transformer-based heatmap detection networks. This study also proposed intensive training strategy to use more unlabeled data without the need for manual operation to further improve the performance of transformer thermal map detection networks. Experimental results on the 300W, AFLW, and 300VW datasets demonstrate the superiority of our method over state-of-the-art face alignment methods.
Similar content being viewed by others
Data Availibility Statement
300W dataset is openly available from Intelligent Behaviour Understanding Group (Accession 300W at https://ibug.doc.ic.ac.uk/resources/300-W/). AFLW dataset is openly available from Graz University of Technology (Accession AFLW at https://www.tugraz.at/institute/icg/research/team-bischof/lrs/downloads/aflw/). 300VW dataset is openly available from Intelligent Behaviour Understanding Group (Accession 300VW at https://ibug.doc.ic.ac.uk/resources/300-VW/).
References
Jiang K, Wang Z, Yi P, Wang G, Gu K, Jiang J (2019) Atmfn: adaptive-threshold-based multi-model fusion network for compressed face hallucination. IEEE Trans Multimed 22(10):2734–2747
Jiang K, Wang Z, Yi P, Lu T, Jiang J, Xiong Z (2020) Dual-path deep fusion network for face image hallucination. IEEE Trans Neural Netw Learn Syst 33(1):378–391
Kumar A, Kaur A, Kumar M (2019) Face detection techniques: a review. Artif Intell Rev 52:927–948
Xiao S, Feng J, Xing J, Lai H, Yan S, Kassim A (2016) Robust facial landmark detection via recurrent attentive-refinement networks. In: European conference on computer vision, pp 57–72. Springer
Bulat A, Tzimiropoulos G (2018) Super-fan: Integrated facial landmark localization and super-resolution of real-world low resolution faces in arbitrary poses with gans. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 109–117
Zhu M, Shi D, Zheng M, Sadiq M (2019) Robust facial landmark detection via occlusion-adaptive deep networks. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 3486–3496
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I (2017) Attention is all you need. Adv Neural Inf Process Syst 30
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2019) Huggingface’s transformers: state-of-the-art natural language processing. arXiv:1910.03771
Wolf T, Debut L, Sanh V, Chaumond J, Delangue C, Moi A, Cistac P, Rault T, Louf R, Funtowicz M et al (2020) Transformers: state-of-the-art natural language processing. In: Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pp 38–45
Sun K, Zhao Y, Jiang B, Cheng T, Xiao B, Liu D, Mu Y, Wang X, Liu W, Wang J (2019) High-resolution representations for labeling pixels and regions. arXiv:1904.04514
Dosovitskiy A, Beyer L, Kolesnikov A, Weissenborn D, Zhai X, Unterthiner T, Dehghani M, Minderer M, Heigold G, Gelly S et al (2020) An image is worth 16x16 words: transformers for image recognition at scale. arXiv:2010.11929
Zhu J-Y, Park T, Isola P, Efros AA (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Feng Z-H, Kittler J, Awais M, Huber P, Wu X-J (2018) Wing loss for robust facial landmark localisation with convolutional neural networks. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2235–2245
Wan J, Lai Z, Shen L, Zhou J, Gao C, Xiao G, Hou X (2021) Robust facial landmark detection by cross-order cross-semantic deep network. Neural Netw 136:233–243
Kowalski M, Naruniec J, Trzcinski T (2017) Deep alignment network: a convolutional neural network for robust face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 88–97
Wu W, Qian C, Yang S, Wang Q, Cai Y, Zhou Q (2018) Look at boundary: a boundary-aware face alignment algorithm. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2129–2138
Wang X, Bo L, Fuxin L (2019) Adaptive wing loss for robust face alignment via heatmap regression. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 6971–6981
Wan J, Lai Z, Liu J, Zhou J, Gao C (2020) Robust face alignment by multi-order high-precision hourglass network. IEEE Trans Image Process 30:121–133
Dong X, Yu S-I, Weng X, Wei S-E, Yang Y, Sheikh Y (2018) Supervision-by-registration: An unsupervised approach to improve the precision of facial landmark detectors. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 360–368
Honari S, Molchanov P, Tyree S, Vincent P, Pal C, Kautz J (2018) Improving landmark localization with semi-supervised learning. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1546–1555
Jin H, Liao S, Shao L (2021) Pixel-in-pixel net: Towards efficient facial land- mark detection in the wild. Int J Comput Vis 129(12):3174–3194
Robinson JP, Li Y, Zhang N, Fu Y, Tulyakov S (2019) Laplace landmark localization. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10103–10112
Yue X, Li J, Wu J, Chang J, Wan J, Ma J (2021) Multi-task adversarial autoencoder network for face alignment in the wild. Neurocomputing 437:261–273
Browatzki B, Wallraven C (2020) 3fabrec: Fast few-shot face alignment by reconstruction. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 6110–6120
Kumar A, Chellappa R (2020) S2ld: Semi-supervised landmark detection in low-resolution images and impact on face verification. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition workshops, pp 758–759
Carion N, Massa F, Synnaeve G, Usunier N, Kirillov A, Zagoruyko S (2020) End-to-end object detection with transformers. In: European conference on computer vision, pp 213–229. Springer
Touvron H, Cord M, Douze M, Massa F, Sablayrolles A, Jégou H (2021) Training data-efficient image transformers & distillation through attention. In: International conference on machine learning, pp 10347–10357. PMLR
Fan Y, Tian F, Qin T, Li X-Y, Liu T-Y (2018) Learning to teach. arXiv:1805.03643
Kumar V, Rao S, Yu L (2020) Noisy student training using body language dataset improves facial expression recognition. In: European conference on computer vision, pp 756–773. Springer
Chen L-C, Lopes RG, Cheng B, Collins MD, Cubuk ED, Zoph B, Adam H, Shlens J (2020) Naive-student: leveraging semi-supervised learning in video sequences for urban scene segmentation. In: European conference on computer vision, pp 695–714. Springer
Dong X, Yang Y (2019) Teacher supervises students how to learn from par- tially labeled images for facial landmark detection. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 783–792
Meng R, Zhou S, Wan X, Li M, Wang J (2020) Teacher-student asyn- chronous learning with multi-source consistency for facial landmark detection. arXiv preprint arXiv:2012.06711
Si J, Jiang F, Shen R, Lu H (2021) Small and accurate heatmap-based face alignment via distillation strategy and cascaded architecture. Comput Vis Image Underst 203:103125
Sun K, Xiao B, Liu D, Wang J (2019) Deep high-resolution representation learning for human pose estimation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 5693–5703
Xiao Y, Yuan Q, Jiang K, He J, Wang Y, Zhang L (2023) From degrade to upgrade: learning a self-supervised degradation guided adaptive network for blind remote sensing image super-resolution. Inf Fusion 96:297–311
Ahuja K, Mahajan D, Wang Y, Bengio Y (2023) Interventional causal rep- resentation learning. In: International conference on machine learning, pp 372–407. PMLR
Yang S, Quan Z, Nie M, Yang W (2020) Transpose: towards explainable human pose estimation by transformer. arXiv:2012.14214 2(6)
Sagonas C, Tzimiropoulos G, Zafeiriou S, Pantic M (2013) 300 faces in the wild challenge: the first facial landmark localization challenge. In: Proceedings of the IEEE international conference on computer vision workshops, pp 397–403
Martin Koestinger, P.M.R. Paul Wohlhart, Bischof H (2011) Annotated facial landmarks in the wild: a large-scale, real-world database for facial landmark localization. In: Proc. First IEEE international workshop on benchmarking facial image analysis technologies
Chrysos GG, Antonakos E, Zafeiriou S, Snape P (2015) Offline deformable face tracking in arbitrary videos. In: Proceedings of the IEEE international conference on computer vision workshops, pp 1–9
Jourabloo A, Ye M, Liu X, Ren L (2017) Pose-invariant face alignment with a single cnn. In: Proceedings of the IEEE international conference on computer vision, pp 3200–3209
Lv J, Shao X, Xing J, Cheng C, Zhou X (2017) A deep regression architecture with two-stage re-initialization for high performance facial landmark detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3317–3326
Dong X, Yan Y, Ouyang W, Yang Y (2018) Style aggregated network for facial landmark detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 379–388
Kumar A, Chellappa R (2018) Disentangling 3d pose in a dendritic cnn for unconstrained 2d face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 430–439
Ranjan R, Patel VM, Chellappa R (2017) Hyperface: A deep multi-task learning framework for face detection, landmark localization, pose estimation, and gender recognition. IEEE Trans Pattern Anal Mach Intell 41(1):121–135
Yue X, Li J, Wu J, Chang J, Wan J, Ma J (2021) Multi-task adversarial autoencoder network for face alignment in the wild. Neurocomputing 437:261–273
Ma J, Li J, Du B, Wu J, Wan J, Xiao Y (2022) Robust face alignment by dual-attentional spatial-aware capsule networks. Pattern Recognit 122:108297
Qian S, Sun K, Wu W, Qian C, Jia J (2019) Aggregation via separation: boosting facial landmark detector with semi-supervised style translation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 10153–10163
Miao X, Zhen X, Liu X, Deng C, Athitsos V, Huang H (2018) Direct shape regression networks for end-to-end face alignment. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5040–5049
Guo X, Li S, Yu J, Zhang J, Ma J, Ma L, Liu W, Ling H (2019) Pd: a practical facial landmark detector. arXiv:1902.10859
Haris Khan M, McDonagh J, Tzimiropoulos G (2017) Synergy between face alignment and tracking via discriminative global consensus optimization. In: Proceedings of the IEEE international conference on computer vision, pp 3791–3799
Acknowledgements
This research was made benefited from a grant from National Science Foundation of China (Grant No. 62002233) and National Natural Science Foundation of China (Grant No. 62372335).
Author information
Authors and Affiliations
Corresponding authors
Ethics declarations
Competing Interests
This work was supported by National Science Foundation of China (Grant No. 62002233) and National Natural Science Foundation of China (Grant No. 62372335).
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ma, J., Li, X., Li, J. et al. Quality-aware face alignment using high-resolution spatial dependencies. Multimed Tools Appl 83, 42165–42187 (2024). https://doi.org/10.1007/s11042-023-17295-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17295-5