Abstract
In order to address the issue of local occlusion in practical dynamic expression recognition, this paper first introduces a facial restoration network that combines Vision Transformer (ViT) and GAN. This network can accurately identify missing facial features and perform detailed and efficient restoration. Secondly, for the task of expression recognition, a more robust dynamic expression recognition network is trained by cascading ViT with a Two-Stream CNN, effectively leveraging ViT’s feature extraction capability and the Two-Stream CNN’s ability to acquire spatio-temporal features. Finally, by combining these two networks, we can efficiently recognize dynamically occluded expressions. A multitude of experiments demonstrate that the facial image restoration network trained on the CelebA and VGG Face2 datasets outperforms other networks in handling small and medium occlusions. Expression recognition experiments on AFEW and MMI datasets show that this paper’s expression recognition network achieves an accuracy of 54.95% and 81.2%, respectively, for dynamic expression recognition, surpassing mainstream networks. Moreover, the restoration network outperforms mainstream networks in addressing occlusions and provides an average accuracy improvement of 5.34% in occluded expression recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Chan, T.F., Shen, J.: Mathematical models for local nontexture inpaintings. SIAM J. Appl. Math. 62(3), 1019–1043 (2002)
Barnes, C., Shechtman, E., Finkelstein, A., Goldman, D.B.: Patchmatch: a randomized correspondence algorithm for structural image editing. ACM Trans. Graph. 28(3), 24 (2009)
Zhang, J., Kan, M., Shan, S., Chen, X.: Occlusion-free face alignment: deep regression networks coupled with de-corrupt autoencoders. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV) (2020)
Pathak, D., Krahenbuhl, P., Donahue, J., Darrell, T., Efros, A.A.: Context encoders: feature learning by inpainting. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Egils, A., Tomasz, S., Maie, B., Dorota, K.: Audiovisual emotion recognition in wild. Mach. Vis. Appl. 30, 975–985 (2018)
Wang, K., Peng, X., Yang, J., Meng, D., Qiao, Y.: Region attention networks for pose and occlusion robust facial expression recognition. IEEE Trans. Image Process. 29, 4057–4069 (2020)
Jain, D.K., Zhang, Z., Huang, K.: Multi angle optimal pattern-based deep learning for automatic facial expression recognition. Pattern Recogn. Lett. 139, 157–165 (2020)
Mollahosseini, A., Chan, D., Mahoor, M.H.: Going deeper in facial expression recognition using deep neural networks. IEEE (2016)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. IEEE (2016)
Jianming, Z., Xiaocui, Z.: Processing method of facial expression images under partial occlusion. Comput. Eng. Appl. 47(3), 170–173 (2011)
Li, Y., Liu, S., Yang, J., Yang, M.H.: Generative face completion. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Goodfellow, I.J., et al.: Generative adversarial nets (2014)
Zhang, P., Zhang, K., Luo, W., Li, C., Wang, G.: Blind face restoration: benchmark datasets and a baseline model (2022)
Ge, S., Li, C., Zhao, S., Zeng, D.: Occluded face recognition in the wild by identity-diversity inpainting. IEEE Trans. Circ. Syst. Video Technol. 30(10), 3387–3397 (2020)
Xu, Z., et al.: Facecontroller: controllable attribute editing for face in the wild. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 3083–3091 (2022)
Sun, M.C., Hsu, S.H., Yang, M.C., Chien, J.H.: Context-aware cascade attention-based rnn for video emotion recognition. In: 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia) (2018)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Ghahramani, Z., Welling, M., Cortes, C., Lawrence, N., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 27. Curran Associates Inc. (2014)
Dosovitskiy, A., et al.: An image is worth 16\(\times \)16 words: transformers for image recognition at scale. In: International Conference on Learning Representations (2021)
Liu, Z., Luo, P., Wang, X., Tang, X.: Deep learning face attributes in the wild. In: Proceedings of International Conference on Computer Vision (ICCV) (2015)
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition. In: Proceedings of the British Machine Vision Conference 2015 (2015)
Zhu, X., Lei, Z., Yan, J., Yi, D., Li, S.Z.: High-fidelity pose and expression normalization for face recognition in the wild. In: 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015)
Pantic, M., Valstar, M., Rademaker, R., Maat, L.: Web-based database for facial expression analysis. In: 2005 IEEE International Conference on Multimedia and Expo, p. 5 (2005)
Acknowledgements
This article has been supported by State Grid Corporation Technology Guide Project(5700-202218185A-1-1-ZN).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Liang, M., Zhang, M., Liu, K., Li, X., Wang, Y. (2024). Dynamic Occlusion Expression Recognition Based on Improved GAN. In: Jin, H., Pan, Y., Lu, J. (eds) Artificial Intelligence and Machine Learning. IAIC 2023. Communications in Computer and Information Science, vol 2058. Springer, Singapore. https://doi.org/10.1007/978-981-97-1277-9_14
Download citation
DOI: https://doi.org/10.1007/978-981-97-1277-9_14
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-97-1276-2
Online ISBN: 978-981-97-1277-9
eBook Packages: Computer ScienceComputer Science (R0)