Ensemble of Multi-task Learning Networks for Facial Expression Recognition In-the-Wild with Learning from Synthetic Data

Jeong, Jae-Yeop; Hong, Yeong-Gi; Hong, Sumin; Oh, JiYeon; Jung, Yuchul; Kim, Sang-Ho; Jeong, Jin-Woo

doi:10.1007/978-3-031-25075-0_5

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13806))

Included in the following conference series:

European Conference on Computer Vision

1702 Accesses
2 Citations

Abstract

Facial expression recognition in-the-wild is essential for various interactive computing applications. Especially, “Learning from Synthetic Data” is an important topic in the facial expression recognition task. In this paper, we propose a multi-task learning-based facial expression recognition approach where emotion and appearance perspectives of facial images are jointly learned. We also present our experimental results on validation and test set of the LSD challenge introduced in the 4th affective behavior analysis in-the-wild competition. Our method achieved the mean F1 score of 71.82 on the validation and 35.87 on the test set, ranking third place on the final leaderboard.

J.-Y. Jeong, Y.-G. Hong, S. Hong and J. Oh—Contributed equally to this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 89.00; Price excludes VAT (USA)

Softcover Book: USD 119.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A multi-task meta-learner-based ensemble for robust facial expression recognition in-the-wild

Article 12 August 2024

Assessing Accuracy of Ensemble Learning for Facial Expression Recognition with CNNs

Behavior Analysis for Human by Facial Expression Recognition Using Deep Learning: A Cognitive Study

References

Abbasnejad, I., Sridharan, S., Nguyen, D., Denman, S., Fookes, C., Lucey, S.: Using synthetic data to improve facial expression analysis with 3D convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 1609–1618 (2017)
Google Scholar
AI-Hub: Video dataset for korean facial expression recognition. Available at https://bit.ly/3ODKQNj. Accessed 21 Jul 2022
Bochkovskiy, A., Wang, C.Y., Liao, H.Y.M.: YOLOv4: optimal speed and accuracy of object detection. arXiv preprint arXiv:2004.10934 (2020)
Breiman, L.: Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
Google Scholar
Canedo, D., Neves, A.J.: Facial expression recognition using computer vision: a systematic review. Appl. Sci. 9(21), 4678 (2019)
Article Google Scholar
Cao, J., Cholakkal, H., Anwer, R.M., Khan, F.S., Pang, Y., Shao, L.: D2Det: towards high quality object detection and instance segmentation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 11485–11494 (2020)
Google Scholar
Cao, Q., Shen, L., Xie, W., Parkhi, O.M., Zisserman, A.: VGGFace2: a dataset for recognising faces across pose and age. In: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), pp. 67–74. IEEE (2018)
Google Scholar
Caron, M., et al.: Emerging properties in self-supervised vision transformers. In: Proceedings of the International Conference on Computer Vision (ICCV) (2021)
Google Scholar
Cubuk, E.D., Zoph, B., Shlens, J., Le, Q.V.: RandAugment: practical automated data augmentation with a reduced search space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, pp. 702–703 (2020)
Google Scholar
Deng, L., Platt, J.: Ensemble deep learning for speech recognition. In: Proceedings of Interspeech (2014)
Google Scholar
Feng, Y., Feng, H., Black, M.J., Bolkart, T.: Learning an animatable detailed 3D face model from in-the-wild images. ACM Trans. Graph. (ToG) 40(4), 1–13 (2021)
Article Google Scholar
Fu, J., Liu, J., Jiang, J., Li, Y., Bao, Y., Lu, H.: Scene segmentation with dual relation-aware attention network. IEEE Trans. Neural Netw. Learn. Syst. 32(6), 2547–2560 (2020)
Article Google Scholar
Gao, H., Ogawara, K.: Face alignment using a GAN-based photorealistic synthetic dataset. In: 2022 7th International Conference on Control and Robotics Engineering (ICCRE), pp. 147–151. IEEE (2022)
Google Scholar
Gera, D., Kumar, B.N.S., Kumar, B.V.R., Balasubramanian, S.: SS-MFAR : semi-supervised multi-task facial affect recognition. arXiv preprint arXiv:2207.09012 (2022)
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: MS-Celeb-1M: a dataset and benchmark for large-scale face recognition. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9907, pp. 87–102. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46487-9_6
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hu, J., et al.: ISTR: end-to-end instance segmentation with transformers. arXiv preprint arXiv:2105.00637 (2021)
Huang, Y., Chen, F., Lv, S., Wang, X.: Facial expression recognition: a survey. Symmetry 11(10), 1189 (2019)
Google Scholar
Jeong, J.Y., Hong, Y.G., Kim, D., Jeong, J.W., Jung, Y., Kim, S.H.: Classification of facial expression in-the-wild based on ensemble of multi-head cross attention networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2353–2358 (2022)
Google Scholar
Kollias, D.: ABAW: learning from synthetic data & multi-task learning challenges. arXiv preprint arXiv:2207.01138 (2022)
Kollias, D.: Abaw: Valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2328–2336 (2022)
Google Scholar
Kollias, D., Cheng, S., Pantic, M., Zafeiriou, S.: Photorealistic facial synthesis in the dimensional affect space. In: Proceedings of the European Conference on Computer Vision (ECCV) Workshops (2018)
Google Scholar
Kollias, D., Cheng, S., Ververas, E., Kotsia, I., Zafeiriou, S.: Deep neural network augmentation: generating faces for affect analysis. Int. J. Comput. Vis. 128(5), 1455–1484 (2020)
Article Google Scholar
Kollias, D., Nicolaou, M.A., Kotsia, I., Zhao, G., Zafeiriou, S.: Recognition of affect in the wild using deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 26–33 (2017)
Google Scholar
Kollias, D., Sharmanska, V., Zafeiriou, S.: Distribution matching for heterogeneous multi-task learning: a large-scale face study. arXiv preprint arXiv:2105.03790 (2021)
Kollias, D., et al.: Deep affect prediction in-the-wild: aff-wild database and challenge, deep architectures, and beyond. Int. J. Comput. Vis. 127(6), 907–929 (2019)
Article Google Scholar
Kollias, D., Zafeiriou, S.: Expression, affect, action unit recognition: aff-wild2, multi-task learning and arcface. arXiv preprint arXiv:1910.04855 (2019)
Kollias, D., Zafeiriou, S.: VA-StarGAN: continuous affect generation. In: Blanc-Talon, J., Delmas, P., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2020. LNCS, vol. 12002, pp. 227–238. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-40605-9_20
Chapter Google Scholar
Kollias, D., Zafeiriou, S.: Affect analysis in-the-wild: valence-arousal, expressions, action units and a unified framework. arXiv preprint arXiv:2103.15792 (2021)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems 25 (2012)
Google Scholar
Lee, H., Lim, H., Lim, S.: BYEL : bootstrap on your emotion latent. arXiv preprint arXiv:2207.10003 (2022)
Li, S., Deng, W.: Deep facial expression recognition: a survey. IEEE Trans. Affect. Comput. 13, 1195–1215 (2020)
Google Scholar
Li, S., et al.: Facial affect analysis: Learning from synthetic data & multi-task learning challenges. arXiv preprint arXiv:2207.09748 (2022)
Mao, S., Li, X., Chen, J., Peng, X.: Au-supervised convolutional vision transformers for synthetic facial expression recognition. arXiv preprint arXiv:2207.09777 (2022)
Mehta, S., Rastegari, M.: Separable self-attention for mobile vision transformers. arXiv preprint arXiv:2206.02680 (2022)
Miao, X., Wang, J., Chang, Y., Wu, Y., Wang, S.: Hand-assisted expression recognition method from synthetic images at the fourth ABAW challenge. arXiv preprint arXiv:2207.09661 (2022)
Mollahosseini, A., Hasani, B., Mahoor, M.H.: AffectNet: a database for facial expression, valence, and arousal computing in the wild. IEEE Trans. Affect. Comput. 10(1), 18–31 (2017)
Article Google Scholar
Parkhi, O.M., Vedaldi, A., Zisserman, A.: Deep face recognition (2015)
Google Scholar
Psaroudakis, A., Kollias, D.: Mixaugment & mixup: Augmentation methods for facial expression recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, pp. 2367–2375 (June 2022)
Google Scholar
Rossi, L., Karimi, A., Prati, A.: Recursively refined R-CNN: instance segmentation with self-RoI rebalancing. In: Tsapatsoulis, N., Panayides, A., Theocharides, T., Lanitis, A., Pattichis, C., Vento, M. (eds.) CAIP 2021. LNCS, vol. 13052, pp. 476–486. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-89128-2_46
Chapter Google Scholar
Savchenko, A.V.: HSE-NN team at the 4th ABAW competition: Multi-task emotion recognition and learning from synthetic images. arXiv preprint arXiv:2207.09508 (2022)
Savchenko, A.V., Savchenko, L.V., Makarov, I.: Classifying emotions and engagement in online learning based on a single facial expression recognition neural network. IEEE Trans. Affect. Comput. 13, 2132–2143 (2022)
Google Scholar
Shorten, C., Khoshgoftaar, T.M.: A survey on image data augmentation for deep learning. J. Big Data 6(1), 1–48 (2019)
Article Google Scholar
Thulasidasan, S., Chennupati, G., Bilmes, J.A., Bhattacharya, T., Michalak, S.: On mixup training: improved calibration and predictive uncertainty for deep neural networks. In: Advances in Neural Information Processing Systems 32 (2019)
Google Scholar
Wang, L., Li, H., Liu, C.: Hybrid CNN-transformer model for facial affect recognition in the ABAW4 challenge. arXiv preprint arXiv:2207.10201 (2022)
Wen, Z., Lin, W., Wang, T., Xu, G.: Distract your attention: multi-head cross attention network for facial expression recognition. arXiv preprint arXiv:2109.07270 (2021)
Yun, S., Han, D., Oh, S.J., Chun, S., Choe, J., Yoo, Y.: CutMix: regularization strategy to train strong classifiers with localizable features. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6023–6032 (2019)
Google Scholar
Zafeiriou, S., Kollias, D., Nicolaou, M.A., Papaioannou, A., Zhao, G., Kotsia, I.: Aff-Wild: valence and arousal’in-the-wild’challenge. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pp. 34–41 (2017)
Google Scholar
Zeng, J., Shan, S., Chen, X.: Facial expression recognition with inconsistently annotated datasets. In: Proceedings of the European Conference on Computer Vision (ECCV) (September 2018)
Google Scholar
Zhang, H., Cisse, M., Dauphin, Y.N., Lopez-Paz, D.: mixup: beyond empirical risk minimization. arXiv preprint arXiv:1710.09412 (2017)
Zhang, Z., Luo, P., Loy, C.C., Tang, X.: From facial expression recognition to interpersonal relation prediction. Int. J. Comput. Vis. 126(5), 550–569 (2017). https://doi.org/10.1007/s11263-017-1055-1
Article MathSciNet Google Scholar

Download references

Acknowledgement

This work was supported by the NRF grant funded by the Korea government (MSIT) (No.2021R1F1A1059665), by the Basic Research Program through the NRF grant funded by the Korea Government (MSIT) (No.2020R1A4A1017775), and by Korea Institute for Advancement of Technology(KIAT) grant funded by the Korea Government(MOTIE) (P0017123, The Competency Development Program for Industry Specialist).

Author information

Authors and Affiliations

Department of Data Science, Seoul National University of Science and Technology, Seoul, 01811, Republic of Korea
Jae-Yeop Jeong, Yeong-Gi Hong & Jin-Woo Jeong
Department of Industrial Engineering, Seoul National University of Science and Technology, Seoul, 01811, Republic of Korea
Sumin Hong & JiYeon Oh
Kumoh National Institute of Technology, Gumi, 39177, Republic of Korea
Yuchul Jung & Sang-Ho Kim

Authors

Jae-Yeop Jeong
View author publications
You can also search for this author in PubMed Google Scholar
Yeong-Gi Hong
View author publications
You can also search for this author in PubMed Google Scholar
Sumin Hong
View author publications
You can also search for this author in PubMed Google Scholar
JiYeon Oh
View author publications
You can also search for this author in PubMed Google Scholar
Yuchul Jung
View author publications
You can also search for this author in PubMed Google Scholar
Sang-Ho Kim
View author publications
You can also search for this author in PubMed Google Scholar
Jin-Woo Jeong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jin-Woo Jeong .

Editor information

Editors and Affiliations

IBM Research - MIT-IBM Watson AI Lab, Massachusetts, USA
Leonid Karlinsky
Technion – Israel Institute of Technology, Haifa, Israel
Tomer Michaeli
Kyoto University, Kyoto, Japan
Ko Nishino

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Jeong, JY. et al. (2023). Ensemble of Multi-task Learning Networks for Facial Expression Recognition In-the-Wild with Learning from Synthetic Data. In: Karlinsky, L., Michaeli, T., Nishino, K. (eds) Computer Vision – ECCV 2022 Workshops. ECCV 2022. Lecture Notes in Computer Science, vol 13806. Springer, Cham. https://doi.org/10.1007/978-3-031-25075-0_5

Download citation

DOI: https://doi.org/10.1007/978-3-031-25075-0_5
Published: 19 February 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-25074-3
Online ISBN: 978-3-031-25075-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Ensemble of Multi-task Learning Networks for Facial Expression Recognition In-the-Wild with Learning from Synthetic Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-task meta-learner-based ensemble for robust facial expression recognition in-the-wild

Assessing Accuracy of Ensemble Learning for Facial Expression Recognition with CNNs

Behavior Analysis for Human by Facial Expression Recognition Using Deep Learning: A Cognitive Study

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Ensemble of Multi-task Learning Networks for Facial Expression Recognition In-the-Wild with Learning from Synthetic Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A multi-task meta-learner-based ensemble for robust facial expression recognition in-the-wild

Assessing Accuracy of Ensemble Learning for Facial Expression Recognition with CNNs

Behavior Analysis for Human by Facial Expression Recognition Using Deep Learning: A Cognitive Study

References

Acknowledgement

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation