Enhancing masked facial expression recognition with multimodal deep learning

Shahzad, H.M; Bhatti, Sohail Masood; Jaffar, Arfan; Akram, Sheeraz

doi:10.1007/s11042-024-18362-1

Enhancing masked facial expression recognition with multimodal deep learning

Published: 13 February 2024

Volume 83, pages 73911–73921, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

H.M Shahzad ORCID: orcid.org/0000-0002-2452-6571^1,2,
Sohail Masood Bhatti^1,2,
Arfan Jaffar^1,2 &
…
Sheeraz Akram^1,2,3

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Facial expression recognition (FER) is an essential field for intelligent human-computer interaction, but the COVID-19 pandemic has made unimodal techniques less effective due to masks. Multimodal approaches that combine information from multiple modalities are more robust at recognizing emotions from facial expressions. The need to accurately recognize human emotions based on facial expressions is still significant. The study proposed a multimodal methodology based on deep learning for facial recognition under masks and vocal expressions. The proposed approach used two standard datasets, M-LFW-F and CREMA-D to capture facial and vocal emotional cues. The resulting dataset was used to train a multimodal neural network using fusion techniques that outperformed unimodal methods. The proposed approach achieved an accuracy of 79.05%, while the unimodal approach achieved 68.76%, demonstrating that the proposed approach outperforms unimodal techniques in facial expression recognition under masked conditions. This highlights the potential of multimodal techniques for improving FER in challenging scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Comparative Assessment of Facial Expression Recognition Models for Unraveling Emotional Signals with Convolutional Neural Networks

A deep-learning-based facial expression recognition method using textural features

Article 22 November 2022

An automatic improved facial expression recognition for masked faces

Article Open access 01 April 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Availability of data and materials

The dataset is public.

References

Li B, Lima D (2021) Facial expression recognition via resnet-50. International Journal of Cognitive Computing in Engineering 2:57–64. https://doi.org/10.1016/j.ijcce.2021.02.002
Article Google Scholar
Yildirim E, Akbulut FP, Catal C (2023) Analysis of facial emotion expression in eating occasions using deep learning. Multimedia Tools and Applications 1–13. https://doi.org/10.1007/s11042-023-15008-6
Marini M, Ansani A, Paglieri F, Caruana F, Viola M (2021) The impact of facemasks on emotion recognition, trust attribution and re-identification. Sci Rep 11:1–14. https://doi.org/10.1038/s41598-021-84806-5
Article Google Scholar
Kong Y, Ren Z, Zhang K, Zhang S, Ni Q, Han J (2021) Lightweight facial expression recognition method based on attention mechanism and key region fusion. J Electron Imaging 30:063002–063002. https://doi.org/10.1117/1.JEI.30.6.063002
Grundmann F, Epstude K, Scheibe S (2021) Face masks reduce emotion recognition accuracy and perceived closeness. PLoS ONE 16(4):0249792. https://doi.org/10.1371/journal.pone.0249792
Article Google Scholar
Pazhoohi F, Forby L, Kingstone A (2021) Facial masks affect emotion recognition in the general population and individuals with autistic traits. PLoS ONE 16:0257740. https://doi.org/10.1371/journal.pone.0257740
Article Google Scholar
Puri T, Soni M, Dhiman G, Ibrahim Khalaf O, Raza Khan I et al (2022) Detection of emotion of speech for ravdess audio using hybrid convolution neural network. J Healthcare Eng 2022. https://doi.org/10.1155/2022/8472947
Tawhid MNA, Siuly S, Wang H, Whittaker F, Wang K, Zhang Y (2021) A spectrogram image based intelligent technique for automatic detection of autism spectrum disorder from eeg. PLoS ONE 16:0253094. https://doi.org/10.1371/journal.pone.0253094
Franzoni V, Biondi G, Milani A (2020) Emotional sounds of crowds: spectrogram-based analysis using deep learning. Multimedia tools and applications 79:36063–36075. https://doi.org/10.1007/s11042-020-09428-x
Grahlow M, Rupp CI, Derntl B (2022) The impact of face masks on emotion recognition performance and perception of threat. PLoS ONE 17:0262840. https://doi.org/10.1371/journal.pone.0262840
Article Google Scholar
Grundmann F, Epstude K, Scheibe S (2021) Face masks reduce emotion recognition accuracy and perceived closeness. PLoS ONE 16:0249792. https://doi.org/10.1371/journal.pone.0249792
Article Google Scholar
Vachmanus S, Ravankar AA, Emaru T, Kobayashi Y (2021) Multi-modal sensor fusion-based semantic segmentation for snow driving scenarios. IEEE Sens J 21:16839–16851. https://doi.org/10.1109/JSEN.2021.3077029
Article Google Scholar
Abbas Q, Ibrahim ME, Jaffar MA (2019) A comprehensive review of recent advances on deep vision systems. Artif Intell Rev 52:39–76. https://doi.org/10.1007/s10462-018-9633-3
Article Google Scholar
Sun W, Chen X, Zhang X, Dai G, Chang P, He X (2021) A multi-feature learning model with enhanced local attention for vehicle re-identification. CMC-Computers Materials & Continua 69(3):3549–3561. https://doi.org/10.32604/cmc.2021.021627
Al-Waisy AS, Qahwaji R, Ipson S, Al-Fahdawi S (2018) A multimodal deep learning framework using local feature representations for face recognition. Mach Vis Appl 29:35–54. https://doi.org/10.1007/s00138-017-0870-2
Article Google Scholar
Wei W, Jia Q, Feng Y, Chen G, Chu M (2020) Multi-modal facial expression feature based on deep-neural networks. Journal on Multimodal User Interfaces 14:17–23. https://doi.org/10.1007/s12193-019-00308-9
Article Google Scholar
Hamester D, Barros P, Wermter S (2015) Face expression recognition with a 2-channel convolutional neural network. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280539
Jaafar N, Lachiri Z (2023) Multimodal fusion methods with deep neural networks and meta-information for aggression detection in surveillance. Expert Syst Appl 211:118523. https://doi.org/10.1016/j.eswa.2022.118523
Article Google Scholar
Wang K, Song Y, Huang Z, Sun Y, Xu J, Zhang S (2022) Additive manufacturing energy consumption measurement and prediction in fabricating lattice structure based on recallable multimodal fusion network. Measurement 196:111215. https://doi.org/10.1016/j.eswa.2022.118523
Article Google Scholar
Kashinath SA, Mostafa SA, Mustapha A, Mahdin H, Lim D, Mahmoud MA, Mohammed MA, Al-Rimy BAS, Fudzee MFM, Yang TJ (2021) Review of data fusion methods for real-time and multi-sensor traffic flow analysis. IEEE Access 9:51258–51276. https://doi.org/10.1109/ACCESS.2021.3069770
Article Google Scholar
Gumaei A, Ismail WN, Hassan MR, Hassan MM, Mohamed E, Alelaiwi A, Fortino G (2022) A decision-level fusion method for covid-19 patient health prediction. Big Data Research 27:100287. https://doi.org/10.1016/j.bdr.2021.100287
Article Google Scholar
Yang B, Wu J, Hattori G (2020) Facial expression recognition with the advent of human beings all behind face masks. (2020). Paper presented at the proceedings of the 19th international conference on mobile and ubiquitous multimedia, November, Essen, Germany
Cao H, Cooper DG, Keutmann MK, Gur RC, Nenkova A, Verma R (2014) Crema-d: crowd-sourced emotional multimodal actors dataset. IEEE Trans Affect Comput 5:377–390. https://doi.org/10.1109/TAFFC.2014.2336244
Pappagari R, Wang T, Villalba J, Chen N, Dehak N (2020) x-vectors meet emotions: a study on dependencies between emotion and speaker recognition (2020) Paper presented at the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Gebereselassie SA, Roy BK (2022) Secure speech communication based on the combination of chaotic oscillator and logistic map. Multimedia Tools and Applications 81:26061–26079. https://doi.org/10.1007/s11042-022-12803-5
Article Google Scholar
Zheng Y, Sarigul E, Panicker G, Stott D (2022) Vineyard lai and canopy coverage estimation with convolutional neural network models and drone pictures. Paper presented at the Sensing for Agriculture and Food Quality and Safety XIV
Liu F, Xu H, Qi M, Liu D, Wang J, Kong J (2022) Depth-wise separable convolution attention module for garbage image classification. Sustainability 14(5):3099. https://doi.org/10.3390/su14053099
Article Google Scholar
Qian L, Hu L, Zhao L, Wang T, Jiang R (2020) Sequence-dropout block for reducing overfitting problem in image classification. IEEE Access 8:62830–62840. https://doi.org/10.1109/ACCESS.2020.2983774
Article Google Scholar
Chen L, Li M, Lai X, Hirota K, Pedrycz W (2020) Cnn-based broad learning with efficient incremental reconstruction model for facial emotion recognition. IFAC-PapersOnLine 53(2):10236–10241. https://doi.org/10.1016/j.ifacol.2020.12.2754
Shahzad H, Bhatti SM, Jaffar A, Rashid M (2023) A multi-modal deep learning approach for emotion recognition. Intelligent Automation & Soft Computing 36. https://doi.org/10.32604/iasc.2023.032525

Download references

Funding

This research received no particular grant from any funding agency in the public, private, or not-for-profit sectors.

Author information

Authors and Affiliations

Faculty of Computer Science and Information Technology, The Superior University, Lahore, 55150, Pakistan
H.M Shahzad, Sohail Masood Bhatti, Arfan Jaffar & Sheeraz Akram
Intelligent Data Visual Computing Research (IDVCR), Lahore, Pakistan
H.M Shahzad, Sohail Masood Bhatti, Arfan Jaffar & Sheeraz Akram
Information Systems Department, College of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University (IMSIU), Riyadh, 12571, Saudi Arabia
Sheeraz Akram

Authors

H.M Shahzad
View author publications
You can also search for this author in PubMed Google Scholar
Sohail Masood Bhatti
View author publications
You can also search for this author in PubMed Google Scholar
Arfan Jaffar
View author publications
You can also search for this author in PubMed Google Scholar
Sheeraz Akram
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to H.M Shahzad.

Ethics declarations

Conflict of interest/Competing

The authors declare that they have no conflicts of interest to report regarding the present study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Shahzad, H., Bhatti, S., Jaffar, A. et al. Enhancing masked facial expression recognition with multimodal deep learning. Multimed Tools Appl 83, 73911–73921 (2024). https://doi.org/10.1007/s11042-024-18362-1

Download citation

Received: 20 May 2023
Revised: 28 September 2023
Accepted: 19 January 2024
Published: 13 February 2024
Issue Date: September 2024
DOI: https://doi.org/10.1007/s11042-024-18362-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Enhancing masked facial expression recognition with multimodal deep learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Assessment of Facial Expression Recognition Models for Unraveling Emotional Signals with Convolutional Neural Networks

A deep-learning-based facial expression recognition method using textural features

An automatic improved facial expression recognition for masked faces

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest/Competing

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Enhancing masked facial expression recognition with multimodal deep learning

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Comparative Assessment of Facial Expression Recognition Models for Unraveling Emotional Signals with Convolutional Neural Networks

A deep-learning-based facial expression recognition method using textural features

An automatic improved facial expression recognition for masked faces

Explore related subjects

Availability of data and materials

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest/Competing

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation