Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Enhancing masked facial expression recognition with multimodal deep learning

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Facial expression recognition (FER) is an essential field for intelligent human-computer interaction, but the COVID-19 pandemic has made unimodal techniques less effective due to masks. Multimodal approaches that combine information from multiple modalities are more robust at recognizing emotions from facial expressions. The need to accurately recognize human emotions based on facial expressions is still significant. The study proposed a multimodal methodology based on deep learning for facial recognition under masks and vocal expressions. The proposed approach used two standard datasets, M-LFW-F and CREMA-D to capture facial and vocal emotional cues. The resulting dataset was used to train a multimodal neural network using fusion techniques that outperformed unimodal methods. The proposed approach achieved an accuracy of 79.05%, while the unimodal approach achieved 68.76%, demonstrating that the proposed approach outperforms unimodal techniques in facial expression recognition under masked conditions. This highlights the potential of multimodal techniques for improving FER in challenging scenarios.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Availability of data and materials

The dataset is public.

References

  1. Li B, Lima D (2021) Facial expression recognition via resnet-50. International Journal of Cognitive Computing in Engineering 2:57–64. https://doi.org/10.1016/j.ijcce.2021.02.002

    Article  Google Scholar 

  2. Yildirim E, Akbulut FP, Catal C (2023) Analysis of facial emotion expression in eating occasions using deep learning. Multimedia Tools and Applications 1–13. https://doi.org/10.1007/s11042-023-15008-6

  3. Marini M, Ansani A, Paglieri F, Caruana F, Viola M (2021) The impact of facemasks on emotion recognition, trust attribution and re-identification. Sci Rep 11:1–14. https://doi.org/10.1038/s41598-021-84806-5

    Article  Google Scholar 

  4. Kong Y, Ren Z, Zhang K, Zhang S, Ni Q, Han J (2021) Lightweight facial expression recognition method based on attention mechanism and key region fusion. J Electron Imaging 30:063002–063002. https://doi.org/10.1117/1.JEI.30.6.063002

  5. Grundmann F, Epstude K, Scheibe S (2021) Face masks reduce emotion recognition accuracy and perceived closeness. PLoS ONE 16(4):0249792. https://doi.org/10.1371/journal.pone.0249792

    Article  Google Scholar 

  6. Pazhoohi F, Forby L, Kingstone A (2021) Facial masks affect emotion recognition in the general population and individuals with autistic traits. PLoS ONE 16:0257740. https://doi.org/10.1371/journal.pone.0257740

    Article  Google Scholar 

  7. Puri T, Soni M, Dhiman G, Ibrahim Khalaf O, Raza Khan I et al (2022) Detection of emotion of speech for ravdess audio using hybrid convolution neural network. J Healthcare Eng 2022. https://doi.org/10.1155/2022/8472947

  8. Tawhid MNA, Siuly S, Wang H, Whittaker F, Wang K, Zhang Y (2021) A spectrogram image based intelligent technique for automatic detection of autism spectrum disorder from eeg. PLoS ONE 16:0253094. https://doi.org/10.1371/journal.pone.0253094

  9. Franzoni V, Biondi G, Milani A (2020) Emotional sounds of crowds: spectrogram-based analysis using deep learning. Multimedia tools and applications 79:36063–36075. https://doi.org/10.1007/s11042-020-09428-x

  10. Grahlow M, Rupp CI, Derntl B (2022) The impact of face masks on emotion recognition performance and perception of threat. PLoS ONE 17:0262840. https://doi.org/10.1371/journal.pone.0262840

    Article  Google Scholar 

  11. Grundmann F, Epstude K, Scheibe S (2021) Face masks reduce emotion recognition accuracy and perceived closeness. PLoS ONE 16:0249792. https://doi.org/10.1371/journal.pone.0249792

    Article  Google Scholar 

  12. Vachmanus S, Ravankar AA, Emaru T, Kobayashi Y (2021) Multi-modal sensor fusion-based semantic segmentation for snow driving scenarios. IEEE Sens J 21:16839–16851. https://doi.org/10.1109/JSEN.2021.3077029

    Article  Google Scholar 

  13. Abbas Q, Ibrahim ME, Jaffar MA (2019) A comprehensive review of recent advances on deep vision systems. Artif Intell Rev 52:39–76. https://doi.org/10.1007/s10462-018-9633-3

    Article  Google Scholar 

  14. Sun W, Chen X, Zhang X, Dai G, Chang P, He X (2021) A multi-feature learning model with enhanced local attention for vehicle re-identification. CMC-Computers Materials & Continua 69(3):3549–3561. https://doi.org/10.32604/cmc.2021.021627

  15. Al-Waisy AS, Qahwaji R, Ipson S, Al-Fahdawi S (2018) A multimodal deep learning framework using local feature representations for face recognition. Mach Vis Appl 29:35–54. https://doi.org/10.1007/s00138-017-0870-2

    Article  Google Scholar 

  16. Wei W, Jia Q, Feng Y, Chen G, Chu M (2020) Multi-modal facial expression feature based on deep-neural networks. Journal on Multimodal User Interfaces 14:17–23. https://doi.org/10.1007/s12193-019-00308-9

    Article  Google Scholar 

  17. Hamester D, Barros P, Wermter S (2015) Face expression recognition with a 2-channel convolutional neural network. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp 1–8. https://doi.org/10.1109/IJCNN.2015.7280539

  18. Jaafar N, Lachiri Z (2023) Multimodal fusion methods with deep neural networks and meta-information for aggression detection in surveillance. Expert Syst Appl 211:118523. https://doi.org/10.1016/j.eswa.2022.118523

    Article  Google Scholar 

  19. Wang K, Song Y, Huang Z, Sun Y, Xu J, Zhang S (2022) Additive manufacturing energy consumption measurement and prediction in fabricating lattice structure based on recallable multimodal fusion network. Measurement 196:111215. https://doi.org/10.1016/j.eswa.2022.118523

    Article  Google Scholar 

  20. Kashinath SA, Mostafa SA, Mustapha A, Mahdin H, Lim D, Mahmoud MA, Mohammed MA, Al-Rimy BAS, Fudzee MFM, Yang TJ (2021) Review of data fusion methods for real-time and multi-sensor traffic flow analysis. IEEE Access 9:51258–51276. https://doi.org/10.1109/ACCESS.2021.3069770

    Article  Google Scholar 

  21. Gumaei A, Ismail WN, Hassan MR, Hassan MM, Mohamed E, Alelaiwi A, Fortino G (2022) A decision-level fusion method for covid-19 patient health prediction. Big Data Research 27:100287. https://doi.org/10.1016/j.bdr.2021.100287

    Article  Google Scholar 

  22. Yang B, Wu J, Hattori G (2020) Facial expression recognition with the advent of human beings all behind face masks. (2020). Paper presented at the proceedings of the 19th international conference on mobile and ubiquitous multimedia, November, Essen, Germany

  23. Cao H, Cooper DG, Keutmann MK, Gur RC, Nenkova A, Verma R (2014) Crema-d: crowd-sourced emotional multimodal actors dataset. IEEE Trans Affect Comput 5:377–390. https://doi.org/10.1109/TAFFC.2014.2336244

  24. Pappagari R, Wang T, Villalba J, Chen N, Dehak N (2020) x-vectors meet emotions: a study on dependencies between emotion and speaker recognition (2020) Paper presented at the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)

  25. Gebereselassie SA, Roy BK (2022) Secure speech communication based on the combination of chaotic oscillator and logistic map. Multimedia Tools and Applications 81:26061–26079. https://doi.org/10.1007/s11042-022-12803-5

    Article  Google Scholar 

  26. Zheng Y, Sarigul E, Panicker G, Stott D (2022) Vineyard lai and canopy coverage estimation with convolutional neural network models and drone pictures. Paper presented at the Sensing for Agriculture and Food Quality and Safety XIV

  27. Liu F, Xu H, Qi M, Liu D, Wang J, Kong J (2022) Depth-wise separable convolution attention module for garbage image classification. Sustainability 14(5):3099. https://doi.org/10.3390/su14053099

    Article  Google Scholar 

  28. Qian L, Hu L, Zhao L, Wang T, Jiang R (2020) Sequence-dropout block for reducing overfitting problem in image classification. IEEE Access 8:62830–62840. https://doi.org/10.1109/ACCESS.2020.2983774

    Article  Google Scholar 

  29. Chen L, Li M, Lai X, Hirota K, Pedrycz W (2020) Cnn-based broad learning with efficient incremental reconstruction model for facial emotion recognition. IFAC-PapersOnLine 53(2):10236–10241. https://doi.org/10.1016/j.ifacol.2020.12.2754

  30. Shahzad H, Bhatti SM, Jaffar A, Rashid M (2023) A multi-modal deep learning approach for emotion recognition. Intelligent Automation & Soft Computing 36. https://doi.org/10.32604/iasc.2023.032525

Download references

Funding

This research received no particular grant from any funding agency in the public, private, or not-for-profit sectors.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H.M Shahzad.

Ethics declarations

Conflict of interest/Competing

The authors declare that they have no conflicts of interest to report regarding the present study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Shahzad, H., Bhatti, S., Jaffar, A. et al. Enhancing masked facial expression recognition with multimodal deep learning. Multimed Tools Appl 83, 73911–73921 (2024). https://doi.org/10.1007/s11042-024-18362-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-024-18362-1

Keywords