Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Hybrid multi-modal emotion recognition framework based on InceptionV3DenseNet

Published: 27 March 2023 Publication History

Abstract

Emotion recognition is one of the most complex research areas as individuals express emotional cues based on several modalities such as audio, facial expressions, and language. The recognition of emotion from one of the modalities is not always feasible as the single modalities are disturbed by several factors. The existing models cannot attain the maximum accuracy in exactly identifying the expressions of individuals. In this paper, a novel hybrid multi-modal emotion recognition framework InceptionV3DenseNet is proposed for improving the recognition accuracy. Initially contextual features are extracted from different modalities such as video, audio and text. From the video modality, the features such as shot length, lighting key, motion and color are extracted. Zero-crossing rate, Mel frequency cepstral coefficient (MFCC), energy and pitch are extracted from the audio modality and the unigram, bigram and TF-IDF are extracted from the textual modality. In feature extraction, high level features are extracted with better generalization capability. The extracted features are fused using the multi-set integrated canonical correlation analysis (MICCA) and are provided as the input to the proposed hybrid network model. It detects the correlation between multimodal features to provide better performance with single learning phase. Then the proposed hybrid deep learning model is utilized to classify emotional states by considering the accuracy and reliability. The work simulations are conducted in the MATLAB platform and evaluated using the MELD and RAVDESS datasets. The outcomes proved that the proposed model is more efficient and accurate than the compared models and attained an overall accuracy rate of 74.87% in MELD and 95.25% in RAVDESS.

References

[1]
Abdullah SMSA, Ameen SYA, Sadeeq MAM, and Zeebaree S Multi-modal emotion recognition using deep learning J Appl Sci Technol Trends 2021 2 02 52-58
[2]
Bastanfard A, Aghaahmadi M, Fazel M, Moghadam M (2009) Persian viseme classification for developing visual speech training application. In Pacific-Rim Conference on Multimedia, Springer, Berlin, Heidelberg, 1080–1085
[3]
Bastanfard A, Amirkhani D, Hasani M (2019) Increasing the accuracy of automatic speaker age estimation by using multiple UBMs. In 2019 5th conference on knowledge based engineering and innovation (KBEI), IEEE 592–598
[4]
Cevher D, Zepf S, Klinger R (2019) Towards multi-modal emotion recognition in german speech events in cars using transfer learning. arXiv preprint arXiv:1909.02764
[5]
Chang X and Skarbek W Multi-modal residual perceptron network for audio–video emotion recognition Sensors 2021 21 16 5452
[6]
Cimtay Y, Ekmekcioglu E, and Caglar-Ozhan S Cross-subject multi-modal emotion recognition based on hybrid fusion IEEE Access 2020 8 168865-168878
[7]
Correa NM, Eichele T, Adalı T, Li Y-O, and Calhoun VD Multi-set canonical correlation analysis for the fusion of concurrent single trial ERP and functional MRI Neuroimage 2010 50 4 1438-1445
[8]
Dai W, Liu Z, Yu T, Fung P (2020) Modality-transferable emotion embeddings for low-resource multi-modal emotion recognition. arXiv preprint arXiv:2009.09629
[9]
Granger E, Cardinal P (2021) Cross attentional audio-visual fusion for dimensional emotion recognition. arXiv preprint arXiv:2111.05222
[10]
Guo J-J, Zhou R, Zhao L-M, Lu B-L (2019) Multi-modal emotion recognition from eye image, eye movement and eeg using deep neural networks. In 2019 41st annual international conference of the IEEE engineering in medicine and biology society (EMBC), 3071–3074
[11]
Hashim FA, Houssein EH, Hussain K, Mabrouk MS, and Al-Atabany W Honey badger algorithm: new metaheuristic algorithm for solving optimization problems Math Comput Simul 2022 192 84-110
[12]
He Z, Li Z, Yang F, Wang L, Li J, Zhou C, and Pan J Advances in multi-modal emotion recognition based on brain–computer interfaces Brain Sci 2020 10 10 687
[13]
Ho N-H, Yang H-J, Kim S-H, and Lee G Multi-modal approach of speech emotion recognition using multi-level multi-head fusion attention-based recurrent neural network IEEE Access 2020 8 61672-61686
[14]
Huan R-H, Shu J, Bao S-L, Liang R-H, Chen P, and Chi K-K Video multi-modal emotion recognition based on bi-GRU and attention fusion Multimed Tools Appl 2021 80 6 8213-8240
[15]
Huang H, Hu Z, Wang W, and Wu M Multi-modal emotion recognition based on ensemble convolutional neural network IEEE Access 2019 8 3265-3271
[16]
Li Y, Ishi CT, Inoue K, Nakamura S, and Kawahara T Expressing reactive emotion based on multi-modal emotion recognition for natural conversation in human–robot interaction Adv Robot 2019 33 20 1030-1041
[17]
Li J-L, Lee C-C (2019) Attentive to individual: a multimodal emotion recognition network with personalized attention profile. In Interspeech 211–215
[18]
Liu D, Chen L, Wang Z, and Diao G Speech expression multimodal emotion recognition based on deep belief network J Grid Comput 2021 19 2 1-13
[19]
Liu W, Qiu J-L, Zheng W-L, Lu B-L (2019) Multi-modal emotion recognition using deep canonical correlation analysis. arXiv preprint arXiv:1908.05349
[20]
Mahdavi R, Bastanfard A, Amirkhani D (2020) Persian accents identification using modeling of speech articulatory features. In 2020 25th international computer conference, Computer Society of Iran (CSICC) 1–9
[21]
Mittal T, Bhattacharya U, Chandra R, Bera A, and Manocha D M3er: multiplicative multi-modal emotion recognition using facial, textual, and speech cues Proc AAAI Conf Artif Intell 2020 34 02 1359-1367
[22]
Nemati S, Rohani R, Basiri ME, Abdar M, Yen NY, and Makarenkov V A hybrid latent space data fusion method for multi-modal emotion recognition IEEE Access 2019 7 172948-172964
[23]
Panda D, Chakladar DD, Dasgupta T (2020) Multi-modal system for emotion recognition using EEG and customer review. In Proceedings of the Global AI Congress 2019 Springer, Singapore, 399–410
[24]
Radoi A, Birhala A, Ristea N-C, and Dutu L-C An end-to-end emotion recognition framework based on temporal aggregation of multimodal information IEEE Access 2021 9 135559-135570
[25]
Rahdari F, Rashedi E, and Eftekhari M A multi-modal emotion recognition system using facial landmark analysis Iran J Sci Technol Trans Electr Eng 2019 43 1 171-189
[26]
Savargiv M, Bastanfard A (2013) Text material design for fuzzy emotional speech corpus based on Persian semantic and structure. In 2013 international conference on fuzzy theory and its applications (iFUZZY), IEEE 380–384
[27]
Savargiv M, Bastanfard A (2015) Persian speech emotion recognition. In 2015 7th conference on information and knowledge technology (IKT), IEEE 1–5
[28]
Savargiv M, Bastanfard A (2016) Real-time speech emotion recognition by minimum number of features. In 2016 Artificial intelligence and robotics (IRANOPEN), IEEE 72–76
[29]
Shahin I, Hindawi N, Nassif AB, Alhudhaif A, and Polat K Novel dual-channel long short-term memory compressed capsule networks for emotion recognition Expert Syst Appl Elsevier 2022 188 116080
[30]
Siddiqui MFH and Javaid AY A multimodal facial emotion recognition framework through the fusion of speech with visible and infrared images Multimodal Technol Interact 2020 4 3 46
[31]
Singh P, Srivastava R, Rana KPS, and Kumar V A multimodal hierarchical approach to speech emotion recognition from audio and text Knowl Based Syst Elsevier 2021 229 107316
[32]
Siriwardhana S, Kaluarachchi T, Billinghurst M, and Nanayakkara S Multi-modal emotion recognition with transformer-based self supervised feature fusion IEEE Access 2020 8 176274-176285
[33]
Veni S, Anand R, Mohan D, Paul E (2021) Feature fusion in multimodal emotion recognition system for enhancement of human-machine interaction. In IOP conference series: materials science and engineering, IOP publishing, 1084(1): 012004
[34]
Xie B, Sidulova M, and Park CH Robust multi-modal emotion recognition from conversation with transformer-based crossmodality fusion Sensors 2021 21 14 4913
[35]
Xu N, Mao W, and Chen G Multi-interactive memory network for aspect based multi-modal sentiment analysis Proc AAAI Conf Artif Intell 2019 33 01 371-378
[36]
Xu H, Zhang H, Han K, Wang Y, Peng Y, Li X (2019) Learning alignment for multi-modal emotion recognition from speech. arXiv preprint arXiv:1909.05645
[37]
Yalamanchili B, Dungala K, Mandapati K, Pillodi M, Vanga SR (2021) Survey on multi-modal emotion recognition (MER) systems. In machine learning technologies and applications: proceedings of ICACECS 2020, springer Singapore, 319–326
[38]
Yin G, Sun S, Yu D, Li D, Zhang K (2022) A multimodal framework for large-scale emotion recognition by fusing music and electrodermal activity signals. ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), dl.acm.org, 18(3):1–23
[39]
Yu C, Tapus A (2019) Interactive robot learning for multi-modal emotion recognition. In International Conference on Social Robotics, Springer, Cham, 633–642
[40]
Yuan Y-H, Sun Q-S, Zhou Q, and Xia D-S A novel multi-set integrated canonical correlation analysis framework and its application in feature fusion Pattern Recogn 2011 44 5 1031-1040
[41]
Zhang H Expression-EEG based collaborative multi-modal emotion recognition using deep autoencoder IEEE Access 2020 8 164130-164143
[42]
Zhang G, Luo T, Pedrycz W, El-Meligy MA, Sharaf MAF, and Li Z Outlier processing in multi-modal emotion recognition IEEE Access 2020 8 55688-55701
[43]
Zhao Y, Chen D (2021) Expression EEG Multimodal Emotion Recognition Method Based on the Bidirectional LSTM and Attention Mechanism Computational and Mathematical Methods in Medicine 2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications
Multimedia Tools and Applications  Volume 82, Issue 26
Nov 2023
1559 pages

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 27 March 2023
Accepted: 02 March 2023
Revision received: 05 October 2022
Received: 21 March 2022

Author Tags

  1. Multi-modal emotion recognition
  2. Audio features
  3. Video features
  4. Textual features
  5. Feature extraction
  6. Feature fusion
  7. Classification
  8. Deep learning

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media