Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

A Review of Key Technologies for Emotion Analysis Using Multimodal Information

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Emotion analysis, an integral aspect of human–machine interactions, has witnessed significant advancements in recent years. With the rise of multimodal data sources such as speech, text, and images, there is a profound need for a comprehensive review of pivotal elements within this domain. Our paper delves deep into the realm of emotion analysis, examining multimodal data sources encompassing speech, text, images, and physiological signals. We provide a curated overview of relevant literature, academic forums, and competitions. Emphasis is laid on dissecting unimodal processing methods, including preprocessing, feature extraction, and tools across speech, text, images, and physiological signals. We further discuss the nuances of multimodal data fusion techniques, spotlighting early, late, model, and hybrid fusion strategies. Key findings indicate the essentiality of analyzing emotions across multiple modalities. Detailed discussions on emotion elicitation, expression, and representation models are presented. Moreover, we uncover challenges such as dataset creation, modality synchronization, model efficiency, limited data scenarios, cross-domain applicability, and the handling of missing modalities. Practical solutions and suggestions are provided to address these challenges. The realm of multimodal emotion analysis is vast, with numerous applications ranging from driver sentiment detection to medical evaluations. Our comprehensive review serves as a valuable resource for both scholars and industry professionals. It not only sheds light on the current state of research but also highlights potential directions for future innovations. The insights garnered from this paper are expected to pave the way for subsequent advancements in deep multimodal emotion analysis tailored for real-world deployments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Data Availability

The data are available from the corresponding author on reasonable request.

References

  1. Foa EB, Kozak MJ. Emotional processing of fear: exposure to corrective information[J]. Psychol Bull. 1986;99(1):20.

    Article  Google Scholar 

  2. Ernst H, Scherpf M, Pannasch S, et al. Assessment of the human response to acute mental stress-An overview and a multimodal study[J]. PLoS ONE. 2023;18(11): e0294069.

  3. Liu EH, Chambers CR, Moore C. Fifty years of research on leader communication: What we know and where we are going[J]. The Leadership Quarterly. 2023:101734.

  4. Russell JA. Core affect and the psychological construction of emotion[J]. Psychol Rev. 2003;110(1):145.

    Article  Google Scholar 

  5. Abdullah SMSA, Ameen SYA, Sadeeq MAM, et al. Multimodal emotion recognition using deep learning[J]. J Appl Sci Technol Trends. 2021;2(02):52–8.

    Google Scholar 

  6. Marechal C, Mikolajewski D, Tyburek K, et al. Survey on AI-Based Multimodal Methods for Emotion Detection[J]. High-performance modelling and simulation for big data applications. 2019;11400:307–24.

    Article  Google Scholar 

  7. Shoumy NJ, Ang LM, Seng KP, et al. Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals[J]. J Netw Comput Appl. 2020;149:102447.

    Article  Google Scholar 

  8. Zhao S, Yao X, Yang J, et al. Affective image content analysis: Two decades review and new perspectives[J]. IEEE Trans Pattern Anal Mach Intell. 2021;44(10):6729–51.

    Article  Google Scholar 

  9. Christian H, Suhartono D, Chowanda A, et al. Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging[J]. J Big Data. 2021;8(1):1–20.

    Article  Google Scholar 

  10. Das R, Singh T D. Multimodal Sentiment Analysis: A Survey of Methods, Trends and Challenges[J]. ACM Comput Surv. 2023.

  11. Zhu L, Zhu Z, Zhang C, et al. Multimodal sentiment analysis based on fusion methods: A survey[J]. Inform Fusion. 2023.

  12. Ahmed N, Al Aghbari Z, Girija S. A systematic survey on multimodal emotion recognition using learning algorithms[J]. Intell Syst Appl. 2023;17: 200171.

    Google Scholar 

  13. Jabeen S, Li X, Amin MS, et al. A Review on Methods and Applications in Multimodal Deep Learning[J]. ACM Trans Multimed Comput Commun Appl. 2023;19(2s):1–41.

    Article  Google Scholar 

  14. Gandhi A, Adhvaryu K, Poria S, et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Inform Fusion. 2022.

  15. Dimitri GM. A Short Survey on Deep Learning for Multimodal Integration: Applications, Future Perspectives and Challenges[J]. Computers. 2022;11(11):163.

    Article  Google Scholar 

  16. Xiaoming Z, Yijiao Y, Shiqing Z. Survey of Deep Learning Based Multimodal Emotion Recognition[J]. J Front Comput Sci Technol. 2022;16(7):1479.

    Google Scholar 

  17. Luna-Jimenez C, Kleinlein R, Griol D, et al. A proposal for multimodal emotion recognition using aural transformers and action units on RAVDESS dataset[J]. Appl Sci. 2021;12(1):327.

    Article  Google Scholar 

  18. Chandrasekaran G, Nguyen TN, Hemanth DJ. Multimodal sentimental analysis for social media applications: A comprehensive review[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2021;11(5): e1415.

    Google Scholar 

  19. Zhao S, Jia G, Yang J, et al. Emotion recognition from multiple modalities: Fundamentals and methodologies[J]. IEEE Signal Process Mag. 2021;38(6):59–73.

    Article  Google Scholar 

  20. Abdu SA, Yousef AH, Salem A. Multimodal video sentiment analysis using deep learning approaches, a survey[J]. Inform Fusion. 2021;76:204–26.

    Article  Google Scholar 

  21. Sharma G, Dhall A. A survey on automatic multimodal emotion recognition in the wild[J]. Advances in data science: Methodol Appl. 2021:35-64.

  22. Nandi A, Xhafa F, Subirats L, et al. A survey on multimodal data stream mining for e-learner’s emotion recognition[C]. In: 2020 International Conference on Omni-layer Intelligent Systems (COINS). IEEE; 2020. p. 1–6.

  23. Zhang J, Yin Z, Chen P, et al. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review[J]. Inform Fusion. 2020;59:103–26.

    Article  Google Scholar 

  24. Seng JKP, Ang KLM. Multimodal emotion and sentiment modeling from unstructured Big data: Challenges, architecture, and techniques[J]. IEEE Access. 2019;7:90982–98.

    Article  Google Scholar 

  25. Baltru?aitis T, Ahuja C, Morency LP. Multimodal machine learning: A survey and taxonomy[J]. IEEE Trans Pattern Anal Mach Intell. 2018;41(2):423–43.

  26. Poria S, Cambria E, Bajpai R, et al. A review of affective computing: From unimodal analysis to multimodal fusion[J]. Inform Fusion. 2017;37:98–125.

    Article  Google Scholar 

  27. Latha CP, Priya M. A review on deep learning algorithms for speech and facial emotion recognition[J]. APTIKOM J Comput Sci Inf Technol. 2016;1(3):92–108.

  28. Schuller B, Valstar M, Eyben F, et al. Avec 2011-the first international audio/visual emotion challenge[C]. Affective Computing and Intelligent Interaction: Fourth International Conference, ACII 2011, Memphis, TN, USA, October 9-12, 2011, Proceedings, Part II. Springer Berlin Heidelberg, 2011:415-424.

  29. Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M. Avec 2011-the first international audio/visual emotion challenge. In Affective Computing and Intelligent Interaction, 2011, p. 415-424. Springer Berlin Heidelberg.

  30. Chen H, Zhou H, Du J, et al. The first multimodal information based speech processing challenge:Data, tasks, baselines and results. In Processing ICASSP. 2022, p. 9266-9270. IEEE.

  31. Zafeiriou S, Kollias D, Nicolaou M A, et al. Aff-wild: valence and arousal’In-the-Wild’challenge[C]. Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017:34-41.

  32. Baveye Y, Dellandrea E, Chamaret C, et al. LIRIS-ACCEDE: A video database for affective content analysis[J]. IEEE Trans Affect Comput. 2015;6(1):43–55.

    Article  Google Scholar 

  33. Stappen L, Baird A, Rizos G, et al. Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild[C]. Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop. 2020:35-44.

  34. Li Y, Tao J, Schuller B, et al. Mec 2017: Multimodal emotion recognition challenge[C]. 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia). IEEE, 2018:1-5.

  35. Kollias D. Abaw: valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:2328-2336.

  36. Lian Z, Sun H, Sun L, et al. Mer 2023: Multi-label learning, modality robustness, and semi-supervised learning[C]. In: Proceedings of the 31st ACM International Conference on Multimedia. 2023:9610-9614.

  37. Li J, Zhang Z, Lang J, et al. Hybrid multimodal feature extraction, mining and fusion for sentiment analysis[C]. In: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge. 2022:81-88.

  38. Zong D, Ding C, Li B, et al. Building robust multimodal sentiment recognition via a simple yet effective multimodal transformer[C]. In: Proceedings of the 31st ACM International Conference on Multimedia. 2023:9596-9600.

  39. Advances in Neural Information Processing Systems 10: Proceedings of the 1997 Conference[M]. Mit Press, 1998.

  40. Amsaleg L, Huet B, Larson M, et al. Proceedings of the 27th ACM International Conference on Multimedia[C]. 27th ACM International Conference on Multimedia. ACM Press, 2019.

  41. Lomonaco V, Pellegrini L, Rodriguez P, et al. Cvpr 2020 continual learning in computer vision competition: Approaches, results, current challenges and future directions[J]. Artif Intell. 2022;303: 103635.

    Article  Google Scholar 

  42. Gatterbauer W, Kumar A. Guest Editors’ Introduction to the Special Section on the 33rd International Conference on Data Engineering (ICDE 2017)[J]. IEEE Trans Knowl Data Eng. 2019;31(7):1222-1223.

  43. Liu Y, Paek T, Patwardhan M. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations[C]. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 2018.

  44. Lang J. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018)[J]. 2018.

  45. Reddy C K A, Dubey H, Gopal V, et al. ICASSP 2021 deep noise suppression challenge[C]. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021:6623-6627.

  46. Morency L P, Bohus D, Aghajan H, et al. ICMI’12: Proceedings of the ACM SIGCHI 14th International Conference on Multimodal Interaction[C]. 14th International Conference on Multimodal Interaction, ICMI 2012. Association for Computing Machinery (ACM), 2012.

  47. Nitta N, Hu A, Tobitani K. MMArt-ACM 2022: 5th Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia[C]. Proceedings of the International Conference on Multimedia Retrieval. 2022;2022:692–3.

    Google Scholar 

  48. PRICAI 2022: Trends in Artificial Intelligence: 19th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2022, Shanghai, China, November 10-13, 2022, Proceedings, Part III[M]. Springer Nature, 2022.

  49. Gabbouj M. Proceedings of WIAMIS 2001: Workshop on Image Analysis for Multimedia Services[J]. 2001.

  50. Strike PC, Steptoe A. Behavioral and emotional triggers of acute coronary syndromes: a systematic review and critique[J]. Psychosom Med. 2005;67(2):179–86.

    Article  Google Scholar 

  51. Hubert W, de Jong-Meyer R. Autonomic, neuroendocrine, and subjective responses to emotion-inducing film stimuli[J]. Int J Psychophysiol. 1991;11(2):131–40.

    Article  Google Scholar 

  52. Bhattacharyya MR, Steptoe A. Emotional triggers of acute coronary syndromes: strength of evidence, biological processes, and clinical implications[J]. Prog Cardiovasc Dis. 2007;49(5):353–65.

  53. Scopa C, Contalbrigo L, Greco A, et al. Emotional transfer in human-horse interaction: New perspectives on equine assisted interventions[J]. Animals. 2019;9(12):1030.

    Article  Google Scholar 

  54. Hong JK, Gao L, Singh J, et al. Evaluating medical device and material thrombosis under flow: current and emerging technologies[J]. Biomater Sci. 2020;8(21):5824–45.

    Article  Google Scholar 

  55. Werheid K, Alpay G, Jentzsch I, et al. Priming emotional facial expressions as evidenced by event-related brain potentials[J]. Int J Psychophysiol. 2005;55(2):209–19.

    Article  Google Scholar 

  56. Matsumoto D, Ekman P. The relationship among expressions, labels, and descriptions of contempt[J]. J Pers Soc Psychol. 2004;87(4):529.

    Article  Google Scholar 

  57. Picard R W. Affective computing[M]. MIT press, 2000.

  58. Tomkins S S. Affect imagery consciousness: the complete edition: two volumes[M]. Springer publishing company, 2008.

  59. Mehrabian A. Comparison of the PAD and PANAS as models for describing emotions and for differentiating anxiety from depression[J]. J Psychopathol Behav Assess. 1997;19:331–57.

    Article  Google Scholar 

  60. Russell JA. Core affect and the psychological construction of emotion[J]. Psychol Rev. 2003;110(1):145.

    Article  Google Scholar 

  61. Posner J, Russell JA, Peterson BS. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology[J]. Dev Psychopathol. 2005;17(3):715–34.

    Article  Google Scholar 

  62. Bleicher RJ, Ciocca RM, Egleston BL, et al. Association of routine pretreatment magnetic resonance imaging with time to surgery, mastectomy rate, and margin status[J]. J Am Coll Surg. 2009;209(2):180–7.

    Article  Google Scholar 

  63. Swathi C, Anoop B K, Dhas D A S, et al. Comparison of different image preprocessing methods used for retinal fundus images[C]. 2017 Conference on Emerging Devices and Smart Systems (ICEDSS). IEEE, 2017:175-179.

  64. Finlayson G D, Schiele B, Crowley J L. Comprehensive colour image normalization[C]. Computer Vision-ECCV’98: 5th European Conference on Computer Vision Freiburg, Germany, June, 2-6, 1998 Proceedings, Volume I 5. Springer Berlin Heidelberg, 1998:475-490.

  65. Vishwakarma AK, Mishra A. Color image enhancement techniques: a critical review[J]. Indian J Comput Sci Eng. 2012;3(1):39–45.

    Google Scholar 

  66. Celik T. Two-dimensional histogram equalization and contrast enhancement[J]. Pattern Recogn. 2012;45(10):3810–24.

    Article  Google Scholar 

  67. Jayaram S, Schmugge S, Shin M C, et al. Effect of colorspace transformation, the illuminance component, and color modeling on skin detection[C]. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. IEEE, 2004, 2:II-II.

  68. Pandey M, Bhatia M, Bansal A, An anatomization of noise removal techniques on medical images[C]. international conference on innovation and challenges in cyber security (iciccs-inbush). IEEE. 2016;2016:224–9.

  69. Maini R, Aggarwal H. Study and comparison of various image edge detection techniques[J]. Int J Image Process (IJIP). 2009;3(1):1–11.

    Google Scholar 

  70. Eltanany AS, SAfy Elwan M, Amein AS. Key point detection techniques[C]. Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019. Springer International Publishing. 2020:901-911.

  71. Yang MH, Kriegman DJ, Ahuja N. Detecting faces in images: a survey[J]. IEEE Trans Pattern Anal Mach Intell. 2002;24(1):34–58.

  72. Qin J, He ZS. ASVM, face recognition method based on Gabor-featured key points[C]. international conference on machine learning and cybernetics. IEEE. 2005;2005(8):5144–9.

  73. Xiong X, De la Torre F. Supervised descent method and its applications to face alignment[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2013:532-539.

  74. Kalyuga S, Chandler P, Sweller J. Incorporating learner experience into the design of multimedia instruction[J]. J Educ Psychol. 2000;92(1):126.

    Article  Google Scholar 

  75. Bezoui M, Elmoutaouakkil A, Beni-hssane A. Feature extraction of some Quranic recitation using mel-frequency cepstral coeficients (MFCC)[C]. 5th international conference on multimedia computing and systems (ICMCS). IEEE. 2016;2016:127–31.

  76. Shrawankar U, Thakare V M. Adverse conditions and ASR techniques for robust speech user interface[J]. arXiv preprint arXiv:1303.5515, 2013.

  77. Liu L, He J, Palm G. Signal modeling for speaker identification. In: Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 2). IEEE; 1996. pp. 665–8.

  78. Bozkurt B, Couvreur L, Dutoit T. Chirp group delay analysis of speech signals[J]. Speech Commun. 2007;49(3):159–76.

    Article  Google Scholar 

  79. Seman N, Bakar ZA, Bakar NA. An evaluation of endpoint detection measures for Malay speech recognition of an isolated words[C]. International Symposium on Information Technology, IEEE. 2010;2010(3):1628–35.

    Google Scholar 

  80. Hua Y, Guo J, Zhao H. Deep belief networks and deep learning[C]. Proceedings of 2015 International Conference on Intelligent Computing and Internet of Things, IEEE. 2015:1-4.

  81. Owren MJ. GSU Praat Tools: scripts for modifying and analyzing sounds using Praat acoustics software[J]. Behav Res Methods. 2008;40(3):822–9.

    Article  Google Scholar 

  82. Eyben F, Wllmer M, Schuller B. Opensmile: the munich versatile and fast open-source audio feature extractor[C]. Proceedings of the 18th ACM international conference on Multimedia. 2010:1459-1462.

  83. Hossan M A, Memon S, Gregory M A. A novel approach for MFCC feature extraction[C]. In: 2010 4th International Conference on Signal Processing and Communication Systems. IEEE, 2010:1-5.

  84. Acheampong F A, Nunoo-Mensah H, Chen W. Transformer models for text-based emotion detection: a review of BERT-based approaches[J]. Artificial Intelligence Review, 2021:1-41.

  85. Mishra B, Fernandes SL, Abhishek K, et al. Facial expression recognition using feature based techniques and model based techniques: a survey[C]. In: 2nd international conference on electronics and communication systems (ICECS), IEEE. 2015;2015:589–94.

  86. Mastropaolo A, Scalabrino S, Cooper N, et al. Studying the usage of text-to-text transfer transformer to support code-related tasks[C]. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021:336-347.

  87. Qian F, Han J. Contrastive regularization for multimodal emotion recognition using audio and text[J]. arXiv preprint arXiv:2211.10885, 2022.

  88. Zhang Y, Wang J, Liu Y, et al. A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations[J]. Inform Fusion. 2023.

  89. Fuente C, Castellanos FJ, Valero-Mas JJ, et al. Multimodal recognition of frustration during game-play with deep neural networks[J]. Multimed Tools Appl. 2023;82(9):13617–36.

    Article  Google Scholar 

  90. Li J, Wang X, Lv G, et al. GA2MIF: graph and attention based two-stage multi-source Information Fusion for Conversational Emotion Detection[J]. IEEE Trans Affect Comput. 2023.

  91. Wang B, Dong G, Zhao Y, et al. Hierarchically stacked graph convolution for emotion recognition in conversation[J]. Knowledge-Based Systems, 2023:110285.

  92. Padi S, Sadjadi S O, Manocha D, et al. Multimodal emotion recognition using transfer learning from speaker recognition and Bert-based models[J]. arXiv preprint arXiv:2202.08974, 2022.

  93. Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3d convolutional networks[C]. In: Proceedings of the IEEE international conference on computer vision. 2015:4489-4497.

  94. Bansal K, Agarwal H, Joshi A, et al. Shapes of emotions: multimodal emotion recognition in conversations via emotion shifts[C]. In: Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models. 2022:44-56.

  95. Tang S, Luo Z, Nan G, et al. Fusion with hierarchical graphs for multimodal emotion recognition[C]. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE. 2022;2022:1288–96.

  96. Qian F, Han J. Contrastive regularization for multimodal emotion recognition using audio and text[J]. arXiv preprint arXiv:2211.10885, 2022.

  97. Wei Q, Huang X, Zhang Y. FV2ES: a fully end2end multimodal system for fast yet effective video emotion recognition inference[J]. IEEE Transactions on Broadcasting, 2022.

  98. Wu Y, Li J. Multi-modal emotion identification fusing facial expression and EEG[J]. Multimed Tools Appl. 2023;82(7):10901–19.

  99. Reid MJ, Omlin X, Espie CA, et al. The effect of sleep continuity disruption on multimodal emotion processing and regulation: a laboratory based, randomised, controlled experiment in good sleepers[J]. J Sleep Res. 2023;32(1): e13634.

    Article  Google Scholar 

  100. Fang M, Peng S, Liang Y, et al. A multimodal fusion model with multi-level attention mechanism for depression detection[J]. Biomed Signal Process Control. 2023;82: 104561.

    Article  Google Scholar 

  101. Stappen L, Baird A, Rizos G, et al. Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: emotional car reviews in-the-wild[C]. In: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop. 2020:35-44.

  102. Miranda J A, Canabal M F, Portela Garca M, et al. Embedded emotion recognition: autonomous multimodal affective internet of things[C]. In: Proceedings of the cyber-physical systems workshop. 2018, 2208:22-29.

  103. Caesar H, Bankiti V, Lang A H, et al. nuscenes: a multimodal dataset for autonomous driving[C]. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020:11621-11631.

  104. Mangano G, Ferrari A, Rafele C, et al. Willingness of sharing facial data for emotion recognition: a case study in the insurance market[J]. AI & SOCIETY. 2023:1-12..

  105. Boyd KL, Andalibi N. Automated emotion recognition in the workplace: How proposed technologies reveal potential futures of work[J]. Proceedings of the ACM on Human-Computer Interaction. 2023;7(CSCW1):1–37.

    Article  Google Scholar 

  106. Dubey A, Shingala B, Panara JR, et al. Digital content recommendation system through facial emotion recognition[J]. Int J Res Appl Sci Eng Technol. 2023;11:1272–6.

    Article  Google Scholar 

  107. Holding B C, Laukka P, Fischer H, et al. Multimodal emotion recognition is resilient to insufficient sleep: results from cross-sectional and experimental studies[J]. Sleep. 2017;40(11):zsx145.

  108. Egger M, Ley M, Hanke S. Emotion recognition from physiological signal analysis: a review[J]. Electron Notes Theor Comput Sci. 2019;343:35–55.

    Article  Google Scholar 

  109. Andrews SC, Staios M, Howe J, et al. Multimodal emotion processing deficits are present in amyotrophic lateral sclerosis[J]. Neuropsychology. 2017;31(3):304.

    Article  Google Scholar 

  110. O’Shea K, Nash R. An introduction to convolutional neural networks[J]. arXiv preprint arXiv:1511.08458, 2015.

  111. Meignier S, Merlin T. LIUM SpkDiarization: an open source toolkit for diarization[C]. CMU SPUD Workshop. 2010.

  112. Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]. IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society, 2011 (CONF).

  113. Gaida C, Lange P, Petrick R, et al. Comparing open-source speech recognition toolkits[C]. 11th International Workshop on Natural Language Processing and Cognitive Science. 2014.

  114. Moffat D, Ronan D, Reiss J D. An evaluation of audio feature extraction toolboxes[J]. 2015.

  115. Karkada D, Saletore VA. Training speech recognition models on HPC infrastructure[C]. IEEE/ACM Machine Learning in HPC Environments (MLHPC), IEEE. 2018;2018:124–32.

  116. Syed M S S, Stolar M, Pirogova E, et al. Speech acoustic features characterising individuals with high and low public trust[C]. 2019 13th International Conference on Signal Processing and Communication Systems (ICSPCS). IEEE, 2019:1-9.

  117. Degottex G, Kane J, Drugman T, et al. COVAREP-a collaborative voice analysis repository for speech technologies[C]. In: IEEE international conference on acoustics, speech and signal processing (icassp), IEEE. 2014;2014:960–4.

  118. Yadav U, Sharma AK, Patil D. Review of automated depression detection: social posts, audio and video, open challenges and future direction[J]. Concurrency and Computation: Practice and Experience. 2023;35(1): e7407.

    Article  Google Scholar 

  119. Vijayarani S, Ilamathi MJ, Nithya M. Preprocessing techniques for text mining-an overview[J]. International Journal of Computer Science and Communication Networks. 2015;5(1):7–16.

    Google Scholar 

  120. Thelwall M, Buckley K, Paltoglou G, et al. Sentiment strength detection in short informal text[J]. J Am Soc Inform Sci Technol. 2010;61(12):2544–58.

    Article  Google Scholar 

  121. Wu Z, King S. Investigating gated recurrent networks for speech synthesis[C]. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:5140-5144.

  122. Korngiebel DM, Mooney SD. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery[J]. NPJ Digital Medicine. 2021;4(1):93.

    Article  Google Scholar 

  123. Liu Y, Ott M, Goyal N, et al. Roberta: a robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.

  124. Zahidi Y, El Younoussi Y, Al-Amrani Y. Different valuable tools for Arabic sentiment analysis: a comparative evaluation[J]. International Journal of Electrical and Computer Engineering (2088-8708), 2021, 11(1).

  125. Cai H, Lin Q, Liu H, et al. Recognition of human mood, alertness and comfort under the influence of indoor lighting using physiological features[J]. Biomed Signal Process Control. 2024;89: 105661.

    Article  Google Scholar 

  126. Tan E, Hamlin JK. Toddlers’ affective responses to sociomoral scenes: Insights from physiological measures[J]. J Exp Child Psychol. 2024;237: 105757.

    Article  Google Scholar 

  127. Awada M, Becerik Gerber B, Lucas GM, et al. Stress appraisal in the workplace and its associations with productivity and mood: Insights from a multimodal machine learning analysis[J]. PLoS ONE. 2024;19(1): e0296468.

    Article  Google Scholar 

  128. Guo W, Li Y, Liu M, et al. Functional connectivity-enhanced feature-grouped attention network for cross-subject EEG emotion recognition[J]. Knowl-Based Syst. 2024;283: 111199.

    Article  Google Scholar 

  129. Naeini EK, Sarhaddi F, Azimi I, et al. A deep learning-based PPG quality assessment approach for heart rate and heart rate variability[J]. ACM Transactions on Computing for Healthcare. 2023;4(4):1–22.

    Article  Google Scholar 

  130. Panjaitan F, Nurmaini S, Partan RU. Accurate prediction of sudden cardiac death based on heart rate variability analysis using convolutional neural network[J]. Medicina. 2023;59(8):1394.

    Article  Google Scholar 

  131. Nashiro K, Yoo HJ, Cho C, et al. Effects of a randomised trial of 5-week heart rate variability biofeedback intervention on cognitive function: possible benefits for inhibitory control[J]. Appl Psychophysiol Biofeedback. 2023;48(1):35–48.

    Article  Google Scholar 

  132. Qi N, Piao Y, Yu P, et al. Predicting epileptic seizures based on EEG signals using spatial depth features of a 3D-2D hybrid CNN[J]. Medical & Biological Engineering & Computing, 2023:1-12.

  133. Cho D, Lee B. Automatic sleep-stage classification based on residual unit and attention networks using directed transfer function of electroencephalogram signals[J]. Biomed Signal Process Control. 2024;88: 105679.

    Article  Google Scholar 

  134. Li Z, Xu B, Zhu C, et al. CLMLF: a contrastive learning and multi-layer fusion method for multimodal sentiment detection[J]. arXiv preprint arXiv:2204.05515, 2022.

  135. Yoon S, Byun S, Jung K, Multimodal speech emotion recognition using audio and text[C]. In,. IEEE Spoken Language Technology Workshop (SLT). IEEE. 2018;2018:112–8.

  136. Hazarika D, Poria S, Zadeh A, et al. Conversational memory network for emotion recognition in dyadic dialogue videos[C]. In: Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting. NIH Public Access, 2018, 2018:2122.

  137. Mai S, Hu H, Xing S. Divide, conquer and combine: hierarchical feature fusion network with local and global perspectives for multimodal affective computing[C]. In: Proceedings of the 57th annual meeting of the association for computational linguistics. 2019:481-492.

  138. You Q, Luo J, Jin H, et al. Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia[C]. In: Proceedings of the Ninth ACM international conference on Web search and data mining. 2016:13-22.

  139. Chen M, Wang S, Liang P P, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning[C]. In: Proceedings of the 19th ACM international conference on multimodal interaction. 2017:163-171.

  140. Zadeh A, Chen M, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv preprint arXiv:1707.07250, 2017.

  141. Zhang Y, Yu Y, Wang M, et al. Self-adaptive representation learning model for multi-modal sentiment and sarcasm joint analysis[J]. Communications and Applications: ACM Transactions on Multimedia Computing; 2023.

    Google Scholar 

  142. Poria S, Cambria E, Hazarika D, et al. Context-dependent sentiment analysis in user-generated videos[C]. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers). 2017:873-883.

  143. Poria S, Chaturvedi I, Cambria E, et al. Convolutional MKL, based multimodal emotion recognition and sentiment analysis[C]. In: IEEE 16th international conference on data mining (ICDM), IEEE. 2016;2016:439–48.

  144. Deng D, Zhou Y, Pi J, et al. Multimodal utterance-level affect analysis using visual, audio and text features[J]. arXiv preprint arXiv:1805.00625, 2018.

  145. Chen F, Luo Z, Xu Y, et al. Complementary fusion of multi-features and multi-modalities in sentiment analysis[J]. arXiv preprint arXiv:1904.08138, 2019.

  146. Kumar A, Vepa J. Gated mechanism for attention based multi modal sentiment analysis[C]. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020:4477-4481.

  147. Xu N, Mao W. Multisentinet: a deep semantic network for multimodal sentiment analysis[C]. In: Proceedings of the. ACM on Conference on Information and Knowledge Management. 2017;2017:2399–402.

  148. Yu J, Jiang J, Xia R. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2019;28:429–39.

    Article  Google Scholar 

  149. Mai S, Xing S, Hu H. Analyzing multimodal sentiment via acoustic-and visual-LSTM with channel-aware temporal convolution network[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021;29:1424–37.

    Article  Google Scholar 

  150. Xu N, Mao W, Chen G. Multi-interactive memory network for aspect based multimodal sentiment analysis[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01):371-378.

  151. Liu D, Chen L, Wang Z, et al. Speech expression multimodal emotion recognition based on deep belief network[J]. Journal of Grid Computing. 2021;19(2):22.

    Article  Google Scholar 

  152. Wang F, Tian S, Yu L, et al. TEDT: transformer-based encoding-decoding translation network for multimodal sentiment analysis[J]. Cogn Comput. 2023;15(1):289–303.

    Article  Google Scholar 

  153. Kumar A, Vepa J. Gated mechanism for attention based multi modal sentiment analysis[C]. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020:4477-4481.

  154. Lu Y, Zheng W, Li B, et al. Combining eye movements and EEG to enhance emotion recognition. In: Proceedings of the Twenty-fourth International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, 2015:1170-1176.

  155. Yu Y, Lin H, Meng J, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks. Algorithms. 2016;9(2):41.

    Article  MathSciNet  Google Scholar 

  156. Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2015:2539-2544.

  157. Wang HH, Meghawat A, Morency LP, et al. Select-additive learning: improving generalization in multimodal sentiment analysis. In: Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2017:949-954.

  158. Yu HL, Gui LK, Madaio M, et al. Temporally selective attention model for social and affective state recognition in multimedia content. In: Proceedings of the 25th ACM International Conference on Multimedia, ACM, 2017:1743-1751.

  159. Williams J, Comanescu R, Radu O, et al. DNN multimodal fusion techniques for predicting video sentiment. In: Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), 2018:64-72.

  160. Gkoumas, D., Li, Q., Dehdashti, S., et al. Quantum cognitively motivated decision fusion for video sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(1):827-835.

  161. Sun, J., Yin, H., Tian, Y., et al. Two-level multimodal fusion for sentiment analysis in public security. Security and Communication Networks, 2021.

  162. Zhang F, Li XC, Lim CP, et al. Deep emotional arousal network for multimodal sentiment analysis and emotion recognition[J]. Inform Fusion. 2022;88:296–304.

    Article  Google Scholar 

  163. Wang D, Guo X, Tian Y, et al. TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recogn. 2023;136: 109259.

    Article  Google Scholar 

  164. Zheng W, Liu W, Lu Y, et al. Emotionmeter: a multimodal framework for recognizing human emotions. IEEE Transactions on Cybernetics. 2018;49(3):1110–22.

    Article  Google Scholar 

  165. Zhang S, Zhang S, Huang T, et al. Learning affective features with a hybrid deep model for audio-visual emotion recognition. IEEE Trans Circuits Syst Video Technol. 2017;28(10):1–1.

    Google Scholar 

  166. Chen M, Wang S, Liang P P, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning[C]. In: Proceedings of the 19th ACM international conference on multimodal interaction. 2017:163-171.

  167. Shenoy A, Sardana A. Multilogue-net: a context aware RNN for multi-modal emotion detection and sentiment analysis in conversation[J]. arXiv preprint arXiv:2002.08267, 2020.

  168. Cimtay Y, Ekmekcioglu E, Caglar-Ozhan S. Cross-subject multimodal emotion recognition based on hybrid fusion[J]. IEEE Access. 2020;8:168865–78.

    Article  Google Scholar 

  169. Gunes H, Piccardi M. Bi-modal emotion recognition from expressive face and body gestures[J]. J Netw Comput Appl. 2007;30(4):1334–45.

    Article  Google Scholar 

  170. Paraskevopoulos G, Georgiou E, Potamianos A. Mmlatch: bottom-up top-down fusion for multimodal sentiment analysis[C]. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022:4573-4577.

  171. Qu L, Liu S, Wang M, et al. Trans2Fuse: empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks[J]. Expert Syst Appl. 2024;236: 121363.

    Article  Google Scholar 

  172. Fan H, Zhang X, Xu Y, et al. Transformer-based multimodal feature enhancement networks for multimodal depression detection integrating video, audio and remote photoplethysmograph signals[J]. Inform Fusion. 2024;104: 102161.

    Article  Google Scholar 

  173. Zhu X, Huang Y, Wang X, et al. Emotion recognition based on brain-like multimodal hierarchical perception[J]. Multimed Tools Appl. 2023:1-19.

  174. Huang J, Pu Y, Zhou D, et al. Dynamic hypergraph convolutional network for multimodal sentiment analysis[J]. Neurocomputing. 2024;565: 126992.

    Article  Google Scholar 

  175. Wang X, Guan Z, Qian W, et al. CS2Fusion: contrastive learning for self-supervised infrared and visible image fusion by estimating feature compensation map[J]. Inform Fusion. 2024;102: 102039.

    Article  Google Scholar 

  176. Han Y, Nie R, Cao J, et al. IE-CFRN: information exchange-based collaborative feature representation network for multi-modal medical image fusion[J]. Biomed Signal Process Control. 2023;86: 105301.

    Article  Google Scholar 

  177. Ni J, Bai Y, Zhang W, et al. Deep equilibrium multimodal fusion[J]. arXiv preprint arXiv:2306.16645, 2023.

  178. Li H, Zhao J, Li J, et al. Feature dynamic alignment and refinement for infrared-visible image fusion: translation robust fusion[J]. Inform Fusion. 2023;95:26–41.

    Article  Google Scholar 

  179. Liu J, Capurro D, Nguyen A, et al. Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities[J]. J Biomed Inform. 2023;145: 104466.

    Article  Google Scholar 

  180. Zhang X, Wei X, Zhou Z, et al. Dynamic alignment and fusion of multimodal physiological patterns for stress recognition[J]. IEEE Trans Affect Comput. 2023

  181. Zhang Y, Wang J, Liu Y, et al. A multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations[J]. Inform Fusion. 2023;93:282–301.

    Article  Google Scholar 

  182. Liu Y, Zhang X, Kauttonen J, et al. Uncertain facial expression recognition via multi-task assisted correction[J]. IEEE Trans Multimed. 2023.

  183. Liu J, Lin R, Wu G, et al. Coconet: coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion[J]. Int J Comput Vis. 2023:1-28.

  184. Liu K, Xue F, Guo D, et al. Multimodal graph contrastive learning for multimedia-based recommendation[J]. IEEE Trans Multimed. 2023.

  185. Song J, Chen H, Li C, et al. MIFM: multimodal information fusion model for educational exercises[J]. Electronics. 2023;12(18):3909.

    Article  Google Scholar 

  186. Zhang S, Yang Y, Chen C, et al. Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: a systematic review of recent advancements and future prospects[J]. Expert Syst Appl. 2023:121692.

  187. Dogan G, Akbulut FP. Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress[J]. Neural Comput Appl. 2023;35(34):24435–54.

    Article  Google Scholar 

  188. Liu W, Zuo Y. Stone needle: a general multimodal large-scale model framework towards healthcare[J]. arXiv preprint arXiv:2306.16034, 2023.

  189. Zhao X, Li M, Weber C, et al. Chat with the environment: interactive multimodal perception using large language models[J]. arXiv preprint arXiv:2303.08268, 2023.

  190. Kim K, Park S. AOBERT: all-modalities-in-one BERT for multimodal sentiment analysis[J]. Inform Fusion. 2023;92:37–45.

    Article  Google Scholar 

  191. Tong Z, Du N, Song X, et al. Study on mindspore deep learning framework[C]. In: 2021 17th International Conference on Computational Intelligence and Security (CIS). IEEE, 2021:183-186.

  192. Rasley J, Rajbhandari S, Ruwase O, et al. Deepspeed: system optimizations enable training deep learning models with over 100 billion parameters[C]. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020:3505-3506.

  193. Huang J, Wang H, Sun Y, et al. ERNIE-GeoL: a geography-and-language pre-trained model and its applications in Baidu maps[C]. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022:3029-3039.

  194. Busso C, Bulut M, Lee CC, et al. IEMOCAP: interactive emotional dyadic motion capture database[J]. Lang Resour Eval. 2008;42:335–59.

    Article  Google Scholar 

  195. Zadeh A, Zellers R, Pincus E, et al. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv preprint arXiv:1606.06259, 2016.

  196. Poria S, Hazarika D, Majumder N, et al. Meld: a multimodal multi-party dataset for emotion recognition in conversations[J]. arXiv preprint arXiv:1810.02508, 2018.

  197. Zadeh A A B, Liang P P, Poria S, et al. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph[C]. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018:2236-2246.

  198. Yu W, Xu H, Meng F, et al. Ch-sims: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]. In: Proceedings of the 58th annual meeting of the association for computational linguistics. 2020:3718-3727.

  199. Zafeiriou S, Kollias D, Nicolaou M A, et al. Aff-wild: valence and arousal’In-the-Wild’challenge[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017:34-41.

  200. Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English[J]. PLoS ONE. 2018;13(5): e0196391.

    Article  Google Scholar 

  201. McKeown G, Valstar M, Cowie R, et al. The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent[J]. IEEE Trans Affect Comput. 2011;3(1):5–17.

    Article  Google Scholar 

  202. Chen J, Wang C, Wang K, et al. HEU Emotion: a large-scale database for multimodal emotion recognition in the wild[J]. Neural Comput Appl. 2021;33:8669–85.

    Article  Google Scholar 

  203. Shen G, Wang X, Duan X, et al. Memor: a dataset for multimodal emotion reasoning in videos[C]. In: Proceedings of the 28th ACM International Conference on Multimedia. 2020:493-502.

  204. Wu X, Zheng WL, Li Z, et al. Investigating EEG-based functional connectivity patterns for multimodal emotion recognition[J]. J Neural Eng. 2022;19(1): 016012.

    Article  Google Scholar 

  205. Zadeh A, Liang P P, Poria S, et al. Multi-attention recurrent network for human communication comprehension[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).

  206. Zadeh A, Liang P P, Mazumder N, et al. Memory fusion network for multi-view sequential learning[C]. In: Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).

  207. Liu S, Gao P, Li Y, et al. Multi-modal fusion network with complementarity and importance for emotion recognition[J]. Inf Sci. 2023;619:679–94.

    Article  Google Scholar 

  208. Chen F, Shao J, Zhu S, et al. Multivariate, multi-frequency and multimodal: rethinking graph neural networks for emotion recognition in conversation[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:10761-10770.

  209. Khan M, Gueaieb W, El Saddik A, et al. MSER: multimodal speech emotion recognition using cross-attention with deep fusion[J]. Expert Syst Appl. 2023:122946.

  210. Pan J, Fang W, Zhang Z, et al. Multimodal emotion recognition based on facial expressions, speech, and EEG[J]. IEEE Open Journal of Engineering in Medicine and Biology, 2023.

  211. Meng T, Shou Y, Ai W, et al. Deep imbalanced learning for multimodal emotion recognition in conversations[J]. arXiv preprint arXiv:2312.06337, 2023.

  212. Fu Z, Liu F, Xu Q, et al. LMR-CBT: learning modality-fused representations with CB-transformer for multimodal emotion recognition from unaligned multimodal sequences[J]. Front Comp Sci. 2024;18(4): 184314.

    Article  Google Scholar 

  213. Ma H, Wang J, Lin H, et al. A transformer-based model with self-distillation for multimodal emotion recognition in conversations[J]. IEEE Trans Multimed. 2023.

  214. Shi T, Huang S L. MultiEMO: an attention-based correlation-aware multimodal fusion framework for emotion recognition in conversations[C]. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023:14752-14766.

  215. Li X. TACOformer: token-channel compounded cross attention for multimodal emotion recognition[J]. arXiv preprint arXiv:2306.13592, 2023.

  216. Li J, Wang X, Lv G, et al. Graphcfc: a directed graph based cross-modal feature complementation approach for multimodal conversational emotion recognition[J]. IEEE Trans Multimed. 2023.

  217. Palash M, Bhargava B. EMERSK–explainable multimodal emotion recognition with situational knowledge[J]. arXiv preprint arXiv:2306.08657, 2023.

  218. Li Y, Wang Y, Cui Z. Decoupled multimodal distilling for emotion recognition[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:6631-6640.

  219. Le HD, Lee GS, Kim SH, et al. Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning[J]. IEEE Access. 2023;11:14742–51.

    Article  Google Scholar 

  220. Tang J, Ma Z, Gan K, et al. Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment[J]. Inform Fusion. 2024;103: 102129.

    Article  Google Scholar 

  221. He Y, Seng KP, Ang LM. multimodal sensor-input architecture with deep learning for audio-visual speech recognition in wild[J]. Sensors. 2023;23(4):1834.

    Article  Google Scholar 

  222. Stappen L, Schumann L, Sertolli B, et al. Muse-toolbox: the multimodal sentiment analysis continuous annotation fusion and discrete class transformation toolbox[M]. In: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge. 2021:75-82.

  223. Tang J, Ma Z, Gan K, et al. Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment[J]. Inform Fusion. 2024;103: 102129.

    Article  Google Scholar 

  224. Wang W, Arora R, Livescu K, et al. On deep multi-view representation learning[C]. In: International conference on machine learning. PMLR, 2015:1083-1092.

  225. Yu Y, Tang S, Aizawa K, et al. Category-based deep CCA for fine-grained venue discovery from multimodal data[J]. IEEE transactions on neural networks and learning systems. 2018;30(4):1250–8.

    Article  MathSciNet  Google Scholar 

  226. Liu W, Qiu JL, Zheng WL, et al. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition[J]. IEEE Transactions on Cognitive and Developmental Systems. 2021;14(2):715–29.

  227. Deshmukh S, Abhyankar A, Kelkar S. DCCA and DMCCA framework for multimodal biometric system[J]. Multimed Tools Appl. 2022;81(17):24477–91.

    Article  Google Scholar 

  228. Cevher D, Zepf S, Klinger R. Towards multimodal emotion recognition in German speech events in cars using transfer learning[J]. arXiv preprint arXiv:1909.02764, 2019.

  229. Xi D, Zhou J, Xu W, et al. Discrete emotion synchronicity and video engagement on social media: a moment-to-moment analysis[J]. Int J Electron Commerce. 2024:1-37.

  230. Lv Y, Liu Z, Li G. Context-aware interaction network for RGB-T semantic segmentation[J]. IEEE Trans Multimed. 2024.

  231. Ai W, Zhang F C, Meng T, et al. A two-stage multimodal emotion recognition model based on graph contrastive learning[J]. arXiv preprint arXiv:2401.01495, 2024.

  232. Wan Y, Chen Y, Lin J, et al. A knowledge-augmented heterogeneous graph convolutional network for aspect-level multimodal sentiment analysis[J]. Comput Speech Lang. 2024;85: 101587.

    Article  Google Scholar 

  233. Tiwari P, Zhang L, Qu Z, et al. Quantum Fuzzy Neural Network for multimodal sentiment and sarcasm detection[J]. Inform Fusion. 2024;103: 102085.

    Article  Google Scholar 

  234. Li J, Li L, Sun R, et al. MMAN-M2: multiple multi-head attentions network based on encoder with missing modalities[J]. Pattern Recogn Lett. 2024;177:110–20.

    Article  Google Scholar 

  235. Zuo H, Liu R, Zhao J, et al. Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities[C]. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023:1-5.

  236. Li M, Yang D, Zhang L. Towards robust multimodal sentiment analysis under uncertain signal missing[J]. IEEE Signal Process Lett. 2023.

  237. Mou L, Zhao Y, Zhou C, et al. Driver emotion recognition with a hybrid attentional multimodal fusion framework[J]. IEEE Trans Affect Comput. 2023.

  238. Kumar A, Sharma K, Sharma A. MEmoR: a multimodal emotion recognition using affective biomarkers for smart prediction of emotional health for people analytics in smart industries[J]. Image Vis Comput. 2022;123: 104483.

    Article  Google Scholar 

  239. Chong L, Jin M, He Y. EmoChat: bringing multimodal emotion detection to mobile conversation[C]. In: 2019 5th International Conference on Big Data Computing and Communications (BIGCOM). IEEE, 2019:213-221.

Download references

Funding

This work was supported by the National Natural Science Foundation of China(61771299).

Author information

Authors and Affiliations

Authors

Contributions

Xianxun Zhu drafted and wrote the manuscript. Rui Wang served as the corresponding author, overseeing and coordinating the entire study. Chaopeng Guo created figures and charts for the article. Heyang Feng retrieved and organized relevant literature. Yao Huang typeset and formatted the manuscript. Yichen Feng and Xiangyang Wang assisted with language editing. All authors have reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Rui Wang.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhu, X., Guo, C., Feng, H. et al. A Review of Key Technologies for Emotion Analysis Using Multimodal Information. Cogn Comput (2024). https://doi.org/10.1007/s12559-024-10287-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12559-024-10287-z

Keywords