A Review of Key Technologies for Emotion Analysis Using Multimodal Information

Zhu, Xianxun; Guo, Chaopeng; Feng, Heyang; Huang, Yao; Feng, Yichen; Wang, Xiangyang; Wang, Rui

doi:10.1007/s12559-024-10287-z

A Review of Key Technologies for Emotion Analysis Using Multimodal Information

Published: 01 June 2024

(2024)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Xianxun Zhu¹,
Chaopeng Guo¹,
Heyang Feng¹,
Yao Huang¹,
Yichen Feng¹,
Xiangyang Wang¹ &
…
Rui Wang ORCID: orcid.org/0000-0002-7974-9510¹

169 Accesses
Explore all metrics

Abstract

Emotion analysis, an integral aspect of human–machine interactions, has witnessed significant advancements in recent years. With the rise of multimodal data sources such as speech, text, and images, there is a profound need for a comprehensive review of pivotal elements within this domain. Our paper delves deep into the realm of emotion analysis, examining multimodal data sources encompassing speech, text, images, and physiological signals. We provide a curated overview of relevant literature, academic forums, and competitions. Emphasis is laid on dissecting unimodal processing methods, including preprocessing, feature extraction, and tools across speech, text, images, and physiological signals. We further discuss the nuances of multimodal data fusion techniques, spotlighting early, late, model, and hybrid fusion strategies. Key findings indicate the essentiality of analyzing emotions across multiple modalities. Detailed discussions on emotion elicitation, expression, and representation models are presented. Moreover, we uncover challenges such as dataset creation, modality synchronization, model efficiency, limited data scenarios, cross-domain applicability, and the handling of missing modalities. Practical solutions and suggestions are provided to address these challenges. The realm of multimodal emotion analysis is vast, with numerous applications ranging from driver sentiment detection to medical evaluations. Our comprehensive review serves as a valuable resource for both scholars and industry professionals. It not only sheds light on the current state of research but also highlights potential directions for future innovations. The insights garnered from this paper are expected to pave the way for subsequent advancements in deep multimodal emotion analysis tailored for real-world deployments.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A review on sentiment analysis and emotion detection from text

Article 28 August 2021

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Article Open access 07 May 2022

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Data Availability

The data are available from the corresponding author on reasonable request.

References

Foa EB, Kozak MJ. Emotional processing of fear: exposure to corrective information[J]. Psychol Bull. 1986;99(1):20.
Article Google Scholar
Ernst H, Scherpf M, Pannasch S, et al. Assessment of the human response to acute mental stress-An overview and a multimodal study[J]. PLoS ONE. 2023;18(11): e0294069.
Liu EH, Chambers CR, Moore C. Fifty years of research on leader communication: What we know and where we are going[J]. The Leadership Quarterly. 2023:101734.
Russell JA. Core affect and the psychological construction of emotion[J]. Psychol Rev. 2003;110(1):145.
Article Google Scholar
Abdullah SMSA, Ameen SYA, Sadeeq MAM, et al. Multimodal emotion recognition using deep learning[J]. J Appl Sci Technol Trends. 2021;2(02):52–8.
Google Scholar
Marechal C, Mikolajewski D, Tyburek K, et al. Survey on AI-Based Multimodal Methods for Emotion Detection[J]. High-performance modelling and simulation for big data applications. 2019;11400:307–24.
Article Google Scholar
Shoumy NJ, Ang LM, Seng KP, et al. Multimodal big data affective analytics: A comprehensive survey using text, audio, visual and physiological signals[J]. J Netw Comput Appl. 2020;149:102447.
Article Google Scholar
Zhao S, Yao X, Yang J, et al. Affective image content analysis: Two decades review and new perspectives[J]. IEEE Trans Pattern Anal Mach Intell. 2021;44(10):6729–51.
Article Google Scholar
Christian H, Suhartono D, Chowanda A, et al. Text based personality prediction from multiple social media data sources using pre-trained language model and model averaging[J]. J Big Data. 2021;8(1):1–20.
Article Google Scholar
Das R, Singh T D. Multimodal Sentiment Analysis: A Survey of Methods, Trends and Challenges[J]. ACM Comput Surv. 2023.
Zhu L, Zhu Z, Zhang C, et al. Multimodal sentiment analysis based on fusion methods: A survey[J]. Inform Fusion. 2023.
Ahmed N, Al Aghbari Z, Girija S. A systematic survey on multimodal emotion recognition using learning algorithms[J]. Intell Syst Appl. 2023;17: 200171.
Google Scholar
Jabeen S, Li X, Amin MS, et al. A Review on Methods and Applications in Multimodal Deep Learning[J]. ACM Trans Multimed Comput Commun Appl. 2023;19(2s):1–41.
Article Google Scholar
Gandhi A, Adhvaryu K, Poria S, et al. Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions[J]. Inform Fusion. 2022.
Dimitri GM. A Short Survey on Deep Learning for Multimodal Integration: Applications, Future Perspectives and Challenges[J]. Computers. 2022;11(11):163.
Article Google Scholar
Xiaoming Z, Yijiao Y, Shiqing Z. Survey of Deep Learning Based Multimodal Emotion Recognition[J]. J Front Comput Sci Technol. 2022;16(7):1479.
Google Scholar
Luna-Jimenez C, Kleinlein R, Griol D, et al. A proposal for multimodal emotion recognition using aural transformers and action units on RAVDESS dataset[J]. Appl Sci. 2021;12(1):327.
Article Google Scholar
Chandrasekaran G, Nguyen TN, Hemanth DJ. Multimodal sentimental analysis for social media applications: A comprehensive review[J]. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery. 2021;11(5): e1415.
Google Scholar
Zhao S, Jia G, Yang J, et al. Emotion recognition from multiple modalities: Fundamentals and methodologies[J]. IEEE Signal Process Mag. 2021;38(6):59–73.
Article Google Scholar
Abdu SA, Yousef AH, Salem A. Multimodal video sentiment analysis using deep learning approaches, a survey[J]. Inform Fusion. 2021;76:204–26.
Article Google Scholar
Sharma G, Dhall A. A survey on automatic multimodal emotion recognition in the wild[J]. Advances in data science: Methodol Appl. 2021:35-64.
Nandi A, Xhafa F, Subirats L, et al. A survey on multimodal data stream mining for e-learner’s emotion recognition[C]. In: 2020 International Conference on Omni-layer Intelligent Systems (COINS). IEEE; 2020. p. 1–6.
Zhang J, Yin Z, Chen P, et al. Emotion recognition using multi-modal data and machine learning techniques: A tutorial and review[J]. Inform Fusion. 2020;59:103–26.
Article Google Scholar
Seng JKP, Ang KLM. Multimodal emotion and sentiment modeling from unstructured Big data: Challenges, architecture, and techniques[J]. IEEE Access. 2019;7:90982–98.
Article Google Scholar
Baltru?aitis T, Ahuja C, Morency LP. Multimodal machine learning: A survey and taxonomy[J]. IEEE Trans Pattern Anal Mach Intell. 2018;41(2):423–43.
Poria S, Cambria E, Bajpai R, et al. A review of affective computing: From unimodal analysis to multimodal fusion[J]. Inform Fusion. 2017;37:98–125.
Article Google Scholar
Latha CP, Priya M. A review on deep learning algorithms for speech and facial emotion recognition[J]. APTIKOM J Comput Sci Inf Technol. 2016;1(3):92–108.
Schuller B, Valstar M, Eyben F, et al. Avec 2011-the first international audio/visual emotion challenge[C]. Affective Computing and Intelligent Interaction: Fourth International Conference, ACII 2011, Memphis, TN, USA, October 9-12, 2011, Proceedings, Part II. Springer Berlin Heidelberg, 2011:415-424.
Schuller B, Valstar M, Eyben F, McKeown G, Cowie R, Pantic M. Avec 2011-the first international audio/visual emotion challenge. In Affective Computing and Intelligent Interaction, 2011, p. 415-424. Springer Berlin Heidelberg.
Chen H, Zhou H, Du J, et al. The first multimodal information based speech processing challenge:Data, tasks, baselines and results. In Processing ICASSP. 2022, p. 9266-9270. IEEE.
Zafeiriou S, Kollias D, Nicolaou M A, et al. Aff-wild: valence and arousal’In-the-Wild’challenge[C]. Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017:34-41.
Baveye Y, Dellandrea E, Chamaret C, et al. LIRIS-ACCEDE: A video database for affective content analysis[J]. IEEE Trans Affect Comput. 2015;6(1):43–55.
Article Google Scholar
Stappen L, Baird A, Rizos G, et al. Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: Emotional car reviews in-the-wild[C]. Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop. 2020:35-44.
Li Y, Tao J, Schuller B, et al. Mec 2017: Multimodal emotion recognition challenge[C]. 2018 First Asian Conference on Affective Computing and Intelligent Interaction (ACII Asia). IEEE, 2018:1-5.
Kollias D. Abaw: valence-arousal estimation, expression recognition, action unit detection & multi-task learning challenges[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022:2328-2336.
Lian Z, Sun H, Sun L, et al. Mer 2023: Multi-label learning, modality robustness, and semi-supervised learning[C]. In: Proceedings of the 31st ACM International Conference on Multimedia. 2023:9610-9614.
Li J, Zhang Z, Lang J, et al. Hybrid multimodal feature extraction, mining and fusion for sentiment analysis[C]. In: Proceedings of the 3rd International on Multimodal Sentiment Analysis Workshop and Challenge. 2022:81-88.
Zong D, Ding C, Li B, et al. Building robust multimodal sentiment recognition via a simple yet effective multimodal transformer[C]. In: Proceedings of the 31st ACM International Conference on Multimedia. 2023:9596-9600.
Advances in Neural Information Processing Systems 10: Proceedings of the 1997 Conference[M]. Mit Press, 1998.
Amsaleg L, Huet B, Larson M, et al. Proceedings of the 27th ACM International Conference on Multimedia[C]. 27th ACM International Conference on Multimedia. ACM Press, 2019.
Lomonaco V, Pellegrini L, Rodriguez P, et al. Cvpr 2020 continual learning in computer vision competition: Approaches, results, current challenges and future directions[J]. Artif Intell. 2022;303: 103635.
Article Google Scholar
Gatterbauer W, Kumar A. Guest Editors’ Introduction to the Special Section on the 33rd International Conference on Data Engineering (ICDE 2017)[J]. IEEE Trans Knowl Data Eng. 2019;31(7):1222-1223.
Liu Y, Paek T, Patwardhan M. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations[C]. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations. 2018.
Lang J. Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI 2018)[J]. 2018.
Reddy C K A, Dubey H, Gopal V, et al. ICASSP 2021 deep noise suppression challenge[C]. ICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2021:6623-6627.
Morency L P, Bohus D, Aghajan H, et al. ICMI’12: Proceedings of the ACM SIGCHI 14th International Conference on Multimodal Interaction[C]. 14th International Conference on Multimodal Interaction, ICMI 2012. Association for Computing Machinery (ACM), 2012.
Nitta N, Hu A, Tobitani K. MMArt-ACM 2022: 5th Joint Workshop on Multimedia Artworks Analysis and Attractiveness Computing in Multimedia[C]. Proceedings of the International Conference on Multimedia Retrieval. 2022;2022:692–3.
Google Scholar
PRICAI 2022: Trends in Artificial Intelligence: 19th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2022, Shanghai, China, November 10-13, 2022, Proceedings, Part III[M]. Springer Nature, 2022.
Gabbouj M. Proceedings of WIAMIS 2001: Workshop on Image Analysis for Multimedia Services[J]. 2001.
Strike PC, Steptoe A. Behavioral and emotional triggers of acute coronary syndromes: a systematic review and critique[J]. Psychosom Med. 2005;67(2):179–86.
Article Google Scholar
Hubert W, de Jong-Meyer R. Autonomic, neuroendocrine, and subjective responses to emotion-inducing film stimuli[J]. Int J Psychophysiol. 1991;11(2):131–40.
Article Google Scholar
Bhattacharyya MR, Steptoe A. Emotional triggers of acute coronary syndromes: strength of evidence, biological processes, and clinical implications[J]. Prog Cardiovasc Dis. 2007;49(5):353–65.
Scopa C, Contalbrigo L, Greco A, et al. Emotional transfer in human-horse interaction: New perspectives on equine assisted interventions[J]. Animals. 2019;9(12):1030.
Article Google Scholar
Hong JK, Gao L, Singh J, et al. Evaluating medical device and material thrombosis under flow: current and emerging technologies[J]. Biomater Sci. 2020;8(21):5824–45.
Article Google Scholar
Werheid K, Alpay G, Jentzsch I, et al. Priming emotional facial expressions as evidenced by event-related brain potentials[J]. Int J Psychophysiol. 2005;55(2):209–19.
Article Google Scholar
Matsumoto D, Ekman P. The relationship among expressions, labels, and descriptions of contempt[J]. J Pers Soc Psychol. 2004;87(4):529.
Article Google Scholar
Picard R W. Affective computing[M]. MIT press, 2000.
Tomkins S S. Affect imagery consciousness: the complete edition: two volumes[M]. Springer publishing company, 2008.
Mehrabian A. Comparison of the PAD and PANAS as models for describing emotions and for differentiating anxiety from depression[J]. J Psychopathol Behav Assess. 1997;19:331–57.
Article Google Scholar
Russell JA. Core affect and the psychological construction of emotion[J]. Psychol Rev. 2003;110(1):145.
Article Google Scholar
Posner J, Russell JA, Peterson BS. The circumplex model of affect: An integrative approach to affective neuroscience, cognitive development, and psychopathology[J]. Dev Psychopathol. 2005;17(3):715–34.
Article Google Scholar
Bleicher RJ, Ciocca RM, Egleston BL, et al. Association of routine pretreatment magnetic resonance imaging with time to surgery, mastectomy rate, and margin status[J]. J Am Coll Surg. 2009;209(2):180–7.
Article Google Scholar
Swathi C, Anoop B K, Dhas D A S, et al. Comparison of different image preprocessing methods used for retinal fundus images[C]. 2017 Conference on Emerging Devices and Smart Systems (ICEDSS). IEEE, 2017:175-179.
Finlayson G D, Schiele B, Crowley J L. Comprehensive colour image normalization[C]. Computer Vision-ECCV’98: 5th European Conference on Computer Vision Freiburg, Germany, June, 2-6, 1998 Proceedings, Volume I 5. Springer Berlin Heidelberg, 1998:475-490.
Vishwakarma AK, Mishra A. Color image enhancement techniques: a critical review[J]. Indian J Comput Sci Eng. 2012;3(1):39–45.
Google Scholar
Celik T. Two-dimensional histogram equalization and contrast enhancement[J]. Pattern Recogn. 2012;45(10):3810–24.
Article Google Scholar
Jayaram S, Schmugge S, Shin M C, et al. Effect of colorspace transformation, the illuminance component, and color modeling on skin detection[C]. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004. IEEE, 2004, 2:II-II.
Pandey M, Bhatia M, Bansal A, An anatomization of noise removal techniques on medical images[C]. international conference on innovation and challenges in cyber security (iciccs-inbush). IEEE. 2016;2016:224–9.
Maini R, Aggarwal H. Study and comparison of various image edge detection techniques[J]. Int J Image Process (IJIP). 2009;3(1):1–11.
Google Scholar
Eltanany AS, SAfy Elwan M, Amein AS. Key point detection techniques[C]. Proceedings of the International Conference on Advanced Intelligent Systems and Informatics 2019. Springer International Publishing. 2020:901-911.
Yang MH, Kriegman DJ, Ahuja N. Detecting faces in images: a survey[J]. IEEE Trans Pattern Anal Mach Intell. 2002;24(1):34–58.
Qin J, He ZS. ASVM, face recognition method based on Gabor-featured key points[C]. international conference on machine learning and cybernetics. IEEE. 2005;2005(8):5144–9.
Xiong X, De la Torre F. Supervised descent method and its applications to face alignment[C]. Proceedings of the IEEE conference on computer vision and pattern recognition. 2013:532-539.
Kalyuga S, Chandler P, Sweller J. Incorporating learner experience into the design of multimedia instruction[J]. J Educ Psychol. 2000;92(1):126.
Article Google Scholar
Bezoui M, Elmoutaouakkil A, Beni-hssane A. Feature extraction of some Quranic recitation using mel-frequency cepstral coeficients (MFCC)[C]. 5th international conference on multimedia computing and systems (ICMCS). IEEE. 2016;2016:127–31.
Shrawankar U, Thakare V M. Adverse conditions and ASR techniques for robust speech user interface[J]. arXiv preprint arXiv:1303.5515, 2013.
Liu L, He J, Palm G. Signal modeling for speaker identification. In: Proceedings of the 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing (vol. 2). IEEE; 1996. pp. 665–8.
Bozkurt B, Couvreur L, Dutoit T. Chirp group delay analysis of speech signals[J]. Speech Commun. 2007;49(3):159–76.
Article Google Scholar
Seman N, Bakar ZA, Bakar NA. An evaluation of endpoint detection measures for Malay speech recognition of an isolated words[C]. International Symposium on Information Technology, IEEE. 2010;2010(3):1628–35.
Google Scholar
Hua Y, Guo J, Zhao H. Deep belief networks and deep learning[C]. Proceedings of 2015 International Conference on Intelligent Computing and Internet of Things, IEEE. 2015:1-4.
Owren MJ. GSU Praat Tools: scripts for modifying and analyzing sounds using Praat acoustics software[J]. Behav Res Methods. 2008;40(3):822–9.
Article Google Scholar
Eyben F, Wllmer M, Schuller B. Opensmile: the munich versatile and fast open-source audio feature extractor[C]. Proceedings of the 18th ACM international conference on Multimedia. 2010:1459-1462.
Hossan M A, Memon S, Gregory M A. A novel approach for MFCC feature extraction[C]. In: 2010 4th International Conference on Signal Processing and Communication Systems. IEEE, 2010:1-5.
Acheampong F A, Nunoo-Mensah H, Chen W. Transformer models for text-based emotion detection: a review of BERT-based approaches[J]. Artificial Intelligence Review, 2021:1-41.
Mishra B, Fernandes SL, Abhishek K, et al. Facial expression recognition using feature based techniques and model based techniques: a survey[C]. In: 2nd international conference on electronics and communication systems (ICECS), IEEE. 2015;2015:589–94.
Mastropaolo A, Scalabrino S, Cooper N, et al. Studying the usage of text-to-text transfer transformer to support code-related tasks[C]. In: 2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE). IEEE, 2021:336-347.
Qian F, Han J. Contrastive regularization for multimodal emotion recognition using audio and text[J]. arXiv preprint arXiv:2211.10885, 2022.
Zhang Y, Wang J, Liu Y, et al. A Multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations[J]. Inform Fusion. 2023.
Fuente C, Castellanos FJ, Valero-Mas JJ, et al. Multimodal recognition of frustration during game-play with deep neural networks[J]. Multimed Tools Appl. 2023;82(9):13617–36.
Article Google Scholar
Li J, Wang X, Lv G, et al. GA2MIF: graph and attention based two-stage multi-source Information Fusion for Conversational Emotion Detection[J]. IEEE Trans Affect Comput. 2023.
Wang B, Dong G, Zhao Y, et al. Hierarchically stacked graph convolution for emotion recognition in conversation[J]. Knowledge-Based Systems, 2023:110285.
Padi S, Sadjadi S O, Manocha D, et al. Multimodal emotion recognition using transfer learning from speaker recognition and Bert-based models[J]. arXiv preprint arXiv:2202.08974, 2022.
Tran D, Bourdev L, Fergus R, et al. Learning spatiotemporal features with 3d convolutional networks[C]. In: Proceedings of the IEEE international conference on computer vision. 2015:4489-4497.
Bansal K, Agarwal H, Joshi A, et al. Shapes of emotions: multimodal emotion recognition in conversations via emotion shifts[C]. In: Proceedings of the First Workshop on Performance and Interpretability Evaluations of Multimodal, Multipurpose, Massive-Scale Models. 2022:44-56.
Tang S, Luo Z, Nan G, et al. Fusion with hierarchical graphs for multimodal emotion recognition[C]. In: Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC), IEEE. 2022;2022:1288–96.
Qian F, Han J. Contrastive regularization for multimodal emotion recognition using audio and text[J]. arXiv preprint arXiv:2211.10885, 2022.
Wei Q, Huang X, Zhang Y. FV2ES: a fully end2end multimodal system for fast yet effective video emotion recognition inference[J]. IEEE Transactions on Broadcasting, 2022.
Wu Y, Li J. Multi-modal emotion identification fusing facial expression and EEG[J]. Multimed Tools Appl. 2023;82(7):10901–19.
Reid MJ, Omlin X, Espie CA, et al. The effect of sleep continuity disruption on multimodal emotion processing and regulation: a laboratory based, randomised, controlled experiment in good sleepers[J]. J Sleep Res. 2023;32(1): e13634.
Article Google Scholar
Fang M, Peng S, Liang Y, et al. A multimodal fusion model with multi-level attention mechanism for depression detection[J]. Biomed Signal Process Control. 2023;82: 104561.
Article Google Scholar
Stappen L, Baird A, Rizos G, et al. Muse 2020 challenge and workshop: Multimodal sentiment analysis, emotion-target engagement and trustworthiness detection in real-life media: emotional car reviews in-the-wild[C]. In: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop. 2020:35-44.
Miranda J A, Canabal M F, Portela Garca M, et al. Embedded emotion recognition: autonomous multimodal affective internet of things[C]. In: Proceedings of the cyber-physical systems workshop. 2018, 2208:22-29.
Caesar H, Bankiti V, Lang A H, et al. nuscenes: a multimodal dataset for autonomous driving[C]. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2020:11621-11631.
Mangano G, Ferrari A, Rafele C, et al. Willingness of sharing facial data for emotion recognition: a case study in the insurance market[J]. AI & SOCIETY. 2023:1-12..
Boyd KL, Andalibi N. Automated emotion recognition in the workplace: How proposed technologies reveal potential futures of work[J]. Proceedings of the ACM on Human-Computer Interaction. 2023;7(CSCW1):1–37.
Article Google Scholar
Dubey A, Shingala B, Panara JR, et al. Digital content recommendation system through facial emotion recognition[J]. Int J Res Appl Sci Eng Technol. 2023;11:1272–6.
Article Google Scholar
Holding B C, Laukka P, Fischer H, et al. Multimodal emotion recognition is resilient to insufficient sleep: results from cross-sectional and experimental studies[J]. Sleep. 2017;40(11):zsx145.
Egger M, Ley M, Hanke S. Emotion recognition from physiological signal analysis: a review[J]. Electron Notes Theor Comput Sci. 2019;343:35–55.
Article Google Scholar
Andrews SC, Staios M, Howe J, et al. Multimodal emotion processing deficits are present in amyotrophic lateral sclerosis[J]. Neuropsychology. 2017;31(3):304.
Article Google Scholar
O’Shea K, Nash R. An introduction to convolutional neural networks[J]. arXiv preprint arXiv:1511.08458, 2015.
Meignier S, Merlin T. LIUM SpkDiarization: an open source toolkit for diarization[C]. CMU SPUD Workshop. 2010.
Povey D, Ghoshal A, Boulianne G, et al. The Kaldi speech recognition toolkit[C]. IEEE 2011 workshop on automatic speech recognition and understanding. IEEE Signal Processing Society, 2011 (CONF).
Gaida C, Lange P, Petrick R, et al. Comparing open-source speech recognition toolkits[C]. 11th International Workshop on Natural Language Processing and Cognitive Science. 2014.
Moffat D, Ronan D, Reiss J D. An evaluation of audio feature extraction toolboxes[J]. 2015.
Karkada D, Saletore VA. Training speech recognition models on HPC infrastructure[C]. IEEE/ACM Machine Learning in HPC Environments (MLHPC), IEEE. 2018;2018:124–32.
Syed M S S, Stolar M, Pirogova E, et al. Speech acoustic features characterising individuals with high and low public trust[C]. 2019 13th International Conference on Signal Processing and Communication Systems (ICSPCS). IEEE, 2019:1-9.
Degottex G, Kane J, Drugman T, et al. COVAREP-a collaborative voice analysis repository for speech technologies[C]. In: IEEE international conference on acoustics, speech and signal processing (icassp), IEEE. 2014;2014:960–4.
Yadav U, Sharma AK, Patil D. Review of automated depression detection: social posts, audio and video, open challenges and future direction[J]. Concurrency and Computation: Practice and Experience. 2023;35(1): e7407.
Article Google Scholar
Vijayarani S, Ilamathi MJ, Nithya M. Preprocessing techniques for text mining-an overview[J]. International Journal of Computer Science and Communication Networks. 2015;5(1):7–16.
Google Scholar
Thelwall M, Buckley K, Paltoglou G, et al. Sentiment strength detection in short informal text[J]. J Am Soc Inform Sci Technol. 2010;61(12):2544–58.
Article Google Scholar
Wu Z, King S. Investigating gated recurrent networks for speech synthesis[C]. In: 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2016:5140-5144.
Korngiebel DM, Mooney SD. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery[J]. NPJ Digital Medicine. 2021;4(1):93.
Article Google Scholar
Liu Y, Ott M, Goyal N, et al. Roberta: a robustly optimized bert pretraining approach[J]. arXiv preprint arXiv:1907.11692, 2019.
Zahidi Y, El Younoussi Y, Al-Amrani Y. Different valuable tools for Arabic sentiment analysis: a comparative evaluation[J]. International Journal of Electrical and Computer Engineering (2088-8708), 2021, 11(1).
Cai H, Lin Q, Liu H, et al. Recognition of human mood, alertness and comfort under the influence of indoor lighting using physiological features[J]. Biomed Signal Process Control. 2024;89: 105661.
Article Google Scholar
Tan E, Hamlin JK. Toddlers’ affective responses to sociomoral scenes: Insights from physiological measures[J]. J Exp Child Psychol. 2024;237: 105757.
Article Google Scholar
Awada M, Becerik Gerber B, Lucas GM, et al. Stress appraisal in the workplace and its associations with productivity and mood: Insights from a multimodal machine learning analysis[J]. PLoS ONE. 2024;19(1): e0296468.
Article Google Scholar
Guo W, Li Y, Liu M, et al. Functional connectivity-enhanced feature-grouped attention network for cross-subject EEG emotion recognition[J]. Knowl-Based Syst. 2024;283: 111199.
Article Google Scholar
Naeini EK, Sarhaddi F, Azimi I, et al. A deep learning-based PPG quality assessment approach for heart rate and heart rate variability[J]. ACM Transactions on Computing for Healthcare. 2023;4(4):1–22.
Article Google Scholar
Panjaitan F, Nurmaini S, Partan RU. Accurate prediction of sudden cardiac death based on heart rate variability analysis using convolutional neural network[J]. Medicina. 2023;59(8):1394.
Article Google Scholar
Nashiro K, Yoo HJ, Cho C, et al. Effects of a randomised trial of 5-week heart rate variability biofeedback intervention on cognitive function: possible benefits for inhibitory control[J]. Appl Psychophysiol Biofeedback. 2023;48(1):35–48.
Article Google Scholar
Qi N, Piao Y, Yu P, et al. Predicting epileptic seizures based on EEG signals using spatial depth features of a 3D-2D hybrid CNN[J]. Medical & Biological Engineering & Computing, 2023:1-12.
Cho D, Lee B. Automatic sleep-stage classification based on residual unit and attention networks using directed transfer function of electroencephalogram signals[J]. Biomed Signal Process Control. 2024;88: 105679.
Article Google Scholar
Li Z, Xu B, Zhu C, et al. CLMLF: a contrastive learning and multi-layer fusion method for multimodal sentiment detection[J]. arXiv preprint arXiv:2204.05515, 2022.
Yoon S, Byun S, Jung K, Multimodal speech emotion recognition using audio and text[C]. In,. IEEE Spoken Language Technology Workshop (SLT). IEEE. 2018;2018:112–8.
Hazarika D, Poria S, Zadeh A, et al. Conversational memory network for emotion recognition in dyadic dialogue videos[C]. In: Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting. NIH Public Access, 2018, 2018:2122.
Mai S, Hu H, Xing S. Divide, conquer and combine: hierarchical feature fusion network with local and global perspectives for multimodal affective computing[C]. In: Proceedings of the 57th annual meeting of the association for computational linguistics. 2019:481-492.
You Q, Luo J, Jin H, et al. Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia[C]. In: Proceedings of the Ninth ACM international conference on Web search and data mining. 2016:13-22.
Chen M, Wang S, Liang P P, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning[C]. In: Proceedings of the 19th ACM international conference on multimodal interaction. 2017:163-171.
Zadeh A, Chen M, Poria S, et al. Tensor fusion network for multimodal sentiment analysis[J]. arXiv preprint arXiv:1707.07250, 2017.
Zhang Y, Yu Y, Wang M, et al. Self-adaptive representation learning model for multi-modal sentiment and sarcasm joint analysis[J]. Communications and Applications: ACM Transactions on Multimedia Computing; 2023.
Google Scholar
Poria S, Cambria E, Hazarika D, et al. Context-dependent sentiment analysis in user-generated videos[C]. In: Proceedings of the 55th annual meeting of the association for computational linguistics (volume 1: Long papers). 2017:873-883.
Poria S, Chaturvedi I, Cambria E, et al. Convolutional MKL, based multimodal emotion recognition and sentiment analysis[C]. In: IEEE 16th international conference on data mining (ICDM), IEEE. 2016;2016:439–48.
Deng D, Zhou Y, Pi J, et al. Multimodal utterance-level affect analysis using visual, audio and text features[J]. arXiv preprint arXiv:1805.00625, 2018.
Chen F, Luo Z, Xu Y, et al. Complementary fusion of multi-features and multi-modalities in sentiment analysis[J]. arXiv preprint arXiv:1904.08138, 2019.
Kumar A, Vepa J. Gated mechanism for attention based multi modal sentiment analysis[C]. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020:4477-4481.
Xu N, Mao W. Multisentinet: a deep semantic network for multimodal sentiment analysis[C]. In: Proceedings of the. ACM on Conference on Information and Knowledge Management. 2017;2017:2399–402.
Yu J, Jiang J, Xia R. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2019;28:429–39.
Article Google Scholar
Mai S, Xing S, Hu H. Analyzing multimodal sentiment via acoustic-and visual-LSTM with channel-aware temporal convolution network[J]. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2021;29:1424–37.
Article Google Scholar
Xu N, Mao W, Chen G. Multi-interactive memory network for aspect based multimodal sentiment analysis[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2019, 33(01):371-378.
Liu D, Chen L, Wang Z, et al. Speech expression multimodal emotion recognition based on deep belief network[J]. Journal of Grid Computing. 2021;19(2):22.
Article Google Scholar
Wang F, Tian S, Yu L, et al. TEDT: transformer-based encoding-decoding translation network for multimodal sentiment analysis[J]. Cogn Comput. 2023;15(1):289–303.
Article Google Scholar
Kumar A, Vepa J. Gated mechanism for attention based multi modal sentiment analysis[C]. In: ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2020:4477-4481.
Lu Y, Zheng W, Li B, et al. Combining eye movements and EEG to enhance emotion recognition. In: Proceedings of the Twenty-fourth International Joint Conference on Artificial Intelligence, International Joint Conferences on Artificial Intelligence Organization, 2015:1170-1176.
Yu Y, Lin H, Meng J, et al. Visual and textual sentiment analysis of a microblog using deep convolutional neural networks. Algorithms. 2016;9(2):41.
Article MathSciNet Google Scholar
Poria S, Cambria E, Gelbukh A. Deep convolutional neural network textual features and multiple kernel learning for utterance-level multimodal sentiment analysis. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Association for Computational Linguistics, 2015:2539-2544.
Wang HH, Meghawat A, Morency LP, et al. Select-additive learning: improving generalization in multimodal sentiment analysis. In: Proceedings of the 2017 IEEE International Conference on Multimedia and Expo, IEEE Computer Society, 2017:949-954.
Yu HL, Gui LK, Madaio M, et al. Temporally selective attention model for social and affective state recognition in multimedia content. In: Proceedings of the 25th ACM International Conference on Multimedia, ACM, 2017:1743-1751.
Williams J, Comanescu R, Radu O, et al. DNN multimodal fusion techniques for predicting video sentiment. In: Proceedings of Grand Challenge and Workshop on Human Multimodal Language (Challenge-HML), 2018:64-72.
Gkoumas, D., Li, Q., Dehdashti, S., et al. Quantum cognitively motivated decision fusion for video sentiment analysis. In: Proceedings of the AAAI Conference on Artificial Intelligence, 2021, 35(1):827-835.
Sun, J., Yin, H., Tian, Y., et al. Two-level multimodal fusion for sentiment analysis in public security. Security and Communication Networks, 2021.
Zhang F, Li XC, Lim CP, et al. Deep emotional arousal network for multimodal sentiment analysis and emotion recognition[J]. Inform Fusion. 2022;88:296–304.
Article Google Scholar
Wang D, Guo X, Tian Y, et al. TETFN: a text enhanced transformer fusion network for multimodal sentiment analysis[J]. Pattern Recogn. 2023;136: 109259.
Article Google Scholar
Zheng W, Liu W, Lu Y, et al. Emotionmeter: a multimodal framework for recognizing human emotions. IEEE Transactions on Cybernetics. 2018;49(3):1110–22.
Article Google Scholar
Zhang S, Zhang S, Huang T, et al. Learning affective features with a hybrid deep model for audio-visual emotion recognition. IEEE Trans Circuits Syst Video Technol. 2017;28(10):1–1.
Google Scholar
Chen M, Wang S, Liang P P, et al. Multimodal sentiment analysis with word-level fusion and reinforcement learning[C]. In: Proceedings of the 19th ACM international conference on multimodal interaction. 2017:163-171.
Shenoy A, Sardana A. Multilogue-net: a context aware RNN for multi-modal emotion detection and sentiment analysis in conversation[J]. arXiv preprint arXiv:2002.08267, 2020.
Cimtay Y, Ekmekcioglu E, Caglar-Ozhan S. Cross-subject multimodal emotion recognition based on hybrid fusion[J]. IEEE Access. 2020;8:168865–78.
Article Google Scholar
Gunes H, Piccardi M. Bi-modal emotion recognition from expressive face and body gestures[J]. J Netw Comput Appl. 2007;30(4):1334–45.
Article Google Scholar
Paraskevopoulos G, Georgiou E, Potamianos A. Mmlatch: bottom-up top-down fusion for multimodal sentiment analysis[C]. In: ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2022:4573-4577.
Qu L, Liu S, Wang M, et al. Trans2Fuse: empowering image fusion through self-supervised learning and multi-modal transformations via transformer networks[J]. Expert Syst Appl. 2024;236: 121363.
Article Google Scholar
Fan H, Zhang X, Xu Y, et al. Transformer-based multimodal feature enhancement networks for multimodal depression detection integrating video, audio and remote photoplethysmograph signals[J]. Inform Fusion. 2024;104: 102161.
Article Google Scholar
Zhu X, Huang Y, Wang X, et al. Emotion recognition based on brain-like multimodal hierarchical perception[J]. Multimed Tools Appl. 2023:1-19.
Huang J, Pu Y, Zhou D, et al. Dynamic hypergraph convolutional network for multimodal sentiment analysis[J]. Neurocomputing. 2024;565: 126992.
Article Google Scholar
Wang X, Guan Z, Qian W, et al. CS2Fusion: contrastive learning for self-supervised infrared and visible image fusion by estimating feature compensation map[J]. Inform Fusion. 2024;102: 102039.
Article Google Scholar
Han Y, Nie R, Cao J, et al. IE-CFRN: information exchange-based collaborative feature representation network for multi-modal medical image fusion[J]. Biomed Signal Process Control. 2023;86: 105301.
Article Google Scholar
Ni J, Bai Y, Zhang W, et al. Deep equilibrium multimodal fusion[J]. arXiv preprint arXiv:2306.16645, 2023.
Li H, Zhao J, Li J, et al. Feature dynamic alignment and refinement for infrared-visible image fusion: translation robust fusion[J]. Inform Fusion. 2023;95:26–41.
Article Google Scholar
Liu J, Capurro D, Nguyen A, et al. Attention-based multimodal fusion with contrast for robust clinical prediction in the face of missing modalities[J]. J Biomed Inform. 2023;145: 104466.
Article Google Scholar
Zhang X, Wei X, Zhou Z, et al. Dynamic alignment and fusion of multimodal physiological patterns for stress recognition[J]. IEEE Trans Affect Comput. 2023
Zhang Y, Wang J, Liu Y, et al. A multitask learning model for multimodal sarcasm, sentiment and emotion recognition in conversations[J]. Inform Fusion. 2023;93:282–301.
Article Google Scholar
Liu Y, Zhang X, Kauttonen J, et al. Uncertain facial expression recognition via multi-task assisted correction[J]. IEEE Trans Multimed. 2023.
Liu J, Lin R, Wu G, et al. Coconet: coupled contrastive learning network with multi-level feature ensemble for multi-modality image fusion[J]. Int J Comput Vis. 2023:1-28.
Liu K, Xue F, Guo D, et al. Multimodal graph contrastive learning for multimedia-based recommendation[J]. IEEE Trans Multimed. 2023.
Song J, Chen H, Li C, et al. MIFM: multimodal information fusion model for educational exercises[J]. Electronics. 2023;12(18):3909.
Article Google Scholar
Zhang S, Yang Y, Chen C, et al. Deep learning-based multimodal emotion recognition from audio, visual, and text modalities: a systematic review of recent advancements and future prospects[J]. Expert Syst Appl. 2023:121692.
Dogan G, Akbulut FP. Multi-modal fusion learning through biosignal, audio, and visual content for detection of mental stress[J]. Neural Comput Appl. 2023;35(34):24435–54.
Article Google Scholar
Liu W, Zuo Y. Stone needle: a general multimodal large-scale model framework towards healthcare[J]. arXiv preprint arXiv:2306.16034, 2023.
Zhao X, Li M, Weber C, et al. Chat with the environment: interactive multimodal perception using large language models[J]. arXiv preprint arXiv:2303.08268, 2023.
Kim K, Park S. AOBERT: all-modalities-in-one BERT for multimodal sentiment analysis[J]. Inform Fusion. 2023;92:37–45.
Article Google Scholar
Tong Z, Du N, Song X, et al. Study on mindspore deep learning framework[C]. In: 2021 17th International Conference on Computational Intelligence and Security (CIS). IEEE, 2021:183-186.
Rasley J, Rajbhandari S, Ruwase O, et al. Deepspeed: system optimizations enable training deep learning models with over 100 billion parameters[C]. In: Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2020:3505-3506.
Huang J, Wang H, Sun Y, et al. ERNIE-GeoL: a geography-and-language pre-trained model and its applications in Baidu maps[C]. In: Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining. 2022:3029-3039.
Busso C, Bulut M, Lee CC, et al. IEMOCAP: interactive emotional dyadic motion capture database[J]. Lang Resour Eval. 2008;42:335–59.
Article Google Scholar
Zadeh A, Zellers R, Pincus E, et al. Mosi: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos[J]. arXiv preprint arXiv:1606.06259, 2016.
Poria S, Hazarika D, Majumder N, et al. Meld: a multimodal multi-party dataset for emotion recognition in conversations[J]. arXiv preprint arXiv:1810.02508, 2018.
Zadeh A A B, Liang P P, Poria S, et al. Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph[C]. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2018:2236-2246.
Yu W, Xu H, Meng F, et al. Ch-sims: a Chinese multimodal sentiment analysis dataset with fine-grained annotation of modality[C]. In: Proceedings of the 58th annual meeting of the association for computational linguistics. 2020:3718-3727.
Zafeiriou S, Kollias D, Nicolaou M A, et al. Aff-wild: valence and arousal’In-the-Wild’challenge[C]. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops. 2017:34-41.
Livingstone SR, Russo FA. The Ryerson Audio-Visual Database of Emotional Speech and Song (RAVDESS): a dynamic, multimodal set of facial and vocal expressions in North American English[J]. PLoS ONE. 2018;13(5): e0196391.
Article Google Scholar
McKeown G, Valstar M, Cowie R, et al. The semaine database: annotated multimodal records of emotionally colored conversations between a person and a limited agent[J]. IEEE Trans Affect Comput. 2011;3(1):5–17.
Article Google Scholar
Chen J, Wang C, Wang K, et al. HEU Emotion: a large-scale database for multimodal emotion recognition in the wild[J]. Neural Comput Appl. 2021;33:8669–85.
Article Google Scholar
Shen G, Wang X, Duan X, et al. Memor: a dataset for multimodal emotion reasoning in videos[C]. In: Proceedings of the 28th ACM International Conference on Multimedia. 2020:493-502.
Wu X, Zheng WL, Li Z, et al. Investigating EEG-based functional connectivity patterns for multimodal emotion recognition[J]. J Neural Eng. 2022;19(1): 016012.
Article Google Scholar
Zadeh A, Liang P P, Poria S, et al. Multi-attention recurrent network for human communication comprehension[C]. In: Proceedings of the AAAI Conference on Artificial Intelligence. 2018, 32(1).
Zadeh A, Liang P P, Mazumder N, et al. Memory fusion network for multi-view sequential learning[C]. In: Proceedings of the AAAI conference on artificial intelligence. 2018, 32(1).
Liu S, Gao P, Li Y, et al. Multi-modal fusion network with complementarity and importance for emotion recognition[J]. Inf Sci. 2023;619:679–94.
Article Google Scholar
Chen F, Shao J, Zhu S, et al. Multivariate, multi-frequency and multimodal: rethinking graph neural networks for emotion recognition in conversation[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:10761-10770.
Khan M, Gueaieb W, El Saddik A, et al. MSER: multimodal speech emotion recognition using cross-attention with deep fusion[J]. Expert Syst Appl. 2023:122946.
Pan J, Fang W, Zhang Z, et al. Multimodal emotion recognition based on facial expressions, speech, and EEG[J]. IEEE Open Journal of Engineering in Medicine and Biology, 2023.
Meng T, Shou Y, Ai W, et al. Deep imbalanced learning for multimodal emotion recognition in conversations[J]. arXiv preprint arXiv:2312.06337, 2023.
Fu Z, Liu F, Xu Q, et al. LMR-CBT: learning modality-fused representations with CB-transformer for multimodal emotion recognition from unaligned multimodal sequences[J]. Front Comp Sci. 2024;18(4): 184314.
Article Google Scholar
Ma H, Wang J, Lin H, et al. A transformer-based model with self-distillation for multimodal emotion recognition in conversations[J]. IEEE Trans Multimed. 2023.
Shi T, Huang S L. MultiEMO: an attention-based correlation-aware multimodal fusion framework for emotion recognition in conversations[C]. In: Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2023:14752-14766.
Li X. TACOformer: token-channel compounded cross attention for multimodal emotion recognition[J]. arXiv preprint arXiv:2306.13592, 2023.
Li J, Wang X, Lv G, et al. Graphcfc: a directed graph based cross-modal feature complementation approach for multimodal conversational emotion recognition[J]. IEEE Trans Multimed. 2023.
Palash M, Bhargava B. EMERSK–explainable multimodal emotion recognition with situational knowledge[J]. arXiv preprint arXiv:2306.08657, 2023.
Li Y, Wang Y, Cui Z. Decoupled multimodal distilling for emotion recognition[C]. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023:6631-6640.
Le HD, Lee GS, Kim SH, et al. Multi-label multimodal emotion recognition with transformer-based fusion and emotion-level representation learning[J]. IEEE Access. 2023;11:14742–51.
Article Google Scholar
Tang J, Ma Z, Gan K, et al. Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment[J]. Inform Fusion. 2024;103: 102129.
Article Google Scholar
He Y, Seng KP, Ang LM. multimodal sensor-input architecture with deep learning for audio-visual speech recognition in wild[J]. Sensors. 2023;23(4):1834.
Article Google Scholar
Stappen L, Schumann L, Sertolli B, et al. Muse-toolbox: the multimodal sentiment analysis continuous annotation fusion and discrete class transformation toolbox[M]. In: Proceedings of the 2nd on Multimodal Sentiment Analysis Challenge. 2021:75-82.
Tang J, Ma Z, Gan K, et al. Hierarchical multimodal-fusion of physiological signals for emotion recognition with scenario adaption and contrastive alignment[J]. Inform Fusion. 2024;103: 102129.
Article Google Scholar
Wang W, Arora R, Livescu K, et al. On deep multi-view representation learning[C]. In: International conference on machine learning. PMLR, 2015:1083-1092.
Yu Y, Tang S, Aizawa K, et al. Category-based deep CCA for fine-grained venue discovery from multimodal data[J]. IEEE transactions on neural networks and learning systems. 2018;30(4):1250–8.
Article MathSciNet Google Scholar
Liu W, Qiu JL, Zheng WL, et al. Comparing recognition performance and robustness of multimodal deep learning models for multimodal emotion recognition[J]. IEEE Transactions on Cognitive and Developmental Systems. 2021;14(2):715–29.
Deshmukh S, Abhyankar A, Kelkar S. DCCA and DMCCA framework for multimodal biometric system[J]. Multimed Tools Appl. 2022;81(17):24477–91.
Article Google Scholar
Cevher D, Zepf S, Klinger R. Towards multimodal emotion recognition in German speech events in cars using transfer learning[J]. arXiv preprint arXiv:1909.02764, 2019.
Xi D, Zhou J, Xu W, et al. Discrete emotion synchronicity and video engagement on social media: a moment-to-moment analysis[J]. Int J Electron Commerce. 2024:1-37.
Lv Y, Liu Z, Li G. Context-aware interaction network for RGB-T semantic segmentation[J]. IEEE Trans Multimed. 2024.
Ai W, Zhang F C, Meng T, et al. A two-stage multimodal emotion recognition model based on graph contrastive learning[J]. arXiv preprint arXiv:2401.01495, 2024.
Wan Y, Chen Y, Lin J, et al. A knowledge-augmented heterogeneous graph convolutional network for aspect-level multimodal sentiment analysis[J]. Comput Speech Lang. 2024;85: 101587.
Article Google Scholar
Tiwari P, Zhang L, Qu Z, et al. Quantum Fuzzy Neural Network for multimodal sentiment and sarcasm detection[J]. Inform Fusion. 2024;103: 102085.
Article Google Scholar
Li J, Li L, Sun R, et al. MMAN-M2: multiple multi-head attentions network based on encoder with missing modalities[J]. Pattern Recogn Lett. 2024;177:110–20.
Article Google Scholar
Zuo H, Liu R, Zhao J, et al. Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities[C]. In: ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2023:1-5.
Li M, Yang D, Zhang L. Towards robust multimodal sentiment analysis under uncertain signal missing[J]. IEEE Signal Process Lett. 2023.
Mou L, Zhao Y, Zhou C, et al. Driver emotion recognition with a hybrid attentional multimodal fusion framework[J]. IEEE Trans Affect Comput. 2023.
Kumar A, Sharma K, Sharma A. MEmoR: a multimodal emotion recognition using affective biomarkers for smart prediction of emotional health for people analytics in smart industries[J]. Image Vis Comput. 2022;123: 104483.
Article Google Scholar
Chong L, Jin M, He Y. EmoChat: bringing multimodal emotion detection to mobile conversation[C]. In: 2019 5th International Conference on Big Data Computing and Communications (BIGCOM). IEEE, 2019:213-221.

Download references

Funding

This work was supported by the National Natural Science Foundation of China(61771299).

Author information

Authors and Affiliations

School of Communication and Information Engineering, Shanghai University, No. 99, Shangda Road, Shanghai, 200444, China
Xianxun Zhu, Chaopeng Guo, Heyang Feng, Yao Huang, Yichen Feng, Xiangyang Wang & Rui Wang

Authors

Xianxun Zhu
View author publications
You can also search for this author in PubMed Google Scholar
Chaopeng Guo
View author publications
You can also search for this author in PubMed Google Scholar
Heyang Feng
View author publications
You can also search for this author in PubMed Google Scholar
Yao Huang
View author publications
You can also search for this author in PubMed Google Scholar
Yichen Feng
View author publications
You can also search for this author in PubMed Google Scholar
Xiangyang Wang
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Xianxun Zhu drafted and wrote the manuscript. Rui Wang served as the corresponding author, overseeing and coordinating the entire study. Chaopeng Guo created figures and charts for the article. Heyang Feng retrieved and organized relevant literature. Yao Huang typeset and formatted the manuscript. Yichen Feng and Xiangyang Wang assisted with language editing. All authors have reviewed and approved the final version of the manuscript.

Corresponding author

Correspondence to Rui Wang.

Ethics declarations

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Conflict of Interest

The authors declare no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhu, X., Guo, C., Feng, H. et al. A Review of Key Technologies for Emotion Analysis Using Multimodal Information. Cogn Comput (2024). https://doi.org/10.1007/s12559-024-10287-z

Download citation

Received: 18 September 2023
Accepted: 10 April 2024
Published: 01 June 2024
DOI: https://doi.org/10.1007/s12559-024-10287-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Key Technologies for Emotion Analysis Using Multimodal Information

Abstract

Access this article

Similar content being viewed by others

A review on sentiment analysis and emotion detection from text

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition using convolutional neural networks (FERC)

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A Review of Key Technologies for Emotion Analysis Using Multimodal Information

Abstract

Access this article

Similar content being viewed by others

A review on sentiment analysis and emotion detection from text

Human emotion recognition from EEG-based brain–computer interface using machine learning: a comprehensive review

Facial emotion recognition using convolutional neural networks (FERC)

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval

Conflict of Interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation