research-article

Modality translation-based multimodal sentiment analysis under uncertain missing modalities

Authors:

Lingqiang MengAuthors Info & Claims

Volume 101, Issue C

https://doi.org/10.1016/j.inffus.2023.101973

Published: 01 January 2024 Publication History

Abstract

Multimodal sentiment analysis (MSA) with uncertain missing modalities poses a new challenge in sentiment analysis. To address this problem, efficient MSA models that consider missing modalities have been proposed. However, existing studies have only adopted the concatenation operation for feature fusion while ignoring the deep interactions between different modalities. Moreover, existing studies have failed to take advantage of the text modality, which can achieve better accuracy in sentiment analysis. To tackle the above-mentioned issues, we propose a modality translation-based MSA model (MTMSA), which is robust to uncertain missing modalities. First, for multimodal data (text, visual, and audio) with uncertain missing data, the visual and audio are translated to the text modality with a modality translation module, and then the translated visual modality, translated audio, and encoded text are fused into missing joint features (MJFs). Next, the MJFs are encoded by the transformer encoder module under the supervision of a pre-trained model (transformer-based modality translation network, TMTN), thus making the transformer encoder module produce joint features of uncertain missing modalities approximating those of complete modalities. The encoded MJFs are input into the transformer decoder module to learn the long-term dependencies between different modalities. Finally, sentiment classification is performed based on the outputs of the transformer encoder module. Extensive experiments were conducted on two popular benchmark datasets (CMU-MOSI and IEMOCAP), with the experimental results demonstrating that MTMSA outperforms eight representative baseline models.

Highlights

•

We propose a Multimodal Sentiment Analysis model for uncertain missing modalities.

•

We translate the visual and audio to the text to improve modalities’ quality.

•

We apply a pre-trained model to supervise the model to handle uncertain missing data.

References

[1]

Zhu Linan, Zhu Zhechao, Zhang Chenwei, Xu Yifei, Kong Xiangjie, Multimodal sentiment analysis based on fusion methods: A survey, Inf. Fusion 95 (2023) 306–325.

[2]

Mahendhiran P.D., Kannimuthu SJIJoIT, Deep learning techniques for polarity classification in multimodal sentiment analysis, Int. J. Inf. Technol. Decis. Mak. 17 (03) (2018) 883–910.

[3]

Mahendhiran P.D., Subramanian Kannimuthu, CLSA-CapsNet: Dependency based concept level sentiment analysis for text, J. Intell. Fuzzy Systems (Preprint) (2022) 1–17.

[4]

Trillo José Ramón, Herrera-Viedma Enrique, Morente-Molinera Juan Antonio, Cabrerizo Francisco Javier, A large scale group decision making system based on sentiment analysis cluster, Inf. Fusion 91 (2023) 633–643.

[5]

Messaoudi Chaima, Guessoum Zahia, Ben Romdhane Lotfi, Opinion mining in online social media: a survey, Soc. Netw. Anal. Min. 12 (1) (2022) 25.

[6]

Koohathongsumrit Nitidetch, Meethom Warapoj, A fuzzy decision-making framework for route selection in multimodal transportation networks, Eng. Manag. J. (2022) 1–16.

[7]

Gandhi Ankita, Adhvaryu Kinjal, Poria Soujanya, Cambria Erik, Hussain Amir, Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions, Inf. Fusion (2022).

[8]

Yang Bo, Shao Bo, Wu Lijun, Lin Xiaola, Multimodal sentiment analysis with unidirectional modality translation, Neurocomputing 467 (2022) 130–137.

Digital Library

[9]

Wang Yan, Song Wei, Tao Wei, Liotta Antonio, Yang Dawei, Li Xinlei, Gao Shuyong, Sun Yixuan, Ge Weifeng, Zhang Wei, et al., A systematic review on affective computing: Emotion models, databases, and recent advances, Inf. Fusion (2022).

[10]

Abdu Sarah A., Yousef Ahmed H., Salem Ashraf, Multimodal video sentiment analysis using deep learning approaches, a survey, Inf. Fusion 76 (2021) 204–226.

Digital Library

[11]

Quan Zhibang, Sun Tao, Su Mengli, Wei Jishu, Multimodal sentiment analysis based on cross-modal attention and gated cyclic hierarchical fusion networks, Comput. Intell. Neurosci. 2022 (2022).

[12]

Zhang Qiongan, Shi Lei, Liu Peiyu, Zhu Zhenfang, Xu Liancheng, ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis, Appl. Intell. (2022) 1–14.

[13]

Zhang Feng, Li Xi-Cheng, Lim Chee Peng, Hua Qiang, Dong Chun-Ru, Zhai Jun-Hai, Deep emotional arousal network for multimodal sentiment analysis and emotion recognition, Inf. Fusion 88 (2022) 296–304.

[14]

Fu Yahui, Okada Shogo, Wang Longbiao, Guo Lili, Song Yaodong, Liu Jiaxing, Dang Jianwu, Context-and knowledge-aware graph convolutional network for multimodal emotion recognition, IEEE MultiMedia 29 (3) (2022) 91–100.

[15]

Shou Yuntao, Meng Tao, Ai Wei, Yang Sihan, Li Keqin, Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis, Neurocomputing 501 (2022) 629–639.

[16]

Sun Hao, Liu Jiaqing, Chen Yen-Wei, Lin Lanfen, Modality-invariant temporal representation learning for multimodal sentiment classification, Inf. Fusion 91 (2023) 504–514.

[17]

Luo Wei, Xu Mengying, Lai Hanjiang, Multimodal reconstruct and align net for missing modality problem in sentiment analysis, in: MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9–12, 2023, Proceedings, Part II, Springer, 2023, pp. 411–422.

[18]

Luan Tran, Xiaoming Liu, Jiayu Zhou, Rong Jin, Missing modalities imputation via cascaded residual autoencoder, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1405–1414.

[19]

Lei Cai, Zhengyang Wang, Hongyang Gao, Dinggang Shen, Shuiwang Ji, Deep adversarial learning for multi-modality missing data completion, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1158–1166.

[20]

Jinming Zhao, Ruichen Li, Qin Jin, Missing modality imagination network for emotion recognition with uncertain missing modalities, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 2608–2618.

[21]

Jiandian Zeng, Tianyi Liu, Jiantao Zhou, Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities, in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 1545–1554.

[22]

Arunkumar P.M., Chandramathi Soundararajan, Kannimuthu S., Sentiment analysis-based framework for assessing internet telemedicine videos, Int. J. Data Anal. Tech. Strateg. 11 (4) (2019) 328–336.

[23]

Mai Sijie, Xing Songlong, Hu Haifeng, Analyzing multimodal sentiment via acoustic-and visual-lstm with channel-aware temporal convolution network, IEEE/ACM Trans. Audio, Speech, Lang. Process. 29 (2021) 1424–1437.

[24]

Zhang Bowen, Li Xutao, Xu Xiaofei, Leung Ka-Cheong, Chen Zhiyao, Ye Yunming, Knowledge guided capsule attention network for aspect-based sentiment analysis, IEEE/ACM Trans. Audio, Speech, Lang. Process. 28 (2020) 2538–2551.

Digital Library

[25]

Zheng Chunjun, Wang Chunli, Jia Ning, Emotion recognition model based on multimodal decision fusion, in: Journal of Physics: Conference Series, Vol. 1873, IOP Publishing, 2021.

[26]

Mai Sijie, Zeng Ying, Zheng Shuangjia, Hu Haifeng, Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis, IEEE Trans. Affect. Comput. (2022).

[27]

Sijie Mai, Haifeng Hu, Songlong Xing, Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 164–172.

[28]

Zilong Wang, Zhaohong Wan, Xiaojun Wan, Transmodality: An end2end fusion method with transformer for multimodal sentiment analysis, in: Proceedings of the Web Conference 2020, 2020, pp. 2514–2520.

[29]

Kingma Diederik P., Welling Max, Auto-encoding variational bayes, 2013, arXiv preprint arXiv:1312.6114.

[30]

Shang Chao, Palmer Aaron, Sun Jiangwen, Chen Ko-Shin, Lu Jin, Bi Jinbo, VIGAN: Missing view imputation with generative adversarial networks, in: 2017 IEEE International Conference on Big Data (Big Data), IEEE, 2017, pp. 766–775.

[31]

Zhou Tongxue, Canu Stéphane, Vera Pierre, Ruan Su, Feature-enhanced generation and multi-modality fusion based deep neural network for brain tumor segmentation with missing MR modalities, Neurocomputing 466 (2021) 102–112.

[32]

Zhang Changqing, Cui Yajie, Han Zongbo, Zhou Joey Tianyi, Fu Huazhu, Hu Qinghua, Deep partial multi-view learning, IEEE Trans. Pattern Anal. Mach. Intell. (2020).

[33]

Parthasarathy Srinivas, Sundaram Shiva, Training strategies to handle missing modalities for audio-visual expression recognition, in: Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020, pp. 400–404.

[34]

Akbari Hassan, Yuan Liangzhe, Qian Rui, Chuang Wei-Hong, Chang Shih-Fu, Cui Yin, Gong Boqing, Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text, Adv. Neural Inf. Process. Syst. 34 (2021) 24206–24221.

[35]

Han Jing, Zhang Zixing, Ren Zhao, Schuller Björn, Implicit fusion by joint audiovisual training for emotion recognition in mono modality, in: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 5861–5865.

[36]

Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-Philippe Morency, Barnabás Póczos, Found in translation: Learning robust joint representations by cyclic translations between modalities, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 6892–6899.

[37]

Ziqi Yuan, Wei Li, Hua Xu, Wenmeng Yu, Transformer-based feature reconstruction network for robust multimodal sentiment analysis, in: Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 4400–4407.

[38]

Peng Wei, Hong Xiaopeng, Zhao Guoying, Adaptive modality distillation for separable multimodal sentiment analysis, IEEE Intell. Syst. 36 (3) (2021) 82–89.

[39]

Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, Polosukhin Illia, Attention is all you need, in: Advances in Neural Information Processing Systems, Vol. 30, 2017.

[40]

Zadeh Amir, Zellers Rowan, Pincus Eli, Morency Louis-Philippe, Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages, IEEE Intell. Syst. 31 (6) (2016) 82–88.

Digital Library

[41]

Busso Carlos, Bulut Murtaza, Lee Chi-Chun, Kazemzadeh Abe, Mower Emily, Kim Samuel, Chang Jeannette N, Lee Sungbok, Narayanan Shrikanth S, IEMOCAP: Interactive emotional dyadic motion capture database, Lang. Resour. Eval. 42 (4) (2008) 335–359.

[42]

Baltrusaitis Tadas, Zadeh Amir, Lim Yao Chong, Morency Louis-Philippe, Openface 2.0: Facial behavior analysis toolkit, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE, 2018, pp. 59–66.

[43]

Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.

[44]

Brian McFee, Colin Raffel, Dawen Liang, Daniel P Ellis, Matt McVicar, Eric Battenberg, Oriol Nieto, librosa: Audio and music signal analysis in python, in: Proceedings of the 14th Python in Science Conference, Vol. 8, 2015, pp. 18–25.

[45]

Kingma Diederik P., Ba Jimmy, Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.

[46]

Baldi Pierre, Autoencoders, unsupervised learning, and deep architectures, in: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, JMLR Workshop and Conference Proceedings, 2012, pp. 37–49.

Cited By

Li MYang DLei YWang SWang SSu LYang KWang YSun MZhang LWooldridge MDy JNatarajan S(2024)A unified self-distillation framework for multimodal sentiment analysis with uncertain missing modalitiesProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i9.28871(10074-10082)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i9.28871
Lim EYang HYang HKim SKim SShin JKim AAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)Modality Weights Based Fusion Model for Social Perception Prediction in Video, Audio, and TextProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689085(12-19)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689085
Xu WJiang HLiang XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Leveraging Knowledge of Modality Experts for Incomplete Multimodal LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681683(438-446)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681683
Show More Cited By

Index Terms

Modality translation-based multimodal sentiment analysis under uncertain missing modalities
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
    2. Machine learning approaches
2. Information systems

Index terms have been assigned to the content through auto-classification.

Recommendations

Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities
SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval

Multimodal sentiment analysis has been studied under the assumption that all modalities are available. However, such a strong assumption does not always hold in practice, and most of multimodal fusion models may fail when partial modalities are missing. ...
Multimodal Reconstruct and Align Net for Missing Modality Problem in Sentiment Analysis
MultiMedia Modeling
Abstract
Multimodal Sentiment Analysis (MSA) aims at recognizing emotion categories by textual, visual, and acoustic cues. However, in real-life scenarios, one or two modalities may be missing due to various reasons. And when text modality is missing, ...
Multimodal transformer with adaptive modality weighting for multimodal sentiment analysis
Abstract
Multimodal Sentiment Analysis (MSA) constitutes a pivotal technology in the realm of multimedia research. The efficacy of MSA models largely hinges on the quality of multimodal fusion. Notably, when conveying information pertinent to specific ...
Highlights
- Novel multimodal adaptive weight matrix enables accurate sentiment analysis by considering unique contributions of each modality.
- Multimodal attention mechanism addresses over-focusing on intra-modality attention.
- Multiple Softmax ...

Comments

Information & Contributors

Information

Published In

cover image Information Fusion

Information Fusion Volume 101, Issue C

Jan 2024

481 pages

Issue’s Table of Contents

Elsevier B.V.

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 01 January 2024

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li MYang DLei YWang SWang SSu LYang KWang YSun MZhang LWooldridge MDy JNatarajan S(2024)A unified self-distillation framework for multimodal sentiment analysis with uncertain missing modalitiesProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i9.28871(10074-10082)Online publication date: 20-Feb-2024
https://dl.acm.org/doi/10.1609/aaai.v38i9.28871
Lim EYang HYang HKim SKim SShin JKim AAmiriparian SChrist LEulitz SKönig ACambria ESchuller B(2024)Modality Weights Based Fusion Model for Social Perception Prediction in Video, Audio, and TextProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689085(12-19)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3689062.3689085
Xu WJiang HLiang XCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Leveraging Knowledge of Modality Experts for Incomplete Multimodal LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681683(438-446)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681683
Huang LDong WXiao SQu JYang YLi YCai JKankanhalli MPrabhakaran BBoll SSubramanian RZheng LSingh VCesar PXie LXu D(2024)Language-Guided Visual Prompt Compensation for Multi-Modal Remote Sensing Image Classification with Modality AbsenceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681563(5161-5170)Online publication date: 28-Oct-2024
https://dl.acm.org/doi/10.1145/3664647.3681563
Li CXie LShao XPan HWang Z(2024)A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognitionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108413133:PDOnline publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1016/j.engappai.2024.108413

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents