Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Modality translation-based multimodal sentiment analysis under uncertain missing modalities

Published: 01 January 2024 Publication History

Abstract

Multimodal sentiment analysis (MSA) with uncertain missing modalities poses a new challenge in sentiment analysis. To address this problem, efficient MSA models that consider missing modalities have been proposed. However, existing studies have only adopted the concatenation operation for feature fusion while ignoring the deep interactions between different modalities. Moreover, existing studies have failed to take advantage of the text modality, which can achieve better accuracy in sentiment analysis. To tackle the above-mentioned issues, we propose a modality translation-based MSA model (MTMSA), which is robust to uncertain missing modalities. First, for multimodal data (text, visual, and audio) with uncertain missing data, the visual and audio are translated to the text modality with a modality translation module, and then the translated visual modality, translated audio, and encoded text are fused into missing joint features (MJFs). Next, the MJFs are encoded by the transformer encoder module under the supervision of a pre-trained model (transformer-based modality translation network, TMTN), thus making the transformer encoder module produce joint features of uncertain missing modalities approximating those of complete modalities. The encoded MJFs are input into the transformer decoder module to learn the long-term dependencies between different modalities. Finally, sentiment classification is performed based on the outputs of the transformer encoder module. Extensive experiments were conducted on two popular benchmark datasets (CMU-MOSI and IEMOCAP), with the experimental results demonstrating that MTMSA outperforms eight representative baseline models.

Highlights

We propose a Multimodal Sentiment Analysis model for uncertain missing modalities.
We translate the visual and audio to the text to improve modalities’ quality.
We apply a pre-trained model to supervise the model to handle uncertain missing data.

References

[1]
Zhu Linan, Zhu Zhechao, Zhang Chenwei, Xu Yifei, Kong Xiangjie, Multimodal sentiment analysis based on fusion methods: A survey, Inf. Fusion 95 (2023) 306–325.
[2]
Mahendhiran P.D., Kannimuthu SJIJoIT, Deep learning techniques for polarity classification in multimodal sentiment analysis, Int. J. Inf. Technol. Decis. Mak. 17 (03) (2018) 883–910.
[3]
Mahendhiran P.D., Subramanian Kannimuthu, CLSA-CapsNet: Dependency based concept level sentiment analysis for text, J. Intell. Fuzzy Systems (Preprint) (2022) 1–17.
[4]
Trillo José Ramón, Herrera-Viedma Enrique, Morente-Molinera Juan Antonio, Cabrerizo Francisco Javier, A large scale group decision making system based on sentiment analysis cluster, Inf. Fusion 91 (2023) 633–643.
[5]
Messaoudi Chaima, Guessoum Zahia, Ben Romdhane Lotfi, Opinion mining in online social media: a survey, Soc. Netw. Anal. Min. 12 (1) (2022) 25.
[6]
Koohathongsumrit Nitidetch, Meethom Warapoj, A fuzzy decision-making framework for route selection in multimodal transportation networks, Eng. Manag. J. (2022) 1–16.
[7]
Gandhi Ankita, Adhvaryu Kinjal, Poria Soujanya, Cambria Erik, Hussain Amir, Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions, Inf. Fusion (2022).
[8]
Yang Bo, Shao Bo, Wu Lijun, Lin Xiaola, Multimodal sentiment analysis with unidirectional modality translation, Neurocomputing 467 (2022) 130–137.
[9]
Wang Yan, Song Wei, Tao Wei, Liotta Antonio, Yang Dawei, Li Xinlei, Gao Shuyong, Sun Yixuan, Ge Weifeng, Zhang Wei, et al., A systematic review on affective computing: Emotion models, databases, and recent advances, Inf. Fusion (2022).
[10]
Abdu Sarah A., Yousef Ahmed H., Salem Ashraf, Multimodal video sentiment analysis using deep learning approaches, a survey, Inf. Fusion 76 (2021) 204–226.
[11]
Quan Zhibang, Sun Tao, Su Mengli, Wei Jishu, Multimodal sentiment analysis based on cross-modal attention and gated cyclic hierarchical fusion networks, Comput. Intell. Neurosci. 2022 (2022).
[12]
Zhang Qiongan, Shi Lei, Liu Peiyu, Zhu Zhenfang, Xu Liancheng, ICDN: integrating consistency and difference networks by transformer for multimodal sentiment analysis, Appl. Intell. (2022) 1–14.
[13]
Zhang Feng, Li Xi-Cheng, Lim Chee Peng, Hua Qiang, Dong Chun-Ru, Zhai Jun-Hai, Deep emotional arousal network for multimodal sentiment analysis and emotion recognition, Inf. Fusion 88 (2022) 296–304.
[14]
Fu Yahui, Okada Shogo, Wang Longbiao, Guo Lili, Song Yaodong, Liu Jiaxing, Dang Jianwu, Context-and knowledge-aware graph convolutional network for multimodal emotion recognition, IEEE MultiMedia 29 (3) (2022) 91–100.
[15]
Shou Yuntao, Meng Tao, Ai Wei, Yang Sihan, Li Keqin, Conversational emotion recognition studies based on graph convolutional neural networks and a dependent syntactic analysis, Neurocomputing 501 (2022) 629–639.
[16]
Sun Hao, Liu Jiaqing, Chen Yen-Wei, Lin Lanfen, Modality-invariant temporal representation learning for multimodal sentiment classification, Inf. Fusion 91 (2023) 504–514.
[17]
Luo Wei, Xu Mengying, Lai Hanjiang, Multimodal reconstruct and align net for missing modality problem in sentiment analysis, in: MultiMedia Modeling: 29th International Conference, MMM 2023, Bergen, Norway, January 9–12, 2023, Proceedings, Part II, Springer, 2023, pp. 411–422.
[18]
Luan Tran, Xiaoming Liu, Jiayu Zhou, Rong Jin, Missing modalities imputation via cascaded residual autoencoder, in: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 1405–1414.
[19]
Lei Cai, Zhengyang Wang, Hongyang Gao, Dinggang Shen, Shuiwang Ji, Deep adversarial learning for multi-modality missing data completion, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2018, pp. 1158–1166.
[20]
Jinming Zhao, Ruichen Li, Qin Jin, Missing modality imagination network for emotion recognition with uncertain missing modalities, in: Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 2608–2618.
[21]
Jiandian Zeng, Tianyi Liu, Jiantao Zhou, Tag-assisted Multimodal Sentiment Analysis under Uncertain Missing Modalities, in: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2022, pp. 1545–1554.
[22]
Arunkumar P.M., Chandramathi Soundararajan, Kannimuthu S., Sentiment analysis-based framework for assessing internet telemedicine videos, Int. J. Data Anal. Tech. Strateg. 11 (4) (2019) 328–336.
[23]
Mai Sijie, Xing Songlong, Hu Haifeng, Analyzing multimodal sentiment via acoustic-and visual-lstm with channel-aware temporal convolution network, IEEE/ACM Trans. Audio, Speech, Lang. Process. 29 (2021) 1424–1437.
[24]
Zhang Bowen, Li Xutao, Xu Xiaofei, Leung Ka-Cheong, Chen Zhiyao, Ye Yunming, Knowledge guided capsule attention network for aspect-based sentiment analysis, IEEE/ACM Trans. Audio, Speech, Lang. Process. 28 (2020) 2538–2551.
[25]
Zheng Chunjun, Wang Chunli, Jia Ning, Emotion recognition model based on multimodal decision fusion, in: Journal of Physics: Conference Series, Vol. 1873, IOP Publishing, 2021.
[26]
Mai Sijie, Zeng Ying, Zheng Shuangjia, Hu Haifeng, Hybrid contrastive learning of tri-modal representation for multimodal sentiment analysis, IEEE Trans. Affect. Comput. (2022).
[27]
Sijie Mai, Haifeng Hu, Songlong Xing, Modality to modality translation: An adversarial representation learning and graph fusion network for multimodal fusion, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 34, 2020, pp. 164–172.
[28]
Zilong Wang, Zhaohong Wan, Xiaojun Wan, Transmodality: An end2end fusion method with transformer for multimodal sentiment analysis, in: Proceedings of the Web Conference 2020, 2020, pp. 2514–2520.
[29]
Kingma Diederik P., Welling Max, Auto-encoding variational bayes, 2013, arXiv preprint arXiv:1312.6114.
[30]
Shang Chao, Palmer Aaron, Sun Jiangwen, Chen Ko-Shin, Lu Jin, Bi Jinbo, VIGAN: Missing view imputation with generative adversarial networks, in: 2017 IEEE International Conference on Big Data (Big Data), IEEE, 2017, pp. 766–775.
[31]
Zhou Tongxue, Canu Stéphane, Vera Pierre, Ruan Su, Feature-enhanced generation and multi-modality fusion based deep neural network for brain tumor segmentation with missing MR modalities, Neurocomputing 466 (2021) 102–112.
[32]
Zhang Changqing, Cui Yajie, Han Zongbo, Zhou Joey Tianyi, Fu Huazhu, Hu Qinghua, Deep partial multi-view learning, IEEE Trans. Pattern Anal. Mach. Intell. (2020).
[33]
Parthasarathy Srinivas, Sundaram Shiva, Training strategies to handle missing modalities for audio-visual expression recognition, in: Companion Publication of the 2020 International Conference on Multimodal Interaction, 2020, pp. 400–404.
[34]
Akbari Hassan, Yuan Liangzhe, Qian Rui, Chuang Wei-Hong, Chang Shih-Fu, Cui Yin, Gong Boqing, Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text, Adv. Neural Inf. Process. Syst. 34 (2021) 24206–24221.
[35]
Han Jing, Zhang Zixing, Ren Zhao, Schuller Björn, Implicit fusion by joint audiovisual training for emotion recognition in mono modality, in: ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2019, pp. 5861–5865.
[36]
Hai Pham, Paul Pu Liang, Thomas Manzini, Louis-Philippe Morency, Barnabás Póczos, Found in translation: Learning robust joint representations by cyclic translations between modalities, in: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 33, 2019, pp. 6892–6899.
[37]
Ziqi Yuan, Wei Li, Hua Xu, Wenmeng Yu, Transformer-based feature reconstruction network for robust multimodal sentiment analysis, in: Proceedings of the 29th ACM International Conference on Multimedia, 2021, pp. 4400–4407.
[38]
Peng Wei, Hong Xiaopeng, Zhao Guoying, Adaptive modality distillation for separable multimodal sentiment analysis, IEEE Intell. Syst. 36 (3) (2021) 82–89.
[39]
Vaswani Ashish, Shazeer Noam, Parmar Niki, Uszkoreit Jakob, Jones Llion, Gomez Aidan N, Kaiser Łukasz, Polosukhin Illia, Attention is all you need, in: Advances in Neural Information Processing Systems, Vol. 30, 2017.
[40]
Zadeh Amir, Zellers Rowan, Pincus Eli, Morency Louis-Philippe, Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages, IEEE Intell. Syst. 31 (6) (2016) 82–88.
[41]
Busso Carlos, Bulut Murtaza, Lee Chi-Chun, Kazemzadeh Abe, Mower Emily, Kim Samuel, Chang Jeannette N, Lee Sungbok, Narayanan Shrikanth S, IEMOCAP: Interactive emotional dyadic motion capture database, Lang. Resour. Eval. 42 (4) (2008) 335–359.
[42]
Baltrusaitis Tadas, Zadeh Amir, Lim Yao Chong, Morency Louis-Philippe, Openface 2.0: Facial behavior analysis toolkit, in: 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018), IEEE, 2018, pp. 59–66.
[43]
Devlin Jacob, Chang Ming-Wei, Lee Kenton, Toutanova Kristina, Bert: Pre-training of deep bidirectional transformers for language understanding, 2018, arXiv preprint arXiv:1810.04805.
[44]
Brian McFee, Colin Raffel, Dawen Liang, Daniel P Ellis, Matt McVicar, Eric Battenberg, Oriol Nieto, librosa: Audio and music signal analysis in python, in: Proceedings of the 14th Python in Science Conference, Vol. 8, 2015, pp. 18–25.
[45]
Kingma Diederik P., Ba Jimmy, Adam: A method for stochastic optimization, 2014, arXiv preprint arXiv:1412.6980.
[46]
Baldi Pierre, Autoencoders, unsupervised learning, and deep architectures, in: Proceedings of ICML Workshop on Unsupervised and Transfer Learning, JMLR Workshop and Conference Proceedings, 2012, pp. 37–49.

Cited By

View all
  • (2024)A unified self-distillation framework for multimodal sentiment analysis with uncertain missing modalitiesProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i9.28871(10074-10082)Online publication date: 20-Feb-2024
  • (2024)Modality Weights Based Fusion Model for Social Perception Prediction in Video, Audio, and TextProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689085(12-19)Online publication date: 28-Oct-2024
  • (2024)Leveraging Knowledge of Modality Experts for Incomplete Multimodal LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681683(438-446)Online publication date: 28-Oct-2024
  • Show More Cited By

Index Terms

  1. Modality translation-based multimodal sentiment analysis under uncertain missing modalities
          Index terms have been assigned to the content through auto-classification.

          Recommendations

          Comments

          Information & Contributors

          Information

          Published In

          cover image Information Fusion
          Information Fusion  Volume 101, Issue C
          Jan 2024
          481 pages

          Publisher

          Elsevier Science Publishers B. V.

          Netherlands

          Publication History

          Published: 01 January 2024

          Author Tags

          1. Multimodal sentiment analysis
          2. Uncertain missing modalities
          3. Modality translation
          4. Transformer

          Qualifiers

          • Research-article

          Contributors

          Other Metrics

          Bibliometrics & Citations

          Bibliometrics

          Article Metrics

          • Downloads (Last 12 months)0
          • Downloads (Last 6 weeks)0
          Reflects downloads up to 08 Feb 2025

          Other Metrics

          Citations

          Cited By

          View all
          • (2024)A unified self-distillation framework for multimodal sentiment analysis with uncertain missing modalitiesProceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence10.1609/aaai.v38i9.28871(10074-10082)Online publication date: 20-Feb-2024
          • (2024)Modality Weights Based Fusion Model for Social Perception Prediction in Video, Audio, and TextProceedings of the 5th on Multimodal Sentiment Analysis Challenge and Workshop: Social Perception and Humor10.1145/3689062.3689085(12-19)Online publication date: 28-Oct-2024
          • (2024)Leveraging Knowledge of Modality Experts for Incomplete Multimodal LearningProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681683(438-446)Online publication date: 28-Oct-2024
          • (2024)Language-Guided Visual Prompt Compensation for Multi-Modal Remote Sensing Image Classification with Modality AbsenceProceedings of the 32nd ACM International Conference on Multimedia10.1145/3664647.3681563(5161-5170)Online publication date: 28-Oct-2024
          • (2024)A multimodal shared network with a cross-modal distribution constraint for continuous emotion recognitionEngineering Applications of Artificial Intelligence10.1016/j.engappai.2024.108413133:PDOnline publication date: 1-Jul-2024

          View Options

          View options

          Figures

          Tables

          Media

          Share

          Share

          Share this Publication link

          Share on social media