Attention fusion network for multimodal sentiment analysis

Luo, Yuanyi; Wu, Rui; Liu, Jiafeng; Tang, Xianglong

doi:10.1007/s11042-023-15762-7

Attention fusion network for multimodal sentiment analysis

Published: 14 June 2023

Volume 83, pages 8207–8217, (2024)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Yuanyi Luo¹,
Rui Wu¹,
Jiafeng Liu¹ &
…
Xianglong Tang¹

921 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The main research problem in multimodal sentiment analysis is to model inter-modality dynamics. However, most of the current work cannot consider enough in this aspect. In this study, we propose a multimodal fusion network MSA-AFN, which considers both multimodal relationships and differences in modal contributions to the task. Specifically, in the feature extraction process, we consider not only the relationship between audio and text, but also the contribution of temporal features to the task. In the process of multimodal fusion, based on the soft attention mechanism, the feature representation of each modality is weighted and connected according to their contribution to the task. We evaluate our proposed approach on the Chinese multimodal sentiment analysis dataset: CH-SIMS. Results show that our model achieves better results than comparison models. Moreover, the performance of some baselines has been improved by 0.28% to 9.5% after adding the component of our network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis

Article 27 June 2024

Text-Oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences

CCMA: CapsNet for audio–video sentiment analysis using cross-modal attention

Article 21 May 2024

Data Availability

The dataset analysed during the current study is available in the MMSA repository. https://github.com/thuiar/MMSA or https://github.com/thuiar/Self-MM

References

Bahdanau D, Cho K, Bengio Y (2015) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473v6
Baltrusaitis T, Ahuja C, Morency LP (2019) Multimodal machine learning: a survey and taxonomy. IEEE Trans Pattern Anal Mach Intell 41(2):423–443. https://doi.org/10.1109/TPAMI.2018.2798607
Article Google Scholar
Cambria E, Hazarika D, Poria S, Hussain A (2017) Benchmarking multimodal sentiment analysis. In: Computational linguistics and intelligent text processing, pp 17–23
Ghosal D, Akhtar MS, Chauhan D, Poria S, Bhattacharyya P (2018) Contextual inter-modal attention for multi-modal sentiment analysis. Proceeding EMNLP. BELGIUM, Brussels, pp 3454–3466
Google Scholar
Huddar MG, Sannakki SS, Rajpurohit VS (2021) Attention-based multimodal contextual fusion for sentiment and emotion classification using bidirectional LSTM. Multimed Tools Appl 80(9):13059–13076
Article Google Scholar
Ling H, Wu J, Huang J et al (2020) Attention-based convolutional neural network for deep face recognition. Multimed Tools Appl 79(9):5595–5616
Article Google Scholar
Liu Z, Shen Y, Lakshminarasimhan VB, Liang PP, Zadeh A, Morency L-P (2018) Efficient Low-rank multimodal fusion with modality-specific factors. Proc ACL. Melbourne, AUSTRALIA, pp 2247–2256
Google Scholar
Liu ZY, Wang JH, Du X, Rao YH, Quan XJ (2021) GSMNet: Global Semantic Memory Network for Aspect-Level Sentiment Classification. IEEE Intell Syst 36(5):122–130
Article Google Scholar
Liu B, Zhang L (2012) A survey of opinion mining and sentiment analysis, mining text data 415–463
Long X, Gan C, Melo G, Liu X, Li YD, Li F, Wen SL (2018) Multimodal Keyless Attention Fusion for Video Classification. In: Proceedings of the AAAI conference on artificial intelligence, New Orleans, LA, pp 7202–7209
Sahay S, Kumar SH, Xia R, Huang J, Nachman L (2018) Multimodal relational tensor network for sentiment and emotion classification. Proceeding Challenge-HML. Melbourne, AUSTRALIA, pp 20–27
Google Scholar
Tembhurne JV, Diwan T (2021) Sentiment analysis in textual, visual and multimodal inputs using recurrent neural networks. Multimed Tools Appl 80(5):6871–6910
Article Google Scholar
Tsai Y, Bai S, Liang PP, Kolter J, Morency LP, Salahutdinov R (2019) Multimodal transformer for unaligned multimodal language sequences. Proceeding ACL. Florence, ITALY, pp 6558–6569
Google Scholar
Xu K, Ba JB, Kiros R, Courville A, Salakhutdinov R, Ze-mel R, Bengio Y (2015) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. 32nd international conference machine learning. Lille, FRANCE, pp 2048–2057
Google Scholar
Ye J, Zhou J, Tian J, Wang R, Zhou J, Gui T, Sentiment-aware multimodal pre-training for multimodal sentiment analysis, Knowledge-Based Systems 258 (110021). https://doi.org/10.1016/j.knosys.2022.110021
You Q, Luo J, Jin H, Yang JC (2015) Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the AAAI conference on artificial intelligence, Austin, TX, pp 381–388
Yu W, Xu H, Yuan Z et al (2021) Learning modality-specific representations with self-supervised multi-task learning for multimodal sentiment analysis. Proc AAAI Conf Artif Intell 35(12):10790–10797
Google Scholar
Yu W, Xu H, Meng FY, Zhu YL, Ma YX, Wu JJ, Zou JY, Yang KC (2020) CH-SIMS: a chinese multimodal sentiment analysis dataset with fine-grained annotation of modality. In: Proceeding ACL, ELECTR NETWORK, pp 3718–3727
Zadeh A, Chen M, Poria S, Cambria E, Morency L-P (2017) Tensor fusion network for multimodal sentiment analysis. Proc EMNLP. DENMARK, Copenhagen, pp 1103–1114
Google Scholar
Zadeh A, Liang PP, Mazumder N, Poria S, Cambria E, Morency LP (2018) Memory fusion network for multi-view sequential learning. In: Proceedings of the AAAI conference on artificial intelligence, New Orleans, Louisiana, USA, pp 5634—5641

Download references

Acknowledgements

This research is supported by the National Natural Science Foundation of China (No: 61672190)

Author information

Authors and Affiliations

Harbin Institute of Technology, Harbin, 150001, China
Yuanyi Luo, Rui Wu, Jiafeng Liu & Xianglong Tang

Authors

Yuanyi Luo
View author publications
You can also search for this author in PubMed Google Scholar
Rui Wu
View author publications
You can also search for this author in PubMed Google Scholar
Jiafeng Liu
View author publications
You can also search for this author in PubMed Google Scholar
Xianglong Tang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui Wu.

Ethics declarations

Conflicts of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Luo, Y., Wu, R., Liu, J. et al. Attention fusion network for multimodal sentiment analysis. Multimed Tools Appl 83, 8207–8217 (2024). https://doi.org/10.1007/s11042-023-15762-7

Download citation

Received: 04 July 2022
Revised: 27 April 2023
Accepted: 02 May 2023
Published: 14 June 2023
Issue Date: January 2024
DOI: https://doi.org/10.1007/s11042-023-15762-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention fusion network for multimodal sentiment analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis

Text-Oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences

CCMA: CapsNet for audio–video sentiment analysis using cross-modal attention

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Attention fusion network for multimodal sentiment analysis

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A transformer-encoder-based multimodal multi-attention fusion network for sentiment analysis

Text-Oriented Modality Reinforcement Network for Multimodal Sentiment Analysis from Unaligned Multimodal Sequences

CCMA: CapsNet for audio–video sentiment analysis using cross-modal attention

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation