research-article

Hybrid Representation and Decision Fusion towards Visual-textual Sentiment

Authors:

Qingkui ZengAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology, Volume 14, Issue 3

Article No.: 48, Pages 1 - 17

https://doi.org/10.1145/3583076

Published: 01 April 2023 Publication History

Abstract

The rising use of online media has changed social customs of the public. Users have become gradually accustomed to sharing daily experiences and publishing personal opinions on social networks. Social data carrying with emotions and attitudes have provided significant decision support for numerous tasks in sentiment analysis. Conventional sentiment analysis methods only concern about textual modality and are vulnerable to the multimodal scenario, while common multimodal approaches only focus on the interactive relationship between modalities without considering unique intra-modal information. A hybrid fusion network is proposed in this work to capture both the inter-modal and intra-modal features. First, in the intermediate fusion stage, a multi-head visual attention is proposed to extract accurate semantic and sentimental information from textual embedding representations with the assistance of visual features. Then, multiple base classifiers are trained to learn independent and diverse discriminative information from different modal representations in the late fusion stage. The final decision is determined based on fusing the decision supports from base classifiers via a decision fusion method. To improve the generalization of our hybrid fusion network, a similarity loss is employed to inject decision diversity into the whole model. Empirical results on multimodal datasets have demonstrated the proposed model achieves a higher accuracy and better generalization compared with baselines for multimodal sentiment analysis.

References

[1]

Tadas Baltrusaitis, Chaitanya Ahuja, and Louis Philippe Morency. 2019. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2 (2019), 423–443.

Digital Library

[2]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Computat. Ling. 5 (2017), 135–146.

[3]

Damian Borth, Rongrong Ji, Tao Chen, Thomas M. Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 13th ACM Multimedia Conference. ACM, 223–232.

Digital Library

[4]

Tao Chen, Hany M. SalahEldeen, Xiangnan He, Min Yen Kan, and Dongyuan Lu. 2015. VELDA: Relating an image tweet’s text and images. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. ACM, 30–36.

[5]

Xingyue Chen, Yunhong Wang, and Qingjie Liu. 2017. Visual and textual sentiment analysis using deep fusion convolutional neural networks. In Proceedings of IEEE International Conference on Image Processing. IEEE, 1557–1561.

Digital Library

[6]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.

[7]

Valentin Gabeur, Chen Sun, Karteek Alahari, and Cordelia Schmid. 2020. Multi-modal transformer for video retrieval. In Proceedings of the 16th European Conference of Computer Vision. Springer, 214–229.

Digital Library

[8]

Matthieu Guillaumin, Jakob J. Verbeek, and Cordelia Schmid. 2010. Multimodal semi-supervised learning for image classification. In Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 902–909.

[9]

Anthony Hu and Seth R. Flaxman. 2018. Multimodal sentiment analysis to explore the structure of emotions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 350–358.

Digital Library

[10]

Feiran Huang, Xiaoming Zhang, Zhonghua Zhao, Jie Xu, and Zhoujun Li. 2019. Image-text sentiment analysis via deep multimodal attentive fusion. Knowl.-based Syst. 167 (2019), 26–37.

Digital Library

[11]

Ayush Kumar and Jithendra Vepa. 2020. Gated mechanism for attention based multi modal sentiment analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 4477–4481.

[12]

Hung Le, Doyen Sahoo, Nancy F. Chen, and Steven C. H. Hoi. 2019. Multimodal transformer networks for end-to-end video-grounded dialogue systems. In Proceedings of the 57th Conference of the Association for Computational Linguistics. ACL, 5612–5623.

[13]

Tianliang Liu, Junwei Wan, Xiubin Dai, Feng Liu, Quanzeng You, and Jiebo Luo. 2020. Sentiment recognition for short annotated GIFs using visual-textual fusion. IEEE Trans. Multim. 22, 4 (2020), 1098–1110.

Digital Library

[14]

Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency. 2018. Efficient low-rank multimodal fusion with modality-specific factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. ACL, 2247–2256.

[15]

Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.

[16]

Veronica Perez-Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Utterance-level multimodal sentiment analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. ACL, 973–982.

[17]

Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Proceedings of the 16th IEEE International Conference on Data Mining. IEEE, 439–448.

[18]

Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, and Rada Mihalcea. 2020. Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Trans. Affect. Comput. (2020).

[19]

Stefan Siersdorfer, Enrico Minack, Fan Deng, and Jonathon S. Hare. 2010. Analyzing and predicting sentiment of images on the social web. In Proceedings of the 18th International Conference on Multimedia. ACM, 715–718.

Digital Library

[20]

Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations.

[21]

Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 1422–1432.

[22]

Quoc Tuan Truong and Hady W. Lauw. 2019. VistaNet: Visual aspect attention network for multimodal sentiment analysis. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. ACM, 305–312.

Digital Library

[23]

Sunny Verma, Chen Wang, Liming Zhu, and Wei Liu. 2019. DeepCU: Integrating both common and unique latent information for multimodal sentiment analysis. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. ACM, 3627–3634.

[24]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. ACL, 38–45.

[25]

Jie Xu, Feiran Huang, Xiaoming Zhang, Senzhang Wang, Chaozhuo Li, Zhoujun Li, and Yueying He. 2019. Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowl.-based Syst. 178 (2019), 61–73.

Digital Library

[26]

Nan Xu, Wenji Mao, and Guandan Chen. 2019. Multi-interactive memory network for aspect based multimodal sentiment analysis. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI Press, 371–378.

Digital Library

[27]

Li Yang, Jianfei Yu, Chengzhi Zhang, and Jin-Cheon Na. 2021. Fine-grained sentiment analysis of political tweets with entity-aware multimodal network. In Proceedings of the 16th International Conference on Diversity, Divergence, and Dialogue, Vol. 12645. Springer, 411–420.

Digital Library

[28]

Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 1480–1489.

[29]

Quanzeng You, Liangliang Cao, Hailin Jin, and Jiebo Luo. 2016. Robust visual-textual sentiment analysis: When attention meets tree-structured recursive neural networks. In Proceedings of the 24th ACM Conference on Multimedia Conference. ACM, 1008–1017.

Digital Library

[30]

Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Joint visual-textual sentiment analysis with deep neural networks. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference. ACM, 1071–1074.

Digital Library

[31]

Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 13–22.

Digital Library

[32]

Jianfei Yu and Jing Jiang. 2019. Adapting BERT for target-oriented multimodal sentiment classification. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. IJCAI, 5408–5414.

[33]

Jianfei Yu, Jing Jiang, and Rui Xia. 2020. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans. Audio Speech. Lang. Process. 28 (2020), 429–439.

Digital Library

[34]

Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 1103–1114.

[35]

Amir Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. ACL, 2236–2246.

[36]

Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, and Louis-Philippe Morency. 2018. Multi-attention recurrent network for human communication comprehension. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI, 5642–5649.

[37]

Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intell. Syst. 31, 6 (2016), 82–88.

Digital Library

Cited By

Agrawal MMoparthi N(2024)An Efficient Aspect-based Sentiment Classification with Hybrid Word Embeddings and CNN FrameworkInternational Journal of Sensors, Wireless Communications and Control10.2174/012210327927518823120509400714:1(45-54)Online publication date: Mar-2024
https://doi.org/10.2174/0122103279275188231205094007
Wasi NAbulaish M(2024)SKEDS — An external knowledge supported logistic regression approach for document-level sentiment classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121987238:PDOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121987
Zhong MLin MHe Z(2024)Dynamic multi-scale topological representation for enhancing network intrusion detectionComputers and Security10.1016/j.cose.2023.103516135:COnline publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1016/j.cose.2023.103516
Show More Cited By

Index Terms

Hybrid Representation and Decision Fusion towards Visual-textual Sentiment
1. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Sentiment analysis
    2. Specialized information retrieval
      1. Multimedia and multimodal retrieval

Recommendations

Integrated microblog sentiment analysis from users' social interaction patterns and textual opinions

Traditional post-level opinion classification methods usually fail to capture a person's overall sentiment orientation toward a topic from his/her microblog posts published for a variety of themes related to that topic. One reason for this is that the ...
Multi-attention Fusion for Multimodal Sentiment Classification
MVRMLM '24: Proceedings of 2024 ACM ICMR Workshop on Multimodal Video Retrieval

Multimodal sentiment analysis (MSA) aims to leverage information from multiple modalities (e.g., text, image, etc.) to improve sentiment classification accuracy compared to solely analyzing each modality independently. Most previous works only rely on a ...
Multi-modal Fusion for Video Sentiment Analysis
MuSe'20: Proceedings of the 1st International on Multimodal Sentiment Analysis in Real-life Media Challenge and Workshop

Automatic sentiment analysis can support revealing a subject's emotional state and opinion tendency toward an entity. In this paper, we present our solutions for the MuSe-Wild sub-challenge of Multimodal Sentiment Analysis in Real-life Media (MuSe) ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 14, Issue 3

June 2023

451 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/3587032

Editor:
Huan Liu
Arizona State University, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2023

Online AM: 06 February 2023

Accepted: 23 January 2023

Revised: 13 October 2022

Received: 04 April 2021

Published in TIST Volume 14, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
355
Total Downloads

Downloads (Last 12 months)161
Downloads (Last 6 weeks)5

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Agrawal MMoparthi N(2024)An Efficient Aspect-based Sentiment Classification with Hybrid Word Embeddings and CNN FrameworkInternational Journal of Sensors, Wireless Communications and Control10.2174/012210327927518823120509400714:1(45-54)Online publication date: Mar-2024
https://doi.org/10.2174/0122103279275188231205094007
Wasi NAbulaish M(2024)SKEDS — An external knowledge supported logistic regression approach for document-level sentiment classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121987238:PDOnline publication date: 27-Feb-2024
https://dl.acm.org/doi/10.1016/j.eswa.2023.121987
Zhong MLin MHe Z(2024)Dynamic multi-scale topological representation for enhancing network intrusion detectionComputers and Security10.1016/j.cose.2023.103516135:COnline publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1016/j.cose.2023.103516
Fallah MNajafi MGorgin SLee J(2024)Abstraction and decision fusion architecture for resource-aware image understanding with application on handwriting character classificationApplied Soft Computing10.1016/j.asoc.2024.111813162(111813)Online publication date: Sep-2024
https://doi.org/10.1016/j.asoc.2024.111813
Cacciarelli DKulahci M(2023)Active learning for data streams: a surveyMachine Language10.1007/s10994-023-06454-2113:1(185-239)Online publication date: 20-Nov-2023
https://dl.acm.org/doi/10.1007/s10994-023-06454-2

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents