Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Hybrid Representation and Decision Fusion towards Visual-textual Sentiment

Published: 01 April 2023 Publication History

Abstract

The rising use of online media has changed social customs of the public. Users have become gradually accustomed to sharing daily experiences and publishing personal opinions on social networks. Social data carrying with emotions and attitudes have provided significant decision support for numerous tasks in sentiment analysis. Conventional sentiment analysis methods only concern about textual modality and are vulnerable to the multimodal scenario, while common multimodal approaches only focus on the interactive relationship between modalities without considering unique intra-modal information. A hybrid fusion network is proposed in this work to capture both the inter-modal and intra-modal features. First, in the intermediate fusion stage, a multi-head visual attention is proposed to extract accurate semantic and sentimental information from textual embedding representations with the assistance of visual features. Then, multiple base classifiers are trained to learn independent and diverse discriminative information from different modal representations in the late fusion stage. The final decision is determined based on fusing the decision supports from base classifiers via a decision fusion method. To improve the generalization of our hybrid fusion network, a similarity loss is employed to inject decision diversity into the whole model. Empirical results on multimodal datasets have demonstrated the proposed model achieves a higher accuracy and better generalization compared with baselines for multimodal sentiment analysis.

References

[1]
Tadas Baltrusaitis, Chaitanya Ahuja, and Louis Philippe Morency. 2019. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 41, 2 (2019), 423–443.
[2]
Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomás Mikolov. 2017. Enriching word vectors with subword information. Trans. Assoc. Computat. Ling. 5 (2017), 135–146.
[3]
Damian Borth, Rongrong Ji, Tao Chen, Thomas M. Breuel, and Shih-Fu Chang. 2013. Large-scale visual sentiment ontology and detectors using adjective noun pairs. In Proceedings of the 13th ACM Multimedia Conference. ACM, 223–232.
[4]
Tao Chen, Hany M. SalahEldeen, Xiangnan He, Min Yen Kan, and Dongyuan Lu. 2015. VELDA: Relating an image tweet’s text and images. In Proceedings of the 29th AAAI Conference on Artificial Intelligence. ACM, 30–36.
[5]
Xingyue Chen, Yunhong Wang, and Qingjie Liu. 2017. Visual and textual sentiment analysis using deep fusion convolutional neural networks. In Proceedings of IEEE International Conference on Image Processing. IEEE, 1557–1561.
[6]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 4171–4186.
[7]
Valentin Gabeur, Chen Sun, Karteek Alahari, and Cordelia Schmid. 2020. Multi-modal transformer for video retrieval. In Proceedings of the 16th European Conference of Computer Vision. Springer, 214–229.
[8]
Matthieu Guillaumin, Jakob J. Verbeek, and Cordelia Schmid. 2010. Multimodal semi-supervised learning for image classification. In Proceedings of the 23rd IEEE Conference on Computer Vision and Pattern Recognition. IEEE Computer Society, 902–909.
[9]
Anthony Hu and Seth R. Flaxman. 2018. Multimodal sentiment analysis to explore the structure of emotions. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. ACM, 350–358.
[10]
Feiran Huang, Xiaoming Zhang, Zhonghua Zhao, Jie Xu, and Zhoujun Li. 2019. Image-text sentiment analysis via deep multimodal attentive fusion. Knowl.-based Syst. 167 (2019), 26–37.
[11]
Ayush Kumar and Jithendra Vepa. 2020. Gated mechanism for attention based multi modal sentiment analysis. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 4477–4481.
[12]
Hung Le, Doyen Sahoo, Nancy F. Chen, and Steven C. H. Hoi. 2019. Multimodal transformer networks for end-to-end video-grounded dialogue systems. In Proceedings of the 57th Conference of the Association for Computational Linguistics. ACL, 5612–5623.
[13]
Tianliang Liu, Junwei Wan, Xiubin Dai, Feng Liu, Quanzeng You, and Jiebo Luo. 2020. Sentiment recognition for short annotated GIFs using visual-textual fusion. IEEE Trans. Multim. 22, 4 (2020), 1098–1110.
[14]
Zhun Liu, Ying Shen, Varun Bharadhwaj Lakshminarasimhan, Paul Pu Liang, Amir Zadeh, and Louis-Philippe Morency. 2018. Efficient low-rank multimodal fusion with modality-specific factors. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. ACL, 2247–2256.
[15]
Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global vectors for word representation. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. 1532–1543.
[16]
Veronica Perez-Rosas, Rada Mihalcea, and Louis-Philippe Morency. 2013. Utterance-level multimodal sentiment analysis. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics. ACL, 973–982.
[17]
Soujanya Poria, Iti Chaturvedi, Erik Cambria, and Amir Hussain. 2016. Convolutional MKL based multimodal emotion recognition and sentiment analysis. In Proceedings of the 16th IEEE International Conference on Data Mining. IEEE, 439–448.
[18]
Soujanya Poria, Devamanyu Hazarika, Navonil Majumder, and Rada Mihalcea. 2020. Beneath the tip of the iceberg: Current challenges and new directions in sentiment analysis research. IEEE Trans. Affect. Comput. (2020).
[19]
Stefan Siersdorfer, Enrico Minack, Fan Deng, and Jonathon S. Hare. 2010. Analyzing and predicting sentiment of images on the social web. In Proceedings of the 18th International Conference on Multimedia. ACM, 715–718.
[20]
Karen Simonyan and Andrew Zisserman. 2015. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations.
[21]
Duyu Tang, Bing Qin, and Ting Liu. 2015. Document modeling with gated recurrent neural network for sentiment classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 1422–1432.
[22]
Quoc Tuan Truong and Hady W. Lauw. 2019. VistaNet: Visual aspect attention network for multimodal sentiment analysis. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. ACM, 305–312.
[23]
Sunny Verma, Chen Wang, Liming Zhu, and Wei Liu. 2019. DeepCU: Integrating both common and unique latent information for multimodal sentiment analysis. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. ACM, 3627–3634.
[24]
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-art natural language processing. In Proceedings of the Conference on Empirical Methods in Natural Language Processing: System Demonstrations. ACL, 38–45.
[25]
Jie Xu, Feiran Huang, Xiaoming Zhang, Senzhang Wang, Chaozhuo Li, Zhoujun Li, and Yueying He. 2019. Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowl.-based Syst. 178 (2019), 61–73.
[26]
Nan Xu, Wenji Mao, and Guandan Chen. 2019. Multi-interactive memory network for aspect based multimodal sentiment analysis. In Proceedings of the 33rd AAAI Conference on Artificial Intelligence. AAAI Press, 371–378.
[27]
Li Yang, Jianfei Yu, Chengzhi Zhang, and Jin-Cheon Na. 2021. Fine-grained sentiment analysis of political tweets with entity-aware multimodal network. In Proceedings of the 16th International Conference on Diversity, Divergence, and Dialogue, Vol. 12645. Springer, 411–420.
[28]
Zichao Yang, Diyi Yang, Chris Dyer, Xiaodong He, Alex Smola, and Eduard Hovy. 2016. Hierarchical attention networks for document classification. In Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL, 1480–1489.
[29]
Quanzeng You, Liangliang Cao, Hailin Jin, and Jiebo Luo. 2016. Robust visual-textual sentiment analysis: When attention meets tree-structured recursive neural networks. In Proceedings of the 24th ACM Conference on Multimedia Conference. ACM, 1008–1017.
[30]
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2015. Joint visual-textual sentiment analysis with deep neural networks. In Proceedings of the 23rd Annual ACM Conference on Multimedia Conference. ACM, 1071–1074.
[31]
Quanzeng You, Jiebo Luo, Hailin Jin, and Jianchao Yang. 2016. Cross-modality consistent regression for joint visual-textual sentiment analysis of social multimedia. In Proceedings of the 9th ACM International Conference on Web Search and Data Mining. ACM, 13–22.
[32]
Jianfei Yu and Jing Jiang. 2019. Adapting BERT for target-oriented multimodal sentiment classification. In Proceedings of the 28th International Joint Conference on Artificial Intelligence. IJCAI, 5408–5414.
[33]
Jianfei Yu, Jing Jiang, and Rui Xia. 2020. Entity-sensitive attention and fusion network for entity-level multimodal sentiment classification. IEEE/ACM Trans. Audio Speech. Lang. Process. 28 (2020), 429–439.
[34]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. ACL, 1103–1114.
[35]
Amir Zadeh, Paul Pu Liang, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2018. Multimodal language analysis in the wild: CMU-MOSEI dataset and interpretable dynamic fusion graph. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics. ACL, 2236–2246.
[36]
Amir Zadeh, Paul Pu Liang, Soujanya Poria, Prateek Vij, Erik Cambria, and Louis-Philippe Morency. 2018. Multi-attention recurrent network for human communication comprehension. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. AAAI, 5642–5649.
[37]
Amir Zadeh, Rowan Zellers, Eli Pincus, and Louis-Philippe Morency. 2016. Multimodal sentiment intensity analysis in videos: Facial gestures and verbal messages. IEEE Intell. Syst. 31, 6 (2016), 82–88.

Cited By

View all
  • (2024)An Efficient Aspect-based Sentiment Classification with Hybrid Word Embeddings and CNN FrameworkInternational Journal of Sensors, Wireless Communications and Control10.2174/012210327927518823120509400714:1(45-54)Online publication date: Mar-2024
  • (2024)SKEDS — An external knowledge supported logistic regression approach for document-level sentiment classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121987238:PDOnline publication date: 27-Feb-2024
  • (2024)Dynamic multi-scale topological representation for enhancing network intrusion detectionComputers and Security10.1016/j.cose.2023.103516135:COnline publication date: 10-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 14, Issue 3
June 2023
451 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/3587032
  • Editor:
  • Huan Liu
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 April 2023
Online AM: 06 February 2023
Accepted: 23 January 2023
Revised: 13 October 2022
Received: 04 April 2021
Published in TIST Volume 14, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Decision fusion
  2. multimodal
  3. representation fusion
  4. social network

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)161
  • Downloads (Last 6 weeks)5
Reflects downloads up to 02 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)An Efficient Aspect-based Sentiment Classification with Hybrid Word Embeddings and CNN FrameworkInternational Journal of Sensors, Wireless Communications and Control10.2174/012210327927518823120509400714:1(45-54)Online publication date: Mar-2024
  • (2024)SKEDS — An external knowledge supported logistic regression approach for document-level sentiment classificationExpert Systems with Applications: An International Journal10.1016/j.eswa.2023.121987238:PDOnline publication date: 27-Feb-2024
  • (2024)Dynamic multi-scale topological representation for enhancing network intrusion detectionComputers and Security10.1016/j.cose.2023.103516135:COnline publication date: 10-Jan-2024
  • (2024)Abstraction and decision fusion architecture for resource-aware image understanding with application on handwriting character classificationApplied Soft Computing10.1016/j.asoc.2024.111813162(111813)Online publication date: Sep-2024
  • (2023)Active learning for data streams: a surveyMachine Language10.1007/s10994-023-06454-2113:1(185-239)Online publication date: 20-Nov-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media