Abstract
With the popularity of smart devices and online social media platforms, people are expressing their views in various modalities like text, images, and audio. Thus, recent research in sentiment analysis is no more limited to one modality of information only, rather it compiles all the available modalities to predict more correct sentiment. Multimodal sentiment analysis (MSA) is the process of extracting sentiment from various modalities such as text, images, and audio. Existing research works predict the sentiment of individual modalities independently and these predictions leverage the final sentiment. This paper presents an MSA approach for obtaining the final sentiment of an image-text tweet using multimodal decision-level fusion by incorporating features of individual modalities and inter-modal semantic relations. A dataset is prepared from an existing benchmark MSA dataset by annotating the final sentiment to tweets as a whole after assessing all the modalities. The proposed approach is experimented on this dataset and compared with state-of-the-art MSA methods. The in-depth analysis of the comparison results shows that the proposed approach outperforms existing methods in terms of accuracy, and F1-score.
Similar content being viewed by others
Data Availibility Statement
We have collected our data from MVSA-Single (MVSA-S) dataset which is publicly available and the detail is also mentioned in Section 3. Collected data is further annotated to fit in our current research work which may be available on request to the corresponding author.
References
Abdi H, Williams LJ (2010) Principal component analysis. WIREs. Comput Stat 2(4):433–459. https://doi.org/10.1002/wics.101, https://onlinelibrary.wiley.com/doi/pdf/10.1002/wics.101
Barbieri F, Camacho-Collados J, Neves L, et al. (2020) TweetEval: Unified Benchmark and Comparative Evaluation for Tweet Classification. https://doi.org/10.48550/arXiv.2010.12421, arXiv:2010.12421
Barrett LF, Lindquist KA, Gendron M (2007) Language as context for the perception of emotion. Trends Cognit Sci 11(8):327–332. https://doi.org/10.1016/j.tics.2007.06.003, https://www.sciencedirect.com/science/article/pii/S1364661307001532
Borth D, Ji R, Chen T, et al. (2013) Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’13, pp 223–232, https://doi.org/10.1145/2502081.2502282
Cai G, Xia B (2015) Convolutional neural networks for multimedia sentiment analysis. In: natural language processing and chinese computing: 4th CCF Conference, NLPCC 2015, Nanchang, China, October 9-13, 2015, Proceedings 4, Springer, pp 159–167
Caschera MC, Grifoni P, Ferri F (2022) Emotion classification from speech and text in videos using a multimodal approach. Multimod Technol Interact 6(4):28
Castellano G, Kessous L, Caridakis G (2008) Emotion Recognition through Multiple Modalities: Face, Body Gesture, Speech. In: Peter C, Beale R (eds) Affect and Emotion in Human-Computer Interaction: From Theory to Applications. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg, p 92–103, https://doi.org/10.1007/978-3-540-85099-1_8
Cheema GS, Hakimov S, Müller-Budack E, et al. (2021) A fair and comprehensive comparison of multimodal tweet sentiment analysis methods. In: Proceedings of the 2021 Workshop on Multi-Modal Pre-Training for Multimedia Understanding, pp 37–45
Chen T, Borth D, Darrell T, et al. (2014) DeepSentiBank: Visual Sentiment Concept Classification with Deep Convolutional Neural Networks. https://doi.org/10.48550/arXiv.1410.8586, arXiv:1410.8586
Chen T, Yu FX, Chen J, et al. (2014) Object-Based Visual Sentiment Concept Analysis and Application. In: Proceedings of the 22nd ACM international conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’14, pp 367–376, https://doi.org/10.1145/2647868.2654935
Das R, Singh TD (2023) Multimodal sentiment analysis: A survey of methods, trends and challenges. ACM Comput Surv
Dave K, Lawrence S, Pennock DM (2003) Mining the peanut gallery: Opinion extraction and semantic classification of product reviews. In: Proceedings of the 12th international conference on World Wide Web, pp 519–528
Devlin J, Chang MW, Lee K, et al. (2019) BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, pp 4171–4186, https://doi.org/10.18653/v1/N19-1423, https://aclanthology.org/N19-1423
El-Sappagh S, Saleh H, Sahal R et al (2021) Alzheimer’s disease progression detection model based on an early fusion of cost-effective multimodal data. Future Generation Comput Syst 115:680–699. https://doi.org/10.1016/j.future.2020.10.005, https://www.sciencedirect.com/science/article/pii/S0167739X20329824
Fan RE, Chang KW, Hsieh CJ et al (2008) Liblinear: A library for large linear classification. J Mach Learn Res 9:1871–1874
Gandhi A, Adhvaryu K, Khanduja V (2021) Multimodal sentiment analysis: Review, application domains and future directions. In: 2021 IEEE Pune Section International Conference (PuneCon), pp 1–5, https://doi.org/10.1109/PuneCon52575.2021.9686504
Gandhi A, Adhvaryu K, Poria S, et al. (2022) Multimodal sentiment analysis: A systematic review of history, datasets, multimodal fusion methods, applications, challenges and future directions. Information Fusion
Gkoumas D, Li Q, Lioma C et al (2021) What makes the difference? An empirical comparison of fusion strategies for multimodal language analysis. Inf Fusion 66:184–197. https://doi.org/10.1016/j.inffus.2020.09.005, https://www.sciencedirect.com/science/article/pii/S1566253520303675
Goel A, Gautam J, Kumar S (2016) Real time sentiment analysis of tweets using naive bayes. In: 2016 2nd international conference on next generation computing technologies (NGCT), IEEE, pp 257–261
Huang F, Zhang X, Zhao Z et al (2019) Image-text sentiment analysis via deep multimodal attentive fusion. Knowledge-Based Systems 167:26–37. https://doi.org/10.1016/j.knosys.2019.01.019, https://www.sciencedirect.com/science/article/pii/S095070511930019X
Huang F, Wei K, Weng J, et al. (2020) Attention-Based Modality-Gated Networks for Image-Text Sentiment Analysis. ACM Trans Multimed Comput, Commun Appl 16(3):79:1–79:19. https://doi.org/10.1145/3388861
Huddar MG, Sannakki SS, Rajpurohit VS (2020) Multi-level context extraction and attention-based contextual inter-modal fusion for multimodal sentiment analysis and emotion classification. Int J Multimed Inf Retrieval 9(2):103–112. https://doi.org/10.1007/s13735-019-00185-8
Jain PK, Pamula R, Srivastava G (2021) A systematic literature review on machine learning applications for consumer sentiment analysis using online reviews. Comput Sci Rev 41:100413. https://doi.org/10.1016/j.cosrev.2021.100413
Jiang T, Wang J, Liu Z, et al. (2020) Fusion-Extraction Network for Multimodal Sentiment Analysis. In: Lauw HW, Wong RCW, Ntoulas A, et al. (eds) Advances in Knowledge Discovery and Data Mining. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 785–797, https://doi.org/10.1007/978-3-030-47436-2_59
Joachims T (1998) (2005) Text categorization with support vector machines: Learning with many relevant features. Machine Learning: ECML-98: 10th European Conference on Machine Learning Chemnitz, Germany, April 21–23. Proceedings, Springer, pp 137–142
Kaur R, Kautish S (2022). Multimodal Sentiment Analysis: A Survey and Comparison. https://doi.org/10.4018/978-1-6684-6303-1.ch098, iSBN: 9781668463031 Pages: 1846-1870 Publisher: IGI Global
Kim Y (2014) Convolutional neural networks for sentence classification. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, pp 1746–1751, https://doi.org/10.3115/v1/D14-1181, https://aclanthology.org/D14-1181
Li J, Selvaraju R, Gotmare A, et al. (2021) Align before Fuse: Vision and Language Representation Learning with Momentum Distillation. Advances in Neural Information Processing Systems 34:9694–9705. https://proceedings.neurips.cc/paper/2021/hash/505259756244493872b7709a8a01b536-Abstract.html
Li J, Li D, Xiong C, et al. (2022) BLIP: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: Chaudhuri K, Jegelka S, Song L, et al. (eds) Proceedings of the 39th International Conference on Machine Learning, Proceedings of Machine Learning Research, vol 162. PMLR, pp 12888–12900, https://proceedings.mlr.press/v162/li22n.html
Liao W, Zeng B, Yin X et al (2021) An improved aspect-category sentiment analysis model for text sentiment analysis based on roberta. Appl Intell 51:3522–3533
Liao W, Zeng B, Liu J et al (2022) Image-text interaction graph neural network for image-text sentiment analysis. Appl Intell 52(10):11184–11198
Ligthart A, Catal C, Tekinerdogan B (2021) Systematic reviews in sentiment analysis: a tertiary study. Artif Intell Rev 54(7):4997–5053. https://doi.org/10.1007/s10462-021-09973-3
Liu M, Zhang L, Liu Y et al (2017) Recognizing semantic correlation in image-text weibo via feature space mapping. Comput Vision Image Understand 163:58–66
Liu Y, Ott M, Goyal N, et al. (2019) RoBERTa: A Robustly Optimized BERT Pretraining Approach. https://doi.org/10.48550/arXiv.1907.11692, arXiv:1907.11692
Lu X, Suryanarayan P, Adams RB, et al. (2012) On shape and the computability of emotions. In: Proceedings of the 20th ACM international conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’12, pp 229–238, https://doi.org/10.1145/2393347.2393384,
Machajdik J, Hanbury A (2010) Affective image classification using features inspired by psychology and art theory. In: Proceedings of the 18th ACM international conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’10, pp 83–92, https://doi.org/10.1145/1873951.1873965,
Miaschi A, Dell’Orletta F (2020) Contextual and non-contextual word embeddings: an in-depth linguistic investigation. In: Proceedings of the 5th Workshop on Representation Learning for NLP. Association for Computational Linguistics, Online, pp 110–119, https://doi.org/10.18653/v1/2020.repl4nlp-1.15, https://aclanthology.org/2020.repl4nlp-1.15
Mikolov T, Sutskever I, Chen K, et al. (2013) Distributed representations of words and phrases and their compositionality. Adv Neural Inf Process Syst 26
Niu T, Zhu S, Pang L, et al. (2016) Sentiment Analysis on Multi-View Social Data. In: Tian Q, Sebe N, Qi GJ, et al. (eds) MultiMedia Modeling. Springer International Publishing, Cham, Lecture Notes in Computer Science, pp 15–27, https://doi.org/10.1007/978-3-319-27674-8_2
Pang B, Lee L (2004) A sentimental education: Sentiment analysis using subjectivity summarization based on minimum cuts. In: Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics. Association for Computational Linguistics, USA, ACL ’04, p 271-es, https://doi.org/10.3115/1218955.1218990
Pang B, Lee L, Vaithyanathan S (2002) Thumbs up? sentiment classification using machine learning techniques. arXiv: cs/0205070
Poria S, Chaturvedi I, Cambria E, et al. (2016) Convolutional MKL Based Multimodal Emotion Recognition and Sentiment Analysis. In: 2016 IEEE 16th international conference on data mining (ICDM), pp 439–448, https://doi.org/10.1109/ICDM.2016.0055, iSSN: 2374-8486
Pérez Rosas V, Mihalcea R, Morency LP (2013) Multimodal Sentiment Analysis of Spanish Online Videos. IEEE Intelligent Systems 28(3):38–45. https://doi.org/10.1109/MIS.2013.9, conference Name: IEEE Intelligent Systems
Radford A, Narasimhan K, Salimans T, et al. (2018) Improving language understanding by generative pre-training
Riaz S, Fatima M, Kamran M et al (2019) Opinion mining on large scale data using sentiment analysis and k-means clustering. Cluster Comput 22:7149–7164
Rogers S (2014) What fuels a tweet’s engagement? twitter
Sanagar S, Gupta D (2020) Unsupervised genre-based multidomain sentiment lexicon learning using corpus-generated polarity seed words. IEEE Access 8:118050–118071
Sebastiani F, Esuli A (2006) Sentiwordnet: A publicly available lexical resource for opinion mining. In: Proceedings of the 5th international conference on language resources and evaluation, European Language Resources Association (ELRA) Genoa, Italy, pp 417–422
Setiawan E, Juwiantho H, Santoso J, et al. (2021) Multiview sentiment analysis with image-text-concept features of indonesian social media posts. International Journal of Intelligent Engineering and Systems 14(2):521–535. https://doi.org/10.22266/ijies2021.0430.47, publisher Copyright: 2021, Int J Intell Eng Syst. All Rigts Reserved
She D, Yang J, Cheng MM et al (2020) WSCNet: Weakly Supervised Coupled Networks for Visual Sentiment Classification and Detection. IEEE Trans Multimed 22(5):1358–1371. https://doi.org/10.1109/TMM.2019.2939744, conference Name: IEEE Trans Multimed
Smith R (2007) An overview of the tesseract ocr engine. In: Ninth international conference on document analysis and recognition (ICDAR 2007), IEEE, pp 629–633
Snoek CGM, Worring M (2009) Concept-Based Video Retrieval. Foundations and Trends® in Information Retrieval 2(4):215–322. https://doi.org/10.1561/1500000014, https://www.nowpublishers.com/article/Details/INR-014, publisher: Now Publishers, Inc
Soleymani M, Garcia D, Jou B et al (2017) A survey of multimodal sentiment analysis. Image Vision Comput 65:3–14. https://doi.org/10.1016/j.imavis.2017.08.003, https://www.sciencedirect.com/science/article/pii/S0262885617301191
Sun C, Huang L, Qiu X (2019) Utilizing bert for aspect-based sentiment analysis via constructing auxiliary sentence. arXiv:1903.09588
Taboada M, Brooke J, Tofiloski M et al (2011) Lexicon-based methods for sentiment analysis. Computat Linguist 37(2):267–307
Tai KS, Socher R, Manning CD (2015) Improved semantic representations from tree-structured long short-term memory networks. arXiv:1503.00075
Tang D, Qin B, Liu T (2015) Document modeling with gated recurrent neural network for sentiment classification. In: Proceedings of the 2015 conference on empirical methods in natural language processing, pp 1422–1432
Vaswani A, Shazeer N, Parmar N, et al. (2017) Attention is All you Need. In: Advances in Neural Information Processing Systems, vol 30. Curran Associates, Inc., https://proceedings.neurips.cc/paper_files/paper/2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html
Wang A, Singh A, Michael J, et al. (2019) GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding. https://doi.org/10.48550/arXiv.1804.07461, arXiv:1804.07461
Wang F, Qi S, Gao G et al (2016) Logo information recognition in large-scale social media data. Multimed Syst 22:63–73
Wang M, Cao D, Li L, et al. (2014) Microblog Sentiment Analysis Based on Cross-media Bag-of-words Model. In: Proceedings of International Conference on Internet Multimedia Computing and Service. Association for Computing Machinery, New York, NY, USA, ICIMCS ’14, pp 76–80, https://doi.org/10.1145/2632856.2632912
Wang Y, Huang M, Zhu X, et al. (2016) Attention-based lstm for aspect-level sentiment classification. In: Proceedings of the 2016 conference on empirical methods in natural language processing, pp 606–615
Wilson T, Hoffmann P, Somasundaran S, et al. (2005) Opinionfinder: A system for subjectivity analysis. In: Proceedings of HLT/EMNLP on Interactive Demonstrations. Association for Computational Linguistics, USA, HLT-Demo ’05, p 34-35, https://doi.org/10.3115/1225733.1225751
Wu Y, Ngai EWT, Wu P et al (2020) Fake online reviews: Literature review, synthesis, and directions for future research. Decision Support Syst 132:113280. https://doi.org/10.1016/j.dss.2020.113280
Xi D, Xu W, Chen R et al (2021) Sending or not? A multimodal framework for Danmaku comment prediction. Inf Process Manag 58(6):102687. https://doi.org/10.1016/j.ipm.2021.102687, https://www.sciencedirect.com/science/article/pii/S0306457321001722
Xiao Y, Codevilla F, Gurram A et al (2022) Multimodal End-to-End Autonomous Driving. IEEE Trans Intell Transportat Syst 23(1):537–547. https://doi.org/10.1109/TITS.2020.3013234, conference Name: IEEE Transactions on Intelligent Transportation Systems
Xu J, Huang F, Zhang X et al (2019) Visual-textual sentiment classification with bi-directional multi-level attention networks. Knowl-Based Syst 178:61–73. https://doi.org/10.1016/j.knosys.2019.04.018, https://www.sciencedirect.com/science/article/pii/S0950705119301911
Xu N (2017) Analyzing multimodal public sentiment based on hierarchical semantic attentional network. In: 2017 IEEE international conference on intelligence and security informatics (ISI), IEEE, pp 152–154
Xu N, Mao W (2017) MultiSentiNet: A Deep Semantic Network for Multimodal Sentiment Analysis. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. Association for Computing Machinery, New York, NY, USA, CIKM ’17, pp 2399–2402, https://doi.org/10.1145/3132847.3133142
Xu N, Mao W, Chen G (2018) A Co-Memory Network for Multimodal Sentiment Analysis. In: The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval. Association for Computing Machinery, New York, NY, USA, SIGIR ’18, pp 929–932, https://doi.org/10.1145/3209978.3210093
Yan X, Huang T (2015) Tibetan sentence sentiment analysis based on the maximum entropy model. 2015 10th International Conference on Broadband and Wireless Computing. Communication and Applications (BWCCA), IEEE, pp 594–597
Yang J, She D, Sun M (2017) Joint Image Emotion Classification and Distribution Learning via Deep Convolutional Neural Network. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence. International Joint Conferences on Artificial Intelligence Organization, Melbourne, Australia, pp 3266–3272, https://doi.org/10.24963/ijcai.2017/456, https://www.ijcai.org/proceedings/2017/456
Yang T, Li Y, Pan Q, et al. (2016) Tb-cnn: joint tree-bank information for sentiment analysis using cnn. In: 2016 35th Chinese Control Conference (CCC), IEEE, pp 7042–7044
Yang X, Feng S, Wang D et al (2020) Image-text multimodal emotion classification via multi-view attentional network. IEEE Trans Multimed 23:4014–4026
You Q, Luo J, Jin H, et al. (2015) Robust Image Sentiment Analysis Using Progressively Trained and Domain Transferred Deep Networks. Proceedings of the AAAI Conference on Artificial Intelligence 29(1). https://doi.org/10.1609/aaai.v29i1.9179, https://ojs.aaai.org/index.php/AAAI/article/view/9179, number: 1
You Q, Jin H, Luo J (2017) Visual Sentiment Analysis by Attending on Local Image Regions. Proceedings of the AAAI Conference on Artificial Intelligence 31(1). https://doi.org/10.1609/aaai.v31i1.10501, https://ojs.aaai.org/index.php/AAAI/article/view/10501, number: 1
Yu Y, Lin H, Meng J et al (2016) Visual and Textual Sentiment Analysis of a Microblog Using Deep Convolutional Neural Networks. Algorithms 9(2):41. https://doi.org/10.3390/a9020041, https://www.mdpi.com/1999-4893/9/2/41, number: 2 Publisher: Multidisciplinary Digital Publishing Institute
Yuan J, Mcdonough S, You Q, et al. (2013) Sentribute: image sentiment analysis from a mid-level perspective. In: Proceedings of the Second International Workshop on Issues of Sentiment Discovery and Opinion Mining. Association for Computing Machinery, New York, NY, USA, WISDOM ’13, pp 1–8, https://doi.org/10.1145/2502069.2502079
Zhao S, Gao Y, Jiang X, et al. (2014) Exploring Principles-of-Art Features For Image Emotion Recognition. In: Proceedings of the 22nd ACM international conference on Multimedia. Association for Computing Machinery, New York, NY, USA, MM ’14, pp 47–56, https://doi.org/10.1145/2647868.2654930
Zhao S, Gao Y, Ding G et al (2018) Real-Time Multimedia Social Event Detection in Microblog. IEEE Trans Cybernet 48(11):3218–3231. https://doi.org/10.1109/TCYB.2017.2762344, conference Name: IEEE Transactions on Cybernetics
Zhao Z, Zhu H, Xue Z et al (2019) An image-text consistency driven multimodal sentiment analysis approach for social media. Inf Process Manag 56(6):102097. https://doi.org/10.1016/j.ipm.2019.102097, https://www.sciencedirect.com/science/article/pii/S0306457319304546
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Chakraborty, D., Rudrapal, D. & Bhattacharya, B. A multimodal sentiment analysis approach for tweets by comprehending co-relations between information modalities. Multimed Tools Appl 83, 50061–50085 (2024). https://doi.org/10.1007/s11042-023-17569-y
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-023-17569-y