Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis

Published: 11 January 2024 Publication History

Abstract

Sentiment and sarcasm are intimate and complex, as sarcasm often deliberately elicits an emotional response in order to achieve its specific purpose. Current challenges in multi-modal sentiment and sarcasm joint detection mainly include multi-modal representation fusion and the modeling of the intrinsic relationship between sentiment and sarcasm. To address these challenges, we propose a single-input stream self-adaptive representation learning model (SRLM) for sentiment and sarcasm joint recognition. Specifically, we divide the image into blocks to learn its serialized features and fuse textual feature as input to the target model. Then, we introduce an adaptive representation learning network using a gated network approach for sarcasm and sentiment classification. In this framework, each task is equipped with its dedicated expert network responsible for learning task-specific information, while the shared expert knowledge is acquired and weighted through the gating network. Finally, comprehensive experiments conducted on two publicly available datasets, namely Memotion and MUStARD, demonstrate the effectiveness of the proposed model when compared to state-of-the-art baselines. The results reveal a notable improvement on the performance of sentiment and sarcasm tasks.

References

[1]
DI Hernández Farias and Paolo Rosso. 2017. Irony, sarcasm, and sentiment analysis. Sentiment Analysis in Social Networks. Elsevier, 113–128.
[2]
Abdulmotaleb El Saddik, Stefan Fischer, and Ralf Steinmetz. 2001. Reusable multimedia content in Web based learning systems. IEEE MultiMedia 8, 3 (2001), 30–38.
[3]
Anam Moin, Farhan Aadil, Zeeshan Ali, and Dongwann Kang. 2023. Emotion recognition framework using multiple modalities for an effective human–computer interaction. The Journal of Supercomputing 79, 8 (2023), 9320–9349.
[4]
Xiaoheng Zhang and Yang Li. 2023. A cross-modality context fusion and semantic refinement network for emotion recognition in conversation. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 13099–13110.
[5]
M. Shamim Hossain and Ghulam Muhammad. 2019. Emotion recognition using deep learning approach from audio-visual emotional big data. Information Fusion 49 (2019), 69–78.
[6]
Yue Deng, Wenxuan Zhang, Sinno Jialin Pan, and Lidong Bing. 2023. Bidirectional generative framework for cross-domain aspect-based sentiment analysis. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics. 12272–12285.
[7]
Yi Liu, Zengwei Zheng, Binbin Zhou, Jianhua Ma, Lin Sun, and Ruichen Xia. 2022. Multimodal sarcasm detection based on multimodal sentiment co-training. In Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta). IEEE, 508–515.
[8]
Changsong Wen, Guoli Jia, and Jufeng Yang. 2023. DIP: Dual incongruity perceiving network for sarcasm detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2540–2550.
[9]
Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing. 704–714.
[10]
Haojie Zhao, Xiao Wang, Dong Wang, Huchuan Lu, and Xiang Ruan. 2023. Transformer vision-language tracking via proxy token guided cross-modal fusion. Pattern Recognition Letters 168 (2023), 10–16.
[11]
Yazhou Zhang, Yang Yu, Dongming Zhao, Zuhe Li, Bo Wang, Yuexian Hou, Prayag Tiwari, and Jing Qin. 2023. Learning multi-task commonness and uniqueness for multi-modal sarcasm detection and sentiment analysis in conversation. IEEE Transactions on Artificial Intelligence, 1, 1 (2023), 1–13.
[12]
Md. Shad Akhtar, Dushyant Singh Chauhan, Deepanway Ghosal, Soujanya Poria, Asif Ekbal, and Pushpak Bhat-tacharyya. 2019. Multi-task learning for multi-modal emotion recognition and sentiment analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 370–379.
[13]
Bo Yang, Lijun Wu, Jinhua Zhu, Bo Shao, Xiaola Lin, and Tie-Yan Liu. 2022. Multimodal sentiment analysis with two-phase multi-task learning. IEEE/ACM Transactions on Audio, Speech, and Language Processing 30 (2022), 2015–2024.
[14]
Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An image is worth \(16\times 16\) words: Transformers for image recognition at scale. In 9th International Conference on Learning Representations (ICLR’21).
[15]
Amani Albraikan, Diana P. Tobón, and Abdulmotaleb El Saddik. 2018. Toward user-independent emotion recognition using physiological signals. IEEE sensors Journal 19, 19 (2018), 8402–8412.
[16]
Amani Albraikan, Basim Hafidh, and Abdulmotaleb El Saddik. 2018. iAware: A real-time emotional biofeedback system based on physiological signals. IEEE Access 6 (2018), 78780–78789.
[17]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference Advances in Neural Information Processing Systems 30 (2017).
[18]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT. 4171–4186.
[19]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770–778.
[20]
Jiaxuan He and Haifeng Hu. 2021. MF-BERT: Multimodal fusion in pre-trained BERT for sentiment analysis. IEEE Signal Processing Letters 29 (2021), 454–458.
[21]
Ayush Kumar and Jithendra Vepa. 2020. Gated mechanism for attention based multi modal sentiment analysis. In Proceedings of the ICASSP 2020-2020 IEEE International Conference on Acoustics, Speech and Signal Processing. 4477–4481. DOI:
[22]
Deepanway Ghosal, Md Shad Akhtar, Dushyant Chauhan, Soujanya Poria, Asif Ekbal, and Pushpak Bhattacharyya. 2018. Contextual inter-modal attention for multi-modal sentiment analysis. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 3454–3466.
[23]
Qiongan Zhang, Lei Shi, Peiyu Liu, Zhenfang Zhu, and Liancheng Xu. 2023. ICDN: Integrating consistency and difference networks by transformer for multimodal sentiment analysis. Applied Intelligence 53, 12 (2023), 16332–16345.
[24]
Junjie Peng, Ting Wu, Wenqiang Zhang, Feng Cheng, Shuhua Tan, Fen Yi, and Yansong Huang. 2023. A fine-grained modal label-based multi-stage network for multimodal sentiment analysis. Expert Systems with Applications 221 (2023), 119721.
[25]
Kyeonghun Kim and Sanghyun Park. 2023. AOBERT: All-modalities-in-One BERT for multimodal sentiment analysis. Information Fusion 92 (2023), 37–45.
[26]
Rossano Schifanella, Paloma De Juan, Joel Tetreault, and Liangliang Cao. 2016. Detecting sarcasm in multimodal social platforms. In Proceedings of the 24th ACM International Conference on Multimedia. 1136–1145.
[27]
Xinyu Wang, Xiaowen Sun, Tan Yang, and Hongbo Wang. 2020. Building a bridge: A method for image-text sarcasm detection without pretraining on image-text data. In Proceedings of the 1st International Workshop on Natural Language Processing Beyond Text. 19–29.
[28]
Ning Ding, Sheng-wei Tian, and Long Yu. 2022. A multimodal fusion method for sarcasm detection based on late fusion. Multimedia Tools and Applications 81, 6 (2022), 8597–8616.
[29]
Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multi-modal sarcasm detection in twitter with hierarchical fusion model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2506–2515.
[30]
Yang Qiao, Liqiang Jing, Xuemeng Song, Xiaolin Chen, Lei Zhu, and Liqiang Nie. 2023. Mutual-enhanced incongruity learning network for multi-modal sarcasm detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 9507–9515.
[31]
Xinkai Lu, Ying Qian, Yan Yang, and Wenrao Pang. 2022. Sarcasm detection of dual multimodal contrastive attention networks. In Proceedings of the 2022 IEEE Smartworld, Ubiquitous Intelligence & Computing, Scalable Computing & Communications, Digital Twin, Privacy Computing, Metaverse, Autonomous & Trusted Vehicles (SmartWorld/UIC/ScalCom/DigitalTwin/PriComp/Meta). IEEE, 1455–1460.
[32]
Bin Liang, Chenwei Lou, Xiang Li, Min Yang, Lin Gui, Yulan He, Wenjie Pei, and Ruifeng Xu. 2022. Multi-modal sarcasm detection via cross-modal graph convolutional network. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vol. 1. Association for Computational Linguistics, 1767–1777.
[33]
Hui Liu, Wenya Wang, and Haoliang Li. 2022. Towards multi-modal sarcasm detection via hierarchical congruity modeling with knowledge enhancement. In 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP’22). 4995–5006.
[34]
Yue Tan, Bo Wang, Anqi Liu, Dongming Zhao, Kun Huang, Ruifang He, and Yuexian Hou. 2023. Guiding dialogue agents to complex semantic targets by dynamically completing knowledge graph. In Findings of the Association for Computational Linguistics: ACL 2023. 6506–6518.
[35]
Navonil Majumder, Soujanya Poria, Haiyun Peng, Niyati Chhaya, Erik Cambria, and Alexander Gelbukh. 2019. Sentiment and sarcasm classification with multitask learning. IEEE Intelligent Systems 34, 3 (2019), 38–43.
[36]
Oxana Vitman, Yevhen Kostiuk, Grigori Sidorov, and Alexander Gelbukh. 2023. Sarcasm detection framework using context, emotion and sentiment features. Expert Systems with Applications 234 (2023), 121068.
[37]
Chunyan Yin, Yongheng Chen, and Wanli Zuo. 2021. Multi-task deep neural networks for joint sarcasm detection and sentiment analysis. Pattern Recognition and Image Analysis 31 (2021), 103–108.
[38]
Dushyant Singh Chauhan, S.R. Dhanush, Asif Ekbal, and Pushpak Bhattacharyya. 2020. Sentiment and emotion help sarcasm? A multi-task learning framework for multi-modal sarcasm, sentiment and emotion analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4351–4360.
[39]
Yaochen Liu, Yazhou Zhang, and Dawei Song. 2023. A quantum probability driven framework for joint multi-modal sarcasm, sentiment and emotion analysis. IEEE Transactions on Affective Computing 1 (2023), 1–15.
[40]
M. Shamim Hossain and Ghulam Muhammad. 2018. Emotion-aware connected healthcare big data towards 5g. IEEE Internet of Things Journal 5, 4 (2018), 2399–2406.
[41]
Chhavi Sharma, William Paka, Deepesh Bhageria Scott, Amitava Das, Soujanya Poria, Tanmoy Chakraborty, and Björn Gambäck. 2020. Task report: Memotion analysis 1.0@ semeval 2020: The visuo-lingual metaphor. In Proceedings of the 14th International Workshop on Semantic Evaluation, Sep. Association for Computational Linguistics.
[42]
Aurora Linh Cramer, Ho-Hsiang Wu, Justin Salamon, and Juan Pablo Bello. 2019. Look, listen, and learn more: Design choices for deep audio embeddings. In Proceedings of the ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing. IEEE, 3852–3856.
[43]
Rolandos Alexandros Potamias, Georgios Siolas, and Andreas-Georgios Stafylopatis. 2020. A transformer-based approach to irony and sarcasm detection. Neural Computing and Applications 32 (2020), 17309–17320.
[44]
Mingxing Tan and Quoc Le. 2019. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning. PMLR, 6105–6114.
[45]
Amir Zadeh, Minghai Chen, Soujanya Poria, Erik Cambria, and Louis-Philippe Morency. 2017. Tensor fusion network for multimodal sentiment analysis. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 1103–1114.
[46]
Shraman Pramanick, Aniket Roy, and Vishal M. Patel. 2022. Multimodal learning using optimal transport for sarcasm and humor detection. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. 3930–3940.
[47]
Yao-Hung Hubert Tsai, Shaojie Bai, Paul Pu Liang, J. Zico Kolter, Louis-Philippe Morency, and Ruslan Salakhutdinov. 2019. Multimodal transformer for unaligned multimodal language sequences. In Proceedings of the Conference Association for Computational Linguistics. Meeting, Vol. 2019. NIH Public Access, 6558.
[48]
Yinwei Wei, Xiang Wang, Liqiang Nie, Xiangnan He, Richang Hong, and Tat-Seng Chua. 2019. MMGCN: Multi-modal graph convolution network for personalized recommendation of micro-video. In Proceedings of the 27th ACM International Conference on Multimedia. 1437–1445.
[49]
Yazhou Zhang, Ao Jia, Bo Wang, Peng Zhang, Dongming Zhao, Pu Li, Yuexian Hou, Xiaojia Jin, Dawei Song, and Jing Qin. 2023. M3gat: A multi-modal, multi-task interactive graph attention network for conversational sentiment analysis and emotion recognition. ACM Transactions on Information Systems, 42, 1 (2023), 1–32.
[50]
Iti Chaturvedi, Chit Lin Su, and Roy E. Welsch. 2021. Fuzzy aggregated topology evolution for cognitive multi-tasks. Cognitive Computation 13 (2021), 96–107.
[51]
George-Alexandru Vlad, George-Eduard Zaharia, Dumitru-Clementin Cercel, Costin Chiru, and Stefan Trausan Matu. 2020. Upb at semeval-2020 task 8: Joint textual and visual modeling in a multi-task learning architecture for memotion analysis. In Proceedings of the Fourteenth Workshop on Semantic Evaluation. 1208–1214.

Cited By

View all
  • (2025)Supposititious Sarcasm Detection and Sentiment Analysis Coping Hindi Language in Social Networks Harnessing Zipf- Mandelbrot Probabilistic Optimisation and Perplexity Entropy LearningACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3712061Online publication date: 16-Jan-2025
  • (2024)Textual Context guided Vision Transformer with Rotated Multi-Head Attention for Sentiment AnalysisCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651968(1823-1830)Online publication date: 13-May-2024

Index Terms

  1. Self-Adaptive Representation Learning Model for Multi-Modal Sentiment and Sarcasm Joint Analysis

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Transactions on Multimedia Computing, Communications, and Applications
        ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 20, Issue 5
        May 2024
        650 pages
        EISSN:1551-6865
        DOI:10.1145/3613634
        • Editor:
        • Abdulmotaleb El Saddik
        Issue’s Table of Contents

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 11 January 2024
        Online AM: 05 December 2023
        Accepted: 27 November 2023
        Revised: 23 November 2023
        Received: 27 September 2023
        Published in TOMM Volume 20, Issue 5

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. Multi-modal sentiment analysis
        2. sarcasm detection
        3. representation learning
        4. multi-task learning

        Qualifiers

        • Research-article

        Funding Sources

        • Researchers Supporting Project
        • King Saud University, Riyadh, Saudi Arabia
        • Foundation of Key Laboratory of Dependable Service Computing in Cyber-Physical-Society (Ministry of Education), Chongqing University
        • National Science Foundation of China
        • Fellowship from the China Postdoctoral Science Foundation

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)409
        • Downloads (Last 6 weeks)38
        Reflects downloads up to 24 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2025)Supposititious Sarcasm Detection and Sentiment Analysis Coping Hindi Language in Social Networks Harnessing Zipf- Mandelbrot Probabilistic Optimisation and Perplexity Entropy LearningACM Transactions on Asian and Low-Resource Language Information Processing10.1145/3712061Online publication date: 16-Jan-2025
        • (2024)Textual Context guided Vision Transformer with Rotated Multi-Head Attention for Sentiment AnalysisCompanion Proceedings of the ACM Web Conference 202410.1145/3589335.3651968(1823-1830)Online publication date: 13-May-2024

        View Options

        Login options

        Full Access

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Full Text

        View this article in Full Text.

        Full Text

        Figures

        Tables

        Media

        Share

        Share

        Share this Publication link

        Share on social media