research-article

Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs

Authors: Bin Liang, Chenwei Lou, Xiang Li, Lin Gui, Min Yang, Ruifeng XuAuthors Info & Claims

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

Pages 4707 - 4715

https://doi.org/10.1145/3474085.3475190

Published: 17 October 2021 Publication History

Abstract

Sarcasm is a peculiar form and sophisticated linguistic act to express the incongruity of someone's implied sentiment expression, which is a pervasive phenomenon in social media platforms. Compared with sarcasm detection purely on texts, multi-modal sarcasm detection is more adapted to the rapidly growing social media platforms, where people are interested in creating multi-modal messages. When focusing on the multi-modal sarcasm detection for tweets consisting of texts and images on Twitter, the significant clue of improving the performance of multi-modal sarcasm detection evolves into how to determine the incongruity relations between texts and images. In this paper, we investigate multi-modal sarcasm detection from a novel perspective, so as to determine the sentiment inconsistencies within a certain modality and across different modalities by constructing heterogeneous in-modal and cross-modal graphs (InCrossMGs) for each multi-modal example. Based on it, we explore an interactive graph convolution network (GCN) structure to jointly and interactively learn the incongruity relations of in-modal and cross-modal graphs for determining the significant clues in sarcasm detection. Experimental results demonstrate that our proposed model achieves state-of-the-art performance in multi-modal sarcasm detection.

References

[1]

Nastaran Babanejad, Heidar Davoudi, Aijun An, and Manos Papagelis. 2020. Affective and Contextual Embedding for Sarcasm Detection. In Proceedings of the 28th International Conference on Computational Linguistics. 225--243.

[2]

Alexei Baevski and Michael Auli. 2018. Adaptive Input Representations for Neural Language Modeling. In International Conference on Learning Representations .

[3]

David Bamman and Noah Smith. 2015. Contextualized sarcasm detection on twitter. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 9.

[4]

Yitao Cai, Huiyu Cai, and Xiaojun Wan. 2019. Multi-Modal Sarcasm Detection in Twitter with Hierarchical Fusion Model. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2506--2515.

[5]

Santiago Castro, Devamanyu Hazarika, Verónica Pérez-Rosas, Roger Zimmermann, Rada Mihalcea, and Soujanya Poria. 2019. Towards Multimodal Sarcasm Detection (An _Obviously_ Perfect Paper). In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 4619--4629.

[6]

Dmitry Davidov, Oren Tsur, and Ari Rappoport. 2010. Semi-Supervised Recognition of Sarcasm in Twitter and Amazon. In Proceedings of the Fourteenth Conference on Computational Natural Language Learning. 107--116.

Digital Library

[7]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 4171--4186.

[8]

Shelly Dews and Ellen Winner. 1995. Muting the meaning a social function of irony. Metaphor and Symbol, Vol. 10, 1 (1995), 3--19.

[9]

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, Jakob Uszkoreit, and Neil Houlsby. 2021. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. In International Conference on Learning Representations. https://openreview.net/forum?id=YicbFdNTTy

[10]

Kunihiko Fukushima and Sei Miyake. 1982. Neocognitron: A self-organizing neural network model for a mechanism of visual pattern recognition. In Competition and cooperation in neural nets. Springer, 267--285.

[11]

Raymond W Gibbs. 1986. On the psycholinguistics of sarcasm. Journal of experimental psychology: general, Vol. 115, 1 (1986), 3.

[12]

Raymond W Gibbs. 2007. On the psycholinguistics of sarcasm. Irony in language and thougt: A cognitive science reader (2007), 173--200.

[13]

Roberto González-Ibánez, Smaranda Muresan, and Nina Wacholder. 2011. Identifying sarcasm in Twitter: a closer look. In Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies. 581--586.

Digital Library

[14]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep Residual Learning for Image Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770--778.

[15]

Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural computation, Vol. 9, 8 (1997), 1735--1780.

Digital Library

[16]

Amit Kumar Jena, Aman Sinha, and Rohit Agarwal. 2020. C-net: Contextual network for sarcasm detection. In Proceedings of the Second Workshop on Figurative Language Processing. 61--66.

[17]

Aditya Joshi, Vinita Sharma, and Pushpak Bhattacharyya. 2015a. Harnessing Context Incongruity for Sarcasm Detection. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 757--762.

[18]

Aditya Joshi, Vinita Sharma, and Pushpak Bhattacharyya. 2015b. Harnessing context incongruity for sarcasm detection. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 2: Short Papers). 757--762.

[19]

Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1746--1751.

[20]

Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In Proceedings of the 28th International Conference on Computational Linguistics .

[21]

Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. 1989. Backpropagation applied to handwritten zip code recognition. Neural computation, Vol. 1, 4 (1989), 541--551.

Digital Library

[22]

Bin Liang, Rongdi Yin, Lin Gui, Jiachen Du, and Ruifeng Xu. 2020. Jointly Learning Aspect-Focused and Inter-Aspect Relations with Graph Convolutional Networks for Aspect Sentiment Analysis. In Proceedings of the 28th International Conference on Computational Linguistics. 150--161.

[23]

Edwin Lunando and Ayu Purwarianti. 2013. Indonesian social media sentiment analysis with sarcasm detection. In 2013 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, 195--198.

[24]

Hongliang Pan, Zheng Lin, Peng Fu, Yatao Qi, and Weiping Wang. 2020. Modeling Intra and Inter-modality Incongruity for Multi-Modal Sarcasm Detection. In Findings of the Association for Computational Linguistics: EMNLP 2020. 1383--1392.

[25]

Bo Pang and Lillian Lee. 2008. Opinion Mining and Sentiment Analysis. Information Retrieval, Vol. 2, 1--2 (2008), 1--135.

Digital Library

[26]

Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1532--1543.

[27]

Soujanya Poria, Erik Cambria, Devamanyu Hazarika, and Prateek Vij. 2016. A Deeper Look into Sarcastic Tweets Using Deep Convolutional Neural Networks. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers. 1601--1612.

[28]

Ellen Riloff, Ashequl Qadir, Prafulla Surve, Lalindra De Silva, Nathan Gilbert, and Ruihong Huang. 2013. Sarcasm as contrast between a positive sentiment and negative situation. In Proceedings of the 2013 conference on empirical methods in natural language processing. 704--714.

[29]

Rossano Schifanella, Paloma de Juan, Joel Tetreault, and Liangliang Cao. 2016. Detecting sarcasm in multimodal social platforms. In Proceedings of the 24th ACM international conference on Multimedia. 1136--1145.

Digital Library

[30]

Mohammad Soleymani, David Garcia, Brendan Jou, Björn Schuller, Shih-Fu Chang, and Maja Pantic. 2017. A survey of multimodal sentiment analysis. Image and Vision Computing, Vol. 65 (2017), 3--14.

[31]

Yi Tay, Anh Tuan Luu, Siu Cheung Hui, and Jian Su. 2018. Reasoning with Sarcasm by Reading In-Between. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1010--1020.

[32]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems. 6000--6010.

Digital Library

[33]

Qiang Wang, Bei Li, Tong Xiao, Jingbo Zhu, Changliang Li, Derek F Wong, and Lidia S Chao. 2019. Learning Deep Transformer Models for Machine Translation. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 1810--1822.

[34]

Tao Xiong, Peiran Zhang, Hongbo Zhu, and Yihui Yang. 2019. Sarcasm detection with self-matching networks and low-rank bilinear pooling. In The World Wide Web Conference. 2115--2124.

Digital Library

[35]

Nan Xu, Zhixiong Zeng, and Wenji Mao. 2020. Reasoning with Multimodal Sarcastic Tweets via Modeling Cross-Modality Contrast and Semantic Association. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 3777--3786.

[36]

Chen Zhang, Qiuchi Li, and Dawei Song. 2019 a. Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). 4567--4577.

[37]

Dong Zhang, Shoushan Li, Qiaoming Zhu, and Guodong Zhou. 2019 b. Effective sentiment-relevant word selection for multi-modal sentiment analysis in spoken language. In Proceedings of the 27th ACM International Conference on Multimedia. 148--156.

Digital Library

[38]

Dong Zhang, Suzhong Wei, Shoushan Li, Hanqian Wu, Qiaoming Zhu, and Guodong Zhou. 2021. Multi-modal Graph Fusion for Named Entity Recognition with Targeted Visual Guidance. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 35. 14347--14355.

[39]

Meishan Zhang, Yue Zhang, and Guohong Fu. 2016. Tweet sarcasm detection using deep neural network. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: technical papers. 2449--2460.

Cited By

Pokhriyal HJain G(2024)Sarcasm DetectionHarnessing Artificial Emotional Intelligence for Improved Human-Computer Interactions10.4018/979-8-3693-2794-4.ch012(197-221)Online publication date: 6-Jun-2024
https://doi.org/10.4018/979-8-3693-2794-4.ch012
Zhong WZhang ZWu QXue YCai Q(2024)A Semantic Enhancement Framework for Multimodal Sarcasm DetectionMathematics10.3390/math1202031712:2(317)Online publication date: 18-Jan-2024
https://doi.org/10.3390/math12020317
Fu HLiu HWang HXu LLin JJiang D(2024)Multi-Modal Sarcasm Detection with Sentiment Word EmbeddingElectronics10.3390/electronics1305085513:5(855)Online publication date: 23-Feb-2024
https://doi.org/10.3390/electronics13050855
Show More Cited By

Index Terms

Multi-Modal Sarcasm Detection with Interactive In-Modal and Cross-Modal Graphs
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
      2. Sentiment analysis
  2. Information systems applications
    1. Multimedia information systems

Recommendations

Cross-modal incongruity aligning and collaborating for multi-modal sarcasm detection
Abstract
Sarcasm embodies a linguistic phenomenon that highlights a significant incongruity between the literal meanings of words and intended attitudes. With the proliferation of image–text content on social media, the task of multi-modal sarcasm ...
Highlights
- Proposing a multi-modal mutual learning network for multi-modal sarcasm detection.
- Designing a novel paradigm of align-fuse-collaborate for improving cross-modal fusion.
- Achieving the state-of-the-art results on a public dataset ...
Multi-modal Sarcasm Detection Based on Contrastive Attention Mechanism
Natural Language Processing and Chinese Computing
Abstract
In the past decade, sarcasm detection has been intensively conducted in a textual scenario. With the popularization of video communication, the analysis in multi-modal scenarios has received much attention in recent years. Therefore, multi-modal ...
Modeling Multi-Task Joint Training of Aggregate Networks for Multi-Modal Sarcasm Detection
ICMR '24: Proceedings of the 2024 International Conference on Multimedia Retrieval

With the continuous emergence of various types of social media, which people often use to express their emotions in daily life, the multi-modal sarcasm detection (MSD) task has attracted more and more attention. However, due to the unique nature of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MM '21: Proceedings of the 29th ACM International Conference on Multimedia

October 2021

5796 pages

ISBN:9781450386517

DOI:10.1145/3474085

General Chairs:
Heng Tao Shen
University of Electronic Science&Technology of China, China
,
Yueting Zhuang
Zhejiang University, China
,
John R. Smith
IBM, USA
,
Program Chairs:
Yang Yang
University of Electronic Science and Technology of China, China
,
Pablo Cesar
CWI&TU Delft, The Netherlands
,
Florian Metze
FACEBOOK, Inc., USA
,
Balakrishnan Prabhakaran
University of Texas at Dallas, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMM: ACM Special Interest Group on Multimedia

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 October 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

UK EPSRC
National Natural Science Foundation of China
Youth Innovation Promotion Association of CAS China
Guangdong Province Covid-19 Pandemic Control Research Funding
Shenzhen Foundational Research Funding
China Postdoctoral Science Foundation
Shenzhen Science and Technology Innovation Program

Conference

MM '21

Sponsor:

SIGMM

MM '21: ACM Multimedia Conference

October 20 - 24, 2021

Virtual Event, China

Acceptance Rates

Overall Acceptance Rate 995 of 4,171 submissions, 24%

Upcoming Conference

MM '24

Sponsor:
sigmm

The 32nd ACM International Conference on Multimedia

October 28 - November 1, 2024

Melbourne , VIC , Australia

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

34
Total Citations
View Citations
1,139
Total Downloads

Downloads (Last 12 months)275
Downloads (Last 6 weeks)23

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Pokhriyal HJain G(2024)Sarcasm DetectionHarnessing Artificial Emotional Intelligence for Improved Human-Computer Interactions10.4018/979-8-3693-2794-4.ch012(197-221)Online publication date: 6-Jun-2024
https://doi.org/10.4018/979-8-3693-2794-4.ch012
Zhong WZhang ZWu QXue YCai Q(2024)A Semantic Enhancement Framework for Multimodal Sarcasm DetectionMathematics10.3390/math1202031712:2(317)Online publication date: 18-Jan-2024
https://doi.org/10.3390/math12020317
Fu HLiu HWang HXu LLin JJiang D(2024)Multi-Modal Sarcasm Detection with Sentiment Word EmbeddingElectronics10.3390/electronics1305085513:5(855)Online publication date: 23-Feb-2024
https://doi.org/10.3390/electronics13050855
Liu HYang BYu Z(2024)A Multi-View Interactive Approach for Multimodal Sarcasm Detection in Social Internet of Things with Knowledge EnhancementApplied Sciences10.3390/app1405214614:5(2146)Online publication date: 4-Mar-2024
https://doi.org/10.3390/app14052146
Ou LLi ZGurrin CKongkachandra RSchoeffmann KDang-Nguyen DRossetto LSatoh SZhou L(2024)Modeling Multi-Task Joint Training of Aggregate Networks for Multi-Modal Sarcasm DetectionProceedings of the 2024 International Conference on Multimedia Retrieval10.1145/3652583.3658015(833-841)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3652583.3658015
Zhang YYu YZhao DLi ZWang BHou YTiwari PQin J(2024)Learning Multitask Commonness and Uniqueness for Multimodal Sarcasm Detection and Sentiment Analysis in ConversationIEEE Transactions on Artificial Intelligence10.1109/TAI.2023.32983285:3(1349-1361)Online publication date: Mar-2024
https://doi.org/10.1109/TAI.2023.3298328
Kang LLiu JYe DZhou Z(2024)Context-Aware Dual Attention Network for Multimodal Sarcasm DetectionICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10448377(12777-12781)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10448377
Ren YWang JLiu JLiu PLi HZhu HSun L(2024)A Relation-Aware Heterogeneous Graph Transformer on Dynamic Fusion for Multimodal Classification TasksICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)10.1109/ICASSP48485.2024.10446972(7855-7859)Online publication date: 14-Apr-2024
https://doi.org/10.1109/ICASSP48485.2024.10446972
Al-Sabri RGao JChen JOloulade BWu Z(2024)AutoAMS: Automated attention-based multi-modal graph learning architecture searchNeural Networks10.1016/j.neunet.2024.106427179(106427)Online publication date: Nov-2024
https://doi.org/10.1016/j.neunet.2024.106427
Yu BWang HXi Z(2024)Multifaceted and deep semantic alignment network for multimodal sarcasm detectionKnowledge-Based Systems10.1016/j.knosys.2024.112298301(112298)Online publication date: Oct-2024
https://doi.org/10.1016/j.knosys.2024.112298
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents