research-article

Open access

Contrastive Learning for Multimodal Classification of Crisis related Tweets

Authors:

Bishwas Mandal,

Sarthak Khanal,

Doina CarageaAuthors Info & Claims

WWW '24: Proceedings of the ACM Web Conference 2024

Pages 4555 - 4564

https://doi.org/10.1145/3589334.3648143

Published: 13 May 2024 Publication History

Abstract

Multimodal tasks require learning a joint representation of the constituent modalities of data. Contrastive learning learns a joint representation by using a contrastive loss. For example, CLIP takes as input image-caption pairs and is trained to maximize the similarity between an image and its corresponding caption in actual image-caption pairs, while minimizing the similarity for arbitrary image-caption pairs. This approach operates on the premise that the caption depicts the image's content. However, this assumption does not always hold true for tweets that contain both text and images. Previous studies have indicated that the connection between the image and the text in a tweet is more intricate and complex. We study the effectiveness of pre-trained multimodal contrastive learning models, specifically, CLIP, and ALIGN, on the task of classifying multimodal crisis related tweets. Our experiments using two publicly available datasets, CrisisMMD and DMD, show that despite the intricate relationships in tweets, pre-trained contrastive learning models fine-tuned with task-specific data produce better results than prior approaches used for the multimodal classification of crisis related tweets. Additionally, the experiments show that the contrastive learning models are effective in low-data few-shot and cross-domain settings.

Supplemental Material

MP4 File

Supplemental video

Download
118.60 MB

References

[1]

Mahdi Abavisani, Liwei Wu, Shengli Hu, Joel Tetreault, and Alejandro Jaimes. 2020. Multimodal categorization of crisis events in social media. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 14679--14689.

[2]

Omar Adjali, Romaric Besancc on, Olivier Ferret, Herve Le Borgne, and Brigitte Grau. 2020. Multimodal entity linking for tweets. In European Conference on Information Retrieval. Springer, 463--478.

Digital Library

[3]

Firoj Alam, Ferda Ofli, and Muhammad Imran. 2018. CrisisMMD: Multimodal Twitter Datasets from Natural Disasters. In Proceedings of the Twelfth International Conference on Web and Social Media, ICWSM 2018, Stanford, California, USA, June 25--28, 2018. AAAI Press, 465--473.

[4]

Stanislaw Antol, Aishwarya Agrawal, Jiasen Lu, Margaret Mitchell, Dhruv Batra, C Lawrence Zitnick, and Devi Parikh. 2015. Vqa: Visual question answering. In Proceedings of the IEEE international conference on computer vision. 2425--2433.

Digital Library

[5]

Raffaella Bernardi, Ruket Cakici, Desmond Elliott, Aykut Erdem, Erkut Erdem, Nazli Ikizler-Cinbis, Frank Keller, Adrian Muscat, and Barbara Plank. 2016. Automatic description generation from images: A survey of models, datasets, and evaluation measures. Journal of Artificial Intelligence Research, Vol. 55 (2016), 409--442.

Digital Library

[6]

David Berthelot, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Kihyuk Sohn, Han Zhang, and Colin Raffel. 2020. ReMixMatch: Semi-Supervised Learning with Distribution Alignment and Augmentation Anchoring. arxiv: 1911.09785 [cs.LG]

[7]

Giscard Biamby, Grace Luo, Trevor Darrell, and Anna Rohrbach. 2021. Twitter-COMMs: Detecting Climate, COVID, and Military Multimodal Misinformation. arXiv preprint arXiv:2112.08594 (2021).

[8]

Romain Bielawski, Benjamin Devillers, Tim Van De Cruys, and Rufin Vanrullen. 2022. When does CLIP generalize better than unimodal models? When judging human-centric concepts. In Proceedings of the 7th Workshop on Representation Learning for NLP. Association for Computational Linguistics, Dublin, Ireland, 29--38. https://doi.org/10.18653/v1/2022.repl4nlp-1.4

[9]

Minwoo Byeon, Beomhee Park, Haecheon Kim, Sungjun Lee, Woonhyuk Baek, and Saehoon Kim. 2022. COYO-700M: Image-Text Pair Dataset. https://github.com/kakaobrain/coyo-dataset.

[10]

Hongyun Cai, Yang Yang, Xuefei Li, and Zi Huang. 2015. What are popular: exploring twitter features for event detection, tracking and visualization. In Proceedings of the 23rd ACM international conference on Multimedia. 89--98.

Digital Library

[11]

Qi Chen, Wei Wang, Kaizhu Huang, Suparna De, and Frans Coenen. 2020. Multi-modal adversarial training for crisis-related data classification on social media. In 2020 IEEE International Conference on Smart Computing (SMARTCOMP). IEEE, 232--237.

[12]

Shizhe Chen and Qin Jin. 2015. Multi-modal dimensional emotion recognition using recurrent neural networks. In Proceedings of the 5th International Workshop on Audio/Visual Emotion Challenge. 49--56.

Digital Library

[13]

Ekin D. Cubuk, Barret Zoph, Jonathon Shlens, and Quoc V. Le. 2019. RandAugment: Practical automated data augmentation with a reduced search space. arxiv: 1909.13719 [cs.CV]

[14]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248--255.

[15]

Sergey Edunov, Myle Ott, Michael Auli, and David Grangier. 2018. Understanding Back-Translation at Scale. https://doi.org/10.48550/ARXIV.1808.09381

[16]

Akash Kumar Gautam, Luv Misra, Ajit Kumar, Kush Misra, Shashwat Aggarwal, and Rajiv Ratn Shah. 2019. Multimodal analysis of disaster tweets. In 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). IEEE, 94--103.

[17]

Kyle Glandt, Sarthak Khanal, Yingjie Li, Doina Caragea, and Cornelia Caragea. 2021. Stance detection in COVID-19 tweets. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers). 1596--1611.

[18]

Tao Gui, Liang Zhu, Qi Zhang, Minlong Peng, Xu Zhou, Keyu Ding, and Zhigang Chen. 2019. Cooperative multimodal approach to depression detection in twitter. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 110--117.

Digital Library

[19]

Tsun hin Cheung and Kin man Lam. 2022. Crossmodal bipolar attention for multimodal classification on social media. Neurocomputing, Vol. 514 (2022), 1--12. https://doi.org/10.1016/j.neucom.2022.09.140

Digital Library

[20]

Eftekhar Hossain, Mohammed Moshiul Hoque, Enamul Hoque, and Md. Saiful Islam. 2022. A Deep Attentive Multimodal Learning Approach for Disaster Identification From Social Media Posts. IEEE Access, Vol. 10 (2022), 46538--46551. https://doi.org/10.1109/ACCESS.2022.3170897

[21]

Kyle Hunt, Puneet Agarwal, and Jun Zhuang. 2022. Monitoring misinformation on Twitter during crisis events: a machine learning approach. Risk analysis, Vol. 42, 8 (2022), 1728--1748.

[22]

Kyle Hunt, Bairong Wang, and Jun Zhuang. 2020. Misinformation debunking and cross-platform information sharing through Twitter during Hurricanes Harvey and Irma: a case study on shelters and ID checks. Natural Hazards, Vol. 103, 1 (2020), 861--883.

[23]

Chao Jia, Yinfei Yang, Ye Xia, Yi-Ting Chen, Zarana Parekh, Hieu Pham, Quoc V. Le, Yunhsuan Sung, Zhen Li, and Tom Duerig. 2021. Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision. arxiv: 2102.05918 [cs.CV]

[24]

Keumhee Kang, Chanhee Yoon, and Eun Yi Kim. 2016. Identifying depressive users in Twitter using multimodal analysis. In 2016 International Conference on Big Data and Smart Computing (BigComp). IEEE, 231--238.

Digital Library

[25]

Prannay Khosla, Piotr Teterwak, Chen Wang, Aaron Sarna, Yonglong Tian, Phillip Isola, Aaron Maschinot, Ce Liu, and Dilip Krishnan. 2020. Supervised Contrastive Learning. In Advances in Neural Information Processing Systems, H. Larochelle, M. Ranzato, R. Hadsell, M.F. Balcan, and H. Lin (Eds.), Vol. 33. Curran Associates, Inc., 18661--18673. https://proceedings.neurips.cc/paper_files/paper/2020/file/d89a66c7c80a29b1bdbab0f2a1a94af8-Paper.pdf

[26]

Patrycja Krawczuk, Shubham Nagarkar, and Ewa Deelman. 2021. CrisisFlow: Multimodal Representation Learning Workflow for Crisis Computing. In 2021 IEEE 17th International Conference on eScience (eScience). 264--266. https://doi.org/10.1109/eScience51609.2021.00052

[27]

Akshi Kumar and Geetanjali Garg. 2019. Sentiment analysis of multimodal twitter data. Multimedia Tools and Applications, Vol. 78, 17 (2019), 24103--24119.

Digital Library

[28]

Yi Li, Hualiang Wang, Yiqun Duan, and Xiaomeng Li. 2023. CLIP Surgery for Better Explainability with Enhancement in Open-Vocabulary Tasks. arxiv: 2304.05653 [cs.CV]

[29]

Sreenivasulu Madichetty et al. 2020. Classifying informative and non-informative tweets from the twitter by adapting image features during disaster. Multimedia Tools and Applications, Vol. 79, 39 (2020), 28901--28923.

Digital Library

[30]

Sreenivasulu Madichetty, Sridevi Muthukumarasamy, and P Jayadev. 2021. Multi-modal classification of Twitter data during disasters for humanitarian response. Journal of Ambient Intelligence and Humanized Computing, Vol. 12, 11 (2021), 10223--10237.

[31]

Hussein Mozannar, Yara Rizk, and Mariette Awad. 2018. Damage Identification in Social Media Posts using Multimodal Deep Learning. In International Conference on Information Systems for Crisis Response and Management. https://api.semanticscholar.org/CorpusID:78090471

[32]

Ganesh Nalluru, Rahul Pandey, and Hemant Purohit. 2019. Relevancy classification of multimodal social media streams for emergency services. In 2019 IEEE International Conference on Smart Computing (SMARTCOMP). IEEE, 121--125.

[33]

Jiquan Ngiam, Aditya Khosla, Mingyu Kim, Juhan Nam, Honglak Lee, and Andrew Y. Ng. 2011. Multimodal Deep Learning. In ICML. 689--696. https://icml.cc/2011/papers/399_icmlpaper.pdf

Digital Library

[34]

Ferda Ofli, Firoj Alam, and Muhammad Imran. 2020. Analysis of social media data using multimodal deep learning for disaster response. arXiv preprint arXiv:2004.11838 (2020).

[35]

Raj Pranesh. 2022. Exploring Multimodal Features and Fusion Strategies for Analyzing Disaster Tweets. In Proceedings of the Eighth Workshop on Noisy User-generated Text (W-NUT 2022). Association for Computational Linguistics, Gyeongju, Republic of Korea, 62--68. https://aclanthology.org/2022.wnut-1.6

[36]

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. 2021. Learning transferable visual models from natural language supervision. In International Conference on Machine Learning. PMLR, 8748--8763.

[37]

Mariham Rezk, Noureldin Elmadany, Radwa K. Hamad, and Ehab F. Badran. 2023. Categorizing Crises From Social Media Feeds via Multimodal Channel Attention. IEEE Access, Vol. 11 (2023), 72037--72049. https://doi.org/10.1109/ACCESS.2023.3294474

[38]

Ramin Safa, Peyman Bayat, and Leila Moghtader. 2022. Automatic detection of depression symptoms in twitter using multimodal analysis. The Journal of Supercomputing, Vol. 78, 4 (2022), 4709--4744.

Digital Library

[39]

Victor Sanh, Lysandre Debut, Julien Chaumond, and Thomas Wolf. 2020. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arxiv: 1910.01108 [cs.CL]

[40]

Debaditya Shome and Tejaswini Kar. 2021. ConOffense: Multi-modal multitask Contrastive learning for offensive content identification. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, 4524--4529.

[41]

Iustin Sirbu, Tiberiu Sosea, Cornelia Caragea, Doina Caragea, and Traian Rebedea. 2022. Multimodal Semi-supervised Learning for Disaster Tweet Classification. In Proceedings of the 29th International Conference on Computational Linguistics. International Committee on Computational Linguistics, Gyeongju, Republic of Korea, 2711--2723. https://aclanthology.org/2022.coling-1.239

[42]

Richard Socher, Milind Ganjoo, Christopher D Manning, and Andrew Ng. 2013. Zero-Shot Learning Through Cross-Modal Transfer. In Advances in Neural Information Processing Systems, C.J. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K.Q. Weinberger (Eds.), Vol. 26. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2013/file/2d6cc4b2d139a53512fb8cbb3086ae2e-Paper.pdf

[43]

Kihyuk Sohn, David Berthelot, Chun-Liang Li, Zizhao Zhang, Nicholas Carlini, Ekin D. Cubuk, Alex Kurakin, Han Zhang, and Colin Raffel. 2020. FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. https://doi.org/10.48550/ARXIV.2001.07685

[44]

Tiberiu Sosea, Iustin Sirbu, Cornelia Caragea, Doina Caragea, and Traian Rebedea. 2021. Using the Image-Text Relationship to Improve Multimodal Disaster Tweet Classification. In The 18th International Conference on Information Systems for Crisis Response and Management (ISCRAM 2021).

[45]

Kevin Stowe, Michael Paul, Martha Palmer, Leysia Palen, and Kenneth M Anderson. 2016. Identifying and categorizing disaster-related tweets. In Proceedings of The fourth international workshop on natural language processing for social media. 1--6.

[46]

Hien To, Sumeet Agrawal, Seon Ho Kim, and Cyrus Shahabi. 2017. On identifying disaster-related tweets: Matching-based or learning-based?. In 2017 IEEE third international conference on multimedia big data (BigMM). IEEE, 330--337.

[47]

Laurens van der Maaten and Geoffrey Hinton. 2008. Visualizing Data using t-SNE. Journal of Machine Learning Research, Vol. 9, 86 (2008), 2579--2605. http://jmlr.org/papers/v9/vandermaaten08a.html

[48]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in neural information processing systems. 5998--6008.

[49]

Alakananda Vempala and Daniel Preoct iuc-Pietro. 2019. Categorizing and inferring the relationship between the text and image of twitter posts. In Proceedings of the 57th annual meeting of the Association for Computational Linguistics. 2830--2840.

[50]

Jiquan Wang, Lin Sun, Yi Liu, Meizhi Shao, and Zengwei Zheng. 2022. Multimodal Sarcasm Target Identification in Tweets. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 8164--8175.

[51]

Yuxi Wang, Martin McKee, Aleksandra Torbica, and David Stuckler. 2019. Systematic Literature Review on the Spread of Health-related Misinformation on Social Media. Social Science & Medicine, Vol. 240 (2019), 112552. https://doi.org/10.1016/j.socscimed.2019.112552

[52]

Jason Wei and Kai Zou. 2019. EDA: Easy Data Augmentation Techniques for Boosting Performance on Text Classification Tasks. https://doi.org/10.48550/ARXIV.1901.11196

[53]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. Association for Computational Linguistics, Online, 38--45. https://www.aclweb.org/anthology/2020.emnlp-demos.6

[54]

Martin Wöllmer, Angeliki Metallinou, Florian Eyben, Björn Schuller, and Shrikanth Narayanan. 2010. Context-sensitive multimodal emotion recognition from speech and facial expression using bidirectional lstm modeling. In Proc. INTERSPEECH 2010, Makuhari, Japan. 2362--2365.

[55]

Li Xukun and Doina Caragea. 2020. Improving Disaster-related Tweet Classification with a Multimodal Approach. In ISCRAM 2020 Conference Proceedings--17th International Conference on Information Systems for Crisis Response and Management.

[56]

Yasin Yilmaz and Alfred O Hero. 2018. Multimodal event detection in Twitter hashtag networks. Journal of Signal Processing Systems, Vol. 90, 2 (2018), 185--200.

Digital Library

[57]

Boogeo Yoon, Youhan Lee, and Woonhyuk Baek. 2022. COYO-ALIGN. https://github.com/kakaobrain/coyo-align.

[58]

Jianfei Yu and Jing Jiang. 2019. Adapting BERT for target-oriented multimodal sentiment classification. IJCAI.

[59]

Yuanyuan Zhang, Zi-Rui Wang, and Jun Du. 2019. Deep Fusion: An Attention Guided Factorized Bilinear Pooling for Audio-video Emotion Recognition. https://doi.org/10.48550/ARXIV.1901.04889

[60]

Zhiqiang Zou, Hongyu Gan, Qunying Huang, Tianhui Cai, and Kai Cao. 2021. Disaster Image Classification by Fusing Multimodal Social Media Data. ISPRS International Journal of Geo-Information, Vol. 10, 10 (2021), 636. io

Cited By

Jain TGopalani DKumar Meena Y(2025)Informative task classification with concatenated embeddings using deep learning on crisisMMDInternational Journal of Computers and Applications10.1080/1206212X.2024.244706647:2(123-140)Online publication date: 8-Jan-2025
https://doi.org/10.1080/1206212X.2024.2447066
Nascimento JJacobs NRocha A(2024)Interactive Event Sifting using Bayesian Graph Neural Networks2024 IEEE International Workshop on Information Forensics and Security (WIFS)10.1109/WIFS61860.2024.10810718(1-5)Online publication date: 2-Dec-2024
https://doi.org/10.1109/WIFS61860.2024.10810718

Index Terms

Contrastive Learning for Multimodal Classification of Crisis related Tweets
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision representations
        Image representations
    2. Natural language processing
      1. Information extraction
  2. Machine learning
    1. Learning settings
      1. Semi-supervised learning settings
2. Information systems
  1. World Wide Web
    1. Web applications
      1. Social networks

Recommendations

Label contrastive learning for image classification
Abstract
Image classification is one of the most important research tasks in computer vision. Current image classification methods with supervised learning have achieved good classification accuracy. However, supervised image classification methods mainly ...
Supervised Contrastive Learning for Product Classification
Advanced Data Mining and Applications
Abstract
Contrastive learning has shown great results in image classification. It is primarily applied for semi-supervised or unsupervised representation learning. Contrastive learning has recently shown high accuracy results in supervised environments ...
Analysis of Rescue Request Tweets in the 2018 Japan Floods
ITCC '19: Proceedings of the 2019 International Conference on Information Technology and Computer Communications

Large-scale natural disasters are a frequent occurrence in Japan. During and after such disasters, numerous messages (i.e., tweets) related to the disaster are posted on Twitter. For recent disasters, such as the 2017 Northern Kyushu Floods in Japan and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

WWW '24: Proceedings of the ACM Web Conference 2024

May 2024

4826 pages

ISBN:9798400701719

DOI:10.1145/3589334

General Chairs:
Tat-Seng Chua
National University of Singapore
,
Chong-Wah Ngo
Singapore Management University
,
Proceedings Chair:
Roy Ka-Wei Lee
Singapore University of Technology and Design
,
Program Chairs:
Ravi Kumar
Google
,
Hady W. Lauw
Singapore Management University

Copyright © 2024 Owner/Author.

This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

SIGWEB: ACM Special Interest Group on Hypertext, Hypermedia, and Web

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 May 2024

Check for updates

Author Tags

Qualifiers

Research-article

Conference

WWW '24

Sponsor:

SIGWEB

WWW '24: The ACM Web Conference 2024

May 13 - 17, 2024

Singapore, Singapore

Acceptance Rates

Overall Acceptance Rate 1,899 of 8,196 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
613
Total Downloads

Downloads (Last 12 months)613
Downloads (Last 6 weeks)87

Reflects downloads up to 02 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Jain TGopalani DKumar Meena Y(2025)Informative task classification with concatenated embeddings using deep learning on crisisMMDInternational Journal of Computers and Applications10.1080/1206212X.2024.244706647:2(123-140)Online publication date: 8-Jan-2025
https://doi.org/10.1080/1206212X.2024.2447066
Nascimento JJacobs NRocha A(2024)Interactive Event Sifting using Bayesian Graph Neural Networks2024 IEEE International Workshop on Information Forensics and Security (WIFS)10.1109/WIFS61860.2024.10810718(1-5)Online publication date: 2-Dec-2024
https://doi.org/10.1109/WIFS61860.2024.10810718

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten