research-article

Free access

Just Accepted

Towards Cross-lingual Social Event Detection with Hybrid Knowledge Distillation

Authors:

Shengxiang Gao,

Qiang YangAuthors Info & Claims

ACM Transactions on Knowledge Discovery from Data

Accepted on 21 August 2024

https://doi.org/10.1145/3689948

Online AM: 27 August 2024 Publication History

Abstract

Recently published graph neural networks (GNNs) show promising performance at social event detection tasks. However, most studies are oriented toward monolingual data in languages with abundant training samples. This has left the common lesser-spoken languages relatively unexplored. Thus, in this work, we present a GNN-based framework that integrates cross-lingual word embeddings into the process of graph knowledge distillation for detecting events in low-resource language data streams. To achieve this, a novel cross-lingual knowledge distillation framework, called CLKD, exploits prior knowledge learned from similar threads in English to make up for the paucity of annotated data. Specifically, to extract sufficient useful knowledge, we propose a hybrid distillation method that consists of both feature-wise and relation-wise information. To transfer both kinds of knowledge in an effective way, we add a cross-lingual module in the feature-wise distillation to eliminate the language gap and selectively choose beneficial relations in the relation-wise distillation to avoid distraction caused by teachers’ misjudgments. Our proposed CLKD framework also adopts different configurations to suit both offline and online situations. Experiments on real-world datasets show that the framework is highly effective at detection in languages where training samples are scarce.

References

[1]

Charu C Aggarwal and Karthik Subbian. 2012. Event detection in social streams. In SDM. 624–635.

[2]

Alaa Alharbi and Mark Lee. 2021. Kawarith: an Arabic Twitter Corpus for Crisis Events. In WANLP. 42–52.

[3]

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2017. Learning bilingual word embeddings with (almost) no bilingual data. In ACL. 451–462.

[4]

Mikel Artetxe, Gorka Labaka, and Eneko Agirre. 2018. A robust self-learning method for fully unsupervised cross-lingual mappings of word embeddings. In ACL. 789–798.

[5]

Jimmy Ba and Rich Caruana. 2014. Do Deep Nets Really Need to be Deep?. In NIPS. 2654–2662.

[6]

Fazlourrahman Balouchzahi and H Shashirekha. 2020. An Approach for Event Detection from News in Indian Languages using Linear SVC. In FIRE. 25–28.

[7]

Hila Becker, Mor Naaman, and Luis Gravano. 2011. Beyond trending topics: Real-world event identification on twitter. In AAAI, Vol. 5. 438–441.

[8]

David M Blei, Andrew Y Ng, and Michael I Jordan. 2003. Latent dirichlet allocation. the Journal of machine Learning research 3 (2003), 993–1022.

[9]

Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135–146.

[10]

Cristian BUCILA, Rich CARUANA, and Alexandru NICULESCU-MIZIL. 2006. Model compression. In KDD. 535–541.

[11]

Yuwei Cao, Hao Peng, Jia Wu, Yingtong Dou, Jianxin Li, and Philip S. Yu. 2021. Knowledge-Preserving Incremental Social Event Detection via Heterogeneous GNNs. WWW (2021), 3383–3395.

[12]

Yuwei Cao, Hao Peng, Zhengtao Yu, and S Yu Philip. 2024. Hierarchical and incremental structural entropy minimization for unsupervised social event detection. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 8255–8264.

[13]

Cen Chen, Chengyu Wang, Minghui Qiu, Dehong Gao, Linbo Jin, and Wang Li. 2021. Cross-domain Knowledge Distillation for Retrieval-based Question Answering Systems. In WWW 2021. 2613–2623.

[14]

Gang Chen, Jiawei Chen, Fuli Feng, Sheng Zhou, and Xiangnan He. 2023. Unbiased Knowledge Distillation for Recommendation. In WSDM 2023. 976–984.

[15]

Guobin Chen, Wongun Choi, Xiang Yu, Tony Han, and Manmohan Chandraker. 2017. Learning efficient object detection models with knowledge distillation. NIPs 30 (2017), 742–751.

[16]

Xueqi Cheng, Xiaohui Yan, Yanyan Lan, and Jiafeng Guo. 2014. Btm: Topic modeling over short texts. IEEE TKDE 26, 12 (2014), 2928–2941.

[17]

Hyung Won Chung, Dan Garrette, Kiat Chuan Tan, and Jason Riesa. 2020. Improving Multilingual Models with Language-Clustered Vocabularies. In EMNLP. 4536–4546.

[18]

Giorgio Ciano, Alberto Rossi, Monica Bianchini, and Franco Scarselli. 2021. On Inductive-Transductive Learning with Graph Neural Networks. IEEE TPAMI (2021), 758–769.

[19]

Alexis Conneau, Kartikay Khandelwal, Naman Goyal, Vishrav Chaudhary, Guillaume Wenzek, Francisco Guzmán, Édouard Grave, Myle Ott, Luke Zettlemoyer, and Veselin Stoyanov. 2020. Unsupervised Cross-lingual Representation Learning at Scale. In ACL. 8440–8451.

[20]

Mário Cordeiro. 2012. Twitter event detection: combining wavelet analysis and topic inference summarization. In Doctoral symposium on informatics engineering, Vol. 1. 11–16.

[21]

Wanqiu Cui, Junping Du, Dawei Wang, Feifei Kou, and Zhe Xue. 2021. MVGAN: Multi-View Graph Attention Network for Social Event Detection. ACM TIST 12, 3 (2021), 1–24.

[22]

Xing Dai, Zeren Jiang, Zhao Wu, Yiping Bao, Zhicheng Wang, Si Liu, and Erjin Zhou. 2021. General instance distillation for object detection. In CVPR. 7842–7851.

[23]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. Bert: Pre-training of deep bidirectional transformers for language understanding. NAACL (2018), 4171–4186.

[24]

Chenhe Dong, Yaliang Li, Ying Shen, and Minghui Qiu. 2021. HRKD: Hierarchical Relational Knowledge Distillation for Cross-domain Language Model Compression. In EMNLP. 3126–3136.

[25]

Mateusz Fedoryszak, Brent Frederick, Vijay Rajaram, and Changtao Zhong. 2019. Real-time event detection on social data streams. In KDD. 2774–2782.

[26]

Kaituo Feng, Changsheng Li, Ye Yuan, and Guoren Wang. 2022. Freekd: Free-direction knowledge distillation for graph neural networks. In SIGKDD. 357–366.

Digital Library

[27]

Gabriel Pui Cheong Fung, Jeffrey Xu Yu, Philip S. Yu, and Hongjun Lu. 2005. Parameter free bursty events detection in text streams. In VLDB. Citeseer, 181–192.

Digital Library

[28]

Shiming Ge, Kangkai Zhang, Haolin Liu, Yingying Hua, Shengwei Zhao, Xin Jin, and Hao Wen. 2020. Look one and more: Distilling hybrid order relational knowledge for cross-resolution image recognition. In AAAI, Vol. 34. 10845–10852.

[29]

Jianyuan Guo, Kai Han, Yunhe Wang, Han Wu, Xinghao Chen, Chunjing Xu, and Chang Xu. 2021. Distilling object detectors via decoupled features. In CVPR. 2154–2164.

[30]

Zhichun Guo, Chunhui Zhang, Yujie Fan, Yijun Tian, Chuxu Zhang, and Nitesh V Chawla. 2023. Boosting graph neural networks via adaptive knowledge distillation. In AAAI, Vol. 37. 7793–7801.

Digital Library

[31]

Saurabh Gupta, Judy Hoffman, and Jitendra Malik. 2016. Cross modal distillation for supervision transfer. In CVPR. 2827–2836.

[32]

Shivanshu Gupta, Yoshitomo Matsubara, Ankit Chadha, and Alessandro Moschitti. 2023. Cross-lingual knowledge distillation for answer sentence selection in low-resource languages. In ACL Findings 2023.

[33]

Mahmud Hasan, Mehmet A Orgun, and Rolf Schwitter. 2018. A survey on real-time event detection from the twitter data stream. Journal of Information Science 44, 4 (2018), 443–463.

Digital Library

[34]

Alexander Hermans, Lucas Beyer, and Bastian Leibe. 2017. In defense of the triplet loss for person re-identification. arXiv preprint arXiv:1703.07737 (2017).

[35]

Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. 2015. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531 (2015).

[36]

Elad Hoffer and Nir Ailon. 2015. Deep metric learning using triplet network. In SIMBAD. Springer, 84–92.

[37]

Yedid Hoshen and Lior Wolf. 2018. Non-Adversarial Unsupervised Word Translation. In EMNLP. 469–478.

[38]

Linmei Hu, Bin Zhang, Lei Hou, and Juanzi Li. 2017. Adaptive online event detection in news streams. Knowledge-Based Systems 138 (2017), 105–112.

[39]

Himalaya Jain, Spyros Gidaris, Nikos Komodakis, Patrick Pérez, and Matthieu Cord. 2020. QuEST: Quantized embedding space for transferring knowledge. In ECCV. Springer, 173–189.

[40]

Shujun Wang Kang Li, Lequan Yu and Pheng-Ann Heng. 2020. Towards Cross-Modality Medical Image Segmentation with Online Mutual Knowledge Distillation. In AAAI. 775–783.

[41]

Thomas N Kipf and Max Welling. 2017. Semi-supervised classification with graph convolutional networks. In ICLR. 1–14.

[42]

Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. 2018. Word translation without parallel data. In ICLR. 1–14.

[43]

Pu Li, Xiaoyan Yu, Hao Peng, Yantuan Xian, Linqin Wang, Li Sun, Jingyun Zhang, and Philip S Yu. 2024. Relational Prompt-based Pre-trained Language Models for Social Event Detection. arXiv preprint arXiv:2404.08263 (2024).

[44]

Bang Liu, Fred X Han, Di Niu, Linglong Kong, Kunfeng Lai, and Yu Xu. 2020. Story Forest: Extracting Events and Telling Stories from Breaking News. ACM TKDD 14, 3 (2020), 1–28.

[45]

Fanzhen Liu, Shan Xue, Jia Wu, Chuan Zhou, Wenbin Hu, Cecile Paris, Surya Nepal, Jian Yang, and Philip S. Yu. 2020. Deep Learning for Community Detection: Progress, Challenges and Opportunities. In IJCAI. 4981–4987.

[46]

Yaopeng Liu, Hao Peng, Jianxin Li, Yangqiu Song, and Xiong Li. 2020. Event detection and evolution in multi-lingual social streams. Frontiers of Computer Science 14, 5 (2020), 1–15.

Digital Library

[47]

Xiaoxiao Ma, Jia Wu, Shan Xue, Jian Yang, Chuan Zhou, Quan Z. Sheng, Hui Xiong, and Leman Akoglu. 2021. A Comprehensive Survey on Graph Anomaly Detection with Deep Learning. IEEE TKDE (2021), 1–1. https://doi.org/10.1109/TKDE.2021.3118815

Digital Library

[48]

Béatrice Mazoyer, Julia Cagé, Nicolas Hervé, and Céline Hudelot. 2020. A french corpus for event detection on twitter. In LREC. 6220–6227.

[49]

Andrew J McMinn, Yashar Moshfeghi, and Joemon M Jose. 2013. Building a large-scale corpus for evaluating event detection on twitter. In KMIS. 409–418.

[50]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representations in vector space. In ICLR. 1–12.

[51]

Tomas Mikolov, Quoc V Le, and Ilya Sutskever. 2013. Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013).

[52]

Tasnim Mohiuddin, M Saiful Bari, and Shafiq Joty. 2020. LNMap: Departures from Isomorphic Assumption in Bilingual Lexicon Induction Through Non-Linear Mapping in Latent Space. In EMNLP. 2712–2723.

[53]

Aitor Ormazabal, Mikel Artetxe, Gorka Labaka, Aitor Soroa, and Eneko Agirre. 2019. Analyzing the Limitations of Cross-lingual Word Embedding Mappings. In ACL. 4990–4995.

[54]

Ozer Ozdikis, Pinar Karagoz, and Halit Oğuztüzün. 2017. Incremental clustering with vector expansion for online event detection in microblogs. Social Network Analysis and Mining 7, 1 (2017), 1–17.

[55]

Haojie Pan, Chengyu Wang, Minghui Qiu, Yichang Zhang, Yaliang Li, and Jun Huang. 2021. Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains. In ACL/IJCNLP. 3026–3036.

[56]

Wonpyo Park, Dongju Kim, Yan Lu, and Minsu Cho. 2019. Relational knowledge distillation. In CVPR. 3967–3976.

[57]

Baoyun Peng, Xiao Jin, Jiaheng Liu, Dongsheng Li, Yichao Wu, Yu Liu, Shunfeng Zhou, and Zhaoning Zhang. 2019. Correlation congruence for knowledge distillation. In ICCV. 5007–5016.

[58]

Hao Peng, Jianxin Li, Qiran Gong, Yangqiu Song, Yuanxing Ning, Kunfeng Lai, and Philip S. Yu. 2019. Fine-grained event categorization with heterogeneous graph convolutional networks. IJCAI (2019), 3238–3245.

[59]

Hao Peng, Jianxin Li, Yu He, Yaopeng Liu, Mengjiao Bao, Lihong Wang, Yangqiu Song, and Qiang Yang. 2018. Large-scale hierarchical text classification with recursively regularized deep graph-cnn. In WWW. 1063–1072.

[60]

Hao Peng, Jianxin Li, Yangqiu Song, Renyu Yang, Rajiv Ranjan, Philip S. Yu, and Lifang He. 2021. Streaming social event detection and evolution discovery in heterogeneous information networks. ACM TKDD 15, 5 (2021), 1–33.

Digital Library

[61]

Hao Peng, Ruitong Zhang, Shaoning Li, Yuwei Cao, Shirui Pan, and Philip S. Yu. 2023. Reinforced, Incremental and Cross-Lingual Event Detection From Social Messages. IEEE Transactions on Pattern Analysis and Machine Intelligence 45, 1 (2023), 980–998.

[62]

Jeffrey Pennington, Richard Socher, and Christopher D Manning. 2014. Glove: Global vectors for word representation. In EMNLP. 1532–1543.

[63]

Jiaqian Ren, Lei Jiang, Hao Peng, Yuwei Cao, Jia Wu, Philip S. Yu, and Lifang He. 2022. From Known to Unknown: Quality-aware Self-improving Graph Neural Network For Open Set Social Event Detection. In CIKM. 1696–1705.

[64]

Jiaqian Ren, Lei Jiang, Hao Peng, Zhiwei Liu, Jia Wu, and Philip S. Yu. 2022. Evidential Temporal-aware Graph-based Social Event Detection via Dempster-Shafer Theory. In ICWS. IEEE, 331–336.

[65]

Jiaqian Ren, Hao Peng, Lei Jiang, Zhiwei Liu, Jia Wu, Zhengtao Yu, and S Yu Philip. 2023. Uncertainty-guided Boundary Learning for Imbalanced Social Event Detection. TKDE 01 (2023), 1–14.

[66]

Adriana Romero, Nicolas Ballas, Samira Ebrahimi Kahou, Antoine Chassang, Carlo Gatta, and Yoshua Bengio. 2015. Fitnets: Hints for thin deep nets. In ICLR. 1–13.

[67]

Florian Schroff, Dmitry Kalenichenko, and James Philbin. 2015. Facenet: A unified embedding for face recognition and clustering. In CVPR. 815–823.

[68]

Anders Søgaard, Ivan Vulić, Sebastian Ruder, and Manaal Faruqui. 2019. Cross-lingual word embeddings. Synthesis Lectures on Human Language Technologies 12, 2 (2019), 1–132.

[69]

Xing Su, Shan Xue, Fanzhen Liu, Jia Wu, Jian Yang, Chuan Zhou, Wenbin Hu, Cecile Paris, Surya Nepal, Di Jin, et al. 2022. A comprehensive survey on community detection with deep learning. IEEE Transactions on Neural Networks and Learning Systems (2022), 1–21.

[70]

Fida Mohammad Thoker and Juergen Gall. 2019. Cross-modal knowledge distillation for action recognition. In ICIP. IEEE, 6–10.

[71]

Jialin Tian, Xing Xu, Zheng Wang, Fumin Shen, and Xin Liu. 2021. Relationship-preserving knowledge distillation for zero-shot sketch based image retrieval. In MM. 5473–5481.

[72]

Frederick Tung and Greg Mori. 2019. Similarity-preserving knowledge distillation. In ICCV. 1365–1374.

[73]

Laurens Van der Maaten and Geoffrey Hinton. 2008. Visualizing data using t-SNE. Journal of machine learning research 9, 11 (2008), 2579–2605.

[74]

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. 2018. Graph attention networks. (2018).

[75]

Kai Wang, Yu Liu, Qian Ma, and Quan Z Sheng. 2021. Mulde: Multi-teacher knowledge distillation for low-dimensional knowledge graph embeddings. In WWW. 1716–1726.

[76]

Kai Wang, Yifan Wang, Xing Xu, Xin Liu, Weihua Ou, and Huimin Lu. 2022. Prototype-based Selective Knowledge Distillation for Zero-Shot Sketch Based Image Retrieval. In MM. 601–609.

[77]

Tao Wang, Li Yuan, Xiaopeng Zhang, and Jiashi Feng. 2019. Distilling object detectors with fine-grained feature imitation. In CVPR. 4933–4942.

[78]

Lianghao Xia, Chao Huang, Jiao Shi, and Yong Xu. 2023. Graph-less collaborative filtering. In WWW. 17–27.

[79]

Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya Barua, and Colin Raffel. 2021. mT5: A Massively Multilingual Pre-trained Text-to-Text Transformer. In NAACL. 483–498.

[80]

Cheng Yang, Jiawei Liu, and Chuan Shi. 2021. Extract the knowledge of graph neural networks and go beyond it: An effective knowledge distillation framework. In WWW. 1227–1237.

[81]

Yiming Yang, Tom Pierce, and Jaime Carbonell. 1998. A study of retrospective and on-line event detection. In SIGIR. 28–36.

[82]

Junho Yim, Donggyu Joo, Jihoon Bae, and Junmo Kim. 2017. A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In CVPR. 4133–4141.

[83]

Baosheng Yu, Tongliang Liu, Mingming Gong, Changxing Ding, and Dacheng Tao. 2018. Correcting the triplet selection bias for triplet loss. In ECCV. 71–87.

[84]

Weiren Yu, Jianxin Li, Md Zakirul Alam Bhuiyan, Richong Zhang, and Jinpeng Huai. 2017. Ring: Real-time emerging anomaly monitoring system over text streams. IEEE Transactions on Big Data 5, 4 (2017), 506–519.

[85]

Kuo Zhang, Juan Zi, and Li Gang Wu. 2007. New event detection based on indexing-tree and named entity. In SIGIR. 215–222.

[86]

Meng Zhang, Yang Liu, Huanbo Luan, and Maosong Sun. 2017. Adversarial training for unsupervised bilingual lexicon induction. In ACL. 1959–1970.

[87]

Ying Zhang, Tao Xiang, Timothy M Hospedales, and Huchuan Lu. 2018. Deep mutual learning. In CVPR. 4320–4328.

[88]

Hao Zheng, Runqi Wang, Jianzhuang Liu, and Asako Kanezaki. 2023. Cross-Level Distillation and Feature Denoising for Cross-Domain Few-Shot Classification. In ICLR 2023.

[89]

Xiangmin Zhou and Lei Chen. 2014. Event detection over twitter social media streams. The VLDB journal 23, 3 (2014), 381–400.

Digital Library

Index Terms

Towards Cross-lingual Social Event Detection with Hybrid Knowledge Distillation

Recommendations

Towards zero-shot cross-lingual named entity disambiguation
Highlights
- Novel zero-shot cross-lingual Named Entity Disambiguation approach.
- Robust ...
Abstract
In cross-Lingual Named Entity Disambiguation (XNED) the task is to link Named Entity mentions in text in some native language to English entities in a knowledge graph. XNED systems usually require training data for each native language,...
Cross-Lingual Transfer of Large Language Model by Visually-Derived Supervision Toward Low-Resource Languages
MM '23: Proceedings of the 31st ACM International Conference on Multimedia

Recent progress on vision and language research has shown that visual supervision improves the performance of large language models (LLMs) in various natural language processing (NLP) tasks. In particular, the Vokenization approach [65] initiated a new ...
Meta-ED: Cross-lingual Event Detection Using Meta-learning for Indian Languages
Lack of annotated data is a major concern in Event Detection (ED) tasks for low-resource languages. Cross-lingual ED seeks to address this issue by transferring information across various languages to improve overall performance. In this article, we ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data

ACM Transactions on Knowledge Discovery from Data Just Accepted

EISSN:1556-472X

Table of Contents

Copyright © 2024 Copyright held by the owner/author(s).

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Online AM: 27 August 2024

Accepted: 21 August 2024

Revised: 04 May 2024

Received: 11 November 2023

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
211
Total Downloads

Downloads (Last 12 months)211
Downloads (Last 6 weeks)63

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables