Graph Embeddings for Abusive Language Detection

Cécillon, Noé; Labatut, Vincent; Dufour, Richard; Linarès, Georges

doi:10.1007/s42979-020-00413-7

Graph Embeddings for Abusive Language Detection

Original Research
Published: 12 January 2021

Volume 2, article number 37, (2021)
Cite this article

SN Computer Science Aims and scope Submit manuscript

Noé Cécillon ORCID: orcid.org/0000-0002-9889-0931¹,
Vincent Labatut¹,
Richard Dufour¹ &
…
Georges Linarès¹

696 Accesses
9 Citations
2 Altmetric
Explore all metrics

Abstract

Abusive behaviors are common on online social networks. The increasing frequency of anti-social behaviors forces the hosts of online platforms to find new solutions to address this problem. Automating the moderation process has thus received a lot of interest in the past few years. Various methods have been proposed, most based on the exchanged content, and one relying on the structure and dynamics of the conversation. It has the advantage of being language-independent, however it leverages a hand-crafted set of topological measures which are computationally expensive and not necessarily suitable to all situations. In the present paper, we propose to use recent graph embedding approaches to automatically learn representations of conversational graphs depicting message exchanges. We compare two categories: node vs. whole-graph embeddings. We experiment with a total of 8 approaches and apply them to a dataset of online messages. We also study more precisely which aspects of the graph structure are leveraged by each approach. Our study shows that the representation produced by certain embeddings captures the information conveyed by specific topological measures, but misses out other aspects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Analyzing Twitter networks using graph embeddings: an application to the British case

Article 06 June 2021

Graph-Based Features for Automatic Online Abuse Detection

Detecting Emerging Rumors by Embedding Propagation Graphs

Notes

References

Ahmed A, Shervashidze N, Narayanamurthy S, Josifovski V, Smola AJ. Distributed large-scale natural graph factorization. Int Conf World Wide Web. 2013. https://doi.org/10.1145/2488388.2488393.
Article Google Scholar
Badjatiya P, Gupta S, Gupta M, Varma V. Deep learning for hate speech detection in tweets. Int Conf World Wide Web Companion. 2017. https://doi.org/10.1145/3041021.3054223.
Article Google Scholar
Bai Y, Ding H, Qiao Y, Marinovic A, Gu K, Chen T, Sun Y, Wang W. Unsupervised inductive graph-level representation learning via graph-graph proximity. Int Jt Conf Artif Intell. 2019. https://doi.org/10.24963/ijcai.2019/275.
Article Google Scholar
Balci K, Salah AA. Automatic analysis and identification of verbal aggression and abusive behaviors for online social games. Comput Hum Behav. 2015;53:517–26. https://doi.org/10.1016/j.chb.2014.10.025.
Article Google Scholar
Belkin M, Niyogi P. Laplacian eigenmaps and spectral techniques for embedding and clustering. In: Advances in Neural Information Processing Systems; 2002. pp. 585–591. http://papers.nips.cc/paper/1961-laplacian-eigenmaps-and-spectral-techniques-for-embedding-and-clustering.pdf.
Cai H, Zheng VW, Chang KCC. A comprehensive survey of graph embedding: problems, techniques, and applications. IEEE Trans Knowl Data Eng. 2018;30(9):1616–37. https://doi.org/10.1109/TKDE.2018.2807452.
Article Google Scholar
Chatzakou D, Kourtellis N, Blackburn J, De Cristofaro E, Stringhini G, Vakali A. Mean birds: detecting aggression and bullying on twitter. ACM Web Sci Conf. 2017. https://doi.org/10.1145/3091478.3091487.
Article Google Scholar
Chen H, Perozzi B, Hu Y, Skiena S. Harp: Hierarchical representation learning for networks. In: 32nd AAAI Conferenceon Artificial Intelligence. 2018. https://www.aaai.org/ocs/index.php/AAAI/AAAI18/paper/viewFile/16273/15922.
Chen Y, Zhou Y, Zhu S, Xu H. Detecting offensive language in social media to protect adolescent online safety. Int Conf Priv Sec Risk Trust Int Conf Social Comput. 2012. https://doi.org/10.1109/SocialCom-PASSAT.2012.55.
Article Google Scholar
Cécillon N, Labatut V, Dufour R, Linarès G. Wac: A corpus of wikipedia conversations for online abuse detection. In: 12th International Conference on Language Resources and Evaluation. 2020.
Cécillon N, Labatut V, Dufour R, Linarès G. Abusive language detection in online conversations by combining content- and graph-based features. Front Big Data. 2019;2:8. https://doi.org/10.3389/fdata.2019.00008.
Article Google Scholar
Dadvar M, Trieschnigg D, Ordelman R, de Jong F. Improving cyberbullying detection with user context. Eur Conf IR Res. 2013. https://doi.org/10.1007/978-3-642-36973-5_62.
Article Google Scholar
Dinakar K, Reichart R, Lieberman H. Modeling the detection of textual cyberbullying. In: 5th International AAAI Conference on Weblogs and Social Media / Workshop on the Social Mobile Web; 2011. pp. 11–17 . https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/3841.
Djuric N, Zhou J, Morris R, Grbovic M, Radosavljevic V, Bhamidipati N. Hate speech detection with comment embeddings. Int Conf World Wide Web. 2015. https://doi.org/10.1145/2740908.2742760.
Article Google Scholar
Donnat C, Zitnik M, Hallac D, Leskovec J. Learning structural node embeddings via diffusion wavelets. ACM SIGKDD Int Conf Knowl Discov Data Min. 2018. https://doi.org/10.1145/3219819.3220025.
Article Google Scholar
Goyal P, Ferrara E. Graph embedding techniques, applications, and performance: a survey. Knowl Based Systems. 2018;151:78–94. https://doi.org/10.1016/j.knosys.2018.03.022.
Article Google Scholar
Grover A, Leskovec J. Node2vec: scalable feature learning for networks. ACM SIGKDD Int Conf Knowl Discov Data Min. 2016. https://doi.org/10.1145/2939672.2939754.
Article Google Scholar
Hou B, Wang Y, Zeng M, Jiang S, Mengshoel O.J, Tong Y, Bai J. Customized graph embedding: tailoring embedding vectors to different applications. 2019. arXiv: 1911.09454.pdf
Kipf T.N, Welling M: Semi-supervides classification with graph convolutional networks. In: ICLR. 2017. arXiv: 1609.02907.pdf
de Lara N, Pineau E. A simple baseline algorithm for graph classification. In: NeurIPS Workshop. 2018. arXiv: 1810.09155.pdf
Le Q, Mikolov T. Distributed representations of sentences and documents. Int Conf Mach Learn. 2014;32:1188–96.
Google Scholar
Li J, Wu L, Guo R, Liu C, Liu H. Multi-level network embedding with boosted low-rank matrix approximation. IEEE/ACM Int Conf Adv Soc Netw Anal Min. 2019. https://doi.org/10.1145/3341161.3342864.
Article Google Scholar
Liang X, Li D, Song M, Madden A, Ding Y, Bu Y. Predicting biomedical relationships using the knowledge and graph embedding cascade model. PLoS One. 2019. https://doi.org/10.1371/journal.pone.0218264.
Article Google Scholar
Mikolov T, Chen K, Corrado G, Dean J. Efficient estimation of word representations in vector space. In: ICLR Workshop Track Proceedings. 2013.
Mishra P, Del Tredici M, Yannakoudakis H, Shutova E. Author profiling for abuse detection. In: 27th International Conference on Computational Linguistics; 2018. pp. 1088–98 . https://www.aclweb.org/anthology/C18-1093.
Mousavi SF, Safayani M, Mirzaei A, Bahonar H. Hierarchical graph embedding in vector space by graph pyramid. Pattern Recogn. 2017;61(C):245–54. https://doi.org/10.1016/j.patcog.2016.07.043.
Article MATH Google Scholar
Narayanan A, Chandramohan M, Venkatesan R, Chen L, Liu Y, Jaiswal S. graph2vec: Learning distributed representations of graphs. In: 13th International Workshop on Mining and Learning with Graphs (MLG). 2017.
Nobata C, Tetreault J, Thomas A, Mehdad Y, Chang Y. Abusive language detection in online user content. Int Conf World Wide Web. 2016. https://doi.org/10.1145/2872427.2883062.
Article Google Scholar
Okky Ibrohim M, Budi I. A dataset and preliminaries study for abusive language detection in indonesian social media. Proced Comput Sci. 2018;135:222–9. https://doi.org/10.1016/j.procs.2018.08.169.
Article Google Scholar
Ou M, Cui P, Pei J, Zhang Z, Zhu W. Asymmetric transitivity preserving graph embedding. ACM SIGKDD Int Conf Knowledge Discov Data Min. 2016. https://doi.org/10.1145/2939672.2939751.
Article Google Scholar
Papegnies E, Labatut V, Dufour R, Linarès G. Conversational networks for automatic online moderation. IEEE Trans Comput Soc Syst. 2019;6(1):38–55. https://doi.org/10.1109/TCSS.2018.2887240.
Article Google Scholar
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, Blondel M, Prettenhofer P, Weiss R, Dubourg V, Vanderplas J, Passos A, Cournapeau D, Brucher M, Perrot M, Duchesnay E. Scikit-learn: machine learning in python. J Mach Learn Res. 2011;12:2825–30.
MathSciNet MATH Google Scholar
Perozzi B, Al-Rfou R, Skiena S. Deepwalk: online learning of social representations. ACM SIGKDD Int Conf Knowl Discov Data Min. 2014. https://doi.org/10.1145/2623330.2623732.
Article Google Scholar
Perozzi B, Kulkarni V, Skiena S. Don’t walk, skip! online learning of multi-scale network embeddings. IEEE/ACM Int Conf Adv Soc Netw Anal Min. 2017. https://doi.org/10.1145/3110025.3110086.
Article Google Scholar
Roweis ST, Saul LK. Nonlinear dimensionality reduction by locally linear embedding. Science. 2000;290(5500):2323–6. https://doi.org/10.1016/j.chb.2014.10.0255.
Article Google Scholar
Rozemberczki B, Kiss O, Sarkar R. Karate Club: An API Oriented Open-source Python Framework for Unsupervised Learning on Graphs. In: ACM International Conference on Information and Knowledge Management; 2020. pp. 3125–32. https://doi.org/10.1145/3340531.3412757.
Salminen J, Almerekhi H, Milenković M, Jung S, An J, Kwak H, Jansen B.J: Anatomy of online hate: developing a taxonomy and machine learning models for identifying and classifying hate in online news media. In: International AAAI Conference on Web and Social Media (ICWSM 2018). 2018.
Tsitsulin A, Mottin D, Karras P, Bronstein A, Müller E. Netlsd: Hearing the shape of a graph. ACM SIGKDD Int Conf Knowl Discov Data Min. 2018. https://doi.org/10.1145/3219819.3219991.
Article Google Scholar
Verma S, Zhang ZL. Hunt for the unique, stable, sparse and fast feature learning on graphs. Adv Neural Inform Process Syst. 2017;30:88–98.
Google Scholar
Wang D, Cui P, Zhu W. Structural deep network embedding. ACM SIGKDD Int Conf Knowl Discov Data Min. 2016. https://doi.org/10.1145/2939672.2939753.
Article Google Scholar
Wang H, Wang J, Wang J, Zhao M, Zhang W, Zhang F, Xie X, Guo M. Graphgan: graph representation learning with generative adversarial nets. AAAI Conf Artif Intell. 2018. https://doi.org/10.1109/TKDE.2019.2961882.
Article Google Scholar
Warner W, Hirschberg J. Detecting hate speech on the world wide web. In: Second Workshop on Language in Social Media; 2012. pp. 19–26.
Waseem Z, Hovy D. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In: NAACL Student Research Workshop; 2016. pp. 88–93.
Xiang G, Fan B, Wang L, Hong J, Rose C. Detecting offensive tweets via topical feature discovery over a large scale twitter corpus. ACM Int Conf Inform Knowl Manag. 2012. https://doi.org/10.1145/2396761.2398556.
Article Google Scholar
Yan S, Xu D, Zhang B, Zhang H, Yang Q, Lin S. Graph embedding and extensions: a general framework for dimensionality reduction. IEEE Trans Pattern Anal Mach Intell. 2007;29:40–51.
Article Google Scholar
Yin D, Xue Z, Hong L, Davison B.D, Kontostathis A, Edwards L. Detection of harassment on web 2.0. In: WWW Workshop: Content Analysis in the Web 2.0; 2009. pp. 1–7.

Download references

Author information

Authors and Affiliations

Laboratoire Informatique d’Avignon-LIA EA 4128, Avignon Université, Avignon, France
Noé Cécillon, Vincent Labatut, Richard Dufour & Georges Linarès

Authors

Noé Cécillon
View author publications
You can also search for this author in PubMed Google Scholar
Vincent Labatut
View author publications
You can also search for this author in PubMed Google Scholar
Richard Dufour
View author publications
You can also search for this author in PubMed Google Scholar
Georges Linarès
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Noé Cécillon.

Ethics declarations

Conflict of interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article is part of the topical collection “Social Media Analytics and its Evaluation” guest edited by Thomas Mandl, Sandip Modha and Prasenjit Majumder.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cécillon, N., Labatut, V., Dufour, R. et al. Graph Embeddings for Abusive Language Detection. SN COMPUT. SCI. 2, 37 (2021). https://doi.org/10.1007/s42979-020-00413-7

Download citation

Received: 06 May 2020
Accepted: 24 November 2020
Published: 12 January 2021
DOI: https://doi.org/10.1007/s42979-020-00413-7

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Graph Embeddings for Abusive Language Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing Twitter networks using graph embeddings: an application to the British case

Graph-Based Features for Automatic Online Abuse Detection

Detecting Emerging Rumors by Embedding Propagation Graphs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

Graph Embeddings for Abusive Language Detection

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Analyzing Twitter networks using graph embeddings: an application to the British case

Graph-Based Features for Automatic Online Abuse Detection

Detecting Emerging Rumors by Embedding Propagation Graphs

Notes

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation