Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3603287.3651218acmconferencesArticle/Chapter ViewAbstractPublication Pagesacm-seConference Proceedingsconference-collections
short-paper

Evaluation of Different Machine Learning and Deep Learning Techniques for Hate Speech Detection

Published: 27 April 2024 Publication History

Abstract

Detecting online hate speech is important for creating safer online spaces. In this paper, we evaluate the performance of several machine learning (ML) and deep learning (DL) models in detecting hate speech on three different datasets. We evaluate the performance of the traditional ML algorithms Support Vector Machines (SVM), Naive Bayes, Decision Trees, Random Forests, and Logistic Regression. We also evaluate the performance of deep learning Convolutional Neural Networks (CNN), Long Short Term Memory (LSTM), and the BERT pre-trained transformer model. Our experiments show that BERT outperformed all other models with F-1 scores of 90.6% on one dataset and 89.7% and 88.2% on the other two datasets. After that, CNN and LSTM outperformed the traditional ML algorithms with F1-scores over 80% on all three datasets. Among the traditional ML models, SVM performed best with the highest F1-score of 75.6%.

References

[1]
Francisca Adoma Acheampong, Henry Nunoo-Mensah, and Wenyu Chen. 2021. Transformer Models for Text-based Emotion Detection: A Review of BERT-based Approaches. Artificial Intelligence Review (2021), 1--41.
[2]
Sweta Agrawal and Amit Awekar. 2018. Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms. In European conference on Information Retrieval (Lecture Notes in Computer Science, Vol. 10772), G. Pasi, B. Piwowarski, L. Azzopardi, and A. Hanbury (Eds.). Springer, 141--153. https://doi.org/10.1007/978-3-319-76941-7_11
[3]
Osama Alsharif. 2023. Rise in Hate Speech Over Gaza a Defining Moment. https://www.arabnews.com/node/2416871
[4]
Mohit Chandra, Dheeraj Pailla, Himanshu Bhatia, Aadilmehdi Sanchawala, Manish Gupta, Manish Shrivastava, and Ponnurangam Kumaraguru. 2021. "Subverting the Jewtocracy": Online Antisemitism Detection Using Multimodal Deep Learning. In Proceedings of the 13th ACM Web Science Conference 2021. ACM, Southampton, England, 148--157. https://doi.org/10.1145/3447535.3462502
[5]
Mohit Chandra, Manvith Reddy, Shradha Sehgal, Saurabh Gupta, Arun Balaji Buduru, and Ponnurangam Kumaraguru. 2021. "A Virus Has No Religion": Analyzing Islamophobia on Twitter During the COVID-19 Outbreak. In Proceedings of the 32nd ACM Conference on Hypertext and Social Media. Dublin, Ireland, 67--77.
[6]
Sam Cook. 2023. Cyberbullying Facts and Statistics for 2018 - 2023. https://www.comparitech.com/Internet-providers/cyberbullying-statistics/
[7]
Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated Hate Speech Detection and the Problem of Offensive Language. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 11. Montreal, Canada, 512--515.
[8]
Magnus Ekma. 2021. Learning Deep Learning: Theory and Practice of Neural Networks, Computer Vision, Natural Language Processing, and Transformers Using TensorFlow. Addison-Wesley Professional, Boston, MA, USA.
[9]
Mai ElSherief, Vivek Kulkarni, Dana Nguyen, William Yang Wang, and Elizabeth Belding. 2018. Hate Lingo: A Target-based Linguistic Analysis of Hate Speech in Social Media. In Proceedings of the International AAAI Conference on Web and Social Media, Vol. 12. Stanford, California, USA.
[10]
Emma Farge and Alison Williams. 2023. UN Committee Voices Concern About Rising Israeli Hate Speech Against Palestinians. https://www.reuters.com/world/uncommittee-voices-concern-about-rising-israeli-hate-speech-against-2023-10-27/
[11]
Paula Fortuna and Sérgio Nunes. 2019. A Survey on Automatic Detection of Hate Speech in Text. Comput. Surveys 51, 4 (2019), 1--30. https://doi.org/10.1145/3232676
[12]
Kyle Gallatin and Chris Albon. 2023. Machine Learning with Python Cookbook: Practical Solutions from Preprocessing to Deep Learning. O'Reilly Media, Boston, MA, USA.
[13]
Gary W Giumetti and Robin M Kowalski. 2022. Cyberbullying via Social Media and Well-being. Current Opinion in Psychology 45 (2022), 101314. https://doi.org/10.1016/j.copsyc.2022.101314
[14]
Jonathan Greig. 2021. CDC Study Finds Ties Between Online Bullying, Violence, Hate Speech and Suicide or Self-harm. https://www.zdnet.com/article/cdc-study-finds-ties-between-online-bullying-violence-hate-speech-and-suicide-or-self-harm
[15]
Aurélien Géron. 2019. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems. O'Reilly Media, Boston, MA, USA.
[16]
Ong Chee Hang and Halina Mohamed Dahlan. 2019. Cyberbullying Lexicon for Social Media. In 2019 6th International Conference on Research and Innovation in Information Systems (ICRIIS). IEEE, Johor Bahru, Malaysia, 1--6. https://doi.org/10.1109/ICRIIS48246.2019.9073679
[17]
Sameer Hinduja and Justin W Patchin. 2010. Bullying, Cyberbullying, and Suicide. Archives of Suicide Research 14, 3 (2010), 206--221.
[18]
Impermium. 2012. Detecting Insults in Social Commentary Dataset, Kaggle. https://www.kaggle.com/c/detecting-insults-in-social-commentary
[19]
S Joshua Johnson, M Ramakrishna Murty, and I Navakanth. 2023. A Detailed Review on Word Embedding Techniques with Emphasis on Word2Vec. Multimedia Tools and Applications (3 October 2023), 1--29.
[20]
Heena Khan and Joshua L Phillips. 2021. Language Agnostic Model: Detecting Islamophobic Content on Social Media. In Proceedings of the 2021 ACM Southeast Conference. Jacksonville, Alabama, USA, 229--233.
[21]
Lara Korte. 2017. Youth Suicide Rates are Rising. School and the Internet May be to Blame. https://www.usatoday.com/story/news/nation-now/2017/05/30/youth-suicide-rates-rising-school-and-internet-may-blame/356539001/
[22]
Ritesh Kumar, Atul Kr Ojha, Shervin Malmasi, and Marcos Zampieri. 2018. Benchmarking Aggression Identification in Social Media. In Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018). Santa Fe, New Mexico, USA, 1--11.
[23]
Jeremy Liebowitz, Geoffrey Macdonald, Vivek Shivaram, and Sanjendra Vignaraja. 2005. The Digitalization of Hate Speech in South and Southeast Asia: Conflict-Mitigation Approaches. Georgetown Journal of International Affairs (5 May 2005).
[24]
Anna Liu. 2018. Neural Network Models for Hate Speech Classification in Tweets. Ph. D. Dissertation. https://dash.harvard.edu/handle/1/38811552
[25]
Sean MacAvaney, Hao-Ren Yao, Eugene Yang, Katina Russell, Nazli Goharian, and Ophir Frieder. 2019. Hate Speech Detection: Challenges and Solutions. PloS one 14, 8 (2019). https://doi.org/10.1371/
[26]
Saed Rezayi, Vimala Balakrishnan, Samira Arabnia, and Hamid R Arabnia. 2018. Fake News and Cyberbullying in the Modern Era. In 2018 International Conference on Computational Science and Computational Intelligence (CSCI). IEEE, Las Vegas, NV, USA, 7--12. https://doi.org/10.1109/CSCI46756.2018.00010
[27]
Kalhan Rosenblatt. 2017. Cyberbullying Tragedy: New Jersey Family to Sue After 12-Year-Old Daughter's Suicide. https://www.nbcnews.com/news/us-news/new-jersey-family-sue-school-district-after-12-year-old-n788506
[28]
Hind Saleh, Areej Alhothali, and Kawthar Moria. 2023. Detection of Hate Speech Using BERT and Hate Speech Word Embedding with Deep Model. Applied Artificial Intelligence 37 (2023). Issue 1. https://doi.org/10.1080/08839514.2023.2166719
[29]
Anna Schmidt and Michael Wiegand. 2017. A Survey on Hate Speech Detection Using Natural Language Processing. In Proceedings of the Fifth International Workshop on Natural Language Processing for Social Media. Association for Computational Linguistics, Valencia, Spain, 1--10.
[30]
Mifta Sintaha and Moin Mostakim. 2018. An Empirical Study and Analysis of the Machine Learning Algorithms Used in Detecting Cyberbullying in Social Media. In 2018 21st International Conference of Computer and Information Technology (ICCIT). IEEE, Dhaka, Bangladesh, 1--6. https://doi.org/0.1109/ICCITECHN.2018.8631958
[31]
Nico T Solitana and Charibeth K Cheng. 2021. Analyses of Hate and Non-Hate Expressions During Election Using NLP. In 2021 International Conference on Asian Language Processing (IALP). IEEE, Yantai, China, 385--390. https://doi.org/10.1109/IALP54817.2021.9675186
[32]
Fatemeh Tahmasbi, Leonard Schild, Chen Ling, Jeremy Blackburn, Gianluca Stringhini, Yang Zhang, and Savvas Zannettou. 2021. "Go Eat a Bat, Chang!": On the Emergence of Sinophobic Behavior on Web Communities in the Face of COVID-19. In Proceedings of the Web Conference 2021. 1122--1133. https://doi.org/10.1145/3442381.3450024
[33]
Cagatay Neftali Tulu. 2022. Experimental Comparison of Pre-Trained Word Embedding Vectors of Word2Vec, Glove, FastText for Word Level Semantic Text Similarity Measurement in Turkish. Advances in Science and Technology. Research Journal 16, 4 (2022), 147--156.
[34]
Iulia Turc, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. Well-Read Students Learn Better: On the Importance of Pre-training Compact Models. arXiv preprint arXiv:1908.08962v2 (2019).
[35]
Riyaz Wani. 2022. Across South Asia, Online Hate Speech is Increasingly Leading to Real-world Harm. https://www.equaltimes.org/across-south-asia-online-hate?lang=en
[36]
Zeerak Waseem and Dirk Hovy. 2016. Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter. In Proceedings of the NAACL Student Research Workshop. San Diego, California, 88--93.
[37]
Ellery Wulczyn, Nithum Thain, and Lucas Dixon. 2017. Ex Machina: Personal Attacks Seen at Scale. In Proceedings of the 26th International Conference on World Wide Web. Perth, Australia, 1391--1399. https://doi.org/10.1145/3038912.3052591
[38]
Dawei Yin, Zhenzhen Xue, Liangjie Hong, Brian D Davison, April Kontostathis, and Lynne Edwards Edwards. 2009. Detection of Harassment on Web 2.0. In Proceedings of the Content Analysis in the WEB (CAW2.0) Conference. Madrid, Spain, 7 pages.

Cited By

View all
  • (2024)Hyperparameter Tuning of Pre-Trained Architectures for Multi-Modal Cyberbullying DetectionAdvancing Cyber Security Through Quantum Cryptography10.4018/979-8-3693-5961-7.ch017(465-502)Online publication date: 4-Oct-2024

Index Terms

  1. Evaluation of Different Machine Learning and Deep Learning Techniques for Hate Speech Detection

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      ACMSE '24: Proceedings of the 2024 ACM Southeast Conference
      April 2024
      337 pages
      ISBN:9798400702372
      DOI:10.1145/3603287
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 27 April 2024

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. BERT
      2. deep learning
      3. hate speech
      4. machine learning
      5. text classification

      Qualifiers

      • Short-paper
      • Research
      • Refereed limited

      Conference

      ACM SE '24
      Sponsor:
      ACM SE '24: 2024 ACM Southeast Conference
      April 18 - 20, 2024
      GA, Marietta, USA

      Acceptance Rates

      ACMSE '24 Paper Acceptance Rate 44 of 137 submissions, 32%;
      Overall Acceptance Rate 502 of 1,023 submissions, 49%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)79
      • Downloads (Last 6 weeks)6
      Reflects downloads up to 27 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Hyperparameter Tuning of Pre-Trained Architectures for Multi-Modal Cyberbullying DetectionAdvancing Cyber Security Through Quantum Cryptography10.4018/979-8-3693-5961-7.ch017(465-502)Online publication date: 4-Oct-2024

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media