Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3605098.3635964acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article

A Large Language Model Approach to Detect Hate Speech in Political Discourse Using Multiple Language Corpora

Published: 21 May 2024 Publication History

Abstract

In this era of unprecedented digital connectivity and interactions, the issue of hate speech has become a focal point in societal discussions. The rise of digital communication platforms has fundamentally transformed how hate speech spreads. Online social media and messaging apps have rapidly disseminated hate speech, exacerbated by the internet's anonymity. Computational technology has emerged as a valuable tool for identifying and mitigating hate speech on social media. In this work, we employed five distinct corpora representing the English, Italian, Filipino, German, and Turkish languages. We propose employing a Large Language Model (GPT-3) enhanced with Cross-Lingual Learning to improve hate speech detection in English and Italian. Our investigation employs a strategy, namely JL/CL+, which combines two strategies: Joint Learning (JL) and Cascade Learning (CL). Even using data with lexical disparities, our findings demonstrate substantial success, yielding an F1-score of 96.58% for English and 92.05% for Italian languages.

References

[1]
OpenAI [n. d.]. OpenAI. OpenAI. https://openai.com
[2]
Asai A., Kudugunta S., Yu X. V., Blevins T., Gonen H., and Reid M. et al. 2023. BUFFET: Benchmarking Large Language Models for Few-shot Cross-lingual Transfer. In arXiv preprint arXiv:2305.14857.
[3]
Duzha A., Casadei C., and Tosi M. et al. 2021. Hate versus politics: detection of hate against policy makers in Italian tweets. In SN Soc Sci 1, 223 (2021).
[4]
Schioppa A., Garcia X., and Firat O. 2023. Cross-Lingual Supervision improves Large Language Models Pre-training. In arXiv preprint arXiv:2305.11778.
[5]
I. Bigoulaeva, V. Hangya, and A. Fraser. 2021. Cross-lingual transfer learning for hate speech detection. In in Proceedings of the First Workshop on Language Technology for Equality, Diversity and Inclusion, LT-EDI@EACL 2021, Online, April 19, 2021, Association for Computational Linguistics, 2021, pp. 15--25. URL: https://www.aclweb.org/anthology/2021.ltedi-1.3/.
[6]
Neil Vicente Cabasag, Vicente Raphael Chan, Sean Christian Lim, Mark Edward Gonzales, and Charibeth Cheng. 2019. Hate speech in philippine election-related tweets: Automatic detection and classification using natural language processing. In In Philippine Computing Journal, XIV No. 1 August.
[7]
V. Chang, B. Gobinathan, A. Pinagapan, and S. Kannan. 2021. Automatic Detection of Cyberbullying using multi-feature based Artificial Intelligence with Deep Decision Tree Classification. In Computers and Electrical Engineering, Vol.92, pp. 1--17.
[8]
M. Corazza, S. Menini, E. Cabrio, S. Tonelli, and S. Villata. 2020. A multilingual evaluation for online hate speech detection. In volume 20, 2020, pp. 10:110:22.
[9]
Aillkeen de Oliveira., Cláudio Baptista., Anderson Firmino., and Anselmo Cardoso de Paiva. 2023. Using Multilingual Approach in Cross-Lingual Transfer Learning to Improve Hate Speech Detection. In Proceedings of the 25th International Conference on Enterprise Information Systems - Volume 1: ICEIS. INSTICC, SciTePress, 374--384.
[10]
F. M. P. del Arco, M. D. Molina-González, L. A. U. López, and M. T. M. Valdivia. 2021. Comparing pre-trained language models for spanish hate speech detection. In volume 166, p. 114120. . doi: 10.1016/j.eswa.2020.114120.
[11]
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Association for Computational Linguistics, Minneapolis, Minnesota, 4171--4186.
[12]
Anderson Almeida Firmino, Cláudio de Souza Baptista, and Anselmo Cardoso de Paiva. 2024. Improving Hate Speech Detection using Cross-Lingual Learning. Expert Systems with Applications 235 (2024), 121115.
[13]
P. Fortuna and S. Nunes. 2018. A survey on automatic detection of hate speech in text. In ACM Computing Surveys (CSUR) volume 51, 2018, pp. 85:1--85:30. . doi: 10.1145/3232676.
[14]
S. Frenda, B. Ghanem, M. Montes y Gómez, and P. Rosso. 2019. Online hate speech against women: Automatic identification of misogyny and sexism on twitter. In volume 36, 2019, pp. 4743--4752. doi: 10.3233/JIFS-179023.
[15]
C. GOUTTE and E. GAUSSIER. 2005. A probabilistic interpretation of precision, recall and <i>f</i>-score, with implication for evaluation. In In: Proceedings of the 27th European Conference on Advances in Information Retrieval Research. Berlin, Heidelberg: Springer-Verlag. (ECIR'05), p. 345--359. ISBN 3540252959.
[16]
Lara Grimminger and Roman Klinger. 2021. Hate Towards the Political Opponent: A Twitter Corpus Study of the 2020 US Elections on the Basis of Offensive Speech and Stance Detection. In In Proceedings of the Eleventh Workshop on Computational Approaches to Subjectivity, Sentiment and Social Media Analysis, pages 171--180, Online. Association for Computational Linguistics.
[17]
S. Hewitt, T. Tiropanis, and C. Bokhove. 2016. The problem of identifying misogynist language on twitter (and other online social spaces). In in: Proceedings of the 8th ACM Conference on Web Science, pp. 333--335.
[18]
G. Lample, A. Conneau, M. Ranzato, L. Denoyer, and H. Jégou. 2018. Word translation without parallel data. In in: International Conference on Learning Representations, 2018. URL: https://openreview.net/forum?id=H196sainb.
[19]
M. F. López-Vizcaíno, F. J. Nóvoa, V. Carneiro, and F. Cacheda. 2021. Early Detection of Cyberbullying on Social Media Networks. In Future Generation Computer Systems (118), pp. 219--229.
[20]
K. Maity, S. Bhattacharya, S. Saha, and M. Seera. 2023. A Deep Learning Framework for the Detection of Malay Hate Speech. In In IEEE Access, vol. 11, pp. 79542--79552, 2023,
[21]
B. Mathew, R. Dutt, P. Goyal, and A. Mukherjee. 2019. Spread of hate speech in online social media. In in Proceedings of the 10th ACM conference on web science, pp. 173--182.
[22]
M. Mladenovic, V. Osmjanski, and S. Vujicic Stankovic. 2021. Cyber-aggression, cyberbullying, and cyber-grooming: A survey and research challenges. In ACM Computing Surveys 54 (2021) 1:1--1:42.
[23]
M. Mondal, L.A. Silva, D. Correa, and F. Benevenuto. 2018. Characterizing usage of explicit hate expressions in social media. In New Review of Hypermedia and Multimedia 24, 110--130.
[24]
E. W. Pamungkas, V. Basile, and V. Patti. 2021. A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection. In volume 58, 2021, p. 102544. URL: https://www.sciencedirect.com/science/article/pii/S0306457321000510.
[25]
M. Pikuliak, M. Simko, and M. Bieliková. 2021. Cross-lingual learning for text processing: A survey. In volume 165, 2021, p. 113765. doi:10.1016/j.eswa.2020.113765.
[26]
Silvan Schweter. 2020. Italian BERT and ELECTRA Models.
[27]
De Smedt, Tom, and Sylvia Jaki. 2018. The Polly corpus: Online political debate in Germany. In Proceedings of the 6th Conference on Computer-Mediated Communication (CMC) and Social Media Corpora (CMC-corpora 2018).
[28]
Cagri Toraman, Furkan Şahinuç, and Eyup Halit Yilmaz. 2022. Large-Scale Hate Speech Detection with Cross-Domain Transfer. In Proceedings of the Language Resources and Evaluation Conference, June 2022, Marseille, France, European Language Resources Association, pp. 2215--2225, https://aclanthology.org/2022.lrec-1.238.
[29]
F. D. Vigna, A. Cimino, F. Dell'Orletta, M. Petrocchi, and M. Tesconi. 2017. Hate me, hate me not: Hate speech detection on facebook. In in: A. Armando, R. Baldoni, R. Focardi (Eds.), Proceedings of the First Italian Conference on Cybersecurity (ITASEC17), Venice, Italy, January 17--20, 2017, volume 1816 of CEUR Workshop Proceedings, CEUR-WS.org, 2017, pp. 86--95. URL: http://ceur-ws.org/Vol-1816/paper-09.pdf.
[30]
Z. Waseem and D. Hovy. 2016. Hateful symbols or hateful people? predictive features for hate speech detection on twitter. In in Proceedings of the Student Research Workshop, SRW@HLT-NAACL 2016, The 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego California, USA, June 12--17, 2016, The Association for Computational Linguistics, 2016, pp. 88--93.
[31]
Chuanpeng Yang, Fuqing Zhu, Guihua Liu, Jizhong Han, and Songlin Hu. 2022. Multimodal Hate Speech Detection via Cross-Domain Knowledge Transfer. In In Proceedings of the 30th ACM International Conference on Multimedia (MM '22). Association for Computing Machinery, New York, NY, USA, 4505--4514.
[32]
Hailemariam Mehari Yohannes and Toshiyuki Amagasa. 2022. Named-entity recognition for a low-resource language using pre-trained language model. In In Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing (SAC '22). Association for Computing Machinery, New York, NY, USA, 837--844.
[33]
E. Zhang, Y. Zhang, and F-Measure. 2009. Springer US, Boston, MA, 2009, pp. 1147--1147.

Index Terms

  1. A Large Language Model Approach to Detect Hate Speech in Political Discourse Using Multiple Language Corpora

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SAC '24: Proceedings of the 39th ACM/SIGAPP Symposium on Applied Computing
    April 2024
    1898 pages
    ISBN:9798400702433
    DOI:10.1145/3605098
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 May 2024

    Check for updates

    Author Tags

    1. hate speech
    2. large language model
    3. cross-lingual learning
    4. machine learning
    5. natural language processing

    Qualifiers

    • Research-article

    Conference

    SAC '24
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 45
      Total Downloads
    • Downloads (Last 12 months)45
    • Downloads (Last 6 weeks)29
    Reflects downloads up to 02 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media