Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Detecting Homophobic Speech in Soccer Tweets Using Large Language Models and Explainable AI

  • Conference paper
  • First Online:
Social Networks Analysis and Mining (ASONAM 2024)

Abstract

Homophobic speech is a form of hate speech. Social media enables hate speech to spread rapidly and widely through the internet, and unlike offline hate speech, can persist indefinitely, thereby prolonging its impact. Due to the adverse impact of hate speech, policymakers have called for greater action from online platforms to moderate and remove hate speech, including homophobic content. While homophobic hate speech is prevalent in online soccer discourses, there are few studies on this empirical context in general and specifically on the use of Large Language Models (LLMs) for detecting such speech. This study addresses this gap by proposing a homophobic speech text classification pipeline. We introduce H-DICT, a new general dictionary for identifying potential homophobic content in documents, and leverage this dictionary to curate and manually label an annotated dataset of homophobic and non-homophobic samples from the UEFA European Football Championships (the Euros) discourse on Twitter. We fine-tune and evaluate five large language models (LLMs) based on the BERT architecture - BERT, DistilBERT, RoBERTa, BERT Hate, and RoBERTa Offensive - and use Integrated Gradients, an explainable AI technique to explain each model’s predictions. RoBERTa Offensive, an LLM fine-tuned specifically for detecting offensive language, presented the best performance when compared to the other LLMs.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 74.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://hatebase.org/.

  2. 2.

    https://github.com/GutoL/H-DICT.

  3. 3.

    https://huggingface.co/bert-base-uncased.

  4. 4.

    https://huggingface.co/distilbert/distilbert-base-uncased.

  5. 5.

    https://huggingface.co/FacebookAI/roberta-base.

  6. 6.

    https://huggingface.co/IMSyPP/hate_speech_en.

  7. 7.

    https://huggingface.co/cardiffnlp/twitter-roberta-base-offensive.

References

  1. Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)

  2. Anjum, Katarya, R.: Hate speech, toxicity detection in online social media: a recent survey of state of the art and opportunities. Int. J. Inf. Secur. 1–32 (2023)

    Google Scholar 

  3. Barbieri, F., Camacho-Collados, J., Neves, L., Espinosa-Anke, L.T.: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2020.12421 (2020)

  4. Bilewicz, M., Soral, W.: Hate speech epidemic. the dynamic effects of derogatory language on intergroup relations and political radicalization. Polit. Psychol. 41, 3–33 (2020)

    Google Scholar 

  5. Billings, A.C.: Defining Sport Communication. Taylor & Francis (2016)

    Google Scholar 

  6. Chakravarthi, B.R.: Detection of homophobia and transphobia in youtube comments. Int. J. Data Sci. Anal. 1–20 (2023)

    Google Scholar 

  7. Chakravarthi, B.R., Hande, A., Ponnusamy, R., Kumaresan, P.K., Priyadharshini, R.: How can we detect homophobia and transphobia? experiments in a multilingual code-mixed setting for social media governance. Int. J. Inf. Manag. Data Insights 2(2), 100119 (2022)

    Google Scholar 

  8. Chakravarthi, B.R., et al.: Dataset for identification of homophobia and transophobia in multilingual youtube comments. arXiv preprint arXiv:2109.00227 (2021)

  9. Chanda, S., Mishra, A., Pal, S.: Sentiment analysis and homophobia detection of code-mixed dravidian languages leveraging pre-trained model and word-level language tag. In: Working Notes of FIRE 2022-Forum for Information Retrieval Evaluation (Hybrid). CEUR (2022)

    Google Scholar 

  10. Chang, Y., et al.: A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. (2023)

    Google Scholar 

  11. Chiu, K.L., Collins, A., Alexander, R.: Detecting hate speech with GPT-3. arXiv preprint arXiv:2103.12407 (2021)

  12. Cleland, J., MacDonald, C.: Social media, digital technology, and masculinity in sport. In: Sport, Social Media, and Digital Technology: Sociological Approaches, pp. 49–66. Emerald Publishing Limited (2022)

    Google Scholar 

  13. Council of Europe: Combating hate speech. Council of Europe (2022)

    Google Scholar 

  14. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)

  15. European Union Agency for Fundamental Rights: Homophobia and discrimination on grounds of sexual orientation and gender identity in the EU member states: Part II-The social situation. European Union Agency for Fundamental Rights (2009)

    Google Scholar 

  16. FBI: FBI releases 2022 crime in the nation statistics. FBI (2023)

    Google Scholar 

  17. Fenton, A., Keegan, B.J., Parry, K.D.: Understanding sporting social media brand communities, place and social capital: a netnography of football fans. Commun. Sport 11(2), 313–333 (2023)

    Article  Google Scholar 

  18. García-Díaz, J.A., Jiménez-Zafra, S.M., Valencia-García, R.: Umuteam at homo-mex 2023: fine-tuning large language models integration for solving hate-speech detection in Mexican Spanish (2023)

    Google Scholar 

  19. Glynn, E., Brown, D.H.: Discrimination on football twitter: the role of humour in the othering of minorities. Sport Soc. 26(8), 1432–1454 (2023)

    Article  MATH  Google Scholar 

  20. Gupta, P., Gandhi, S., Chakravarthi, B.R.: Leveraging transfer learning techniques-bert, roberta, albert and distilbert for fake review detection. In: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation, pp. 75–82 (2021)

    Google Scholar 

  21. Jahan, M.S., Oussalah, M.: A systematic review of hate speech automatic detection using natural language processing. Neurocomputing 126232 (2023)

    Google Scholar 

  22. Jiang, A.Q., et al.: Mistral 7b. arXiv preprint arXiv:2310.06825 (2023)

  23. Kearns, C., et al.: A scoping review of research on online hate and sport. Commun. Sport 11(2), 402–430 (2023)

    Article  MATH  Google Scholar 

  24. Kralj Novak, P., Scantamburlo, T., Pelicon, A., Cinelli, M., Mozetič, I., Zollo, F.: Handling disagreement in hate speech modelling. In: International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 681–695. Springer, Cham (2022)

    Google Scholar 

  25. Kurniasih, A., Manik, L.P.: On the role of text preprocessing in bert embedding-based DNNs for classifying informal texts. Neuron 1024(512), 927–34 (2022)

    Google Scholar 

  26. Lavric, E., Pisek, G., Skinner, A., Stadler, W.: The linguistics of football, vol. 38. Narr Francke Attempto Verlag (2008)

    Google Scholar 

  27. Lee, J.S., Hsiang, J.: Patent classification by fine-tuning bert language model. World Patent Inf. 61, 101965 (2020)

    Article  MATH  Google Scholar 

  28. Leets, L., Giles, H.: Words as weapons–when do they wound? Investigations of harmful speech. Hum. Commun. Res. 24(2), 260–301 (1997)

    Article  MATH  Google Scholar 

  29. Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  30. Magrath, R.: ‘To try and gain an advantage for my team’: homophobic and homosexually themed chanting among english football fans. Sociology 52(4), 709–726 (2018)

    Article  Google Scholar 

  31. Mozafari, M., Farahbakhsh, R., Crespi, N.: A bert-based transfer learning approach for hate speech detection in online social media. In: Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8, pp. 928–940. Springer, Cham (2020)

    Google Scholar 

  32. Mullen, B., Smyth, J.M.: Immigrant suicide rates as a function of ethnophaulisms: hate speech predicts death. Psychosom. Med. 66(3), 343–348 (2004)

    Google Scholar 

  33. Murarka, A., Radhakrishnan, B., Ravichandran, S.: Detection and classification of mental illnesses on social media using roberta. arXiv preprint arXiv:2011.11226 (2020)

  34. Nilsson, F., Al-Azzawi, S.S.S., Kovács, G.: Leveraging sentiment data for the detection of homophobic/transphobic content in a multi-task, multi-lingual setting using transformers. In: 14th Forum for Information Retrieval Evaluation, FIRE 2022, 9–13 December 2022, Kolkata, India, vol. 3395, pp. 196–207. CEUR-WS (2022)

    Google Scholar 

  35. Polders, L.A.: Factors affecting vulnerability to depression among gay men and lesbian women. Ph.D. thesis, University of South Africa (2006)

    Google Scholar 

  36. Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)

  37. Santos, G.L., et al.: Kicking prejudice: large language models for racism classification in soccer discourse on social media. In: International Conference on Advanced Information Systems Engineering, pp. 547–562. Springer, Cham (2024)

    Google Scholar 

  38. dos Santos, V.G., Santos, G.L., Lynn, T., Benatallah, B.: Identifying citizen-related issues from social media using LLM-based data augmentation. In: International Conference on Advanced Information Systems Engineering, pp. 531–546. Springer, Cham (2024)

    Google Scholar 

  39. Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)

    Google Scholar 

  40. Team, G., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

  41. Tourkochoriti, I.: The digital services act and the EU as the global regulator of the internet. Chi. J. Int. L. 24, 129 (2023)

    MATH  Google Scholar 

  42. Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)

  43. Vásquez, J., Andersen, S., Bel-Enguix, G., Gómez-Adorno, H., Ojeda-Trueba, S.L.: Homo-mex: a Mexican Spanish annotated corpus for LGBT+ phobia detection on twitter. In: The 7th Workshop on Online Abuse and Harms (WOAH), pp. 202–214 (2023)

    Google Scholar 

  44. Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)

    Google Scholar 

  45. Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)

  46. Zochniak, K., Lewicka, O., Wybrańska, Z., Bilewicz, M.: Homophobic hate speech affects well-being of highly identified LGBT people. J. Lang. Soc. Psychol. 0261927X231174569 (2023)

    Google Scholar 

  47. Stefăniță, O., Buf, D.M.: Hate speech in social media and its effects on the LGBT community: a review of the current research. Rom. J. Commun. Public Relations 23(1), 47–55 (2021)

    Google Scholar 

Download references

Acknowledgment

The research in this paper was partially funded by the UK Arts and Humanities Research Council and the Irish Research Council (Grant Number AH/W001624/1) and the Federation Internationale de l’Automobile.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Guto Leoni Santos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Santos, G.L. et al. (2025). Detecting Homophobic Speech in Soccer Tweets Using Large Language Models and Explainable AI. In: Aiello, L.M., Chakraborty, T., Gaito, S. (eds) Social Networks Analysis and Mining. ASONAM 2024. Lecture Notes in Computer Science, vol 15211. Springer, Cham. https://doi.org/10.1007/978-3-031-78541-2_30

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-78541-2_30

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-78540-5

  • Online ISBN: 978-3-031-78541-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics