Abstract
Homophobic speech is a form of hate speech. Social media enables hate speech to spread rapidly and widely through the internet, and unlike offline hate speech, can persist indefinitely, thereby prolonging its impact. Due to the adverse impact of hate speech, policymakers have called for greater action from online platforms to moderate and remove hate speech, including homophobic content. While homophobic hate speech is prevalent in online soccer discourses, there are few studies on this empirical context in general and specifically on the use of Large Language Models (LLMs) for detecting such speech. This study addresses this gap by proposing a homophobic speech text classification pipeline. We introduce H-DICT, a new general dictionary for identifying potential homophobic content in documents, and leverage this dictionary to curate and manually label an annotated dataset of homophobic and non-homophobic samples from the UEFA European Football Championships (the Euros) discourse on Twitter. We fine-tune and evaluate five large language models (LLMs) based on the BERT architecture - BERT, DistilBERT, RoBERTa, BERT Hate, and RoBERTa Offensive - and use Integrated Gradients, an explainable AI technique to explain each model’s predictions. RoBERTa Offensive, an LLM fine-tuned specifically for detecting offensive language, presented the best performance when compared to the other LLMs.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
- 3.
- 4.
- 5.
- 6.
- 7.
References
Achiam, J., et al.: GPT-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
Anjum, Katarya, R.: Hate speech, toxicity detection in online social media: a recent survey of state of the art and opportunities. Int. J. Inf. Secur. 1–32 (2023)
Barbieri, F., Camacho-Collados, J., Neves, L., Espinosa-Anke, L.T.: Unified benchmark and comparative evaluation for tweet classification. arXiv preprint arXiv:2020.12421 (2020)
Bilewicz, M., Soral, W.: Hate speech epidemic. the dynamic effects of derogatory language on intergroup relations and political radicalization. Polit. Psychol. 41, 3–33 (2020)
Billings, A.C.: Defining Sport Communication. Taylor & Francis (2016)
Chakravarthi, B.R.: Detection of homophobia and transphobia in youtube comments. Int. J. Data Sci. Anal. 1–20 (2023)
Chakravarthi, B.R., Hande, A., Ponnusamy, R., Kumaresan, P.K., Priyadharshini, R.: How can we detect homophobia and transphobia? experiments in a multilingual code-mixed setting for social media governance. Int. J. Inf. Manag. Data Insights 2(2), 100119 (2022)
Chakravarthi, B.R., et al.: Dataset for identification of homophobia and transophobia in multilingual youtube comments. arXiv preprint arXiv:2109.00227 (2021)
Chanda, S., Mishra, A., Pal, S.: Sentiment analysis and homophobia detection of code-mixed dravidian languages leveraging pre-trained model and word-level language tag. In: Working Notes of FIRE 2022-Forum for Information Retrieval Evaluation (Hybrid). CEUR (2022)
Chang, Y., et al.: A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. (2023)
Chiu, K.L., Collins, A., Alexander, R.: Detecting hate speech with GPT-3. arXiv preprint arXiv:2103.12407 (2021)
Cleland, J., MacDonald, C.: Social media, digital technology, and masculinity in sport. In: Sport, Social Media, and Digital Technology: Sociological Approaches, pp. 49–66. Emerald Publishing Limited (2022)
Council of Europe: Combating hate speech. Council of Europe (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
European Union Agency for Fundamental Rights: Homophobia and discrimination on grounds of sexual orientation and gender identity in the EU member states: Part II-The social situation. European Union Agency for Fundamental Rights (2009)
FBI: FBI releases 2022 crime in the nation statistics. FBI (2023)
Fenton, A., Keegan, B.J., Parry, K.D.: Understanding sporting social media brand communities, place and social capital: a netnography of football fans. Commun. Sport 11(2), 313–333 (2023)
García-Díaz, J.A., Jiménez-Zafra, S.M., Valencia-García, R.: Umuteam at homo-mex 2023: fine-tuning large language models integration for solving hate-speech detection in Mexican Spanish (2023)
Glynn, E., Brown, D.H.: Discrimination on football twitter: the role of humour in the othering of minorities. Sport Soc. 26(8), 1432–1454 (2023)
Gupta, P., Gandhi, S., Chakravarthi, B.R.: Leveraging transfer learning techniques-bert, roberta, albert and distilbert for fake review detection. In: Proceedings of the 13th Annual Meeting of the Forum for Information Retrieval Evaluation, pp. 75–82 (2021)
Jahan, M.S., Oussalah, M.: A systematic review of hate speech automatic detection using natural language processing. Neurocomputing 126232 (2023)
Jiang, A.Q., et al.: Mistral 7b. arXiv preprint arXiv:2310.06825 (2023)
Kearns, C., et al.: A scoping review of research on online hate and sport. Commun. Sport 11(2), 402–430 (2023)
Kralj Novak, P., Scantamburlo, T., Pelicon, A., Cinelli, M., Mozetič, I., Zollo, F.: Handling disagreement in hate speech modelling. In: International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, pp. 681–695. Springer, Cham (2022)
Kurniasih, A., Manik, L.P.: On the role of text preprocessing in bert embedding-based DNNs for classifying informal texts. Neuron 1024(512), 927–34 (2022)
Lavric, E., Pisek, G., Skinner, A., Stadler, W.: The linguistics of football, vol. 38. Narr Francke Attempto Verlag (2008)
Lee, J.S., Hsiang, J.: Patent classification by fine-tuning bert language model. World Patent Inf. 61, 101965 (2020)
Leets, L., Giles, H.: Words as weapons–when do they wound? Investigations of harmful speech. Hum. Commun. Res. 24(2), 260–301 (1997)
Liu, Y., et al.: Roberta: a robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Magrath, R.: ‘To try and gain an advantage for my team’: homophobic and homosexually themed chanting among english football fans. Sociology 52(4), 709–726 (2018)
Mozafari, M., Farahbakhsh, R., Crespi, N.: A bert-based transfer learning approach for hate speech detection in online social media. In: Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8, pp. 928–940. Springer, Cham (2020)
Mullen, B., Smyth, J.M.: Immigrant suicide rates as a function of ethnophaulisms: hate speech predicts death. Psychosom. Med. 66(3), 343–348 (2004)
Murarka, A., Radhakrishnan, B., Ravichandran, S.: Detection and classification of mental illnesses on social media using roberta. arXiv preprint arXiv:2011.11226 (2020)
Nilsson, F., Al-Azzawi, S.S.S., Kovács, G.: Leveraging sentiment data for the detection of homophobic/transphobic content in a multi-task, multi-lingual setting using transformers. In: 14th Forum for Information Retrieval Evaluation, FIRE 2022, 9–13 December 2022, Kolkata, India, vol. 3395, pp. 196–207. CEUR-WS (2022)
Polders, L.A.: Factors affecting vulnerability to depression among gay men and lesbian women. Ph.D. thesis, University of South Africa (2006)
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
Santos, G.L., et al.: Kicking prejudice: large language models for racism classification in soccer discourse on social media. In: International Conference on Advanced Information Systems Engineering, pp. 547–562. Springer, Cham (2024)
dos Santos, V.G., Santos, G.L., Lynn, T., Benatallah, B.: Identifying citizen-related issues from social media using LLM-based data augmentation. In: International Conference on Advanced Information Systems Engineering, pp. 531–546. Springer, Cham (2024)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: International Conference on Machine Learning, pp. 3319–3328. PMLR (2017)
Team, G., et al.: Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)
Tourkochoriti, I.: The digital services act and the EU as the global regulator of the internet. Chi. J. Int. L. 24, 129 (2023)
Touvron, H., et al.: Llama 2: open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288 (2023)
Vásquez, J., Andersen, S., Bel-Enguix, G., Gómez-Adorno, H., Ojeda-Trueba, S.L.: Homo-mex: a Mexican Spanish annotated corpus for LGBT+ phobia detection on twitter. In: The 7th Workshop on Online Abuse and Harms (WOAH), pp. 202–214 (2023)
Vaswani, A., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30 (2017)
Wu, Y., et al.: Google’s neural machine translation system: bridging the gap between human and machine translation. arXiv preprint arXiv:1609.08144 (2016)
Zochniak, K., Lewicka, O., Wybrańska, Z., Bilewicz, M.: Homophobic hate speech affects well-being of highly identified LGBT people. J. Lang. Soc. Psychol. 0261927X231174569 (2023)
Stefăniță, O., Buf, D.M.: Hate speech in social media and its effects on the LGBT community: a review of the current research. Rom. J. Commun. Public Relations 23(1), 47–55 (2021)
Acknowledgment
The research in this paper was partially funded by the UK Arts and Humanities Research Council and the Irish Research Council (Grant Number AH/W001624/1) and the Federation Internationale de l’Automobile.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2025 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Santos, G.L. et al. (2025). Detecting Homophobic Speech in Soccer Tweets Using Large Language Models and Explainable AI. In: Aiello, L.M., Chakraborty, T., Gaito, S. (eds) Social Networks Analysis and Mining. ASONAM 2024. Lecture Notes in Computer Science, vol 15211. Springer, Cham. https://doi.org/10.1007/978-3-031-78541-2_30
Download citation
DOI: https://doi.org/10.1007/978-3-031-78541-2_30
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-78540-5
Online ISBN: 978-3-031-78541-2
eBook Packages: Computer ScienceComputer Science (R0)