Abstract
The large amount of data generated on social networks makes the task of moderating textual content written by users complex and impossible to do manually. One of the most prominent problems on social networks is racism and xenophobia. Although there are studies of predictive models that make use of natural language processing techniques to detect racist or xenophobic texts, a lack of these has been observed in the Spanish language. In this paper we present a solution based on deep learning models and, more specifically, models based on transfer learning to detect racist and xenophobic messages in Spanish. For this purpose, a dataset obtained from the social network Twitter has been created using data mining techniques and, after a preprocessing, it has been labelled into racist messages and non-racist messages. The trained models are based on BERT and were called BETO and mBERT. Promising results were obtained showing 85.14% accuracy in the best performing model.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Ahmad, M., Aftab, S., Bashir, M.S., Hameed, N.: Sentiment analysis using SVM: a systematic literature review. Int. J. Adv. Comput. Sci. Appl. 9(2), 182–188 (2018). https://doi.org/10.14569/IJACSA.2018.090226
Al-Hassan, A., Al-Dossari, H.: Detection of hate speech in social networks: a survey on multilingual corpus. In: Computer Science & Information Technology (CS & IT), pp. 83–100. AIRCC Publishing Corporation, February 2019. https://doi.org/10.5121/csit.2019.90208
Alotaibi, A., Abul Hasanat, M.H.: Racism detection in Twitter using deep learning and text mining techniques for the Arabic language. In: Proceedings - 2020 1st International Conference of Smart Systems and Emerging Technologies, SMART-TECH 2020, pp. 161–164 (2020). https://doi.org/10.1109/SMART-TECH49988.2020.00047
Anonymous: Finsbury Park attack: son of hire boss held over Facebook post. BBC News (2017). https://www.bbc.co.uk/news/uk-wales-40347813/
del Arco, F.M.P., Molina-González, M.D., Ureña-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, 114120 (2021). https://doi.org/10.1016/j.eswa.2020.114120, https://www.sciencedirect.com/science/article/pii/S095741742030868X
Barlett, C.P.: Anonymously hurting others online: the effect of anonymity on cyberbullying frequency. Psychol. Pop. Media Cult. 4(2), 70–79 (2015). https://doi.org/10.1037/a0034335
Basile, V., et al.: SemEval-2019 task 5: multilingual detection of hate speech against immigrants and women in Twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, Minnesota, USA, pp. 54–63. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/S19-2007
Bisht, A., Singh, A., Bhadauria, H.S., Virmani, J., Kriti: Detection of hate speech and offensive language in Twitter data using LSTM model, pp. 243–264. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-2740-1_17
Br Ginting, P.S., Irawan, B., Setianingsih, C.: Hate speech detection on Twitter using multinomial logistic regression classification method. In: 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp. 105–111 (2019). https://doi.org/10.1109/IoTaIS47347.2019.8980379
Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)
Chaudhry, I.: Hashtagging hate: using Twitter to track racism online. First Monday, vol. 20, no. 2 (2015). https://doi.org/10.5210/fm.v20i2.5450https://journals.uic.edu/ojs/index.php/fm/article/view/5450
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Criss, S., Michaels, E.K., Solomon, K., Allen, A.M., Nguyen, T.T.: Twitter fingers and echo chambers: exploring expressions and experiences of online racism using Twitter. J. Racial Ethn. Health Disparities 8(5), 1322–1331 (2020). https://doi.org/10.1007/s40615-020-00894-5
Del Vigna, F., Cimino, A., Dell’Orletta, F., Petrocchi, M., Tesconi, M.: Hate me, hate me not: hate speech detection on Facebook. CEUR Workshop Proc. 1816, 86–95 (2017)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/N19-1423
de los diputados, C., Government, S., October 2020. https://www.parlamento2030.es/initiatives/3381886de6b06a9ab93ac0bed74cbc61d9259c1c
Garcia, K., Berton, L.: Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl. Soft Comput. 101, 107057 (2021). https://doi.org/10.1016/j.asoc.2020.107057, https://www.sciencedirect.com/science/article/pii/S1568494620309959
García Nieto, P.J., García-Gonzalo, E., Paredes-Sánchez, J.P., Bernardo Sánchez, A., Menéndez Fernández, M.: Predictive modelling of the higher heating value in biomass torrefaction for the energy treatment process using machine-learning techniques. Neural Comput. Appl. 31(12), 8823–8836 (2019). https://doi.org/10.1007/s00521-018-3870-x
Hasan, M.R., Maliha, M., Arifuzzaman, M.: Sentiment analysis with NLP on Twitter data. In: 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), pp. 1–4 (2019). https://doi.org/10.1109/IC4ME247184.2019.9036670
Kalaivani, A., Thenmozhi, D.: SSN_NLP_MLRG at SemEval-2020 task 12: offensive language identification in English, Danish, Greek using BERT and machine learning approach. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona, pp. 2161–2170. International Committee for Computational Linguistics (online), December 2020. https://aclanthology.org/2020.semeval-1.287
Kumar, P., Singh, A., Kumar, P., Kumar, C.: An explainable machine learning approach for definition extraction. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) MIND 2020. CCIS, vol. 1241, pp. 145–155. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6318-8_13
Lakshmi, R., Divya, S.R.B., Valarmathi, R.: Analysis of sentiment in Twitter using logistic regression. Int. J. Eng. Technol. 7(233), 619–621 (2018). https://doi.org/10.14419/ijet.v7i2.33.14849
Menéndez García, L.A., Sánchez Lasheras, F., García Nieto, P.J., Álvarez de Prado, L., Bernardo Sánchez, A.: Predicting benzene concentration using machine learning and time series algorithms. Mathematics 8(12), 2205 (2020). https://doi.org/10.3390/math8122205
Nedjah, N., Santos, I., de Macedo Mourelle, L.: Sentiment analysis using convolutional neural network via word embeddings. Evol. Intell. (2019). https://doi.org/10.1007/s12065-019-00227-4
Paetzold, G.H., Zampieri, M., Malmasi, S.: UTFPR at SemEval-2019 task 5: hate speech identification with recurrent neural networks. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, Minnesota, USA, pp. 519–523. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/S19-2093
Pereira-Kohatsu, J.C., Quijano-Sánchez, L., Liberatore, F., Camacho-Collados, M.: Detecting and monitoring hate speech in Twitter. Sensors 19(21) (2019). https://doi.org/10.3390/s19214654
Plaza-Del-Arco, F.M., Molina-González, M.D., Ureña López, L.A., Martín-Valdivia, M.T.: Detecting misogyny and xenophobia in Spanish tweets using language technologies. ACM Trans. Internet Technol. 20(2) (2020). https://doi.org/10.1145/3369869
Rastogi, S., Bansal, D.: Visualization of Twitter sentiments on Kashmir territorial conflict. Cybern. Syst. 52, 642–669 (2021). https://doi.org/10.1080/01969722.2021.1949520
Rodríguez Maeso, S.: “Europe’’ and the narrative of the “true racist’’: (un-)thinking anti-discrimination law through race. Oñati Socio-Legal Ser. 8(6), 845–873 (2018). https://doi.org/10.35295/osls.iisl/0000-0000-0000-0974
Roy, P.K., Tripathy, A.K., Das, T.K., Gao, X.: A framework for hate speech detection using deep convolutional neural network. IEEE Access 8, 204951–204962 (2020)
Saha, B.N., Senapati, A., Mahajan, A.: LSTM based deep RNN architecture for election sentiment analysis from Bengali newspaper. In: 2020 International Conference on Computational Performance Evaluation (ComPE), pp. 564–569 (2020). https://doi.org/10.1109/ComPE49325.2020.9200062
Sayan, P.: Enforcement of the anti-racism legislation of the European Union against antigypsyism. Ethnic Racial Stud. 42(5), 763–781 (2019). https://doi.org/10.1080/01419870.2018.1468568
Singh, M., Bansal, D., Sofat, S.: Who is who on Twitter-spammer, fake or compromised account? A tool to reveal true identity in real-time. Cybern. Syst. 49(1), 1–25 (2018). https://doi.org/10.1080/01969722.2017.1412866
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Talita, A., Wiguna, A.: Implementasi algoritma long short-term memory (LSTM) untuk mendeteksi ujaran kebencian (hate speech) pada kasus pilpres 2019. MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer 19(1), 37–44 (2019). https://doi.org/10.30812/matrik.v19i1.495
Travis, A.: Anti-Muslim hate crime surges after Manchester and London bridge. The Guardian (2017). https://www.theguardian.com/society/2017/jun/20/anti-muslim-hate-surges-after-manchester-and-london-bridge-attacks
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825–13835 (2018). https://doi.org/10.1109/ACCESS.2018.2806394
Zhang, Z., Luo, L.: Hate speech detection: a solved problem? The challenging case of long tail on Twitter. Semantic Web 10(5), 925–945 (2019). https://doi.org/10.3233/SW-180338
Funding
This research was funded by the Junta de Castilla y León grant number LE014G18.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Benitez-Andrades, J.A. et al. (2022). BERT Model-Based Approach for Detecting Racism and Xenophobia on Twitter Data. In: Garoufallou, E., Ovalle-Perandones, MA., Vlachidis, A. (eds) Metadata and Semantic Research. MTSR 2021. Communications in Computer and Information Science, vol 1537. Springer, Cham. https://doi.org/10.1007/978-3-030-98876-0_13
Download citation
DOI: https://doi.org/10.1007/978-3-030-98876-0_13
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98875-3
Online ISBN: 978-3-030-98876-0
eBook Packages: Computer ScienceComputer Science (R0)