BERT Model-Based Approach for Detecting Racism and Xenophobia on Twitter Data

Benitez-Andrades, José Alberto; González-Jiménez, Álvaro; López-Brea, Álvaro; Benavides, Carmen; Aveleira-Mata, Jose; Alija-Pérez, José-Manuel; García-Ordás, María Teresa

doi:10.1007/978-3-030-98876-0_13

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1537))

Included in the following conference series:

Research Conference on Metadata and Semantics Research

930 Accesses

Abstract

The large amount of data generated on social networks makes the task of moderating textual content written by users complex and impossible to do manually. One of the most prominent problems on social networks is racism and xenophobia. Although there are studies of predictive models that make use of natural language processing techniques to detect racist or xenophobic texts, a lack of these has been observed in the Spanish language. In this paper we present a solution based on deep learning models and, more specifically, models based on transfer learning to detect racist and xenophobic messages in Spanish. For this purpose, a dataset obtained from the social network Twitter has been created using data mining techniques and, after a preprocessing, it has been labelled into racist messages and non-racist messages. The trained models are based on BERT and were called BETO and mBERT. Promising results were obtained showing 85.14% accuracy in the best performing model.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Using Deep Learning to Detect Rumors in Twitter

Detecting Rumors on Social Media Based on a CNN Deep Learning Technique

Article 29 August 2020

Rumour Veracity Estimation with Deep Learning for Twitter

References

Ahmad, M., Aftab, S., Bashir, M.S., Hameed, N.: Sentiment analysis using SVM: a systematic literature review. Int. J. Adv. Comput. Sci. Appl. 9(2), 182–188 (2018). https://doi.org/10.14569/IJACSA.2018.090226
Article Google Scholar
Al-Hassan, A., Al-Dossari, H.: Detection of hate speech in social networks: a survey on multilingual corpus. In: Computer Science & Information Technology (CS & IT), pp. 83–100. AIRCC Publishing Corporation, February 2019. https://doi.org/10.5121/csit.2019.90208
Alotaibi, A., Abul Hasanat, M.H.: Racism detection in Twitter using deep learning and text mining techniques for the Arabic language. In: Proceedings - 2020 1st International Conference of Smart Systems and Emerging Technologies, SMART-TECH 2020, pp. 161–164 (2020). https://doi.org/10.1109/SMART-TECH49988.2020.00047
Anonymous: Finsbury Park attack: son of hire boss held over Facebook post. BBC News (2017). https://www.bbc.co.uk/news/uk-wales-40347813/
del Arco, F.M.P., Molina-González, M.D., Ureña-López, L.A., Martín-Valdivia, M.T.: Comparing pre-trained language models for Spanish hate speech detection. Expert Syst. Appl. 166, 114120 (2021). https://doi.org/10.1016/j.eswa.2020.114120, https://www.sciencedirect.com/science/article/pii/S095741742030868X
Barlett, C.P.: Anonymously hurting others online: the effect of anonymity on cyberbullying frequency. Psychol. Pop. Media Cult. 4(2), 70–79 (2015). https://doi.org/10.1037/a0034335
Article Google Scholar
Basile, V., et al.: SemEval-2019 task 5: multilingual detection of hate speech against immigrants and women in Twitter. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, Minnesota, USA, pp. 54–63. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/S19-2007
Bisht, A., Singh, A., Bhadauria, H.S., Virmani, J., Kriti: Detection of hate speech and offensive language in Twitter data using LSTM model, pp. 243–264. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-2740-1_17
Br Ginting, P.S., Irawan, B., Setianingsih, C.: Hate speech detection on Twitter using multinomial logistic regression classification method. In: 2019 IEEE International Conference on Internet of Things and Intelligence System (IoTaIS), pp. 105–111 (2019). https://doi.org/10.1109/IoTaIS47347.2019.8980379
Cañete, J., Chaperon, G., Fuentes, R., Ho, J.H., Kang, H., Pérez, J.: Spanish pre-trained BERT model and evaluation data. In: PML4DC at ICLR 2020 (2020)
Google Scholar
Chaudhry, I.: Hashtagging hate: using Twitter to track racism online. First Monday, vol. 20, no. 2 (2015). https://doi.org/10.5210/fm.v20i2.5450 https://journals.uic.edu/ojs/index.php/fm/article/view/5450
Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995). https://doi.org/10.1023/A:1022627411411
Article MATH Google Scholar
Criss, S., Michaels, E.K., Solomon, K., Allen, A.M., Nguyen, T.T.: Twitter fingers and echo chambers: exploring expressions and experiences of online racism using Twitter. J. Racial Ethn. Health Disparities 8(5), 1322–1331 (2020). https://doi.org/10.1007/s40615-020-00894-5
Article Google Scholar
Del Vigna, F., Cimino, A., Dell’Orletta, F., Petrocchi, M., Tesconi, M.: Hate me, hate me not: hate speech detection on Facebook. CEUR Workshop Proc. 1816, 86–95 (2017)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, Minnesota, pp. 4171–4186. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/N19-1423
de los diputados, C., Government, S., October 2020. https://www.parlamento2030.es/initiatives/3381886de6b06a9ab93ac0bed74cbc61d9259c1c
Garcia, K., Berton, L.: Topic detection and sentiment analysis in Twitter content related to COVID-19 from Brazil and the USA. Appl. Soft Comput. 101, 107057 (2021). https://doi.org/10.1016/j.asoc.2020.107057, https://www.sciencedirect.com/science/article/pii/S1568494620309959
García Nieto, P.J., García-Gonzalo, E., Paredes-Sánchez, J.P., Bernardo Sánchez, A., Menéndez Fernández, M.: Predictive modelling of the higher heating value in biomass torrefaction for the energy treatment process using machine-learning techniques. Neural Comput. Appl. 31(12), 8823–8836 (2019). https://doi.org/10.1007/s00521-018-3870-x
Article Google Scholar
Hasan, M.R., Maliha, M., Arifuzzaman, M.: Sentiment analysis with NLP on Twitter data. In: 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), pp. 1–4 (2019). https://doi.org/10.1109/IC4ME247184.2019.9036670
Kalaivani, A., Thenmozhi, D.: SSN_NLP_MLRG at SemEval-2020 task 12: offensive language identification in English, Danish, Greek using BERT and machine learning approach. In: Proceedings of the Fourteenth Workshop on Semantic Evaluation, Barcelona, pp. 2161–2170. International Committee for Computational Linguistics (online), December 2020. https://aclanthology.org/2020.semeval-1.287
Kumar, P., Singh, A., Kumar, P., Kumar, C.: An explainable machine learning approach for definition extraction. In: Bhattacharjee, A., Borgohain, S.K., Soni, B., Verma, G., Gao, X.-Z. (eds.) MIND 2020. CCIS, vol. 1241, pp. 145–155. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-6318-8_13
Chapter Google Scholar
Lakshmi, R., Divya, S.R.B., Valarmathi, R.: Analysis of sentiment in Twitter using logistic regression. Int. J. Eng. Technol. 7(233), 619–621 (2018). https://doi.org/10.14419/ijet.v7i2.33.14849
Article Google Scholar
Menéndez García, L.A., Sánchez Lasheras, F., García Nieto, P.J., Álvarez de Prado, L., Bernardo Sánchez, A.: Predicting benzene concentration using machine learning and time series algorithms. Mathematics 8(12), 2205 (2020). https://doi.org/10.3390/math8122205
Nedjah, N., Santos, I., de Macedo Mourelle, L.: Sentiment analysis using convolutional neural network via word embeddings. Evol. Intell. (2019). https://doi.org/10.1007/s12065-019-00227-4
Paetzold, G.H., Zampieri, M., Malmasi, S.: UTFPR at SemEval-2019 task 5: hate speech identification with recurrent neural networks. In: Proceedings of the 13th International Workshop on Semantic Evaluation, Minneapolis, Minnesota, USA, pp. 519–523. Association for Computational Linguistics, June 2019. https://doi.org/10.18653/v1/S19-2093
Pereira-Kohatsu, J.C., Quijano-Sánchez, L., Liberatore, F., Camacho-Collados, M.: Detecting and monitoring hate speech in Twitter. Sensors 19(21) (2019). https://doi.org/10.3390/s19214654
Plaza-Del-Arco, F.M., Molina-González, M.D., Ureña López, L.A., Martín-Valdivia, M.T.: Detecting misogyny and xenophobia in Spanish tweets using language technologies. ACM Trans. Internet Technol. 20(2) (2020). https://doi.org/10.1145/3369869
Rastogi, S., Bansal, D.: Visualization of Twitter sentiments on Kashmir territorial conflict. Cybern. Syst. 52, 642–669 (2021). https://doi.org/10.1080/01969722.2021.1949520
Article Google Scholar
Rodríguez Maeso, S.: “Europe’’ and the narrative of the “true racist’’: (un-)thinking anti-discrimination law through race. Oñati Socio-Legal Ser. 8(6), 845–873 (2018). https://doi.org/10.35295/osls.iisl/0000-0000-0000-0974
Article Google Scholar
Roy, P.K., Tripathy, A.K., Das, T.K., Gao, X.: A framework for hate speech detection using deep convolutional neural network. IEEE Access 8, 204951–204962 (2020)
Article Google Scholar
Saha, B.N., Senapati, A., Mahajan, A.: LSTM based deep RNN architecture for election sentiment analysis from Bengali newspaper. In: 2020 International Conference on Computational Performance Evaluation (ComPE), pp. 564–569 (2020). https://doi.org/10.1109/ComPE49325.2020.9200062
Sayan, P.: Enforcement of the anti-racism legislation of the European Union against antigypsyism. Ethnic Racial Stud. 42(5), 763–781 (2019). https://doi.org/10.1080/01419870.2018.1468568
Article Google Scholar
Singh, M., Bansal, D., Sofat, S.: Who is who on Twitter-spammer, fake or compromised account? A tool to reveal true identity in real-time. Cybern. Syst. 49(1), 1–25 (2018). https://doi.org/10.1080/01969722.2017.1412866
Article Google Scholar
Sun, C., Qiu, X., Xu, Y., Huang, X.: How to fine-tune BERT for text classification? In: Sun, M., Huang, X., Ji, H., Liu, Z., Liu, Y. (eds.) CCL 2019. LNCS (LNAI), vol. 11856, pp. 194–206. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-32381-3_16
Chapter Google Scholar
Talita, A., Wiguna, A.: Implementasi algoritma long short-term memory (LSTM) untuk mendeteksi ujaran kebencian (hate speech) pada kasus pilpres 2019. MATRIK: Jurnal Manajemen, Teknik Informatika dan Rekayasa Komputer 19(1), 37–44 (2019). https://doi.org/10.30812/matrik.v19i1.495
Travis, A.: Anti-Muslim hate crime surges after Manchester and London bridge. The Guardian (2017). https://www.theguardian.com/society/2017/jun/20/anti-muslim-hate-surges-after-manchester-and-london-bridge-attacks
Vaswani, A., et al.: Attention is all you need. In: Guyon, I., et al (eds.) Advances in Neural Information Processing Systems, vol. 30. Curran Associates, Inc. (2017). https://proceedings.neurips.cc/paper/2017/file/3f5ee243547dee91fbd053c1c4a845aa-Paper.pdf
Watanabe, H., Bouazizi, M., Ohtsuki, T.: Hate speech on Twitter: a pragmatic approach to collect hateful and offensive expressions and perform hate speech detection. IEEE Access 6, 13825–13835 (2018). https://doi.org/10.1109/ACCESS.2018.2806394
Article Google Scholar
Zhang, Z., Luo, L.: Hate speech detection: a solved problem? The challenging case of long tail on Twitter. Semantic Web 10(5), 925–945 (2019). https://doi.org/10.3233/SW-180338
Article Google Scholar

Download references

Funding

This research was funded by the Junta de Castilla y León grant number LE014G18.

Author information

Authors and Affiliations

SALBIS Research Group, Department of Electric, Systems and Automatics Engineering, Universidad de León, Campus of Vegazana s/n, 24071, León, Spain
José Alberto Benitez-Andrades & Carmen Benavides
Universidad de León, Campus of Vegazana s/n, 24071, León, Spain
Álvaro González-Jiménez & Álvaro López-Brea
SECOMUCI Research Group, Escuela de Ingenierías Industrial e Informática, Universidad de León, Campus de Vegazana s/n, 24071, León, Spain
Jose Aveleira-Mata, José-Manuel Alija-Pérez & María Teresa García-Ordás

Authors

José Alberto Benitez-Andrades
View author publications
You can also search for this author in PubMed Google Scholar
Álvaro González-Jiménez
View author publications
You can also search for this author in PubMed Google Scholar
Álvaro López-Brea
View author publications
You can also search for this author in PubMed Google Scholar
Carmen Benavides
View author publications
You can also search for this author in PubMed Google Scholar
Jose Aveleira-Mata
View author publications
You can also search for this author in PubMed Google Scholar
José-Manuel Alija-Pérez
View author publications
You can also search for this author in PubMed Google Scholar
María Teresa García-Ordás
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to José Alberto Benitez-Andrades .

Editor information

Editors and Affiliations

International Hellenic University, Thessaloniki, Greece
Emmanouel Garoufallou
Complutense University of Madrid, Madrid, Spain
María-Antonia Ovalle-Perandones
University College London, London, UK
Andreas Vlachidis

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Benitez-Andrades, J.A. et al. (2022). BERT Model-Based Approach for Detecting Racism and Xenophobia on Twitter Data. In: Garoufallou, E., Ovalle-Perandones, MA., Vlachidis, A. (eds) Metadata and Semantic Research. MTSR 2021. Communications in Computer and Information Science, vol 1537. Springer, Cham. https://doi.org/10.1007/978-3-030-98876-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-98876-0_13
Published: 01 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98875-3
Online ISBN: 978-3-030-98876-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

BERT Model-Based Approach for Detecting Racism and Xenophobia on Twitter Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using Deep Learning to Detect Rumors in Twitter

Detecting Rumors on Social Media Based on a CNN Deep Learning Technique

Rumour Veracity Estimation with Deep Learning for Twitter

References

Funding

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

BERT Model-Based Approach for Detecting Racism and Xenophobia on Twitter Data

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Using Deep Learning to Detect Rumors in Twitter

Detecting Rumors on Social Media Based on a CNN Deep Learning Technique

Rumour Veracity Estimation with Deep Learning for Twitter

References

Funding

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation