Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Detecting Misogyny and Xenophobia in Spanish Tweets Using Language Technologies

Published: 14 March 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Today, misogyny and xenophobia are some of the most important social problems. With the increase in the use of social media, this feeling of hatred toward women and immigrants can be more easily expressed, and therefore it can have harmful effects on social media users. For this reason, it is important to develop systems capable of detecting hateful comments automatically. In this article, we analyze the hate speech in Spanish tweets against women and immigrants conducting classification experiments using different approaches. Moreover, we create appropriate language resources for hate speech detection in Spanish.

    References

    [1]
    Miguel Á. Álvarez-Carmona, Estefanıa Guzmán-Falcón, Manuel Montes-y Gómez, Hugo Jair Escalante, Luis Villasenor-Pineda, Verónica Reyes-Meza, and Antonio Rico-Sulayes. 2018. Overview of MEX-A3T at IberEval 2018: Authorship and aggressiveness analysis in Mexican Spanish tweets. In Notebook Papers of the 3rd SEPLN Workshop on Evaluation of Human Language Technologies for Iberian Languages (IBEREVAL’18), Vol. 6.
    [2]
    Maria Anzovino, Elisabetta Fersini, and Paolo Rosso. 2018. Automatic identification and classification of misogynistic language on Twitter. In Proceedings of the International Conference on Applications of Natural Language to Information Systems. 57--64.
    [3]
    Aymé Arango, Jorge Pérez, and Barbara Poblete. 2019. Hate speech detection is not as easy as you may think: A closer look at model validation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, New York, NY, 45--54.
    [4]
    Angelo Basile and Chiara Rubagotti. 2018. CrotoneMilano for AMI at Evalita2018. A performant, cross-lingual misogyny detection system. In Proceedings of the Final Workshop of the 6th EvaluationCampaign (EVALITA’18), Co-Located with the 5th Italian Conference on Computational Linguistics (CLiC-it’18).
    [5]
    Valerio Basile, Cristina Bosco, Elisabetta Fersini, Debora Nozza, Viviana Patti, Francisco Rangel, Paolo Rosso, and Manuela Sanguinetti. 2019. SemEval-2019 Task 5: Multilingual detection of hate speech against immigrants and women in Twitter. In Proceedings of the 13th International Workshop on Semantic Evaluation (SemEval’19).
    [6]
    Elisa Bassignana, Valerio Basile, and Viviana Patti. 2018. Hurtlex: A multilingual lexicon of words to hurt. In Proceedings of the 5th Italian Conference on Computational Linguistics (CLiC-it’18), Vol. 2253. 1--6.
    [7]
    Linda Beckman, Curt Hagquist, and Lisa Hellström. 2013. Discrepant gender patterns for cyberbullying and traditional bullying—An analysis of Swedish adolescent data. Computers in Human Behavior 29, 5 (2013), 1896--1903.
    [8]
    Yoshua Bengio, Holger Schwenk, Jean-Sébastien Senécal, Fréderic Morin, and Jean-Luc Gauvain. 2006. Innovations in Machine Learning, D. E. Holmes and L. C. Jain (Eds.). Studies in Fuzziness and Soft Computing. Springer.
    [9]
    Piotr Bojanowski, Edouard Grave, Armand Joulin, and Tomas Mikolov. 2017. Enriching word vectors with subword information. Transactions of the Association for Computational Linguistics 5 (2017), 135--146.
    [10]
    Cristina Bosco, Viviana Patti, Marcello Bogetti, Michelangelo Conoscenti, Giancarlo Francesco Ruffo, Rossano Schifanella, and Marco Stranisci. 2017. Tools and resources for detecting hate and prejudice against immigrants in social media. In Proceedingsof the AISB Annual Convention: Symposium III—Social Interactions in Complex Intelligent Systems (SICIS). 79--84.
    [11]
    Jose Sebastián Canós. 2018. Misogyny identification through SVM at IberEval 2018. In Proceedings of the 3rd IberEval Workshop.
    [12]
    Cristian Cardellino. 2016. Spanish Billion Words Corpus and Embeddings. Retrieved February 17, 2020 from https://crscardellino.github.io/SBWCE/.
    [13]
    Cagatay Catal, Ugur Sevim, and Banu Diri. 2011. Practical development of an Eclipse-based software fault prediction tool using naive Bayes algorithm. Expert Systems with Applications 38, 3 (2011), 2347--2353.
    [14]
    Naganna Chetty and Sreejith Alathur. 2018. Hate speech review in the context of online social networks. Aggression and Violent Behavior 40, 5–6 (2018), 108–118.
    [15]
    Raphael Cohen-Almagor. 2011. Fighting hate and bigotry on the Internet. Policy 8 Internet 3, 3 (2011), 1--26.
    [16]
    Thomas Davidson, Dana Warmsley, Michael Macy, and Ingmar Weber. 2017. Automated hate speech detection and the problem of offensive language. In Proceedings of the 11th International AAAI Conference on Web and Social Media.
    [17]
    Karthik Dinakar, Birago Jones, Catherine Havasi, Henry Lieberman, and Rosalind Picard. 2012. Common sense reasoning for detection, prevention, and mitigation of cyberbullying. ACM Transactions on Interactive Intelligent Systems 2, 3 (2012), 18.
    [18]
    Karmen Erjavec and Melita Poler Kovačič. 2012. You don’t understand, this is a new war! Analysis of hate speech in news web sites’ comments. Mass Communication and Society 15, 6 (2012), 899--920.
    [19]
    Elisabetta Fersini, Debora Nozza, and Paolo Rosso. 2018. Overview of the Evalita 2018 task on automatic misogyny identification (AMI). In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18).
    [20]
    Elisabetta Fersini, Paolo Rosso, and Maria Anzovino. 2018. Overview of the task on automatic misogyny identification at IberEval 2018. In Proceedings of the 3rd IberEval Workshop, Co-Located with the 34th Conferenceof the Spanish Society for Natural Language Processing (SEPLN’18).
    [21]
    Paula Fortuna and Sérgio Nunes. 2018. A survey on automatic detection of hate speech in text. ACM Computing Surveys 51, 4 (2018), 85.
    [22]
    Jesse Fox, Carlos Cruz, and Ji Young Lee. 2015. Perpetuating online sexism offline: Anonymity, interactivity, and the effects of sexist hashtags on social media. Computers in Human Behavior 52 (2015), 436--442.
    [23]
    Jesse Fox and Wai Yen Tang. 2014. Sexism in online video games: The role of conformity to masculine norms and social dominance orientation. Computers in Human Behavior 33 (2014), 314--320.
    [24]
    Simona Frenda, Bilal Ghanem, and Manuel Montes-y Gómez. 2018. Exploration of misogyny in Spanish and English tweets. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Vol. 2150. 260--267.
    [25]
    Simona Frenda, Bilal Ghanem, Manuel Montes-y Gómez, and Paolo Rosso. 2019. Online hate speech against women: Automatic identification of misogyny and sexism on Twitter. Journal of Intelligent 8 Fuzzy Systems 36, 5 (2019), 4743--4752.
    [26]
    Raúl Garreta and Guillermo Moncecchi. 2013. Learning Scikit-learn: Machine Learning in Python. Packt Publishing Ltd.
    [27]
    Abigail S. Gertner, John Henderson, Elizabeth Merkhofer, Amy Marsh, Ben Wellner, and Guido Zarrella. 2019. MITRE at SemEval-2019 Task 5: Transfer learning for multilingual hate speech detection. In Proceedings of the 13th International Workshop on Semantic Evaluation. 453--459.
    [28]
    I. Goenaga, A. Atutxa, K. Gojenola, A. Casillas, A. Dıaz de Ilarraza, N. Ezeiza, M. Oronoz, A. Pérez, and O. Perez de Vinaspre. 2018. Automatic misogyny identification using neural networks. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Co-:ocated with the 34th Conference of the Spanish Society for Natural Language Processing (SEPLN’18).
    [29]
    Sameer Hinduja and Justin W. Patchin. 2010. Bullying, cyberbullying, and suicide. Archives of Suicide Research 14, 3 (2010), 206--221.
    [30]
    Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. arXiv:1207.0580.
    [31]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (Nov. 1997), 1735--1780.
    [32]
    Homa Hosseinmardi, Sabrina Arredondo Mattson, Rahat Ibn Rafiq, Richard Han, Qin Lv, and Shivakant Mishra. 2015. Detection of cyberbullying incidents on the Instagram social network. arXiv:1503.03909.
    [33]
    Diederik P. Kingma and Jimmy Ba. 2015. Adam: A method for stochastic optimization. In Proceedings of the 3rd International Conference for Learning Representations. http://arxiv.org/abs/1412.6980
    [34]
    Ritesh Kumar, Atul Kr Ojha, Marcos Zampieri, and Shervin Malmasi. 2018. Proceedings of the First Workshop on Trolling, Aggression and Cyberbullying (TRAC-2018), R. Kumar, A. K. Ojha, M. Zampieri, and S. Malmasi (Eds.). ACM, New York, NY.
    [35]
    Irene Kwok and Yuzhou Wang. 2013. Locate the hate: Detecting tweets against blacks. In Proceedings of the 27th AAAI Conference on Artificial Intelligence.
    [36]
    Vittorio Lingiardi, Nicola Carone, Giovanni Semeraro, Cataldo Musto, Marilisa D’Amico, and Silvia Brena. 2019. Mapping Twitter hate speech towards social and sexual minorities: A lexicon-based approach to semantic content analysis. Behaviour 8 Information Technology. Epub ahead of print. April 22, 2019.
    [37]
    E. Martínez-Cámara, F. Cruz, M. D. Molina-González, M. T. Martín-Valdivia, F. Javier Ortega, and L. A. Ureña-López. 2015. Improving Spanish polarity classification combining different linguistic resources. In Natural Language Processing and Information Systems. Lecture Notes in Computer Science, Vol. 9103. Springer, 234--245.
    [38]
    Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, M. Dolores Molina-González, and José M. Perea-Ortega. 2014. Integrating Spanish lexical resources by meta-classifiers for polarity classification. Journal of Information Science 40, 4 (2014), 538--554. arXiv: http://jis.sagepub.com/content/40/4/538.full.pdf+html
    [39]
    Andrew McCallum and Kamal Nigam. 1998. A comparison of event models for naive Bayes text classification. In Proceedings of the AAAI-98 Workshop on Learning for Text Categorization, Vol. 752. 41--48.
    [40]
    Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S. Corrado, and Jeff Dean. 2013. Distributed representations of words and phrases and their compositionality. In Advances in Neural Information Processing Systems. 3111--3119.
    [41]
    M. Dolores Molina-González, Eugenio Martínez-Cámara, María-Teresa Martín-Valdivia, and José M. Perea-Ortega. 2013. Semantic orientation for polarity classification in Spanish reviews. Expert Systems with Applications 40, 18 (2013), 7250--7257.
    [42]
    M. Dolores Molina-González, Eugenio Martínez-Cámara, M. Teresa Martín-Valdivia, and L. Alfonso Ureña-López. 2015. A Spanish semantic orientation approach to domain adaptation for polarity classification. Information Processing 8 Management 51, 4 (2015), 520--531.
    [43]
    Mainack Mondal, Leandro Araújo Silva, and Fabrício Benevenuto. 2017. A measurement study of hate speech in social media. In Proceedings of the 28th ACM Conference on Hypertext and Social Media. ACM, New York, NY, 85--94.
    [44]
    Rodrigo Moraes, João Francisco Valiati, and Wilson P. Gavião Neto. 2013. Document-level sentiment classification: An empirical comparison between SVM and ANN. Expert Systems with Applications 40, 2 (2013), 621--633.
    [45]
    Hamdy Mubarak, Kareem Darwish, and Walid Magdy. 2017. Abusive language detection on Arabic social media. In Proceedings of the 1st Workshop on Abusive Language Online. 52--56.
    [46]
    Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. 14-ExLab@ UniTo for AMI at IberEval2018: Exploiting lexical knowledge for detecting misogyny in English and Spanish tweets. In Proceedings of the 3rd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval’18), Vol. 2150. 234--241.
    [47]
    Endang Wahyu Pamungkas, Alessandra Teresa Cignarella, Valerio Basile, and Viviana Patti. 2018. Automatic identification of misogyny in English and Italian tweets at EVALITA 2018 with a multilingual hate lexicon. In Proceedings of the 6th Evaluation Campaign of Natural Language Processing and Speech Tools for Italian (EVALITA’18), Vol. 2263. 1--6.
    [48]
    Fabian Pedregosa, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel, Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (Oct. 2011), 2825--2830.
    [49]
    Juan Manuel Pérez and Franco M. Luque. 2019. Atalaya at SemEval 2019 Task 5: Robust embeddings for tweet classification. In Proceedings of the 13th International Workshop on Semantic Evaluation. 64--69.
    [50]
    Michal Ptaszynski, Agata Pieciukiewicz, and Paweł Dybała. 2019. Results of the PolEval 2019 Shared Task 6: First dataset and open shared task for automatic cyberbullying detection in Polish Twitter. In Proceedings of the PolEval 2019 Workshop.89.
    [51]
    Nanjira Sambuli, Faith Morara, and Christine Mahihu. 2013. Monitoring Online Dangerous Speech in Kenya. Umati.
    [52]
    Manuela Sanguinetti, Fabio Poletto, Cristina Bosco, Viviana Patti, and Stranisci Marco. 2018. An Italian Twitter corpus of hate speech against immigrants. In Proceedings of the 2018 Language Resources and Evaluation Conference (LREC’18). 1--8.
    [53]
    Gudbjartur Ingi Sigurbergsson and Leon Derczynski. 2019. Offensive language and hate speech detection for Danish. arXiv:1908.04531.
    [54]
    Leandro Silva, Mainack Mondal, Denzil Correa, Fabrício Benevenuto, and Ingmar Weber. 2016. Analyzing the targets of hate in online social media. In Proceedings of the 1th International AAAI Conference on Web and Social Media.
    [55]
    Rachel Noelle Simons. 2015. Addressing gender-based harassment in social media: A call to action. In Proceedings of iConference 2015.
    [56]
    Ellen Spertus. 1997. Smokey: Automatic recognition of hostile messages. In Proceedings of the 14th National Conference on Artificial Intelligence and the 9th Conference on Innovative Applications of Artificial Intelligence (AAAI’97/AAAI’97). 1058--1065.
    [57]
    Mikalai Tsytsarau and Themis Palpanas. 2012. Survey on mining subjective data on the web. Data Mining and Knowledge Discovery 24, 3 (2012), 478--514.
    [58]
    Stéphan Tulkens, Lisa Hilte, Elise Lodewyckx, Ben Verhoeven, and Walter Daelemans. 2016. A dictionary-based approach to racism detection in dutch social media. arXiv:1608.08738.
    [59]
    Luis Enrique Argota Vega, Jorge Carlos Reyes-Magaña, Helena Gómez-Adorno, and Gemma Bel-Enguix. 2019. MineriaUNAM at SemEval-2019 Task 5: Detecting hate speech in Twitter using multiple features in a combinatorial framework. In Proceedings of the 13th International Workshop on Semantic Evaluation. 447--452.
    [60]
    Zeerak Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the 1st Workshop on NLP and Computational Social Science. 138--142.
    [61]
    Zeerak Waseem, Wendy Hui Kyong Chung, Dirk Hovy, and Joel Tetreault. 2017. Proceedings of the First Workshop on Abusive Language Online. ACM, New York, NY.
    [62]
    Zeerak Waseem and Dirk Hovy. 2016. Hateful symbols or hateful people? Predictive features for hate speech detection on twitter. In Proceedings of the NAACL Student Research Workshop. 88--93.
    [63]
    Michael Wiegand, Melanie Siegel, and Josef Ruppenhofer. 2018. Overview of the GermEval 2018 shared task on the identification of offensive language. In Proceedings of the 14th Conference on Natural Language Processing (KONVENS’18).
    [64]
    Haoti Zhong, Hao Li, Anna Cinzia Squicciarini, Sarah Michele Rajtmajer, Christopher Griffin, David J. Miller, and Cornelia Caragea. 2016. Content-driven detection of cyberbullying on the Instagram social network. In Proceedings of the 25th International Joint Conference on Artificial Intelligence (IJCAI’16). 3952--3958.

    Cited By

    View all
    • (2024)Reframing social media discourse: Converting hate speech to non-hate speechJournal of Intelligent & Fuzzy Systems10.3233/JIFS-219348(1-14)Online publication date: 28-Apr-2024
    • (2024)Automatic Detection of Multilingual Misogynistic Content in Social Media Data Based on Machine Learning Approach2024 International Conference on Integrated Circuits and Communication Systems (ICICACS)10.1109/ICICACS60521.2024.10499136(1-7)Online publication date: 23-Feb-2024
    • (2024)Semi-Automatic Dataset Annotation Applied to Automatic Violent Message DetectionIEEE Access10.1109/ACCESS.2024.336140412(19651-19664)Online publication date: 2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Internet Technology
    ACM Transactions on Internet Technology  Volume 20, Issue 2
    Special Section on Emotions in Conflictual Social Interactions and Regular Papers
    May 2020
    256 pages
    ISSN:1533-5399
    EISSN:1557-6051
    DOI:10.1145/3386441
    • Editor:
    • Ling Liu
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 14 March 2020
    Accepted: 01 October 2019
    Revised: 01 August 2019
    Received: 01 March 2019
    Published in TOIT Volume 20, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Misogyny detection
    2. classifier ensemble
    3. hate speech classification
    4. lexicon
    5. machine learning
    6. social media
    7. text mining
    8. xenophobia detection

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • REDES project
    • Fondo Europeo de Desarrollo Regional (FEDER)
    • LIVING-LANG

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)92
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Reframing social media discourse: Converting hate speech to non-hate speechJournal of Intelligent & Fuzzy Systems10.3233/JIFS-219348(1-14)Online publication date: 28-Apr-2024
    • (2024)Automatic Detection of Multilingual Misogynistic Content in Social Media Data Based on Machine Learning Approach2024 International Conference on Integrated Circuits and Communication Systems (ICICACS)10.1109/ICICACS60521.2024.10499136(1-7)Online publication date: 23-Feb-2024
    • (2024)Semi-Automatic Dataset Annotation Applied to Automatic Violent Message DetectionIEEE Access10.1109/ACCESS.2024.336140412(19651-19664)Online publication date: 2024
    • (2024)Evaluating cultural ecosystem services in China's modern historic parks: A sentiment computing approachUrban Forestry & Urban Greening10.1016/j.ufug.2024.12831495(128314)Online publication date: May-2024
    • (2024)Multi-task learning neural framework for categorizing sexismComputer Speech & Language10.1016/j.csl.2023.10153583(101535)Online publication date: Jan-2024
    • (2024)Hate speech, toxicity detection in online social media: a recent survey of state of the art and opportunitiesInternational Journal of Information Security10.1007/s10207-023-00755-223:1(577-608)Online publication date: 1-Feb-2024
    • (2024)Manifestations of xenophobia in AI systemsAI & SOCIETY10.1007/s00146-024-01893-4Online publication date: 21-Mar-2024
    • (2023)The Semiotics of Xenophobia and Misogyny on Digital MediaNews Media and Hate Speech Promotion in Mediterranean Countries10.4018/978-1-6684-8427-2.ch007(111-135)Online publication date: 30-Jun-2023
    • (2023)A situação socioeconômica dos roma na RomêniaIdeias10.20396/ideias.v14i00.867182714(e023004)Online publication date: 9-Oct-2023
    • (2023)Exploring Automatic Hate Speech Detection on Social Media: A Focus on Content-Based AnalysisSAGE Open10.1177/2158244023118131113:2Online publication date: 17-Jun-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media