Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3639233.3639247acmotherconferencesArticle/Chapter ViewAbstractPublication PagesnlpirConference Proceedingsconference-collections
research-article

Exploring the Impact of Lexicon-based Knowledge Transfer for Hate Speech Detection in Indonesia Code-Mixed Languages

Published: 05 March 2024 Publication History
  • Get Citation Alerts
  • Abstract

    In this study, our objective is to examine the influence of external knowledge from a lexicon on knowledge transfer for mitigating the language shift issue in the detection of hate speech in code-mixed languages. To accomplish this, we constructed a lexicon based on findings from previous studies. Subsequently, we implemented several machine learning models with various scenarios to assess the impact of the lexicon. The experimental results demonstrate that incorporating lexicon features leads to improved performance in detecting hate speech within code-mixed languages. Particularly, utilizing a lexicon that encompasses both implicit and explicit lexicons yields the most favorable outcomes in this investigation. This research provides valuable insights into understanding the detection of hate speech in code-mixed Indonesian languages and contributes to the advancement of more robust systems aimed at fostering a safer and more inclusive online environment. By leveraging the lexicon and exploring the interplay between implicit and explicit elements in hate speech, this study enhances our understanding of addressing hate speech challenges in Indonesia code-mixed languages and paves the way for developing more effective detection mechanisms.

    References

    [1]
    Ajeng Dwi Asti, Indra Budi, and Muhammad Okky Ibrohim. 2021. Multi-label Classification for Hate Speech and Abusive Language in Indonesian-Local Languages. In 2021 International Conference on Advanced Computer Science and Information Systems (ICACSIS). IEEE, 1–6.
    [2]
    Elisa Bassignana, Valerio Basile, Viviana Patti, 2018. Hurtlex: A multilingual lexicon of words to hurt. In CEUR Workshop proceedings, Vol. 2253. CEUR-WS, 1–6.
    [3]
    Irina Bigoulaeva, Viktor Hangya, Iryna Gurevych, and Alexander Fraser. 2023. Label modification and bootstrapping for zero-shot cross-lingual hate speech detection. Language Resources and Evaluation (2023), 1–32.
    [4]
    Bharathi Raja Chakravarthi, Ruba Priyadharshini, Shubanker Banerjee, Manoj Balaji Jagadeeshan, Prasanna Kumar Kumaresan, Rahul Ponnusamy, Sean Benhur, and John Philip McCrae. 2023. Detecting abusive comments at a fine-grained level in a low-resource language. Natural Language Processing Journal 3 (2023), 100006.
    [5]
    Kyunghyun Cho, Bart van Merriënboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, and Yoshua Bengio. 2014. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). Association for Computational Linguistics, Doha, Qatar, 1724–1734. https://doi.org/10.3115/v1/D14-1179
    [6]
    Abhishek Chopra, Deepak Kumar Sharma, Aashna Jha, and Uttam Ghosh. 2023. A Framework for Online Hate Speech Detection on Code-mixed Hindi-English Text and Hindi Text in Devanagari. ACM Transactions on Asian and Low-Resource Language Information Processing 22, 5 (2023), 1–21.
    [7]
    Gretel Liz De la Peña Sarracén and Paolo Rosso. 2023. Systematic keyword and bias analyses in hate speech detection. Information Processing & Management 60, 5 (2023), 103433. https://doi.org/10.1016/j.ipm.2023.103433
    [8]
    Suman Dowlagar and Radhika Mamidi. 2022. Hate speech detection on code-mixed dataset using a fusion of custom and pre-trained models with profanity vector augmentation. SN Computer Science 3, 4 (2022), 306.
    [9]
    Vishwajeet Dwivedy and Pradeep Kumar Roy. 2023. Deep feature fusion for hate speech detection: a transfer learning approach. Multimedia Tools and Applications (2023), 1–23.
    [10]
    HateBase. 2023. HateBase. https://hatebase.org/
    [11]
    Mardhiya Hayaty, Sumarni Adi, and Anggit D Hartanto. 2020. Lexicon-based indonesian local language abusive words dictionary to detect hate speech in social media. Journal of Information Systems Engineering and Business Intelligence 6, 1 (2020), 9–17.
    [12]
    Sepp Hochreiter and Jürgen Schmidhuber. 1997. Long Short-Term Memory. Neural Comput. 9, 8 (Nov. 1997), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
    [13]
    Md Saroar Jahan and Mourad Oussalah. 2023. A systematic review of hate speech automatic detection using natural language processing. Neurocomputing 546 (2023), 126232. https://doi.org/10.1016/j.neucom.2023.126232
    [14]
    Hiren Madhu, Shrey Satapara, Sandip Modha, Thomas Mandl, and Prasenjit Majumder. 2023. Detecting offensive speech in conversational code-mixed dialogue on social media: A contextual dataset and benchmark experiments. Expert Systems with Applications 215 (2023), 119342.
    [15]
    Roberto Navigli and Simone Paolo Ponzetto. 2012. BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial intelligence 193 (2012), 217–250.
    [16]
    Endang Wahyu Pamungkas, Valerio Basile, and Viviana Patti. 2021. A joint learning approach with knowledge injection for zero-shot cross-lingual hate speech detection. Information Processing & Management 58, 4 (2021), 102544.
    [17]
    Endang Wahyu Pamungkas, Azizah Fatmawati, Yusuf Sulistyo Nugroho, Dedi Gunawan, and Endah Sudarmilah. 2022. Hate Speech Detection in Code-Mixed Indonesian Social Media: Exploiting Multilingual Languages Resources. In 2022 Seventh International Conference on Informatics and Computing (ICIC). IEEE, 1–5.
    [18]
    Endang Wahyu Pamungkas, Azizah Fatmawati, and Farah Danisha Salam. 2022. Hate Speech Detection on Indonesian Social Media: A Preliminary Study on Code-Mixed Language Issue. In Proceedings of the 2022 6th International Conference on Natural Language Processing and Information Retrieval. 104–109.
    [19]
    Endang Wahyu Pamungkas and Viviana Patti. 2019. Cross-domain and cross-lingual abusive language detection: A hybrid approach with deep learning and a multilingual lexicon. In Proceedings of the 57th annual meeting of the association for computational linguistics: Student research workshop. 363–370.
    [20]
    Ishaani Priyadarshini, Sandipan Sahu, and Raghvendra Kumar. 2023. A transfer learning approach for detecting offensive and hate speech on social media platforms. Multimedia Tools and Applications (2023), 1–27.
    [21]
    Shofianina Dwi Ananda Putri, Muhammad Okky Ibrohim, and Indra Budi. 2021. Abusive language and hate speech detection for Javanese and Sundanese languages in tweets: Dataset and preliminary study. In 2021 11th International Workshop on Computer Science and Engineering, WCSE 2021. International Workshop on Computer Science and Engineering (WCSE), 461–465.
    [22]
    Michael Wiegand, Josef Ruppenhofer, Anna Schmidt, and Clayton Greenberg. 2018. Inducing a lexicon of abusive words–a feature-based approach. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). 1046–1056.

    Index Terms

    1. Exploring the Impact of Lexicon-based Knowledge Transfer for Hate Speech Detection in Indonesia Code-Mixed Languages

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Other conferences
        NLPIR '23: Proceedings of the 2023 7th International Conference on Natural Language Processing and Information Retrieval
        December 2023
        336 pages
        ISBN:9798400709227
        DOI:10.1145/3639233
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 05 March 2024

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. code-mixed languages
        2. hate speech detection
        3. knowledge transfer
        4. lexicon

        Qualifiers

        • Research-article
        • Research
        • Refereed limited

        Conference

        NLPIR 2023

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • 0
          Total Citations
        • 9
          Total Downloads
        • Downloads (Last 12 months)9
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 27 Jul 2024

        Other Metrics

        Citations

        View Options

        Get Access

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        HTML Format

        View this article in HTML Format.

        HTML Format

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media