Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Unravelling social media racial discriminations through a semi-supervised approach

Published: 01 February 2022 Publication History
  • Get Citation Alerts
  • Highlights

    Machine learning models were used to detect cyber-racism during COVID19 pandemic.
    Cyber-racism detection based on negative English tweets.
    Random Forest with bagging emerged to be the best detection classifier.
    Top themes of cyber-racism - Eating habit, Xenophobia and Political hatred.

    Abstract

    The study investigated cyber-racism on social media during the recent Coronavirus pandemic using a semi-supervised approach. Specifically, several machine learning models were trained to detect cyber-racism, followed by topic modelling using Latent Dirichlet Allocation (LDA). Twitter data were gathered using the hash tags Chinese virus and Kung Flu in the month of March 2020, resulting in 7,454 clean tweets. Negative tweets extracted using sentiment analysis were annotated (Racism, Sarcasm/irony and Others), and used to train several machine learning models. Experimental results show Random Forest with bagging to consistently outperform Random Forest, J48 and Support Vector Machine with an accuracy of 78.1% (Racism versus Sarcasm/Irony) and 77.9% (Racism versus Others). LDA revealed three distinct topics for tweets identified as racist, namely, Eating habit, Political hatred and Xenophobia. Consistent detection performance of the models evaluated indicate their reliability in detecting cyber-racism patterns based on textual communications.

    References

    [1]
    M. Ahmad, S. Aftab, S.S. Muhammad, S. Ahmad, Machine learning techniques for sentiment analysis: A review, Int. J. Multidiscip. Sci. Eng 8 (3) (2017) 27.
    [2]
    M.A. Al-garadi, K.D. Varathan, S.D. Ravana, Cybercrime detection in online communications: The experimental case of cyberbullying detection in the Twitter network, Comput. Hum. Behav. 63 (2016) 433–443.
    [3]
    M.S. Amin, Y.K. Chiam, K.D. Varathan, Identification of significant features and data mining techniques in predicting heart disease, Telematics Inform. 36 (2019) 82–93.
    [4]
    P. Badjatiya, S. Gupta, M. Gupta, V. Varma, Deep learning for hate speech detection in tweets, Proceedings of the 26th International Conference on World Wide Web Companion, 2017.
    [5]
    D.M. Blei, Probabilistic topic models, Commun. ACM 55 (4) (2012) 77–84.
    [6]
    A.-M. Bliuc, N. Faulkner, A. Jakubowicz, C. McGarty, Online networks of racial hate: A systematic review of 10 years of research on cyber-racism, Comput. Hum. Behav. 87 (2018) 75–86.
    [7]
    A. Bondielli, F. Marcelloni, A survey on fake news and rumour detection techniques, Inf. Sci. 497 (2019) 38–55.
    [8]
    L.B. Buchanan, Elementary pre-service teachers׳ navigation of racism and whiteness through inquiry with historical documentary film, J. Soc. Stud. Res. 40 (2) (2016) 137–154.
    [9]
    P. Burnap, O.F. Rana, N. Avis, M. Williams, W. Housley, A. Edwards, J. Morgan, L. Sloan, Detecting tension in online communities with computational Twitter analysis, Technol. Forecast. Soc. Chang. 95 (2015) 96–108.
    [10]
    P. Burnap, M.L. Williams, Us and them: identifying cyber hate on Twitter across multiple protected characteristics, EPJ Data Sci. 5 (1) (2016) 11.
    [11]
    W. Cai, D. Yu, Z. Wu, X. Du, T. Zhou, A hybrid ensemble learning framework for basketball outcomes prediction, Physica A 528 (2019) 121461,.
    [12]
    J. Cho, S. Kim, Personal and social predictors of use and non-use of fitness/diet app: Application of Random Forest algorithm, Telematics Inform. 55 (2020) 101301,.
    [13]
    S. Ding, Z. Li, X. Liu, H. Huang, S. Yang, Diabetic complication prediction using a similarity-enhanced latent Dirichlet allocation model, Inf. Sci. 499 (2019) 12–24.
    [14]
    Hasanuzzaman, M., Dias, G., & Way, A. (2017). Demographic word embeddings for racism detection on twitter.
    [15]
    D. Jain, A. Kumar, G. Garg, Sarcasm detection in mash-up language using soft-attention based bi-directional LSTM and feature-rich CNN, Appl. Soft Comput. 91 (2020) 106–198.
    [16]
    P.K. Jonason, How “dark” personality traits and perceptions come together to predict racism in Australia, Personality Individ. Differ. 72 (2015) 47–51.
    [17]
    Kozlowska, H. (2020). How anti-Chinese sentiment is spreading on social media. https://qz.com/1823608/how-anti-china-sentiment-is-spreading-on-social-media/.
    [18]
    J. Liu, E. Zio, Integration of feature vector selection and support vector machine for classification of imbalanced data, Appl. Soft Comput. 75 (2019) 702–711.
    [19]
    E. Lozano, J. Cedeño, G. Castillo, F. Layedra, H. Lasso, C. Vaca, Requiem for online harassers: Identifying racism from political tweets, 2017 Fourth International Conference on eDemocracy & eGovernment (ICEDEG), 2017.
    [20]
    S. Murnion, W.J. Buchanan, A. Smales, G. Russell, Machine learning and semantic analysis of in-game chat for cyberbullying, Computers & Security 76 (2018) 197–213.
    [21]
    F.A. Ozbay, B. Alatas, Fake news detection within online social media using supervised artificial intelligence algorithms, Physica A 540 (2020) 123–174.
    [22]
    N. Öztürk, S. Ayvaz, Sentiment analysis on Twitter: A text mining approach to the Syrian refugee crisis, Telematics Inform. 35 (1) (2018) 136–147.
    [23]
    T. Pan, J. Zhao, W. Wu, J. Yang, Learning imbalanced datasets based on SMOTE and Gaussian distribution, Inf. Sci. 512 (2020) 1214–1233.
    [24]
    G. Pennycook, J. McPhetres, Y. Zhang, J.G. Lu, D.G. Rand, Fighting COVID-19 misinformation on social media: Experimental evidence for a scalable accuracy nudge intervention, Psychol. Sci. 31 (7) (2020) 770–780.
    [25]
    L. Tang, Y. Tian, W. Li, P.M. Pardalos, Structural improved regular simplex support vector machine for multiclass classification, Appl. Soft Comput. 91 (2020) 106–235.
    [26]
    L.V.P. Trindade, Disparagement humour and gendered racism on social media in Brazil, Ethnic and Racial Studies (2019) 1–19.
    [27]
    M.F. Vázquez, F.S. Pérez, Hate Speech in Spain Against Aquarius Refugees 2018 in Twitter, Proceedings of the Seventh International Conference on Technological Ecosystems for Enhancing Multiculturality, 2019.
    [28]
    H. Watanabe, M. Bouazizi, T. Ohtsuki, Hate speech on twitter: A pragmatic approach to collect hateful and offensive expressions and perform hate speech detection, IEEE Access 6 (2018) 13825–13835.
    [29]
    World Health Organization (2020a). Coronavirus disease 2019 (COVID-19) Situation Report - 72. https://www.who.int/docs/default-source/coronaviruse/situation-reports/20200401-sitrep-72-covid-19.pdf?sfvrsn=3dd8971b_2.
    [31]
    N. Zainuddin, A. Selamat, R. Ibrahim, Hybrid sentiment classification on twitter aspect-based sentiment analysis, Applied Intelligence 48 (5) (2018) 1218–1232.
    [32]
    J. Zhang, M. Litvinova, W. Wang, Y. Wang, X. Deng, X. Chen, M. Li, W. Zheng, L. Yi, X. Chen, Q. Wu, Y. Liang, X. Wang, J. Yang, K. Sun, I.M. Longini, M.E. Halloran, P. Wu, B.J. Cowling, S. Merler, C. Viboud, A. Vespignani, M. Ajelli, H. Yu, Evolving epidemiology and transmission dynamics of coronavirus disease 2019 outside Hubei province, China: A descriptive and modelling study, Lancet Infectios Disease 20 (7) (2020) 793–802.

    Cited By

    View all
    • (2024)Emotional Intelligence Attention Unsupervised Learning Using Lexicon Analysis for Irony-based AdvertisingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358049623:1(1-19)Online publication date: 15-Jan-2024

    Index Terms

    1. Unravelling social media racial discriminations through a semi-supervised approach
            Index terms have been assigned to the content through auto-classification.

            Recommendations

            Comments

            Information & Contributors

            Information

            Published In

            cover image Telematics and Informatics
            Telematics and Informatics  Volume 67, Issue C
            Feb 2022
            98 pages

            Publisher

            Pergamon Press, Inc.

            United States

            Publication History

            Published: 01 February 2022

            Author Tags

            1. Cyber-racism
            2. Machine learning
            3. Topic modelling
            4. Sentiment analysis
            5. Social media

            Qualifiers

            • Research-article

            Contributors

            Other Metrics

            Bibliometrics & Citations

            Bibliometrics

            Article Metrics

            • Downloads (Last 12 months)0
            • Downloads (Last 6 weeks)0
            Reflects downloads up to 27 Jul 2024

            Other Metrics

            Citations

            Cited By

            View all
            • (2024)Emotional Intelligence Attention Unsupervised Learning Using Lexicon Analysis for Irony-based AdvertisingACM Transactions on Asian and Low-Resource Language Information Processing10.1145/358049623:1(1-19)Online publication date: 15-Jan-2024

            View Options

            View options

            Get Access

            Login options

            Media

            Figures

            Other

            Tables

            Share

            Share

            Share this Publication link

            Share on social media