Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

“HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social Media

Published: 12 March 2024 Publication History

Abstract

Harmful textual content is pervasive on social media, poisoning online communities and negatively impacting participation. A common approach to this issue is developing detection models that rely on human annotations. However, the tasks required to build such models expose annotators to harmful and offensive content and may require significant time and cost to complete. Generative AI models have the potential to understand and detect harmful textual content. We used ChatGPT to investigate this potential and compared its performance with MTurker annotations for three frequently discussed concepts related to harmful textual content on social media: Hateful, Offensive, and Toxic (HOT). We designed five prompts to interact with ChatGPT and conducted four experiments eliciting HOT classifications. Our results show that ChatGPT can achieve an accuracy of approximately 80% when compared to MTurker annotations. Specifically, the model displays a more consistent classification for non-HOT comments than HOT comments compared to human annotations. Our findings also suggest that ChatGPT classifications align with the provided HOT definitions. However, ChatGPT classifies “hateful” and “offensive” as subsets of “toxic.” Moreover, the choice of prompts used to interact with ChatGPT impacts its performance. Based on these insights, our study provides several meaningful implications for employing ChatGPT to detect HOT content, particularly regarding the reliability and consistency of its performance, its understanding and reasoning of the HOT concept, and the impact of prompts on its performance. Overall, our study provides guidance on the potential of using generative AI models for moderating large volumes of user-generated textual content on social media.

References

[1]
L. Fan, H. Yu, and Z. Yin. 2020. Stigmatization in social media: Documenting and analyzing hate speech for COVID-19 on Twitter. Proc. Assoc. Inf. Sci. Technol. 57, 1 (2020). DOI:
[2]
E. Whittaker and R. M. Kowalski. 2015. Cyberbullying via social media. Journal of School Violence 14, 1 (2015), 11–29. DOI:
[3]
W. Alorainy, P. Burnap, H. Liu, and M. L. Williams. 2019. “The enemy among us”: Detecting cyber hate speech with threats-based othering language embeddings. ACM Trans. Web 13, 3 (2019), 1–26. DOI:
[4]
N. Djuric, J. Zhou, R. Morris, M. Grbovic, V. Radosavljevic, and N. Bhamidipati. 2015. Hate speech detection with comment embeddings. In Proceedings of the 24th International Conference on World Wide Web. ACM, Florence, Italy, 29–30. DOI:
[5]
T. Davidson, D. Warmsley, M. Macy, and I. Weber. 2017. Automated hate speech detection and the problem of offensive language. ICWSM 11, 1 (2017), 512–515. DOI:
[6]
B. Mathew, P. Saha, S. M. Yimam, C. Biemann, P. Goyal, and A. Mukherjee. 2021. HateXplain: A benchmark dataset for explainable hate speech detection. AAAI 35, 17 (2021), 14867–14875. DOI:
[7]
Z. Talat, J. Thorne, and J. Bingel. 2021. Correction to: Bridging the gaps: Multi task learning for domain transfer of hate speech entection. In Online Harassment, J. Golbeck, Ed., In Human–Computer Interaction Series. Springer International Publishing, Cham, C1–C1. DOI:
[8]
M. Ibrahim, M. Torki, and N. El-Makky. 2018. Imbalanced toxic comments classification using data augmentation and deep learning. In 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA). IEEE, Orlando, FL, 875–878. DOI:
[9]
M. Diaz, I. Kivlichan, R. Rosen, D. Baker, R. Amironesei, V. Prabhakaran, and E. Denton. 2022. CrowdWorkSheets: Accounting for individual and collective identities underlying crowdsourced dataset annotation. In 2022 ACM Conference on Fairness, Accountability, and Transparency.ACM, Seoul, Republic of Korea, 2342–2351. DOI:
[10]
F. Huang, H. Kwak, and J. An. 2023. Is ChatGPT better than human annotators? Potential and limitations of ChatGPT in explaining implicit hate speech. (2023). DOI:
[11]
T. Sorensen, J. Robinson, C. Rytting, A. Shaw, K. Rogers, A. Delorey, M. Khalil, N. Fulda, and D. Wingate. 2022. An information-theoretic approach to prompt engineering without ground truth labels. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics, Dublin, Ireland, 819–862. DOI:
[12]
H. Strobelt, A. Webson, V. Sanh, B. Hoover, J. Beyer, H. Pfister, and A. Rush. 2022. Interactive and visual prompt engineering for ad-hoc task adaptation with large language models. IEEE Trans. Visual. Comput. Graphics (2022), 1–11. DOI:
[13]
Y. Zhou, A. Muresanu, Z. Han, K. Paster, S. Pitis, H. Chan, and J. Ba. 2023. Large language models are human-level prompt engineers. arXiv, Mar. 10, 2023. Accessed: Apr. 03, 2023. [Online]. Available: http://arxiv.org/abs/2211.01910
[14]
C. Nobata, J. Tetreault, A. Thomas, Y. Mehdad, and Y. Chang. 2016. Abusive language detection in online user content. In Proceedings of the 25th International Conference on World Wide Web. International World Wide Web Conferences Steering Committee, Montréal, Québec, Canada, 145–153. DOI:
[15]
M. Das, B. Mathew, P. Saha, P. Goyal, and A. Mukherjee. 2020. Hate speech in online social media. SIGWEB Newsl. Autumn, 1–8 (2020). DOI:
[16]
Md. A. H. Wadud, M. M. Kabir, M. F. Mridha, M. A. Ali, Md. A. Hamid, and M. M. Monowar. 2022. How can we manage offensive text in social media - a text classification approach using LSTM-BOOST. International Journal of Information Management Data Insights 2, 2 (2022), 100095. DOI:
[17]
B. Gambäck and U. K. Sikdar. 2017. Using convolutional neural networks to classify hate-speech. In Proceedings of the First Workshop on Abusive Language Online. Association for Computational Linguistics, Vancouver, BC, Canada, 85–90. DOI:
[18]
A. Matamoros-Fernández and J. Farkas. 2021. Racism, hate speech, and social media: A systematic review and critique. Television & New Media 22, 2 (2021), 205–224. DOI:
[19]
Z. Mossie and J.-H. Wang. 2020. Vulnerable community identification using hate speech detection on social media. Information Processing & Management 57, 3 (2020), 102087. DOI:
[20]
M. Sap, D. Card, S. Gabriel, Y. Choi, and N. A. Smith. 2019. The risk of racial bias in hate speech detection. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, Florence, Italy, 1668–1678. DOI:
[21]
N. Andalibi, O. L. Haimson, M. D. Choudhury, and A. Forte. 2018. Social support, reciprocity, and anonymity in responses to sexual abuse disclosures on social media. ACM Trans. Comput.-Hum. Interact. 25, 5 (2018), 1–35. DOI:
[22]
O. de Gibert, N. Perez, A. García-Pablos, and M. Cuadros. 2018. Hate speech dataset from a white supremacy forum. In Proceedings of the 2nd Workshop on Abusive Language Online (ALW2). Association for Computational Linguistics, Brussels, Belgium, 11–20. DOI:
[23]
P. Fortuna and S. Nunes. 2019. A survey on automatic detection of hate speech in text. ACM Comput. Surv. 51, 4 (2019), 1–30. DOI:
[24]
S. Jhaver, S. Ghoshal, A. Bruckman, and E. Gilbert. 2018. Online harassment and content moderation: The case of blocklists. ACM Trans. Comput.-Hum. Interact. 25, 2 (2018), 1–33. DOI:
[25]
D. Paschalides, D. Stephanidis, A. Andreou, K. Orphanou, G. Pallis, M. Dikaiakos, and E. Markatos. 2020. MANDOLA: A big-data processing and visualization platform for monitoring and detecting online hate speech. ACM Trans. Internet Technol. 20, 2 (2020), 1–21. DOI:
[26]
S. Wu, A. Schöpke-Gonzalez, S. Kumar, L. Hemphill, and P. Resnick. 2023. HOT speech: Comments from political news posts and videos that were annotated for hateful, offensive, and toxic content. Inter-university Consortium for Political and Social Research [distributor], (2023). [Online]. Available: https://socialmediaarchive.org/record/19
[27]
S. Duguay, J. Burgess, and N. Suzor. 2020. Queer women's experiences of patchwork platform governance on Tinder, Instagram, and Vine. Convergence 26, 2 (2020), 237–252. DOI:
[28]
J. T. Nockleby. 2000. Hate speech. Encyclopedia of the American Constitution. 2000.
[30]
[31]
J. Salminen, M. Hopf, S. A. Chowdhury, S. Jung, H. Almerekhi, and B. J. Jansen. 2020. Developing an online hate classifier for multiple social media platforms. Hum. Cent. Comput. Inf. Sci. 10, 1 (2020). DOI:
[32]
M. Wiegand, M. Siegel, and J. Ruppendorfer. 2018. Konvens 2018 - GermEval Proceedings. Verlag der Österreichischen Akademie der Wissenschaften, 2018, p. 0xc1aa5576_0x003a105d. DOI:
[33]
M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar. 2019. SemEval-2019 Task 6: Identifying and categorizing offensive language in social media (OffensEval). In Proceedings of the 13th International Workshop on Semantic Evaluation. Association for Computational Linguistics, Minneapolis, Minnesota, USA, 75–86. DOI:
[34]
T. Jay and K. Janschewitz. 2008. The pragmatics of swearing. Journal of Politeness Research. Language, Behaviour, Culture 4, 2 (2008). DOI:
[35]
Google Jigsaw. 2017. Perspective API. 2017. [Online]. Available: https://perspectiveapi.com
[36]
V. Kolhatkar, H. Wu, L. Cavasso, E. Francis, K. Shukla, and M. Taboada. 2020. The SFU opinion and comments corpus: A corpus for the analysis of online news comments. Corpus Pragmatics 4, 2 (2020), 155–190. DOI:
[37]
Y. Chen, Y. Zhou, S. Zhu, and H. Xu. 2012. Detecting offensive language in social media to protect adolescent online safety. In 2012 International Conference on Privacy, Security, Risk and Trust and 2012 International Conference on Social Computing. IEEE, Amsterdam, Netherlands, 71–80. DOI:
[38]
M. Wiegand, J. Ruppenhofer, and T. Kleinbauer. 2019. Detection of abusive language: The problem of biased datasets. In Proceedings of the 2019 Conference of the North. Association for Computational Linguistics, Minneapolis, Minnesota, 602–608. DOI:
[39]
N. D. Gitari, Z. Zhang, H. Damien, and J. Long. 2015. A lexicon-based approach for hate speech detection. IJMUE 10, 4 (2015), 215–230. DOI:
[40]
M. Wiegand, J. Ruppenhofer, A. Schmidt, and C. Greenberg. 2018. Inducing a lexicon of abusive words – a feature-based approach. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). Association for Computational Linguistics, New Orleans, Louisiana, 1046–1056. DOI:
[41]
A. Tontodimamma, L. Fontanella, S. Anzani, and V. Basile. 2022. An Italian lexical resource for incivility detection in online discourses. Qual. Quant. 2022. DOI:
[42]
L. Dixon, J. Li, J. Sorensen, N. Thain, and L. Vasserman. 2018. Measuring and mitigating unintended bias in text classification. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. ACM, New Orleans, LA USA, 67–73. DOI:
[43]
T. Gillespie. 2018. Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions that Shape Social Media. Yale University Press, New Haven, Connecticut.
[44]
P. Badjatiya, S. Gupta, M. Gupta, and V. Varma. 2017. Deep learning for hate speech detection in tweets. In Proceedings of the 26th International Conference on World Wide Web Companion - WWW ’17 Companion. : ACM, Perth, Australia, 759–760. DOI:
[45]
D. Chatzakou, I. Leontiadis, J. Blackburn, E. Cristofaro, G. Stringhini, A. Vakali, and N. Kourtellis. 2019. Detecting cyberbullying and cyberaggression in social media. ACM Trans. Web 13, 3 (2019), 1–51. DOI:
[46]
C. Chelmis and D.-S. Zois. 2021. Dynamic, incremental, and continuous detection of cyberbullying in online social media. ACM Trans. Web 15, 3 (2021), 1–33. DOI:
[47]
P. Malik, A. Aggrawal, and D. K. Vishwakarma. 2021. Toxic speech detection using traditional machine learning models and BERT and FastText embedding with deep neural networks. In 2021 5th International Conference on Computing Methodologies and Communication (ICCMC). IEEE, Erode, India, 1254–1259. DOI:
[48]
Z. Yin, L. Fan, H. Yu, and A. J. Gilliland. 2020. Using a three-step social media similarity (TSMS) mapping method to analyze controversial speech relating to COVID-19 in Twitter collections. In 2020 IEEE International Conference on Big Data (Big Data). IEEE, Atlanta, GA, USA, 1949–1953. DOI:
[49]
N. Badri, F. Kboubi, and A. H. Chaibi. 2022. Combining FastText and GloVe word embedding for offensive and hate speech text detection. Procedia Computer Science 207 (2022), 769–778. DOI:
[50]
A. C. Mazari, N. Boudoukhani, and A. Djeffal. 2023. BERT-based ensemble learning for multi-aspect hate speech detection. Cluster Comput. (2023). DOI:
[51]
A. Velankar, H. Patil, A. Gore, S. Salunke, and R. Joshi. 2022. L3Cube-MahaHate: A tweet-based Marathi hate speech detection dataset and BERT models. (2022). DOI:
[52]
S. Khan, M. Fazil, V. Sejwal, M. Alshara, R. Alotaibi, A. Kamal, and A. Baig. 2022. BiCHAT: BiLSTM with deep CNN and hierarchical attention for hate speech detection. Journal of King Saud University - Computer and Information Sciences 34, 7 (2022), 4335–4344. DOI:
[53]
S. MacAvaney, H.-R. Yao, E. Yang, K. Russell, N. Goharian, and O. Frieder. 2019. Hate speech detection: Challenges and solutions. PLoS ONE 14, 8 (2019), e0221152. DOI:
[54]
M. J. Riedl, G. M. Masullo, and K. N. Whipple. 2020. The downsides of digital labor: Exploring the toll incivility takes on online comment moderators. Computers in Human Behavior 107 (2020), 106262. DOI:
[55]
F. Gilardi, M. Alizadeh, and M. Kubli. 2023. ChatGPT outperforms crowd-workers for text-annotation tasks. arXiv, Mar. 27, 2023. Accessed: Apr. 03, 2023. [Online]. Available: http://arxiv.org/abs/2303.15056
[56]
B. Kasthuriarachchy, M. Chetty, A. Shatte, and D. Walls. 2021. Cost effective annotation framework using zero-shot text classification. In 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, Shenzhen, China, 1–8. DOI:
[57]
T. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert-Voss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Amodei. 2020. Language models are few-shot learners. In Advances in Neural Information Processing Systems. H. Larochelle, M. Ranzato, R. Hadsell, M. F. Balcan, and H. Lin, (Eds.), Curran Associates, Inc., Dec. 2020, 1877–1901. [Online]. Available: https://proceedings.neurips.cc/paper_files/paper/2020/file/1457c0d6bfcb4967418bfb8ac142f64a-Paper.pdf
[58]
A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, and I. Sutskever. 2019. Language models are unsupervised multitask learners, 2019. [Online]. https://insightcivic.s3.us-east-1.amazonaws.com/language-models.pdf
[59]
L. Fan, L. Li, Z. Ma, S. Lee, H. Yu, and L. Hemphill. 2023. A bibliometric review of large language models research from 2017 to 2023, 2023. DOI:
[60]
Sam Altman: OpenAI CEO on GPT-4, ChatGPT, and the Future of AI, (2023). [Online Video]. Available: https://www.youtube.com/watch?v=L_Guz73e6fw
[61]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. Gomez, L. Kaiser, and I. Polosukhin. 2017. Attention is all you need. Presented at the 31st Conference on Neural Information Processing Systems. Long Beach, CA, USA: arXiv, 2017. DOI:
[62]
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North. Association for Computational Linguistics, Minneapolis, Minnesota, 4171–4186. DOI:
[63]
A. Feder, N. Oved, U. Shalit, and R. Reichart. 2021. CausaLM: Causal model explanation through counterfactual language models. Computational Linguistics. 1–54. DOI:
[64]
OpenAI. 2023. GPT-4 is OpenAI's most advanced system, producing safer and more useful responses Quick links Try on ChatGPT Plus. [Online]. Available: https://openai.com/product/gpt-4
[65]
S. Pichai. 2023. An important next step on our AI journey. [Online]. Available: https://blog.google/technology/ai/bard-google-ai-search-updates/
[66]
Y.-S. Wang and Y. Chang. 2022. Toxicity detection with generative prompt-based inference. arXiv, 2022. Accessed: Apr. 03, 2023. [Online]. Available: http://arxiv.org/abs/2205.12390
[67]
I. Pettersson. 2022. Keeping tabs on GPT-SWE: Classifying toxic output from generative language models for Swedish text generation, diva-portal.org, 2022. [Online]. Available: https://www.diva-portal.org/smash/record.jsf?pid=diva2%3A1704893&dswid=391
[68]
A. Kucharavy, Z. Schillaci, L. Maréchal, M. Würsch, L. Dolamic, R. Sabonnadiere, D. David, A. Mermoud, and V. Lenders. 2023. Fundamentals of generative large language models and perspectives in cyber-defense. arXiv, Mar. 21, 2023. Accessed: Apr. 03, 2023. [Online]. Available: http://arxiv.org/abs/2303.12132
[69]
D. Ganguli, L. Lovitt, J. Kernion, A. Askell, Y. Bai, S. Kadavath, B. Mann, E. Perez, N. Schiefer, K. Ndousse, A. Jones, S. Bowman, A. Chen, T. Conerly, N. DasSarma, D. Drain, N. Elhage, S. El-Showk, S. Fort, Z. Hatfield-Dodds, T. Henighan, D. Hernandez, T. Hume, J. Jacobson, S. Johnston, S. Kravec, C. Olsson, S. Ringer, E. Tran-Johnson, D. Amodei, T. Brown, N. Joseph, S. McCandlish, C. Olah, J. Kaplan, and J. Clark. 2023. Red teaming language models to reduce harms: Methods, scaling behaviors, and lessons learned. arXiv, Nov. 22, 2022. Accessed: Apr. 03, 2023. [Online]. Available: http://arxiv.org/abs/2209.07858
[70]
Y. Ji, Y. Gong, Y. Peng, C. Ni, P. Sun, D. Pan, B. Ma, and X. Li. 2023. Exploring ChatGPT's ability to rank content: A preliminary study on consistency with human preferences. arXiv, Mar. 13, 2023. Accessed: Apr. 03, 2023. [Online]. Available: http://arxiv.org/abs/2303.07610
[71]
R. Sridhar and D. Yang. 2022. Explaining toxic text via knowledge enhanced text generation. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle WA, United States, 811–826. DOI:
[72]
OpenAI, “Models.” [Online]. Available: https://platform.openai.com/docs/models/gpt-3-5
[73]
K. Krippendorff. 2019. Content Analysis: An Introduction to Its Methodology. California SAGE Publications, Inc., Thousand Oaks. DOI:
[74]
A. Schöpke-Gonzalez, S. Wu, S. Kumar, P. J. Resnick, and L. Hemphill. 2023. How we define harm impacts data annotations: Explaining how annotators distinguish hateful, offensive, and toxic comments. (2023). DOI:
[75]
E. M. Humphries, C. Wright, A. M. Hoffman, C. Savonen, and J. T. Leek. 2023. What's the best chatbot for me? Researchers put LLMs through their paces. Nature. DOI:
[76]
T. Kuzman, I. Mozetič, and N. Ljubešić. 2023. ChatGPT: Beginning of an end of manual linguistic data annotation? Use case of automatic genre identification. (2023). DOI:
[77]
M. V. Reiss. 2023. Testing the reliability of ChatGPT for text annotation and classification: A cautionary remark. (2023). DOI:
[78]
J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith, and D. Schmidt. 2023. A prompt pattern catalog to enhance prompt engineering with ChatGPT. (2023). DOI:
[79]
J. Lever, M. Krzywinski, and N. Altman. 2016. Model selection and overfitting. Nat. Methods 13, 9 (2016), 703–704. DOI:
[80]
S. Myers West. 2018. Censored, suspended, shadowbanned: User interpretations of content moderation on social media platforms. New Media & Society 20, 11 (2018), 4366–4383. DOI:
[81]
S. Jhaver, I. Birman, E. Gilbert, and A. Bruckman. 2019. Human-machine collaboration for content regulation: The case of Reddit Automoderator. ACM Trans. Comput.-Hum. Interact. 26, 5 (2019), 1–35. DOI:
[82]
T. Dias Oliva, D. M. Antonialli, and A. Gomes. 2021. Fighting hate speech, silencing drag queens? Artificial intelligence in content moderation and risks to LGBTQ voices online. Sexuality & Culture 25, 2 (2021), 700–732. DOI:
[83]
S. Atreja, L. Hemphill, and P. Resnick. 2022. Remove, reduce, inform: What actions do people want social media platforms to take on potentially misleading content?. (2022). DOI:
[84]
K. Singh. 2023. US judge restricts Biden officials from contact with social media firms. Reuters. Accessed: Sep. 29, 2023. [Online]. Available: https://www.reuters.com/legal/judge-blocks-us-officials-communicating-with-social-media-companies-newspaper-2023-07-04/
[85]
Q. V. Liao, D. Gruen, and S. Miller. 2020. Questioning the AI: Informing design practices for explainable AI user experiences. In Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems. ACM, Honolulu, HI, USA, 1–15. DOI:
[86]
M. Gupta, C. Akiri, K. Aryal, E. Parker, and L. Praharaj. 2023. From ChatGPT to ThreatGPT: Impact of generative AI in cybersecurity and privacy. IEEE Access, 11 (2023), 80218–80245. DOI:
[87]
E. Wulczyn, N. Thain, and L. Dixon. 2017. Ex machina: Personal attacks seen at scale. In Proceedings of the 26th International Conference on World Wide Web, Perth Australia: International World Wide Web Conferences Steering Committee (2017), 1391–1399. DOI:
[88]
B. Kennedy, M. Atari, A. Davani, L. Yeh, A. Omrani, Y. Kim, K. Jr., S. Havaldar, G. Portillo-Wightman, E. Gonzalez, J. Hoover, A. Azatian, A. Hussain, A. Lara, G. Cardenas, A. Omary, C. Park, X. Wang, C. Wijaya, Y. Zhang, B. Meyerowitz, and M. Dehghani. 2022. Introducing the Gab Hate Corpus: Defining and applying hate-based rhetoric to social media posts at scale. Lang. Resources & Evaluation 56, 1 (2022), 79–108. DOI:
[89]
E. Denton, M. Díaz, I. Kivlichan, V. Prabhakaran, and R. Rosen. 2021. Whose ground truth? Accounting for individual and collective identities underlying dataset annotation. (2021). DOI:
[90]
P. S. Sachdeva, R. Barreto, C. von Vacano, and C. J. Kennedy. 2022. Assessing annotator identity sensitivity via item response theory: A case study in a hate speech corpus. In 2022 ACM Conference on Fairness, Accountability, and Transparency. ACM, Seoul, Republic of Korea, 1585–1603. DOI:
[91]
M. Sap, S. Swayamdipta, L. Vianna, X. Zhou, Y. Choi, and N. Smith. 2022. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. Association for Computational Linguistics, Seattle, WA, United States, 5884–5906. DOI:
[92]
Z. Waseem. 2016. Are you a racist or am I seeing things? Annotator influence on hate speech detection on Twitter. In Proceedings of the First Workshop on NLP and Computational Social Science. Association for Computational Linguistics, Austin, Texas, 138–142. DOI:
[93]
Moderation API. 2023. [Online]. Available: https://moderationapi.com/content-moderation
[94]
J. Pei and D. Jurgens. 2023. When do annotator demographics matter? Measuring the influence of annotator demographics with the POPQUORN Dataset. arXiv, Aug. 28, 2023. Accessed: Sep. 29, 2023. [Online]. Available: http://arxiv.org/abs/2306.06826
[95]
E. Denton, M. Díaz, I. Kivlichan, V. Prabhakaran, and R. Rosen. 2023. Whose ground truth? Accounting for individual and collective identities underlying dataset annotation. arXiv, Dec. 08, 2021. Accessed: Sep. 29, 2023. [Online]. Available: http://arxiv.org/abs/2112.04554
[96]
L. Chen, M. Zaharia, and J. Zou. 2023. How is ChatGPT's behavior changing over time? arXiv, Aug. 01, 2023. Accessed: Sep. 29, 2023. [Online]. Available: http://arxiv.org/abs/2307.09009
[97]
Meta AI, Introducing LLaMA: A foundational, 65-billion-parameter large language model. [Online]. Available: https://ai.facebook.com/blog/large-language-model-llama-meta-ai/
[98]
E. Almazrouei, H. Alobeidli, A. Alshamsi, A. Cappelli, R. Cojocaru, M. Debbah, E. Goffinet, D. Heslow, J. Launay, Q. Malartic, B. Noune, B. Pannier, and G. Penedo. 2023. Falcon-40B: An open large language model with state-of-the-art performance. Jun. 20, 2023. Accessed: Sep. 29, 2023. [Online]. Available: https://huggingface.co/tiiuae/falcon-40b
[99]
A. Chowdhery, S. Narang, J. Devlin, M. Bosma, G. Mishra, A. Roberts, P. Barham, H. Chung, C. Sutton, S. Gehrmann, P. Schuh, K. Shi, S. Tsvyashchenko, J. Maynez, A. Rao, P. Barnes, Y. Tay, N. Shazeer, V. Prabhakaran, E. Reif, N. Du, B. Hutchinson, R. Pope, J. Bradbury, J. Austin, M. Isard, G. Gur-Ari, P. Yin, T. Duke, A. Levskaya, S. Ghemawat, S. Dev, H. Michalewski, X. Garcia, V. Misra, K. Robinson, L. Fedus, D. Zhou, D. Ippolito, D. Luan, H. Lim, B. Zoph, A. Spiridonov, R. Sepassi, D. Dohan, S. Agrawal, M. Omernick, A. Dai, T. Pillai, M. Pellat, A. Lewkowycz, E. Moreira, R. Child, O. Polozov, K. Lee, Z. Zhou, X. Wang, B. Saeta, M. Diaz, O. Firat, M. Catasta, J. Wei, K. Meier-Hellstern, D. Eck, J. Dean, S. Petrov, and N. Fiedel. 2022. PaLM: Scaling language modeling with pathways. 2022. DOI:
[100]
Google developers. 2023. Google AI PaLM 2, Google AI. Accessed: Sep. 29, 2023. [Online]. Available: https://ai.google/discover/palm2/
[101]
Perspective developers. 2023. Perspective API. 2023. [Online]. Available: https://perspectiveapi.com/research/

Cited By

View all
  • (2024)The Effects of Social Approval Signals on the Production of Online Hate: A Theoretical ExplicationCommunication Research10.1177/00936502241278944Online publication date: 14-Sep-2024
  • (2024)8. Algorithms Against Antisemitism?Antisemitism in Online Communication10.11647/obp.0406.08(205-236)Online publication date: 21-Jun-2024
  • (2024)User Voices, Platform Choices: Social Media Policy Puzzle with Decentralization SaltExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650799(1-10)Online publication date: 11-May-2024
  • Show More Cited By

Index Terms

  1. “HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social Media

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on the Web
    ACM Transactions on the Web  Volume 18, Issue 2
    May 2024
    378 pages
    EISSN:1559-114X
    DOI:10.1145/3613666
    • Editor:
    • White Ryen
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 March 2024
    Online AM: 02 February 2024
    Accepted: 12 December 2023
    Revised: 30 September 2023
    Received: 03 May 2023
    Published in TWEB Volume 18, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Generative AI
    2. ChatGPT
    3. hate speech
    4. offensive language
    5. online toxicity
    6. MTurker annotation
    7. prompt engineering

    Qualifiers

    • Research-article

    Funding Sources

    • National Science Foundation

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,367
    • Downloads (Last 6 weeks)228
    Reflects downloads up to 16 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)The Effects of Social Approval Signals on the Production of Online Hate: A Theoretical ExplicationCommunication Research10.1177/00936502241278944Online publication date: 14-Sep-2024
    • (2024)8. Algorithms Against Antisemitism?Antisemitism in Online Communication10.11647/obp.0406.08(205-236)Online publication date: 21-Jun-2024
    • (2024)User Voices, Platform Choices: Social Media Policy Puzzle with Decentralization SaltExtended Abstracts of the 2024 CHI Conference on Human Factors in Computing Systems10.1145/3613905.3650799(1-10)Online publication date: 11-May-2024
    • (2024)Moderating New Waves of Online Hate with Chain-of-Thought Reasoning in Large Language Models2024 IEEE Symposium on Security and Privacy (SP)10.1109/SP54263.2024.00181(788-806)Online publication date: 19-May-2024
    • (2024)Investigating ChatGPT on Reddit by Using Lexicon-Based Sentiment Analysis2024 International Conference on Information Technology Research and Innovation (ICITRI)10.1109/ICITRI62858.2024.10699128(65-70)Online publication date: 5-Sep-2024
    • (2024)MLHS-CGCapNet: A Lightweight Model for Multilingual Hate Speech DetectionIEEE Access10.1109/ACCESS.2024.343466412(106631-106644)Online publication date: 2024
    • (2024)Leveraging Transfer Learning for Hate Speech Detection in Portuguese Social Media PostsIEEE Access10.1109/ACCESS.2024.343084812(101374-101389)Online publication date: 2024
    • (2024)Generative AI for Cyber Security: Analyzing the Potential of ChatGPT, DALL-E, and Other Models for Enhancing the Security SpaceIEEE Access10.1109/ACCESS.2024.338510712(53497-53516)Online publication date: 2024
    • (2024)Promoting Positive Discourse: Advancing AI-Powered Content Moderation with Explainability and User Rephrasing2024 International Conference on Advances in Computing, Communication and Applied Informatics (ACCAI)10.1109/ACCAI61061.2024.10601796(1-6)Online publication date: 9-May-2024
    • (2024)LLMs Red TeamingLarge Language Models in Cybersecurity10.1007/978-3-031-54827-7_24(213-223)Online publication date: 12-Apr-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media