research-article

“HOT” ChatGPT: The Promise of ChatGPT in Detecting and Discriminating Hateful, Offensive, and Toxic Comments on Social Media

Authors:

Lingyao Li,

Lizhou Fan,

Shubham Atreja,

Libby HemphillAuthors Info & Claims

ACM Transactions on the Web, Volume 18, Issue 2

Article No.: 30, Pages 1 - 36

https://doi.org/10.1145/3643829

Published: 12 March 2024 Publication History

Get Access

Abstract

Harmful textual content is pervasive on social media, poisoning online communities and negatively impacting participation. A common approach to this issue is developing detection models that rely on human annotations. However, the tasks required to build such models expose annotators to harmful and offensive content and may require significant time and cost to complete. Generative AI models have the potential to understand and detect harmful textual content. We used ChatGPT to investigate this potential and compared its performance with MTurker annotations for three frequently discussed concepts related to harmful textual content on social media: Hateful, Offensive, and Toxic (HOT). We designed five prompts to interact with ChatGPT and conducted four experiments eliciting HOT classifications. Our results show that ChatGPT can achieve an accuracy of approximately 80% when compared to MTurker annotations. Specifically, the model displays a more consistent classification for non-HOT comments than HOT comments compared to human annotations. Our findings also suggest that ChatGPT classifications align with the provided HOT definitions. However, ChatGPT classifies “hateful” and “offensive” as subsets of “toxic.” Moreover, the choice of prompts used to interact with ChatGPT impacts its performance. Based on these insights, our study provides several meaningful implications for employing ChatGPT to detect HOT content, particularly regarding the reliability and consistency of its performance, its understanding and reasoning of the HOT concept, and the impact of prompts on its performance. Overall, our study provides guidance on the potential of using generative AI models for moderating large volumes of user-generated textual content on social media.

References

[1]

L. Fan, H. Yu, and Z. Yin. 2020. Stigmatization in social media: Documenting and analyzing hate speech for COVID-19 on Twitter. Proc. Assoc. Inf. Sci. Technol. 57, 1 (2020). DOI:

Abstract

References

Cited By

Index Terms

Recommendations

Using ChatGPT in Content Marketing: Enhancing Users’ Social Media Engagement in Cross-Platform Content Creation through Generative AI

Hate speech and offensive language detection in Dravidian languages using deep ensemble framework

The Virality of Hate Speech on Social Media

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Funding Sources

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Full Text

Share

Share this Publication link

Share on social media

Affiliations