ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

Lin, Zi; Wang, Zihan; Tong, Yongqi; Wang, Yangkun; Guo, Yuxin; Wang, Yujia; Shang, Jingbo

Computer Science > Computation and Language

arXiv:2310.17389 (cs)

[Submitted on 26 Oct 2023]

Title:ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

Authors:Zi Lin, Zihan Wang, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang, Jingbo Shang

View PDF

Abstract:Despite remarkable advances that large language models have achieved in chatbots, maintaining a non-toxic user-AI interactive environment has become increasingly critical nowadays. However, previous efforts in toxicity detection have been mostly based on benchmarks derived from social media content, leaving the unique challenges inherent to real-world user-AI interactions insufficiently explored. In this work, we introduce ToxicChat, a novel benchmark based on real user queries from an open-source chatbot. This benchmark contains the rich, nuanced phenomena that can be tricky for current toxicity detection models to identify, revealing a significant domain difference compared to social media content. Our systematic evaluation of models trained on existing toxicity datasets has shown their shortcomings when applied to this unique domain of ToxicChat. Our work illuminates the potentially overlooked challenges of toxicity detection in real-world user-AI conversations. In the future, ToxicChat can be a valuable resource to drive further advancements toward building a safe and healthy environment for user-AI interactions.

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2310.17389 [cs.CL]
	(or arXiv:2310.17389v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2310.17389
Journal reference:	EMNLP findings 2023

Submission history

From: Zi Lin [view email]
[v1] Thu, 26 Oct 2023 13:35:41 UTC (8,017 KB)

Computer Science > Computation and Language

Title:ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators