CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Tu, Quan; Fan, Shilong; Tian, Zihang; Yan, Rui

Computer Science > Computation and Language

arXiv:2401.01275 (cs)

[Submitted on 2 Jan 2024 (v1), last revised 9 Jan 2024 (this version, v2)]

Title:CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Authors:Quan Tu, Shilong Fan, Zihang Tian, Rui Yan

View PDF

Abstract:Recently, the advent of large language models (LLMs) has revolutionized generative agents. Among them, Role-Playing Conversational Agents (RPCAs) attract considerable attention due to their ability to emotionally engage users. However, the absence of a comprehensive benchmark impedes progress in this field. To bridge this gap, we introduce CharacterEval, a Chinese benchmark for comprehensive RPCA assessment, complemented by a tailored high-quality dataset. The dataset comprises 1,785 multi-turn role-playing dialogues, encompassing 23,020 examples and featuring 77 characters derived from Chinese novels and scripts. It was carefully constructed, beginning with initial dialogue extraction via GPT-4, followed by rigorous human-led quality control, and enhanced with in-depth character profiles sourced from Baidu Baike. CharacterEval employs a multifaceted evaluation approach, encompassing thirteen targeted metrics on four dimensions. Comprehensive experiments on CharacterEval demonstrate that Chinese LLMs exhibit more promising capabilities than GPT-4 in Chinese role-playing conversation. Source code, data source and reward model will be publicly accessible at this https URL.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2401.01275 [cs.CL]
	(or arXiv:2401.01275v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2401.01275

Submission history

From: Tu Quan [view email]
[v1] Tue, 2 Jan 2024 16:20:40 UTC (579 KB)
[v2] Tue, 9 Jan 2024 18:54:05 UTC (579 KB)

Computer Science > Computation and Language

Title:CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:CharacterEval: A Chinese Benchmark for Role-Playing Conversational Agent Evaluation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators