Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese

Xu, Yunqi; Cai, Tianchi; Jiang, Jiyan; Song, Xierui

doi:10.1145/3637528.3671656

Computer Science > Computation and Language

arXiv:2407.01080 (cs)

[Submitted on 1 Jul 2024 (v1), last revised 3 Jul 2024 (this version, v2)]

Title:Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese

Authors:Yunqi Xu, Tianchi Cai, Jiyan Jiang, Xierui Song

View PDF

Abstract:The prevailing issue of factual inconsistency errors in conventional Retrieval Augmented Generation (RAG) motivates the study of Factual Consistency Evaluation (FCE). Despite the various FCE methods proposed earlier, these methods are evaluated on datasets generated by specific Large Language Models (LLMs). Without a comprehensive benchmark, it remains unexplored how these FCE methods perform on other LLMs with different error distributions or even unseen error types, as these methods may fail to detect the error types generated by other LLMs. To fill this gap, in this paper, we propose the first comprehensive FCE benchmark \emph{Face4RAG} for RAG independent of the underlying LLM. Our benchmark consists of a synthetic dataset built upon a carefully designed typology for factuality inconsistency error and a real-world dataset constructed from six commonly used LLMs, enabling evaluation of FCE methods on specific error types or real-world error distributions. On the proposed benchmark, we discover the failure of existing FCE methods to detect the logical fallacy, which refers to a mismatch of logic structures between the answer and the retrieved reference. To fix this issue, we further propose a new method called \emph{L-Face4RAG} with two novel designs of logic-preserving answer decomposition and fact-logic FCE. Extensive experiments show L-Face4RAG substantially outperforms previous methods for factual inconsistency detection on a wide range of tasks, notably beyond the RAG task from which it is originally motivated. Both the benchmark and our proposed method are publicly available.\footnote{\url{this https URL}\label{link_face4rag}}

Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2407.01080 [cs.CL]
	(or arXiv:2407.01080v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.01080
Journal reference:	KDD 2024 (oral)
Related DOI:	https://doi.org/10.1145/3637528.3671656

Submission history

From: Yunqi Xu [view email]
[v1] Mon, 1 Jul 2024 08:35:04 UTC (1,305 KB)
[v2] Wed, 3 Jul 2024 12:49:34 UTC (1,305 KB)

Computer Science > Computation and Language

Title:Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Face4RAG: Factual Consistency Evaluation for Retrieval Augmented Generation in Chinese

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators