Benchmark Data Contamination of Large Language Models: A Survey

Xu, Cheng; Guan, Shuhao; Greene, Derek; Kechadi, M-Tahar

Computer Science > Computation and Language

arXiv:2406.04244v1 (cs)

[Submitted on 6 Jun 2024]

Title:Benchmark Data Contamination of Large Language Models: A Survey

Authors:Cheng Xu, Shuhao Guan, Derek Greene, M-Tahar Kechadi

View PDF HTML (experimental)

Abstract:The rapid development of Large Language Models (LLMs) like GPT-4, Claude-3, and Gemini has transformed the field of natural language processing. However, it has also resulted in a significant issue known as Benchmark Data Contamination (BDC). This occurs when language models inadvertently incorporate evaluation benchmark information from their training data, leading to inaccurate or unreliable performance during the evaluation phase of the process. This paper reviews the complex challenge of BDC in LLM evaluation and explores alternative assessment methods to mitigate the risks associated with traditional benchmarks. The paper also examines challenges and future directions in mitigating BDC risks, highlighting the complexity of the issue and the need for innovative solutions to ensure the reliability of LLM evaluation in real-world applications.

Comments:	31 pages, 7 figures, 3 tables
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2406.04244 [cs.CL]
	(or arXiv:2406.04244v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.04244

Submission history

From: Cheng Xu [view email]
[v1] Thu, 6 Jun 2024 16:41:39 UTC (5,816 KB)

Full-text links:

Access Paper:

view license

Current browse context:

cs.CL

< prev | next >

new | recent | 2024-06

Change to browse by:

References & Citations

export BibTeX citation

Computer Science > Computation and Language

Title:Benchmark Data Contamination of Large Language Models: A Survey

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Benchmark Data Contamination of Large Language Models: A Survey

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators