Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study

Song, Mingyang; Zheng, Mao; Luo, Xuan

Computer Science > Computation and Language

arXiv:2406.11629 (cs)

[Submitted on 17 Jun 2024 (v1), last revised 17 Sep 2024 (this version, v4)]

Title:Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study

Authors:Mingyang Song, Mao Zheng, Xuan Luo

View PDF HTML (experimental)

Abstract:Utilizing Large Language Models (LLMs) as evaluators for evaluating the performance of LLMs has recently garnered attention. However, this kind of evaluation approach is affected by potential biases in LLMs, raising concerns about the accuracy and reliability of the evaluation results. To mitigate this issue, we propose and study two many-shot ICL prompts, which rely on two versions of many-shot ICL prompt templates for helping LLM evaluators to mitigate the potential biases in LLMs, \textbf{M}any-\textbf{S}hot \textbf{w}ith \textbf{R}eference (\textbf{MSwR}) and \textbf{M}any-\textbf{S}hot with\textbf{o}ut \textbf{R}eference (\textbf{MSoR}). Concretely, the former utilizes in-context examples with model-generated rationales as guidance, and the latter without. Based on the designed prompts, we investigate the impact of scaling the number of in-context examples on the consistency and quality of the evaluation results. Experimental results show that advanced LLMs, such as GPT-4o, perform better in the many-shot regime than in the zero-shot regime. Furthermore, we reveal the symbol bias hidden in the selection bias of LLMs and propose a simple yet effective approach to mitigate the bias. Experimental results further verify the effectiveness of the symbol bias mitigation approach.

Comments:	work in progress
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2406.11629 [cs.CL]
	(or arXiv:2406.11629v4 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.11629

Submission history

From: Mingyang Song [view email]
[v1] Mon, 17 Jun 2024 15:11:58 UTC (221 KB)
[v2] Mon, 24 Jun 2024 16:02:21 UTC (235 KB)
[v3] Sun, 30 Jun 2024 13:31:24 UTC (221 KB)
[v4] Tue, 17 Sep 2024 14:04:27 UTC (312 KB)

Computer Science > Computation and Language

Title:Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Can Many-Shot In-Context Learning Help LLMs as Evaluators? A Preliminary Empirical Study

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators