AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses

Lu, Xiaotian; Li, Jiyi; Takeuchi, Koh; Kashima, Hisashi

Computer Science > Computation and Language

arXiv:2410.01246 (cs)

[Submitted on 2 Oct 2024]

Title:AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses

Authors:Xiaotian Lu, Jiyi Li, Koh Takeuchi, Hisashi Kashima

View PDF HTML (experimental)

Abstract:Question answering (QA) tasks have been extensively studied in the field of natural language processing (NLP). Answers to open-ended questions are highly diverse and difficult to quantify, and cannot be simply evaluated as correct or incorrect, unlike close-ended questions with definitive answers. While large language models (LLMs) have demonstrated strong capabilities across various tasks, they exhibit relatively weaker performance in evaluating answers to open-ended questions. In this study, we propose a method that leverages LLMs and the analytic hierarchy process (AHP) to assess answers to open-ended questions. We utilized LLMs to generate multiple evaluation criteria for a question. Subsequently, answers were subjected to pairwise comparisons under each criterion with LLMs, and scores for each answer were calculated in the AHP. We conducted experiments on four datasets using both ChatGPT-3.5-turbo and GPT-4. Our results indicate that our approach more closely aligns with human judgment compared to the four baselines. Additionally, we explored the impact of the number of criteria, variations in models, and differences in datasets on the results.

Comments:	Accepted for EMNLP 2024 Findings
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI)
Cite as:	arXiv:2410.01246 [cs.CL]
	(or arXiv:2410.01246v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2410.01246

Submission history

From: Xiaotian Lu [view email]
[v1] Wed, 2 Oct 2024 05:22:07 UTC (788 KB)

Computer Science > Computation and Language

Title:AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:AHP-Powered LLM Reasoning for Multi-Criteria Evaluation of Open-Ended Responses

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators