On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Nan, Linyong; Zhang, Ellen; Zou, Weijin; Zhao, Yilun; Zhou, Wenfei; Cohan, Arman

Computer Science > Computation and Language

arXiv:2311.09721 (cs)

[Submitted on 16 Nov 2023]

Title:On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Authors:Linyong Nan, Ellen Zhang, Weijin Zou, Yilun Zhao, Wenfei Zhou, Arman Cohan

View PDF

Abstract:This study introduces a new long-form database question answering dataset designed to evaluate how Large Language Models (LLMs) interact with a SQL interpreter. The task necessitates LLMs to strategically generate multiple SQL queries to retrieve sufficient data from a database, to reason with the acquired context, and to synthesize them into a comprehensive analytical narrative. Our findings highlight that this task poses great challenges even for the state-of-the-art GPT-4 model. We propose and evaluate two interaction strategies, and provide a fine-grained analysis of the individual stages within the interaction. A key discovery is the identification of two primary bottlenecks hindering effective interaction: the capacity for planning and the ability to generate multiple SQL queries. To address the challenge of accurately assessing answer quality, we introduce a multi-agent evaluation framework that simulates the academic peer-review process, enhancing the precision and reliability of our evaluations. This framework allows for a more nuanced understanding of the strengths and limitations of current LLMs in complex retrieval and reasoning tasks.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2311.09721 [cs.CL]
	(or arXiv:2311.09721v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2311.09721

Submission history

From: Linyong Nan [view email]
[v1] Thu, 16 Nov 2023 09:55:07 UTC (340 KB)

Computer Science > Computation and Language

Title:On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:On Evaluating the Integration of Reasoning and Action in LLM Agents with Database Question Answering

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators