Retrieval is Accurate Generation

Cao, Bowen; Cai, Deng; Cui, Leyang; Cheng, Xuxin; Bi, Wei; Zou, Yuexian; Shi, Shuming

Computer Science > Computation and Language

arXiv:2402.17532v2 (cs)

[Submitted on 27 Feb 2024 (v1), revised 29 Feb 2024 (this version, v2), latest version 16 Mar 2024 (v3)]

Title:Retrieval is Accurate Generation

Authors:Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi

View PDF HTML (experimental)

Abstract:Standard language models generate text by selecting tokens from a fixed, finite, and standalone vocabulary. We introduce a novel method that selects context-aware phrases from a collection of supporting documents. One of the most significant challenges for this paradigm shift is determining the training oracles, because a string of text can be segmented in various ways and each segment can be retrieved from numerous possible documents. To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement. Extensive experiments show that our model not only outperforms standard language models on a variety of knowledge-intensive tasks but also demonstrates improved generation quality in open-ended text generation. For instance, compared to the standard language model counterpart, our model raises the accuracy from 23.47% to 36.27% on OpenbookQA, and improves the MAUVE score from 42.61% to 81.58% in open-ended text generation. Remarkably, our model also achieves the best performance and the lowest latency among several retrieval-augmented baselines. In conclusion, we assert that retrieval is more accurate generation and hope that our work will encourage further research on this new paradigm shift.

Comments:	ICLR 2024
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2402.17532 [cs.CL]
	(or arXiv:2402.17532v2 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2402.17532

Submission history

From: Bowen Cao [view email]
[v1] Tue, 27 Feb 2024 14:16:19 UTC (6,916 KB)
[v2] Thu, 29 Feb 2024 07:56:14 UTC (6,916 KB)
[v3] Sat, 16 Mar 2024 04:31:47 UTC (6,916 KB)

Computer Science > Computation and Language

Title:Retrieval is Accurate Generation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Retrieval is Accurate Generation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators