SAIL: Search-Augmented Instruction Learning

Luo, Hongyin; Chuang, Yung-Sung; Gong, Yuan; Zhang, Tianhua; Kim, Yoon; Wu, Xixin; Fox, Danny; Meng, Helen; Glass, James

Computer Science > Computation and Language

arXiv:2305.15225v1 (cs)

[Submitted on 24 May 2023 (this version), latest version 25 Jun 2023 (v2)]

Title:SAIL: Search-Augmented Instruction Learning

Authors:Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass

View PDF

Abstract:Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information. In this work, we propose search-augmented instruction learning (SAIL), which grounds the language generation and instruction following abilities on complex search results generated by in-house and external search engines. With an instruction tuning corpus, we collect search results for each training case from different search APIs and domains, and construct a new search-grounded training set containing \textit{(instruction, grounding information, response)} triplets. We then fine-tune the LLaMA-7B model on the constructed training set. Since the collected results contain unrelated and disputing languages, the model needs to learn to ground on trustworthy search results, filter out distracting passages, and generate the target response. The search result-denoising process entails explicit trustworthy information selection and multi-hop reasoning, since the retrieved passages might be informative but not contain the instruction-following answer. Experiments show that the fine-tuned SAIL-7B model has a strong instruction-following ability, and it performs significantly better on transparency-sensitive tasks, including open-ended question answering and fact checking.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2305.15225 [cs.CL]
	(or arXiv:2305.15225v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2305.15225

Submission history

From: Hongyin Luo [view email]
[v1] Wed, 24 May 2023 15:07:30 UTC (1,711 KB)
[v2] Sun, 25 Jun 2023 17:56:37 UTC (1,712 KB)

Computer Science > Computation and Language

Title:SAIL: Search-Augmented Instruction Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:SAIL: Search-Augmented Instruction Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators