Reducing Distraction in Long-Context Language Models by Focused Learning

Wu, Zijun; Liu, Bingyuan; Yan, Ran; Chen, Lei; Delteil, Thomas

Computer Science > Computation and Language

arXiv:2411.05928 (cs)

[Submitted on 8 Nov 2024]

Title:Reducing Distraction in Long-Context Language Models by Focused Learning

Authors:Zijun Wu, Bingyuan Liu, Ran Yan, Lei Chen, Thomas Delteil

View PDF HTML (experimental)

Abstract:Recent advancements in Large Language Models (LLMs) have significantly enhanced their capacity to process long contexts. However, effectively utilizing this long context remains a challenge due to the issue of distraction, where irrelevant information dominates lengthy contexts, causing LLMs to lose focus on the most relevant segments. To address this, we propose a novel training method that enhances LLMs' ability to discern relevant information through a unique combination of retrieval-based data augmentation and contrastive learning. Specifically, during fine-tuning with long contexts, we employ a retriever to extract the most relevant segments, serving as augmented inputs. We then introduce an auxiliary contrastive learning objective to explicitly ensure that outputs from the original context and the retrieved sub-context are closely aligned. Extensive experiments on long single-document and multi-document QA benchmarks demonstrate the effectiveness of our proposed method.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2411.05928 [cs.CL]
	(or arXiv:2411.05928v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2411.05928

Submission history

From: Zijun Wu [view email]
[v1] Fri, 8 Nov 2024 19:27:42 UTC (289 KB)

Computer Science > Computation and Language

Title:Reducing Distraction in Long-Context Language Models by Focused Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Reducing Distraction in Long-Context Language Models by Focused Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators