Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Hsieh, Cheng-Yu; Chuang, Yung-Sung; Li, Chun-Liang; Wang, Zifeng; Le, Long T.; Kumar, Abhishek; Glass, James; Ratner, Alexander; Lee, Chen-Yu; Krishna, Ranjay; Pfister, Tomas

Computer Science > Computation and Language

arXiv:2406.16008v1 (cs)

[Submitted on 23 Jun 2024 (this version), latest version 3 Jul 2024 (v2)]

Title:Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Authors:Cheng-Yu Hsieh, Yung-Sung Chuang, Chun-Liang Li, Zifeng Wang, Long T. Le, Abhishek Kumar, James Glass, Alexander Ratner, Chen-Yu Lee, Ranjay Krishna, Tomas Pfister

View PDF HTML (experimental)

Abstract:Large language models (LLMs), even when specifically trained to process long input contexts, struggle to capture relevant information located in the middle of their input. This phenomenon has been known as the lost-in-the-middle problem. In this work, we make three contributions. First, we set out to understand the factors that cause this phenomenon. In doing so, we establish a connection between lost-in-the-middle to LLMs' intrinsic attention bias: LLMs exhibit a U-shaped attention bias where the tokens at the beginning and at the end of its input receive higher attention, regardless of their relevance. Second, we mitigate this positional bias through a calibration mechanism, found-in-the-middle, that allows the model to attend to contexts faithfully according to their relevance, even though when they are in the middle. Third, we show found-in-the-middle not only achieves better performance in locating relevant information within a long context, but also eventually leads to improved retrieval-augmented generation (RAG) performance across various tasks, outperforming existing methods by up to 15 percentage points. These findings open up future directions in understanding LLM attention bias and its potential consequences.

Comments:	ACL Findings 2024
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2406.16008 [cs.CL]
	(or arXiv:2406.16008v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.16008

Submission history

From: Cheng-Yu Hsieh [view email]
[v1] Sun, 23 Jun 2024 04:35:42 UTC (1,762 KB)
[v2] Wed, 3 Jul 2024 17:40:00 UTC (1,762 KB)

Computer Science > Computation and Language

Title:Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Found in the Middle: Calibrating Positional Attention Bias Improves Long Context Utilization

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators