Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning

Zheng, Bowen; Ma, Ming; Lin, Zhongqiao; Yang, Tianming

Computer Science > Computation and Language

arXiv:2406.16007 (cs)

[Submitted on 23 Jun 2024]

Title:Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning

Authors:Bowen Zheng, Ming Ma, Zhongqiao Lin, Tianming Yang

View PDF HTML (experimental)

Abstract:Large Language Models (LLMs) have demonstrated remarkable abilities, one of the most important being In-Context Learning (ICL). With ICL, LLMs can derive the underlying rule from a few demonstrations and provide answers that comply with the rule. Previous work hypothesized that the network creates a "task vector" in specific positions during ICL. Patching the "task vector" allows LLMs to achieve zero-shot performance similar to few-shot learning. However, we discover that such "task vectors" do not exist in tasks where the rule has to be defined through multiple demonstrations. Instead, the rule information provided by each demonstration is first transmitted to its answer position and forms its own rule vector. Importantly, all the rule vectors contribute to the output in a distributed manner. We further show that the rule vectors encode a high-level abstraction of rules extracted from the demonstrations. These results are further validated in a series of tasks that rely on rules dependent on multiple demonstrations. Our study provides novel insights into the mechanism underlying ICL in LLMs, demonstrating how ICL may be achieved through an information aggregation mechanism.

Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2406.16007 [cs.CL]
	(or arXiv:2406.16007v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2406.16007

Submission history

From: Ming Ma [view email]
[v1] Sun, 23 Jun 2024 04:29:13 UTC (1,430 KB)

Computer Science > Computation and Language

Title:Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Distributed Rule Vectors is A Key Mechanism in Large Language Models' In-Context Learning

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators