An Explanation of In-context Learning as Implicit Bayesian Inference

Xie, Sang Michael; Raghunathan, Aditi; Liang, Percy; Ma, Tengyu

Computer Science > Computation and Language

arXiv:2111.02080 (cs)

[Submitted on 3 Nov 2021 (v1), last revised 21 Jul 2022 (this version, v6)]

Title:An Explanation of In-context Learning as Implicit Bayesian Inference

Authors:Sang Michael Xie, Aditi Raghunathan, Percy Liang, Tengyu Ma

View PDF

Abstract:Large language models (LMs) such as GPT-3 have the surprising ability to do in-context learning, where the model learns to do a downstream task simply by conditioning on a prompt consisting of input-output examples. The LM learns from these examples without being explicitly pretrained to learn. Thus, it is unclear what enables in-context learning. In this paper, we study how in-context learning can emerge when pretraining documents have long-range coherence. Here, the LM must infer a latent document-level concept to generate coherent next tokens during pretraining. At test time, in-context learning occurs when the LM also infers a shared latent concept between examples in a prompt. We prove when this occurs despite a distribution mismatch between prompts and pretraining data in a setting where the pretraining distribution is a mixture of HMMs. In contrast to messy large-scale datasets used to train LMs capable of in-context learning, we generate a small-scale synthetic dataset (GINC) where Transformers and LSTMs both exhibit in-context learning. Beyond the theory, experiments on GINC exhibit large-scale real-world phenomena including improved in-context performance with model scaling (despite the same pretraining loss), sensitivity to example order, and instances where zero-shot is better than few-shot in-context learning.

Comments:	ICLR 2022
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2111.02080 [cs.CL]
	(or arXiv:2111.02080v6 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2111.02080

Submission history

From: Sang Michael Xie [view email]
[v1] Wed, 3 Nov 2021 09:12:33 UTC (766 KB)
[v2] Sun, 14 Nov 2021 23:16:56 UTC (783 KB)
[v3] Sat, 18 Dec 2021 17:52:54 UTC (998 KB)
[v4] Thu, 24 Mar 2022 20:06:34 UTC (1,279 KB)
[v5] Wed, 4 May 2022 21:01:33 UTC (1,397 KB)
[v6] Thu, 21 Jul 2022 07:44:13 UTC (1,397 KB)

Computer Science > Computation and Language

Title:An Explanation of In-context Learning as Implicit Bayesian Inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:An Explanation of In-context Learning as Implicit Bayesian Inference

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators