The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Kitouni, Ouail; Nolte, Niklas; Bouchacourt, Diane; Williams, Adina; Rabbat, Mike; Ibrahim, Mark

Computer Science > Machine Learning

arXiv:2406.05183 (cs)

[Submitted on 7 Jun 2024]

Title:The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Authors:Ouail Kitouni, Niklas Nolte, Diane Bouchacourt, Adina Williams, Mike Rabbat, Mark Ibrahim

View PDF

Abstract:Today's best language models still struggle with hallucinations: factually incorrect generations, which impede their ability to reliably retrieve information seen during training. The reversal curse, where models cannot recall information when probed in a different order than was encountered during training, exemplifies this in information retrieval. We reframe the reversal curse as a factorization curse - a failure of models to learn the same joint distribution under different factorizations. Through a series of controlled experiments with increasing levels of realism including WikiReversal, a setting we introduce to closely simulate a knowledge intensive finetuning task, we find that the factorization curse is an inherent failure of the next-token prediction objective used in popular large language models. Moreover, we demonstrate reliable information retrieval cannot be solved with scale, reversed tokens, or even naive bidirectional-attention training. Consequently, various approaches to finetuning on specialized data would necessarily provide mixed results on downstream tasks, unless the model has already seen the right sequence of tokens. Across five tasks of varying levels of complexity, our results uncover a promising path forward: factorization-agnostic objectives can significantly mitigate the reversal curse and hint at improved knowledge storage and planning capabilities.

Comments:	18 pages, 7 figures
Subjects:	Machine Learning (cs.LG); Artificial Intelligence (cs.AI); Computation and Language (cs.CL)
Cite as:	arXiv:2406.05183 [cs.LG]
	(or arXiv:2406.05183v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.05183

Submission history

From: Ouail Kitouni [view email]
[v1] Fri, 7 Jun 2024 18:00:37 UTC (6,100 KB)

Computer Science > Machine Learning

Title:The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:The Factorization Curse: Which Tokens You Predict Underlie the Reversal Curse and More

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators