Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Parmar, Jupinder; Satheesh, Sanjev; Patwary, Mostofa; Shoeybi, Mohammad; Catanzaro, Bryan

Computer Science > Computation and Language

arXiv:2407.07263 (cs)

[Submitted on 9 Jul 2024]

Title:Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Authors:Jupinder Parmar, Sanjev Satheesh, Mostofa Patwary, Mohammad Shoeybi, Bryan Catanzaro

View PDF HTML (experimental)

Abstract:As language models have scaled both their number of parameters and pretraining dataset sizes, the computational cost for pretraining has become intractable except for the most well-resourced teams. This increasing cost makes it ever more important to be able to reuse a model after it has completed pretraining; allowing for a model's abilities to further improve without needing to train from scratch. In this work, we detail a set of guidelines that cover how to design efficacious data distributions and learning rate schedules for continued pretraining of language models. When applying these findings within a continued pretraining run on top of a well-trained 15B parameter model, we show an improvement of 9\% in average model accuracy compared to the baseline of continued training on the pretraining set. The resulting recipe provides a practical starting point with which to begin developing language models through reuse rather than retraining.

Comments:	Preprint. Under review
Subjects:	Computation and Language (cs.CL)
Cite as:	arXiv:2407.07263 [cs.CL]
	(or arXiv:2407.07263v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.07263

Submission history

From: Jupinder Parmar [view email]
[v1] Tue, 9 Jul 2024 22:37:59 UTC (409 KB)

Computer Science > Computation and Language

Title:Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Reuse, Don't Retrain: A Recipe for Continued Pretraining of Language Models

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators