Learning to Grow Pretrained Models for Efficient Transformer Training

Wang, Peihao; Panda, Rameswar; Hennigen, Lucas Torroba; Greengard, Philip; Karlinsky, Leonid; Feris, Rogerio; Cox, David Daniel; Wang, Zhangyang; Kim, Yoon

Computer Science > Machine Learning

arXiv:2303.00980 (cs)

[Submitted on 2 Mar 2023]

Title:Learning to Grow Pretrained Models for Efficient Transformer Training

Authors:Peihao Wang, Rameswar Panda, Lucas Torroba Hennigen, Philip Greengard, Leonid Karlinsky, Rogerio Feris, David Daniel Cox, Zhangyang Wang, Yoon Kim

View PDF

Abstract:Scaling transformers has led to significant breakthroughs in many domains, leading to a paradigm in which larger versions of existing models are trained and released on a periodic basis. New instances of such models are typically trained completely from scratch, despite the fact that they are often just scaled-up versions of their smaller counterparts. How can we use the implicit knowledge in the parameters of smaller, extant models to enable faster training of newer, larger models? This paper describes an approach for accelerating transformer training by learning to grow pretrained transformers, where we learn to linearly map the parameters of the smaller model to initialize the larger model. For tractable learning, we factorize the linear transformation as a composition of (linear) width- and depth-growth operators, and further employ a Kronecker factorization of these growth operators to encode architectural knowledge. Extensive experiments across both language and vision transformers demonstrate that our learned Linear Growth Operator (LiGO) can save up to 50% computational cost of training from scratch, while also consistently outperforming strong baselines that also reuse smaller pretrained models to initialize larger models.

Comments:	International Conference on Learning Representations (ICLR), 2023
Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2303.00980 [cs.LG]
	(or arXiv:2303.00980v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2303.00980

Submission history

From: Peihao Wang [view email]
[v1] Thu, 2 Mar 2023 05:21:18 UTC (1,726 KB)

Computer Science > Machine Learning

Title:Learning to Grow Pretrained Models for Efficient Transformer Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Learning to Grow Pretrained Models for Efficient Transformer Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators