A Mean Field Ansatz for Zero-Shot Weight Transfer

Chen, Xingyuan; Kuang, Wenwei; Deng, Lei; Han, Wei; Bai, Bo; Reis, Goncalo dos

Computer Science > Machine Learning

arXiv:2408.08681 (cs)

[Submitted on 16 Aug 2024]

Title:A Mean Field Ansatz for Zero-Shot Weight Transfer

Authors:Xingyuan Chen, Wenwei Kuang, Lei Deng, Wei Han, Bo Bai, Goncalo dos Reis

View PDF

Abstract:The pre-training cost of large language models (LLMs) is prohibitive. One cutting-edge approach to reduce the cost is zero-shot weight transfer, also known as model growth for some cases, which magically transfers the weights trained in a small model to a large model. However, there are still some theoretical mysteries behind the weight transfer. In this paper, inspired by prior applications of mean field theory to neural network dynamics, we introduce a mean field ansatz to provide a theoretical explanation for weight transfer. Specifically, we propose the row-column (RC) ansatz under the mean field point of view, which describes the measure structure of the weights in the neural network (NN) and admits a close measure dynamic. Thus, the weights of different sizes NN admit a common distribution under proper assumptions, and weight transfer methods can be viewed as sampling methods. We empirically validate the RC ansatz by exploring simple MLP examples and LLMs such as GPT-3 and Llama-3.1. We show the mean-field point of view is adequate under suitable assumptions which can provide theoretical support for zero-shot weight transfer.

Comments:	40 pages, 6 Figures, 1 table
Subjects:	Machine Learning (cs.LG); Numerical Analysis (math.NA); Probability (math.PR)
Cite as:	arXiv:2408.08681 [cs.LG]
	(or arXiv:2408.08681v1 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2408.08681

Submission history

From: Gonçalo dos Reis Dr. [view email]
[v1] Fri, 16 Aug 2024 11:53:52 UTC (1,692 KB)

Computer Science > Machine Learning

Title:A Mean Field Ansatz for Zero-Shot Weight Transfer

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:A Mean Field Ansatz for Zero-Shot Weight Transfer

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators