Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

Alexandrov, Anton; Raychev, Veselin; Müller, Mark Niklas; Zhang, Ce; Vechev, Martin; Toutanova, Kristina

Computer Science > Machine Learning

arXiv:2407.08699 (cs)

[Submitted on 11 Jul 2024 (v1), last revised 16 Jul 2024 (this version, v2)]

Title:Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

Authors:Anton Alexandrov, Veselin Raychev, Mark Niklas Müller, Ce Zhang, Martin Vechev, Kristina Toutanova

View PDF HTML (experimental)

Abstract:As open-weight large language models (LLMs) achieve ever more impressive performances across a wide range of tasks in English, practitioners aim to adapt these models to different languages. However, such language adaptation is often accompanied by catastrophic forgetting of the base model's capabilities, severely limiting the usefulness of the resulting model. We address this issue by proposing Branch-and-Merge (BaM), a new adaptation method based on iteratively merging multiple models, fine-tuned on a subset of the available training data. BaM is based on the insight that this yields lower magnitude but higher quality weight changes, reducing forgetting of the source domain while maintaining learning on the target domain. We demonstrate in an extensive empirical study on Bulgarian and German that BaM can significantly reduce forgetting while matching or even improving target domain performance compared to both standard continued pretraining and instruction finetuning across different model architectures.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2407.08699 [cs.LG]
	(or arXiv:2407.08699v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2407.08699

Submission history

From: Anton Alexandrov [view email]
[v1] Thu, 11 Jul 2024 17:32:40 UTC (1,213 KB)
[v2] Tue, 16 Jul 2024 07:48:06 UTC (1,213 KB)

Computer Science > Machine Learning

Title:Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Mitigating Catastrophic Forgetting in Language Transfer via Model Merging

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators