MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Li, Lu; Zhang, Tianyu; Bu, Zhiqi; Wang, Suyuchen; He, Huan; Fu, Jie; Wu, Yonghui; Bian, Jiang; Chen, Yong; Bengio, Yoshua

Computer Science > Machine Learning

arXiv:2406.07529 (cs)

[Submitted on 11 Jun 2024 (v1), last revised 19 Oct 2024 (this version, v4)]

Title:MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Authors:Lu Li, Tianyu Zhang, Zhiqi Bu, Suyuchen Wang, Huan He, Jie Fu, Yonghui Wu, Jiang Bian, Yong Chen, Yoshua Bengio

View PDF HTML (experimental)

Abstract:Model merging has emerged as an effective approach to combine multiple single-task models into a multitask model. This process typically involves computing a weighted average of the model parameters without any additional training. Existing model-merging methods focus on enhancing average task accuracy. However, interference and conflicts between the objectives of different tasks can lead to trade-offs during the merging process. In real-world applications, a set of solutions with various trade-offs can be more informative, helping practitioners make decisions based on diverse preferences. In this paper, we introduce a novel and low-compute algorithm, Model Merging with Amortized Pareto Front (MAP). MAP efficiently identifies a Pareto set of scaling coefficients for merging multiple models, reflecting the trade-offs involved. It amortizes the substantial computational cost of evaluations needed to estimate the Pareto front by using quadratic approximation surrogate models derived from a pre-selected set of scaling coefficients. Experimental results on vision and natural language processing tasks demonstrate that MAP can accurately identify the Pareto front, providing practitioners with flexible solutions to balance competing task objectives. We also introduce Bayesian MAP for scenarios with a relatively low number of tasks and Nested MAP for situations with a high number of tasks, further reducing the computational cost of evaluation.

Subjects:	Machine Learning (cs.LG)
Cite as:	arXiv:2406.07529 [cs.LG]
	(or arXiv:2406.07529v4 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2406.07529

Submission history

From: Lu Li [view email]
[v1] Tue, 11 Jun 2024 17:55:25 UTC (25,322 KB)
[v2] Tue, 18 Jun 2024 06:24:11 UTC (19,771 KB)
[v3] Mon, 2 Sep 2024 20:42:08 UTC (29,377 KB)
[v4] Sat, 19 Oct 2024 00:49:52 UTC (26,149 KB)

Computer Science > Machine Learning

Title:MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:MAP: Low-compute Model Merging with Amortized Pareto Fronts via Quadratic Approximation

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators