Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Zhang, Zeliang; Liu, Xiaodong; Cheng, Hao; Xu, Chenliang; Gao, Jianfeng

Computer Science > Computation and Language

arXiv:2407.09590 (cs)

[Submitted on 12 Jul 2024]

Title:Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Authors:Zeliang Zhang, Xiaodong Liu, Hao Cheng, Chenliang Xu, Jianfeng Gao

View PDF

Abstract:By increasing model parameters but activating them sparsely when performing a task, the use of Mixture-of-Experts (MoE) architecture significantly improves the performance of Large Language Models (LLMs) without increasing the inference cost. However, the memory consumption due to the growing number of experts presents a challenge to the deployment of these models in many real world settings. Our empirical study reveals that some experts encode redundant knowledge during pre-training. We thus propose a method of grouping and pruning similar experts to improve model's parameter efficiency. We validate the effectiveness of our method by pruning two state-of-the-art MoE models, Mixtral-8x7B and Mixtral-8x22B. Evaluation shows that our method outperforms other model pruning methods on a range of natural language tasks. To facilitate future research, we will release our code and the pruned MoE models.

Comments:	13pages, 6 figures
Subjects:	Computation and Language (cs.CL); Machine Learning (cs.LG)
Cite as:	arXiv:2407.09590 [cs.CL]
	(or arXiv:2407.09590v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2407.09590

Submission history

From: Xiaodong Liu [view email]
[v1] Fri, 12 Jul 2024 17:25:02 UTC (1,696 KB)

Computer Science > Computation and Language

Title:Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Computation and Language

Title:Diversifying the Expert Knowledge for Task-Agnostic Pruning in Sparse Mixture-of-Experts

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators