InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding

Chen, Qiaoling; Gu, Diandian; Wang, Guoteng; Chen, Xun; Xiong, YingTong; Huang, Ting; Hu, Qinghao; Jin, Xin; Wen, Yonggang; Zhang, Tianwei; Sun, Peng

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2401.09149 (cs)

[Submitted on 17 Jan 2024 (v1), last revised 22 Jan 2024 (this version, v3)]

Title:InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding

Authors:Qiaoling Chen, Diandian Gu, Guoteng Wang, Xun Chen, YingTong Xiong, Ting Huang, Qinghao Hu, Xin Jin, Yonggang Wen, Tianwei Zhang, Peng Sun

View PDF HTML (experimental)

Abstract:Large language models (LLMs) with long sequences begin to power more and more fundamentally new applications we use every day. Existing methods for long-sequence LLM training are neither efficient nor compatible with commonly-used training algorithms such as FlashAttention. We design InternEvo to address these issues. InternEvo decouples all of the sharding dimensions into a new hierarchical space, and systematically analyzes the memory and communication cost of LLM training. Then, it generates an effective hybrid parallelism strategy. We design a new selective overlap mechanism to mitigate the communication overhead introduced by the hybrid parallelism. We also implement memory management techniques to reduce GPU memory fragmentation. Evaluation results show that InternEvo generates parallelization strategies that match or outperform existing methods in model FLOPs utilization.

Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2401.09149 [cs.DC]
	(or arXiv:2401.09149v3 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2401.09149

Submission history

From: Qiaoling Chen [view email]
[v1] Wed, 17 Jan 2024 11:47:59 UTC (7,374 KB)
[v2] Fri, 19 Jan 2024 08:56:41 UTC (4,604 KB)
[v3] Mon, 22 Jan 2024 06:40:44 UTC (4,604 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators