Scale up with Order: Finding Good Data Permutations for Distributed Training

Guo, Wentao; Pham, Khiem; Lu, Yucheng; Yuan, Tiancheng; Ruan, Charlie F.; De Sa, Christopher

Computer Science > Machine Learning

arXiv:2302.00845v2 (cs)

[Submitted on 2 Feb 2023 (v1), revised 6 Mar 2023 (this version, v2), latest version 21 Dec 2023 (v5)]

Title:Scale up with Order: Finding Good Data Permutations for Distributed Training

Authors:Wentao Guo, Khiem Pham, Yucheng Lu, Tiancheng Yuan, Charlie F. Ruan, Christopher De Sa

View PDF

Abstract:Gradient Balancing (GraB) is a recently proposed technique that finds provably better data permutations when training models with multiple epochs over a finite dataset. It converges at a faster rate than the widely adopted Random Reshuffling, by minimizing the discrepancy of the gradients on adjacently selected examples. However, GraB only operates under critical assumptions such as small batch sizes and centralized data, leaving open the question of how to order examples at large scale -- i.e. distributed learning with decentralized data. To alleviate the limitation, in this paper we propose D-GraB, an algorithm that orders the examples in a parallel setting with negligible overhead, which enjoys linear speed up at rate $\tilde{O}((mnT)^{-2/3})$ on smooth non-convex objectives and $\tilde{O}((mnT)^{-2})$ under PL condition, where $n$ denotes the number of parallel workers, $m$ denotes the number of examples per worker and $T$ denotes the number of epochs. D-GraB benefits from both data ordering and parallelism. Empirically, we show on various applications including GLUE, CIFAR10 and WikiText-2 that D-GraB outperforms naive parallel GraB and Distributed Random Reshuffling in terms of both training and validation performance.

Subjects:	Machine Learning (cs.LG); Distributed, Parallel, and Cluster Computing (cs.DC); Optimization and Control (math.OC)
Cite as:	arXiv:2302.00845 [cs.LG]
	(or arXiv:2302.00845v2 [cs.LG] for this version)
	https://doi.org/10.48550/arXiv.2302.00845

Submission history

From: Wentao Guo [view email]
[v1] Thu, 2 Feb 2023 03:15:29 UTC (6,743 KB)
[v2] Mon, 6 Mar 2023 23:04:27 UTC (6,744 KB)
[v3] Mon, 29 May 2023 22:36:53 UTC (7,836 KB)
[v4] Wed, 6 Dec 2023 05:49:55 UTC (7,883 KB)
[v5] Thu, 21 Dec 2023 19:41:57 UTC (7,883 KB)

Computer Science > Machine Learning

Title:Scale up with Order: Finding Good Data Permutations for Distributed Training

Submission history

Access Paper:

References & Citations

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Machine Learning

Title:Scale up with Order: Finding Good Data Permutations for Distributed Training

Submission history

Access Paper:

References & Citations

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators