Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

Yang, Xiang

Computer Science > Distributed, Parallel, and Cluster Computing

arXiv:2007.00433 (cs)

[Submitted on 1 Jul 2020 (v1), last revised 16 Jul 2020 (this version, v2)]

Title:Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

Authors:Xiang Yang

View PDF

Abstract:As a crucial scheme to accelerate the deep neural network (DNN) training, distributed stochastic gradient descent (DSGD) is widely adopted in many real-world applications. In most distributed deep learning (DL) frameworks, DSGD is implemented with Ring-AllReduce architecture (Ring-SGD) and uses a computation-communication overlap strategy to address the overhead of the massive communications required by DSGD. However, we observe that although $O(1)$ gradients are needed to be communicated per worker in Ring-SGD, the $O(n)$ handshakes required by Ring-SGD limits its usage when training with many workers or in high latency network. In this paper, we propose Shuffle-Exchange SGD (SESGD) to solve the dilemma of Ring-SGD. In the cluster of 16 workers with 0.1ms Ethernet latency, SESGD can accelerate the DNN training to $1.7 \times$ without losing model accuracy. Moreover, the process can be accelerated up to $5\times$ in high latency networks (5ms).

Comments:	NeurIPS 2020 Under Review
Subjects:	Distributed, Parallel, and Cluster Computing (cs.DC)
Cite as:	arXiv:2007.00433 [cs.DC]
	(or arXiv:2007.00433v2 [cs.DC] for this version)
	https://doi.org/10.48550/arXiv.2007.00433

Submission history

From: Xiang Yang [view email]
[v1] Wed, 1 Jul 2020 12:38:48 UTC (2,153 KB)
[v2] Thu, 16 Jul 2020 03:24:11 UTC (2,154 KB)

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators

Computer Science > Distributed, Parallel, and Cluster Computing

Title:Shuffle-Exchange Brings Faster: Reduce the Idle Time During Communication for Decentralized Neural Network Training

Submission history

Access Paper:

References & Citations

DBLP - CS Bibliography

BibTeX formatted citation

Bookmark

Bibliographic and Citation Tools

Code, Data and Media Associated with this Article

Demos

Recommenders and Search Tools

arXivLabs: experimental projects with community collaborators