poster

Accelerating distributed stochastic gradient descent with adaptive periodic parameter averaging: poster

Authors:

Peng Jiang,

Gagan AgrawalAuthors Info & Claims

PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

Pages 403 - 404

https://doi.org/10.1145/3293883.3299818

Published: 16 February 2019 Publication History

Get Access

Abstract

Communication overhead is a well-known performance bottleneck in distributed Stochastic Gradient Descent (SGD), which is a popular algorithm to perform optimization in large-scale machine learning tasks. In this work, we propose a practical and effective technique, named Adaptive Periodic Parameter Averaging, to reduce the communication overhead of distributed SGD, without impairing its convergence property.

References

[1]

Peng Jiang and Gagan Agrawal. 2018. A Linear Speedup Analysis of Distributed Deep Learning with Sparse and Quantized Communication. In Advances in Neural Information Processing Systems 31, S. Bengio, H. Wallach, H. Larochelle, K. Grauman, N. Cesa-Bianchi, and R. Garnett (Eds.). Curran Associates, Inc., 2526--2537.

Google Scholar

[2]

Pitch Patarasuk and Xin Yuan. 2009. Bandwidth Optimal All-reduce Algorithms for Clusters of Workstations. J. Parallel Distrib. Comput. 69, 2 (Feb. 2009), 117--124.

Digital Library

Google Scholar

[3]

Fan Zhou and Guojing Cong. 2018. On the Convergence Properties of a K-step Averaging Stochastic Gradient Descent Algorithm for Nonconvex Optimization. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, IJCAI-18. International Joint Conferences on Artificial Intelligence Organization, 3219--3227.

Digital Library

Google Scholar

Cited By

View all

Schüle MLang HSpringer MKemper ANeumann TGünnemann S(2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 9-Jul-2022
https://doi.org/10.1007/s10619-022-07417-7
Luo QHe JZhuo YQian XLarus JCeze LStrauss K(2020)Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized TrainingProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378499(401-416)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378499
Park SLee JKim H(2019)Hardware Resource Analysis in Distributed Training with Edge DevicesElectronics10.3390/electronics90100289:1(28)Online publication date: 26-Dec-2019
https://doi.org/10.3390/electronics9010028

Index Terms

Accelerating distributed stochastic gradient descent with adaptive periodic parameter averaging: poster
1. Computing methodologies
  1. Machine learning
    1. Machine learning algorithms

Recommendations

Efficient Distributed Stochastic Gradient Descent Through Gaussian Averaging
Artificial Intelligence and Security
Abstract
Training of large-scale machine learning models presents a hefty communication challenge to the Stochastic Gradient Descent (SGD) algorithm. In a distributed computing environment, frequent exchanges of gradient parameters between computational ...
Accelerating stochastic gradient descent using predictive variance reduction
NIPS'13: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 1

Stochastic gradient descent is popular for large scale optimization but has slow convergence asymptotically due to the inherent variance. To remedy this problem, we introduce an explicit variance reduction method for stochastic gradient descent which we ...
Conjugate Directions for Stochastic Gradient Descent
ICANN '02: Proceedings of the International Conference on Artificial Neural Networks

The method of conjugate gradients provides a very effective way to optimize large, deterministic systems by gradient descent. In its standard form, however, it is not amenable to stochastic approximation of the gradient. Here we explore ideas from ...

Comments

Information & Contributors

Information

Published In

PPoPP '19: Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming

February 2019

472 pages

ISBN:9781450362252

DOI:10.1145/3293883

General Chair:
Jeff Hollingsworth
University of Maryland
,
Program Chair:
Idit Keidar
Technion, Israel

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 February 2019

Check for updates

Qualifiers

Poster

Conference

PPoPP '19

Sponsor:

PPoPP '19: 24th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming

February 16 - 20, 2019

District of Columbia, Washington

Acceptance Rates

PPoPP '19 Paper Acceptance Rate 29 of 152 submissions, 19%;

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

3
Total Citations
View Citations
210
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Schüle MLang HSpringer MKemper ANeumann TGünnemann S(2022)Recursive SQL and GPU-support for in-database machine learningDistributed and Parallel Databases10.1007/s10619-022-07417-740:2-3(205-259)Online publication date: 9-Jul-2022
https://doi.org/10.1007/s10619-022-07417-7
Luo QHe JZhuo YQian XLarus JCeze LStrauss K(2020)Prague: High-Performance Heterogeneity-Aware Asynchronous Decentralized TrainingProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378499(401-416)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378499
Park SLee JKim H(2019)Hardware Resource Analysis in Distributed Training with Edge DevicesElectronics10.3390/electronics90100289:1(28)Online publication date: 26-Dec-2019
https://doi.org/10.3390/electronics9010028

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Efficient Distributed Stochastic Gradient Descent Through Gaussian Averaging

Accelerating stochastic gradient descent using predictive variance reduction

Conjugate Directions for Stochastic Gradient Descent