research-article

Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms

Authors:

Franz-Josef PfreundtAuthors Info & Claims

MLHPC '15: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments

Article No.: 1, Pages 1 - 11

https://doi.org/10.1145/2834892.2834893

Published: 15 November 2015 Publication History

Abstract

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in terms of convergence and accuracy. Recently, several parallelization approaches have been proposed in order to scale SGD to solve very large ML problems. At their core, most of these approaches are following a MapReduce scheme.

This paper presents a novel parallel updating algorithm for SGD, which utilizes the asynchronous single-sided communication paradigm. Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent (ASGD) provides faster convergence, at linear scalability and stable accuracy.

References

[1]

K. Bhaduri, K. Das, K. Liu, and H. Kargupta. Distributed data mining bibliography. In http://www.csee.umbc.edu/hillol/DDMBIB/.

[2]

L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pages 177--186. Springer, 2010.

[3]

L. Bottou and Y. Bengio. Convergence properties of the k-means algorithms. In Advances in Neural Information Processing Systems 7,{NIPS Conference, Denver, Colorado, USA, 1994}, pages 585--592, 1994.

[4]

L. Bottou and O. Bousquet. The tradeoffs of large-scale learning. In Neural Information Processing Systems 20, pages 161--168. MIT Press, 2008.

[5]

C. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. Advances in neural information processing systems, 19:281, 2007.

[6]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, pages 1223--1231, 2012.

Digital Library

[7]

D. Grunewald. Bqcd with gpi: A case study. In High Performance Computing and Simulation (HPCS), 2012 International Conference on, pages 388--394. IEEE, 2012.

[8]

D. Grünewald and C. Simmendinger. The gaspi api specification and its implementation gpi 2.0. In 7th International Conference on PGAS Programming Models, volume 243, 2013.

[9]

A. K. Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651--666, 2010.

Digital Library

[10]

S. Lloyd. Least squares quantization in pcm. Information Theory, IEEE Transactions on, 28(2):129--137, 1982.

Digital Library

[11]

R. Machado, C. Lojewski, S. Abreu, and F.-J. Pfreundt. Unbalanced tree search on a manycore system using the gpi programming model. Computer Science-Research and Development, 26(3-4):229--236, 2011.

Digital Library

[12]

M. Meila. The uniqueness of a good optimum for k-means. In Proc. 23rd Internat. Conf. Machine Learning, pages 625--632, 2006.

Digital Library

[13]

J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Q. V. Le, and A. Y. Ng. On optimization methods for deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 265--272, 2011.

Digital Library

[14]

C. Noel and S. Osindero. Dogwild!--distributed hogwild for cpu & gpu. 2014.

[15]

E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In Computer Vision--ECCV 2006, pages 490--503. Springer, 2006.

Digital Library

[16]

B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 693--701, 2011.

Digital Library

[17]

D. Sculley. Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web, pages 1177--1178. ACM, 2010.

Digital Library

[18]

H. Shan, B. Austin, N. J. Wright, E. Strohmaier, J. Shalf, and K. Yelick. Accelerating applications at scale using one-sided communication.

[19]

Q. Zhu, M.-C. Yeh, K.-T. Cheng, and S. Avidan. Fast human detection using a cascade of histograms of oriented gradients. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1491--1498. IEEE, 2006.

Digital Library

[20]

M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 2595--2603, 2010.

Digital Library

Cited By

Yao FZhang ZJi ZLiu BGao H(2024)LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU clusterThe Journal of Supercomputing10.1007/s11227-023-05886-w80:9(12247-12272)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s11227-023-05886-w
Pira LFerrie C(2023)An invitation to distributed quantum neural networksQuantum Machine Intelligence10.1007/s42484-023-00114-35:2Online publication date: 15-Jun-2023
https://doi.org/10.1007/s42484-023-00114-3
Wen YQiu ZZhang DHuang DXiao NLin L(2023)Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update MethodInternational Journal of Parallel Programming10.1007/s10766-023-00759-452:3(125-146)Online publication date: 13-Nov-2023
https://doi.org/10.1007/s10766-023-00759-4
Show More Cited By

Index Terms

Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms

Recommendations

Fast asynchronous parallel stochastic gradient descent: a lock-free approach with convergence guarantee
AAAI'16: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence

Stochastic gradient descent (SGD) and its variants have become more and more popular in machine learning due to their efficiency and effectiveness. To handle large-scale problems, researchers have recently proposed several parallel SGD methods for ...
Guided parallelized stochastic gradient descent for delay compensation
Abstract
Stochastic gradient descent (SGD) algorithm and its variations have been effectively used to optimize neural network models. However, with the rapid growth of big data and deep learning, SGD is no longer the most suitable choice due to ...
Highlights
- Its convergence rate of O 1 ρ T + σ 2 shows its applicability for the real-time systems.
Ant colony optimization and stochastic gradient descent

In this article, we study the relationship between the two techniques known as ant colony optimization (ACO) and stochastic gradient descent. More precisely, we show that some empirical ACO algorithms approximate stochastic gradient descent in the space ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MLHPC '15: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments

November 2015

40 pages

ISBN:9781450340069

DOI:10.1145/2834892

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGHPC: ACM Special Interest Group on High Performance Computing, Special Interest Group on High Performance Computing
SIGARCH: ACM Special Interest Group on Computer Architecture
IEEE-CS\DATC: IEEE Computer Society

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 November 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SC15

Sponsor:

SIGHPC
SIGARCH
IEEE-CS\DATC

SC15: The International Conference for High Performance Computing, Networking, Storage and Analysis

November 15, 2015

Texas, Austin

Acceptance Rates

MLHPC '15 Paper Acceptance Rate 5 of 7 submissions, 71%;

Overall Acceptance Rate 5 of 7 submissions, 71%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
467
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)1

Reflects downloads up to 01 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Yao FZhang ZJi ZLiu BGao H(2024)LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU clusterThe Journal of Supercomputing10.1007/s11227-023-05886-w80:9(12247-12272)Online publication date: 1-Jun-2024
https://dl.acm.org/doi/10.1007/s11227-023-05886-w
Pira LFerrie C(2023)An invitation to distributed quantum neural networksQuantum Machine Intelligence10.1007/s42484-023-00114-35:2Online publication date: 15-Jun-2023
https://doi.org/10.1007/s42484-023-00114-3
Wen YQiu ZZhang DHuang DXiao NLin L(2023)Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update MethodInternational Journal of Parallel Programming10.1007/s10766-023-00759-452:3(125-146)Online publication date: 13-Nov-2023
https://doi.org/10.1007/s10766-023-00759-4
Olaniyan RMaheswaran M(2022)A Fast Edge-Based Synchronizer for Tasks in Real-Time Artificial Intelligence ApplicationsIEEE Internet of Things Journal10.1109/JIOT.2021.31002959:5(3825-3837)Online publication date: 1-Mar-2022
https://doi.org/10.1109/JIOT.2021.3100295
Nowicki MGórski ŁBała P(2021)PCJ Java library as a solution to integrate HPC, Big Data and Artificial Intelligence workloadsJournal of Big Data10.1186/s40537-021-00454-68:1Online publication date: 26-Apr-2021
https://doi.org/10.1186/s40537-021-00454-6
Jiang SZavala V(2021)Convolutional neural nets in chemical engineering: Foundations, computations, and applicationsAIChE Journal10.1002/aic.1728267:9Online publication date: 18-May-2021
https://doi.org/10.1002/aic.17282
AKÇAY M(2020)Estimation of Constant Speed Time for Railway Vehicles by Stochastic Gradient Descent AlgorithmSakarya University Journal of Computer and Information Sciences10.35377/saucis.03.03.8055983:3(355-365)Online publication date: 30-Dec-2020
https://doi.org/10.35377/saucis.03.03.805598
Yan ZXiao DChen MZhou JWu W(2020)Dual-Way Gradient Sparsification for Asynchronous Distributed Deep LearningProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404401(1-10)Online publication date: 17-Aug-2020
https://dl.acm.org/doi/10.1145/3404397.3404401
Ben-Nun THoefler T(2019)Demystifying Parallel and Distributed Deep LearningACM Computing Surveys10.1145/332006052:4(1-43)Online publication date: 30-Aug-2019
https://dl.acm.org/doi/10.1145/3320060
Sprague MJalalirad AScavuzzo MCapota CNeun MDo LKopp M(2019)Asynchronous Federated Learning for Geospatial ApplicationsECML PKDD 2018 Workshops10.1007/978-3-030-14880-5_2(21-28)Online publication date: 8-Mar-2019
https://doi.org/10.1007/978-3-030-14880-5_2
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents