Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2834892.2834893acmconferencesArticle/Chapter ViewAbstractPublication PagesscConference Proceedingsconference-collections
research-article

Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms

Published: 15 November 2015 Publication History

Abstract

The implementation of a vast majority of machine learning (ML) algorithms boils down to solving a numerical optimization problem. In this context, Stochastic Gradient Descent (SGD) methods have long proven to provide good results, both in terms of convergence and accuracy. Recently, several parallelization approaches have been proposed in order to scale SGD to solve very large ML problems. At their core, most of these approaches are following a MapReduce scheme.
This paper presents a novel parallel updating algorithm for SGD, which utilizes the asynchronous single-sided communication paradigm. Compared to existing methods, Asynchronous Parallel Stochastic Gradient Descent (ASGD) provides faster convergence, at linear scalability and stable accuracy.

References

[1]
K. Bhaduri, K. Das, K. Liu, and H. Kargupta. Distributed data mining bibliography. In http://www.csee.umbc.edu/hillol/DDMBIB/.
[2]
L. Bottou. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT'2010, pages 177--186. Springer, 2010.
[3]
L. Bottou and Y. Bengio. Convergence properties of the k-means algorithms. In Advances in Neural Information Processing Systems 7,{NIPS Conference, Denver, Colorado, USA, 1994}, pages 585--592, 1994.
[4]
L. Bottou and O. Bousquet. The tradeoffs of large-scale learning. In Neural Information Processing Systems 20, pages 161--168. MIT Press, 2008.
[5]
C. Chu, S. K. Kim, Y.-A. Lin, Y. Yu, G. Bradski, A. Y. Ng, and K. Olukotun. Map-reduce for machine learning on multicore. Advances in neural information processing systems, 19:281, 2007.
[6]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, A. Senior, P. Tucker, K. Yang, Q. V. Le, et al. Large scale distributed deep networks. In Advances in Neural Information Processing Systems, pages 1223--1231, 2012.
[7]
D. Grunewald. Bqcd with gpi: A case study. In High Performance Computing and Simulation (HPCS), 2012 International Conference on, pages 388--394. IEEE, 2012.
[8]
D. Grünewald and C. Simmendinger. The gaspi api specification and its implementation gpi 2.0. In 7th International Conference on PGAS Programming Models, volume 243, 2013.
[9]
A. K. Jain. Data clustering: 50 years beyond k-means. Pattern recognition letters, 31(8):651--666, 2010.
[10]
S. Lloyd. Least squares quantization in pcm. Information Theory, IEEE Transactions on, 28(2):129--137, 1982.
[11]
R. Machado, C. Lojewski, S. Abreu, and F.-J. Pfreundt. Unbalanced tree search on a manycore system using the gpi programming model. Computer Science-Research and Development, 26(3-4):229--236, 2011.
[12]
M. Meila. The uniqueness of a good optimum for k-means. In Proc. 23rd Internat. Conf. Machine Learning, pages 625--632, 2006.
[13]
J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, Q. V. Le, and A. Y. Ng. On optimization methods for deep learning. In Proceedings of the 28th International Conference on Machine Learning (ICML-11), pages 265--272, 2011.
[14]
C. Noel and S. Osindero. Dogwild!--distributed hogwild for cpu & gpu. 2014.
[15]
E. Nowak, F. Jurie, and B. Triggs. Sampling strategies for bag-of-features image classification. In Computer Vision--ECCV 2006, pages 490--503. Springer, 2006.
[16]
B. Recht, C. Re, S. Wright, and F. Niu. Hogwild: A lock-free approach to parallelizing stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 693--701, 2011.
[17]
D. Sculley. Web-scale k-means clustering. In Proceedings of the 19th international conference on World wide web, pages 1177--1178. ACM, 2010.
[18]
H. Shan, B. Austin, N. J. Wright, E. Strohmaier, J. Shalf, and K. Yelick. Accelerating applications at scale using one-sided communication.
[19]
Q. Zhu, M.-C. Yeh, K.-T. Cheng, and S. Avidan. Fast human detection using a cascade of histograms of oriented gradients. In Computer Vision and Pattern Recognition, 2006 IEEE Computer Society Conference on, volume 2, pages 1491--1498. IEEE, 2006.
[20]
M. Zinkevich, M. Weimer, L. Li, and A. J. Smola. Parallelized stochastic gradient descent. In Advances in Neural Information Processing Systems, pages 2595--2603, 2010.

Cited By

View all
  • (2024)LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU clusterThe Journal of Supercomputing10.1007/s11227-023-05886-w80:9(12247-12272)Online publication date: 1-Jun-2024
  • (2023)An invitation to distributed quantum neural networksQuantum Machine Intelligence10.1007/s42484-023-00114-35:2Online publication date: 15-Jun-2023
  • (2023)Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update MethodInternational Journal of Parallel Programming10.1007/s10766-023-00759-452:3(125-146)Online publication date: 13-Nov-2023
  • Show More Cited By

Index Terms

  1. Asynchronous parallel stochastic gradient descent: a numeric core for scalable distributed machine learning algorithms

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image ACM Conferences
        MLHPC '15: Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments
        November 2015
        40 pages
        ISBN:9781450340069
        DOI:10.1145/2834892
        Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

        Sponsors

        Publisher

        Association for Computing Machinery

        New York, NY, United States

        Publication History

        Published: 15 November 2015

        Permissions

        Request permissions for this article.

        Check for updates

        Author Tags

        1. high performance computing
        2. optimization
        3. stochastic gradient descent

        Qualifiers

        • Research-article

        Conference

        SC15
        Sponsor:

        Acceptance Rates

        MLHPC '15 Paper Acceptance Rate 5 of 7 submissions, 71%;
        Overall Acceptance Rate 5 of 7 submissions, 71%

        Upcoming Conference

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)12
        • Downloads (Last 6 weeks)1
        Reflects downloads up to 01 Jan 2025

        Other Metrics

        Citations

        Cited By

        View all
        • (2024)LBB: load-balanced batching for efficient distributed learning on heterogeneous GPU clusterThe Journal of Supercomputing10.1007/s11227-023-05886-w80:9(12247-12272)Online publication date: 1-Jun-2024
        • (2023)An invitation to distributed quantum neural networksQuantum Machine Intelligence10.1007/s42484-023-00114-35:2Online publication date: 15-Jun-2023
        • (2023)Accelerating Massively Distributed Deep Learning Through Efficient Pseudo-Synchronous Update MethodInternational Journal of Parallel Programming10.1007/s10766-023-00759-452:3(125-146)Online publication date: 13-Nov-2023
        • (2022)A Fast Edge-Based Synchronizer for Tasks in Real-Time Artificial Intelligence ApplicationsIEEE Internet of Things Journal10.1109/JIOT.2021.31002959:5(3825-3837)Online publication date: 1-Mar-2022
        • (2021)PCJ Java library as a solution to integrate HPC, Big Data and Artificial Intelligence workloadsJournal of Big Data10.1186/s40537-021-00454-68:1Online publication date: 26-Apr-2021
        • (2021)Convolutional neural nets in chemical engineering: Foundations, computations, and applicationsAIChE Journal10.1002/aic.1728267:9Online publication date: 18-May-2021
        • (2020)Estimation of Constant Speed Time for Railway Vehicles by Stochastic Gradient Descent AlgorithmSakarya University Journal of Computer and Information Sciences10.35377/saucis.03.03.8055983:3(355-365)Online publication date: 30-Dec-2020
        • (2020)Dual-Way Gradient Sparsification for Asynchronous Distributed Deep LearningProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404401(1-10)Online publication date: 17-Aug-2020
        • (2019)Demystifying Parallel and Distributed Deep LearningACM Computing Surveys10.1145/332006052:4(1-43)Online publication date: 30-Aug-2019
        • (2019)Asynchronous Federated Learning for Geospatial ApplicationsECML PKDD 2018 Workshops10.1007/978-3-030-14880-5_2(21-28)Online publication date: 8-Mar-2019
        • Show More Cited By

        View Options

        Login options

        View options

        PDF

        View or Download as a PDF file.

        PDF

        eReader

        View online with eReader.

        eReader

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media