Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2986459.2986537guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

HOGWILD!: a lock-free approach to parallelizing stochastic gradient descent

Published: 12 December 2011 Publication History

Abstract

Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently proposed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and implementation that SGD can be implemented without any locking. We present an update scheme called HOGWILD! which allows processors access to shared memory with the possibility of overwriting each other's work. We show that when the associated optimization problem is sparse, meaning most gradient updates only modify small parts of the decision variable, then HOGWILD! achieves a nearly optimal rate of convergence. We demonstrate experimentally that HOGWILD! outperforms alternative schemes that use locking by an order of magnitude.

References

[1]
Max-flow problem instances in vision. From http://vision.csd.uwo.ca/data/maxflow/.
[2]
K. Asanovic and et al. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, Electrical Engineering and Computer Sciences, University of California at Berkeley, 2006.
[3]
D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont, MA, 2nd edition, 1999.
[4]
D. P. Bertsekas and J. N. Tsitsiklis. Parallel and Distributed Computation: Numerical Methods. Athena Scientific, Belmont, MA, 1997.
[5]
L. Bottou and O. Bousquet. The tradeoffs of large scale learning. In Advances in Neural Information Processing Systems, 2008.
[6]
Y. Boykov and V. Kolmogorov. An experimental comparison of min-cut/max-flow algorithms for energy minimization in vision. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(9):1124-1137, 2004.
[7]
G. Călinescu, H. Karloff, and Y. Rabani. An improved approximation algorithm for multiway cut. In Proceedings of the thirtieth annual ACM Symposium on Theory of Computing, pages 48-52, 1998.
[8]
E. Candès and B. Recht. Exact matrix completion via convex optimization. Foundations of Computational Mathematics, 9(6):717-772, 2009.
[9]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. Communications of the ACM, 51(1):107-113, 2008.
[10]
O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao. Optimal distributed online prediction using mini-batches. Technical report, Microsoft Research, 2011.
[11]
A. Doan. http://dblife.cs.wisc.edu.
[12]
J. Duchi, A. Agarwal, and M. J. Wainwright. Distributed dual averaging in networks. In Advances in Neural Information Processing Systems, 2010.
[13]
S. H. Fuller and L. I. Millett, editors. The Future of Computing Performance: Game Over or Next Level. Committee on Sustaining Growth in Computing Performance. The National Academies Press, Washington, D.C., 2011.
[14]
T. Joachims. Training linear svms in linear time. In Proceedings of the ACM Conference on Knowledge Discovery and Data Mining (KDD), 2006.
[15]
J. Langford. https://github.com/JohnLangford/vowpal_wabbit/wiki.
[16]
J. Langford, A. J. Smola, and M. Zinkevich. Slow learners are fast. In Advances in Neural Information Processing Systems, 2009.
[17]
J. Lee, B. Recht, N. Srebro, R. R. Salakhutdinov, and J. A. Tropp. Practical large-scale optimization for max-norm regularization. In Advances in Neural Information Processing Systems, 2010.
[18]
T. Lee, Z. Wang, H. Wang, and S. Hwang. Web scale entity resolution using relational evidence. Technical report, Microsoft Research, 2011. Available at http://research.microsoft.com/apps/pubs/default.aspx?id=145839.
[19]
D. Lewis, Y. Yang, T. Rose, and F. Li. RCV1: A new benchmark collection for text categorization research. Journal of Machine Learning Research, 5:361-397, 2004.
[20]
S. Melnik, A. Gubarev, J. J. Long, G. Romer, S. Shivakumar, M. Tolton, and T. Vassilakis. Dremel: Interactive analysis of web-scale datasets. In Proceedings of VLDB, 2010.
[21]
A. Nemirovski, A. Juditsky, G. Lan, and A. Shapiro. Robust stochastic approximation approach to stochastic programming. SIAM Journal on Optimization, 19(4):1574-1609, 2009.
[22]
F. Niu, B. Recht, C. Ré, and S. J. Wright. Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. Technical report, 2011. arxiv.org/abs/1106.5730.
[23]
B. Recht, M. Fazel, and P. Parrilo. Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Review, 52(3):471-501, 2010.
[24]
S. Shalev-Shwartz and N. Srebro. SVM Optimization: Inverse dependence on training set size. In Proceedings of the 25th Internation Conference on Machine Learning, 2008.
[25]
N. Srebro, J. Rennie, and T. Jaakkola. Maximum margin matrix factorization. In Advances in Neural Information Processing Systems, 2004.
[26]
J. Tsitsiklis, D. P. Bertsekas, and M. Athans. Distributed asynchronous deterministic and stochastic gradient optimization algorithms. IEEE Transactions on Automatic Control, 31(9):803-812, 1986.
[27]
M. Zinkevich, M. Weimer, A. Smola, and L. Li. Parallelized stochastic gradient descent. Advances in Neural Information Processing Systems, 2010.

Cited By

View all
  • (2024)Distributed Machine LearningProceedings of the 25th International Conference on Distributed Computing and Networking10.1145/3631461.3632516(4-7)Online publication date: 4-Jan-2024
  • (2023)Diamont: dynamic monitoring of uncertainty for distributed asynchronous programsInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-023-00717-y25:4(521-539)Online publication date: 1-Aug-2023
  • (2022)Distributed learning of fully connected neural networks using independent subnet trainingProceedings of the VLDB Endowment10.14778/3529337.352934315:8(1581-1590)Online publication date: 1-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'11: Proceedings of the 24th International Conference on Neural Information Processing Systems
December 2011
2752 pages

Publisher

Curran Associates Inc.

Red Hook, NY, United States

Publication History

Published: 12 December 2011

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Distributed Machine LearningProceedings of the 25th International Conference on Distributed Computing and Networking10.1145/3631461.3632516(4-7)Online publication date: 4-Jan-2024
  • (2023)Diamont: dynamic monitoring of uncertainty for distributed asynchronous programsInternational Journal on Software Tools for Technology Transfer (STTT)10.1007/s10009-023-00717-y25:4(521-539)Online publication date: 1-Aug-2023
  • (2022)Distributed learning of fully connected neural networks using independent subnet trainingProceedings of the VLDB Endowment10.14778/3529337.352934315:8(1581-1590)Online publication date: 1-Apr-2022
  • (2022)Elastic ConsistencyACM SIGACT News10.1145/3544979.354499153:2(64-82)Online publication date: 27-Jul-2022
  • (2022)SMOREProceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3534678.3539405(1472-1482)Online publication date: 14-Aug-2022
  • (2022)SYNTHESISProceedings of the Twenty-Third International Symposium on Theory, Algorithmic Foundations, and Protocol Design for Mobile Networks and Mobile Computing10.1145/3492866.3549722(151-160)Online publication date: 3-Oct-2022
  • (2021)Redundancy techniques for straggler mitigation in distributed optimization and learningThe Journal of Machine Learning Research10.5555/3322706.336201320:1(2619-2665)Online publication date: 9-Mar-2021
  • (2021)DSCOVRThe Journal of Machine Learning Research10.5555/3322706.336198420:1(1634-1691)Online publication date: 9-Mar-2021
  • (2021)Epoch-based commit and replication in distributed OLTP databasesProceedings of the VLDB Endowment10.14778/3446095.344609814:5(743-756)Online publication date: 23-Mar-2021
  • (2021)Enabling rapid COVID-19 small molecule drug design through scalable deep learning of generative modelsInternational Journal of High Performance Computing Applications10.1177/1094342021101093035:5(469-482)Online publication date: 1-Sep-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media