Article

Scaling distributed machine learning with the parameter server

Authors:

David G. Andersen,

Alexander J. Smola,

Vanja Josifovski,

Eugene J. Shekita,

Bor-Yiing SuAuthors Info & Claims

OSDI'14: Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation

Pages 583 - 598

Published: 06 October 2014 Publication History

Abstract

We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous data communication between nodes, and supports flexible consistency models, elastic scalability, and continuous fault tolerance.

To demonstrate the scalability of the proposed framework, we show experimental results on petabytes of real data with billions of examples and parameters on problems ranging from Sparse Logistic Regression to Latent Dirichlet Allocation and Distributed Sketching.

References

[1]

A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In Proceedings of The 5th ACM International Conference on Web Search and Data Mining (WSDM), 2012.

Digital Library

[2]

A. Ahmed, Y. Low, M. Aly, V. Josifovski, and A. J. Smola. Scalable inference of dynamic user interests for behavioural targeting. In Knowledge Discovery and Data Mining, 2011.

Digital Library

[3]

E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users' Guide. SIAM, Philadelphia, second edition, 1995.

[4]

Apache Foundation. Mahout project, 2012. http://mahout.apache.org.

[5]

R. Berinde, G. Cormode, P. Indyk, and M. J. Strauss. Space-optimal heavy hitters with strong error bounds. In J. Paredaens and J. Su, editors, Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 157-166. ACM, 2009.

Digital Library

[6]

C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.

Digital Library

[7]

D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, January 2003.

Digital Library

[8]

J. Byers, J. Considine, and M. Mitzenmacher. Simple load balancing for distributed hash tables. In Peer-to-peer systems II, pages 80-87. Springer, 2003.

[9]

K. Canini. Sibyl: A system for large scale supervised machine learning. Technical Talk, 2012.

[10]

B.-G. Chun, T. Condie, C. Curino, C. Douglas, S. Matusevych, B. Myers, S. Narayanamurthy, R. Ramakrishnan, S. Rao, J. Rosen, R. Sears, and M. Weimer. Reef: Retainable evaluator execution framework. Proceedings of the VLDB Endowment, 6(12):1370-1373, 2013.

Digital Library

[11]

G. Cormode and S. Muthukrishnan. Summarizing and mining skewed data streams. In SDM, 2005.

[12]

W. Dai, J. Wei, X. Zheng, J. K. Kim, S. Lee, J. Yin, Q. Ho, and E. P. Xing. Petuum: A framework for iterative-convergent distributed ml. arXiv preprint arXiv:1312.7651, 2013.

[13]

J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Neural Information Processing Systems, 2012.

[14]

J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. CACM, 51(1):107-113, 2008.

Digital Library

[15]

G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available keyvalue store. In T. C. Bressoud and M. F. Kaashoek, editors, Symposium on Operating Systems Principles, pages 205-220. ACM, 2007.

Digital Library

[16]

J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. An extended set of fortran basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14:18-32, 1988.

Digital Library

[17]

The Apache Software Foundation. Apache hadoop nextgen mapreduce (yarn). http://hadoop.apache.org/.

[18]

The Apache Software Foundation. Apache hadoop, 2009. http://hadoop.apache.org/core/.

[19]

F. Girosi, M. Jones, and T. Poggio. Priors, stabilizers and basis functions: From regularization to radial, tensor and additive splines. A.I. Memo 1430, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1993.

Digital Library

[20]

T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101:5228-5235, 2004.

[21]

S. H. Gunderson. Snappy: A fast compressor/decompressor. https://code.google.com/p/snappy/.

[22]

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, New York, 2 edition, 2009.

[23]

B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX conference on Networked systems design and implementation, pages 22-22, 2011.

Digital Library

[24]

Q. Ho, J. Cipar, H. Cui, S. Lee, J. Kim, P. Gibbons, G. Gibson, G. Ganger, and E. Xing. More effective distributed ml via a stale synchronous parallel parameter server. In NIPS, 2013.

Digital Library

[25]

M. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic variational inference. In International Conference on Machine Learning, 2012.

[26]

W. Karush. Minima of functions of several variables with inequalities as side constraints. Master's thesis, Dept. of Mathematics, Univ. of Chicago, 1939.

[27]

L. Kim. How many ads does Google serve in a day?, 2012. http://goo.gl/oIidXO.

[28]

D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

Digital Library

[29]

T. Kraska, A. Talwalkar, J. C. Duchi, R. Griffith, M. J. Franklin, and M. I. Jordan. Mlbase: A distributed machine-learning system. In CIDR, 2013.

[30]

L. Lamport. Paxos made simple. ACM Sigact News, 32(4):18-25, 2001.

[31]

M. Li, D. G. Andersen, and A. J. Smola. Distributed delayed proximal gradient methods. In NIPS Workshop on Optimization for Machine Learning, 2013.

[32]

M. Li, D. G. Andersen, and A. J. Smola. Communication Efficient Distributed Machine Learning with the Parameter Server. In Neural Information Processing Systems, 2014.

[33]

M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. G. Andersen, and A. J. Smola. Parameter server for distributed machine learning. In Big Learning NIPS Workshop, 2013.

[34]

Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed Graphlab: A framework for machine learning and data mining in the cloud. In PVLDB, 2012.

Digital Library

[35]

H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, and D. Golovin. Ad click prediction: a view from the trenches. In KDD, 2013.

Digital Library

[36]

K. P. Murphy. Machine learning: a probabilistic perspective. MIT Press, 2012.

Digital Library

[37]

D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439-455. ACM, 2013.

Digital Library

[38]

A. Phanishayee, D. G. Andersen, H. Pucha, A. Povzner, and W. Belluomini. Flex-KV: Enabling high-performance and flexible KV systems. In Proceedings of the 2012 workshop on Management of big data systems, pages 19-24. ACM, 2012.

Digital Library

[39]

R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In R. H. Arpaci-Dusseau and B. Chen, editors, Operating Systems Design and Implementation, OSDI, pages 293-306. USENIX Association, 2010.

Digital Library

[40]

PRObE Project. Parallel Reconfigurable Observational Environment. https://www.nmc-probe.org/wiki/ Machines:Susitna,

[41]

A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), pages 329-350, Heidelberg, Germany, November 2001.

Digital Library

[42]

B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.

[43]

A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In Very Large Databases (VLDB), 2010.

Digital Library

[44]

E. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J. Gonzalez, M. J. Franklin, M. I. Jordan, and T. Kraska. Mli: An api for distributed machine learning. 2013.

[45]

I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Computer Communication Review, 31(4):149-160, 2001.

Digital Library

[46]

C.H. Teo, Q. Le, A. J. Smola, and S. V. N. Vishwanathan. A scalable modular convex solver for regularized risk minimization. In Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD). ACM, 2007.

Digital Library

[47]

R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In OSDI, volume 4, pages 91-104, 2004.

Digital Library

[48]

V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.

Digital Library

[49]

R. C. Whaley, A. Petitet, and J.J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3-35, 2001.

Digital Library

[50]

M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. M. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Fast and interactive analytics over Hadoop data with Spark. USENIX; login:, 37(4):45-51, August 2012.

Cited By

Erben AMayer RJacobsen H(2024)How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental StudyProceedings of the VLDB Endowment10.14778/3648160.364816517:6(1214-1226)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648165
Liu KJiang ZZhang JGuo SZhang XBai YDong YLuo FZhang ZWang LShi XXu HBai YSong DWei HLi BPan YPan THuang TSekar VYu MSeneviratne AVeitch D(2024)R-Pingmesh: A Service-Aware RoCE Network Monitoring and Diagnostic SystemProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672264(554-567)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672264
Phalak CChahal DRamesh MSinghal RBalsamo SKnottenbelt WAbad CShang W(2024)Towards Geo-Distributed Training of ML Models in a Multi-Cloud EnvironmentCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651422(211-217)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3629527.3651422
Show More Cited By

Scaling distributed machine learning with the parameter server

Recommendations

Scaling Distributed Machine Learning with the Parameter Server
BigDataScience '14: Proceedings of the 2014 International Conference on Big Data Science and Computing

Big data may contain big values, but also brings lots of challenges to the computing theory, architecture, framework, knowledge discovery algorithms, and domain specific tools and applications. Beyond the 4-V or 5-V characters of big datasets, the data ...
Communication efficient distributed machine learning with the parameter server
NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1

This paper describes a third-generation parameter server framework for distributed machine learning. This framework offers two relaxations to balance system performance and algorithm efficiency. We propose a new algorithm that takes advantage of this ...
Distributed machine learning with sparse heterogeneous data
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

Motivated by distributed machine learning settings such as Federated Learning, we consider the problem of fitting a statistical model across a distributed collection of heterogeneous data sets whose similarity structure is encoded by a graph topology. ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

OSDI'14: Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation

October 2014

676 pages

ISBN:9781931971164

Program Chairs:
Jason Flinn
University of Michigan
,
Hank Levy
University of Washington

Sponsors

USENIX Assoc: USENIX Assoc

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

USENIX Association

United States

Publication History

Published: 06 October 2014

Check for updates

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

284
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Erben AMayer RJacobsen H(2024)How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental StudyProceedings of the VLDB Endowment10.14778/3648160.364816517:6(1214-1226)Online publication date: 1-Feb-2024
https://dl.acm.org/doi/10.14778/3648160.3648165
Liu KJiang ZZhang JGuo SZhang XBai YDong YLuo FZhang ZWang LShi XXu HBai YSong DWei HLi BPan YPan THuang TSekar VYu MSeneviratne AVeitch D(2024)R-Pingmesh: A Service-Aware RoCE Network Monitoring and Diagnostic SystemProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672264(554-567)Online publication date: 4-Aug-2024
https://dl.acm.org/doi/10.1145/3651890.3672264
Phalak CChahal DRamesh MSinghal RBalsamo SKnottenbelt WAbad CShang W(2024)Towards Geo-Distributed Training of ML Models in a Multi-Cloud EnvironmentCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651422(211-217)Online publication date: 7-May-2024
https://dl.acm.org/doi/10.1145/3629527.3651422
Zhang SDiao LWu CCao ZWang SLin W(2024)HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program SynthesisProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629580(524-541)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3627703.3629580
Zhao BXu WLiu STian YWang QWu WTsafrir DMUSUVATHI MGupta RAbu-Ghazaleh N(2024)Training Job Placement in Clusters with Statistical In-Network AggregationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624863(420-434)Online publication date: 27-Apr-2024
https://dl.acm.org/doi/10.1145/3617232.3624863
Guerraoui RGupta NPinot R(2024)Byzantine Machine Learning: A PrimerACM Computing Surveys10.1145/361653756:7(1-39)Online publication date: 9-Apr-2024
https://dl.acm.org/doi/10.1145/3616537
Jiang ZLi XPeng TLi HHong JZhang JGong X(2024)Hybrid-Memcached: A Novel Approach for Memcached Persistence Optimization With Hybrid MemoryIEEE Transactions on Computers10.1109/TC.2024.338527973:7(1866-1874)Online publication date: 4-Apr-2024
https://dl.acm.org/doi/10.1109/TC.2024.3385279
Jiang ZGu JZhu HPan DOh ANaumann TGloberson ASaenko KHardt MLevine S(2023)Pre-RMSNorm and Pre-CRMSNorm transformersProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668105(45777-45793)Online publication date: 10-Dec-2023
https://dl.acm.org/doi/10.5555/3666122.3668105
Tang YDing ZJankov DYuan BBourgeois DJermaine CKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Auto-differentiation of relational computations for very large scale machine learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619806(33581-33598)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619806
Sun TWang QLi DWang BKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)Momentum ensures convergence of SIGNSGD under weaker assumptionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619781(33077-33099)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619781
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents