Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2685048.2685095acmotherconferencesArticle/Chapter ViewAbstractPublication PagesosdiConference Proceedingsconference-collections
Article

Scaling distributed machine learning with the parameter server

Published: 06 October 2014 Publication History

Abstract

We propose a parameter server framework for distributed machine learning problems. Both data and workloads are distributed over worker nodes, while the server nodes maintain globally shared parameters, represented as dense or sparse vectors and matrices. The framework manages asynchronous data communication between nodes, and supports flexible consistency models, elastic scalability, and continuous fault tolerance.
To demonstrate the scalability of the proposed framework, we show experimental results on petabytes of real data with billions of examples and parameters on problems ranging from Sparse Logistic Regression to Latent Dirichlet Allocation and Distributed Sketching.

References

[1]
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, and A. J. Smola. Scalable inference in latent variable models. In Proceedings of The 5th ACM International Conference on Web Search and Data Mining (WSDM), 2012.
[2]
A. Ahmed, Y. Low, M. Aly, V. Josifovski, and A. J. Smola. Scalable inference of dynamic user interests for behavioural targeting. In Knowledge Discovery and Data Mining, 2011.
[3]
E. Anderson, Z. Bai, C. Bischof, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, S. Ostrouchov, and D. Sorensen. LAPACK Users' Guide. SIAM, Philadelphia, second edition, 1995.
[4]
Apache Foundation. Mahout project, 2012. http://mahout.apache.org.
[5]
R. Berinde, G. Cormode, P. Indyk, and M. J. Strauss. Space-optimal heavy hitters with strong error bounds. In J. Paredaens and J. Su, editors, Proceedings of the Twenty-Eigth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS, pages 157-166. ACM, 2009.
[6]
C. Bishop. Pattern Recognition and Machine Learning. Springer, 2006.
[7]
D. Blei, A. Ng, and M. Jordan. Latent Dirichlet allocation. Journal of Machine Learning Research, 3:993-1022, January 2003.
[8]
J. Byers, J. Considine, and M. Mitzenmacher. Simple load balancing for distributed hash tables. In Peer-to-peer systems II, pages 80-87. Springer, 2003.
[9]
K. Canini. Sibyl: A system for large scale supervised machine learning. Technical Talk, 2012.
[10]
B.-G. Chun, T. Condie, C. Curino, C. Douglas, S. Matusevych, B. Myers, S. Narayanamurthy, R. Ramakrishnan, S. Rao, J. Rosen, R. Sears, and M. Weimer. Reef: Retainable evaluator execution framework. Proceedings of the VLDB Endowment, 6(12):1370-1373, 2013.
[11]
G. Cormode and S. Muthukrishnan. Summarizing and mining skewed data streams. In SDM, 2005.
[12]
W. Dai, J. Wei, X. Zheng, J. K. Kim, S. Lee, J. Yin, Q. Ho, and E. P. Xing. Petuum: A framework for iterative-convergent distributed ml. arXiv preprint arXiv:1312.7651, 2013.
[13]
J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, Q. Le, M. Mao, M. Ranzato, A. Senior, P. Tucker, K. Yang, and A. Ng. Large scale distributed deep networks. In Neural Information Processing Systems, 2012.
[14]
J. Dean and S. Ghemawat. MapReduce: simplified data processing on large clusters. CACM, 51(1):107-113, 2008.
[15]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available keyvalue store. In T. C. Bressoud and M. F. Kaashoek, editors, Symposium on Operating Systems Principles, pages 205-220. ACM, 2007.
[16]
J. J. Dongarra, J. Du Croz, S. Hammarling, and R. J. Hanson. An extended set of fortran basic linear algebra subprograms. ACM Transactions on Mathematical Software, 14:18-32, 1988.
[17]
The Apache Software Foundation. Apache hadoop nextgen mapreduce (yarn). http://hadoop.apache.org/.
[18]
The Apache Software Foundation. Apache hadoop, 2009. http://hadoop.apache.org/core/.
[19]
F. Girosi, M. Jones, and T. Poggio. Priors, stabilizers and basis functions: From regularization to radial, tensor and additive splines. A.I. Memo 1430, Artificial Intelligence Laboratory, Massachusetts Institute of Technology, 1993.
[20]
T.L. Griffiths and M. Steyvers. Finding scientific topics. Proceedings of the National Academy of Sciences, 101:5228-5235, 2004.
[21]
S. H. Gunderson. Snappy: A fast compressor/decompressor. https://code.google.com/p/snappy/.
[22]
T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning. Springer, New York, 2 edition, 2009.
[23]
B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A. D. Joseph, R. Katz, S. Shenker, and I. Stoica. Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 8th USENIX conference on Networked systems design and implementation, pages 22-22, 2011.
[24]
Q. Ho, J. Cipar, H. Cui, S. Lee, J. Kim, P. Gibbons, G. Gibson, G. Ganger, and E. Xing. More effective distributed ml via a stale synchronous parallel parameter server. In NIPS, 2013.
[25]
M. Hoffman, D. M. Blei, C. Wang, and J. Paisley. Stochastic variational inference. In International Conference on Machine Learning, 2012.
[26]
W. Karush. Minima of functions of several variables with inequalities as side constraints. Master's thesis, Dept. of Mathematics, Univ. of Chicago, 1939.
[27]
L. Kim. How many ads does Google serve in a day?, 2012. http://goo.gl/oIidXO.
[28]
D. Koller and N. Friedman. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
[29]
T. Kraska, A. Talwalkar, J. C. Duchi, R. Griffith, M. J. Franklin, and M. I. Jordan. Mlbase: A distributed machine-learning system. In CIDR, 2013.
[30]
L. Lamport. Paxos made simple. ACM Sigact News, 32(4):18-25, 2001.
[31]
M. Li, D. G. Andersen, and A. J. Smola. Distributed delayed proximal gradient methods. In NIPS Workshop on Optimization for Machine Learning, 2013.
[32]
M. Li, D. G. Andersen, and A. J. Smola. Communication Efficient Distributed Machine Learning with the Parameter Server. In Neural Information Processing Systems, 2014.
[33]
M. Li, L. Zhou, Z. Yang, A. Li, F. Xia, D. G. Andersen, and A. J. Smola. Parameter server for distributed machine learning. In Big Learning NIPS Workshop, 2013.
[34]
Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin, and J. M. Hellerstein. Distributed Graphlab: A framework for machine learning and data mining in the cloud. In PVLDB, 2012.
[35]
H. B. McMahan, G. Holt, D. Sculley, M. Young, D. Ebner, J. Grady, L. Nie, T. Phillips, E. Davydov, and D. Golovin. Ad click prediction: a view from the trenches. In KDD, 2013.
[36]
K. P. Murphy. Machine learning: a probabilistic perspective. MIT Press, 2012.
[37]
D. G. Murray, F. McSherry, R. Isaacs, M. Isard, P. Barham, and M. Abadi. Naiad: a timely dataflow system. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 439-455. ACM, 2013.
[38]
A. Phanishayee, D. G. Andersen, H. Pucha, A. Povzner, and W. Belluomini. Flex-KV: Enabling high-performance and flexible KV systems. In Proceedings of the 2012 workshop on Management of big data systems, pages 19-24. ACM, 2012.
[39]
R. Power and J. Li. Piccolo: Building fast, distributed programs with partitioned tables. In R. H. Arpaci-Dusseau and B. Chen, editors, Operating Systems Design and Implementation, OSDI, pages 293-306. USENIX Association, 2010.
[40]
PRObE Project. Parallel Reconfigurable Observational Environment. https://www.nmc-probe.org/wiki/ Machines:Susitna,
[41]
A. Rowstron and P. Druschel. Pastry: Scalable, decentralized object location and routing for large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware), pages 329-350, Heidelberg, Germany, November 2001.
[42]
B. Schölkopf and A. J. Smola. Learning with Kernels. MIT Press, Cambridge, MA, 2002.
[43]
A. J. Smola and S. Narayanamurthy. An architecture for parallel topic models. In Very Large Databases (VLDB), 2010.
[44]
E. Sparks, A. Talwalkar, V. Smith, J. Kottalam, X. Pan, J. Gonzalez, M. J. Franklin, M. I. Jordan, and T. Kraska. Mli: An api for distributed machine learning. 2013.
[45]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for internet applications. ACM SIGCOMM Computer Communication Review, 31(4):149-160, 2001.
[46]
C.H. Teo, Q. Le, A. J. Smola, and S. V. N. Vishwanathan. A scalable modular convex solver for regularized risk minimization. In Proc. ACM Conf. Knowledge Discovery and Data Mining (KDD). ACM, 2007.
[47]
R. van Renesse and F. B. Schneider. Chain replication for supporting high throughput and availability. In OSDI, volume 4, pages 91-104, 2004.
[48]
V. Vapnik. The Nature of Statistical Learning Theory. Springer, New York, 1995.
[49]
R. C. Whaley, A. Petitet, and J.J. Dongarra. Automated empirical optimization of software and the ATLAS project. Parallel Computing, 27(1-2):3-35, 2001.
[50]
M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. M. Ma, M. McCauley, M. J. Franklin, S. Shenker, and I. Stoica. Fast and interactive analytics over Hadoop data with Spark. USENIX; login:, 37(4):45-51, August 2012.

Cited By

View all
  • (2024)How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental StudyProceedings of the VLDB Endowment10.14778/3648160.364816517:6(1214-1226)Online publication date: 1-Feb-2024
  • (2024)R-Pingmesh: A Service-Aware RoCE Network Monitoring and Diagnostic SystemProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672264(554-567)Online publication date: 4-Aug-2024
  • (2024)Towards Geo-Distributed Training of ML Models in a Multi-Cloud EnvironmentCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651422(211-217)Online publication date: 7-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
OSDI'14: Proceedings of the 11th USENIX conference on Operating Systems Design and Implementation
October 2014
676 pages
ISBN:9781931971164

Sponsors

  • USENIX Assoc: USENIX Assoc

In-Cooperation

Publisher

USENIX Association

United States

Publication History

Published: 06 October 2014

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)How Can We Train Deep Learning Models Across Clouds and Continents? An Experimental StudyProceedings of the VLDB Endowment10.14778/3648160.364816517:6(1214-1226)Online publication date: 1-Feb-2024
  • (2024)R-Pingmesh: A Service-Aware RoCE Network Monitoring and Diagnostic SystemProceedings of the ACM SIGCOMM 2024 Conference10.1145/3651890.3672264(554-567)Online publication date: 4-Aug-2024
  • (2024)Towards Geo-Distributed Training of ML Models in a Multi-Cloud EnvironmentCompanion of the 15th ACM/SPEC International Conference on Performance Engineering10.1145/3629527.3651422(211-217)Online publication date: 7-May-2024
  • (2024)HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program SynthesisProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3629580(524-541)Online publication date: 22-Apr-2024
  • (2024)Training Job Placement in Clusters with Statistical In-Network AggregationProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3617232.3624863(420-434)Online publication date: 27-Apr-2024
  • (2024)Byzantine Machine Learning: A PrimerACM Computing Surveys10.1145/361653756:7(1-39)Online publication date: 9-Apr-2024
  • (2024)Hybrid-Memcached: A Novel Approach for Memcached Persistence Optimization With Hybrid MemoryIEEE Transactions on Computers10.1109/TC.2024.338527973:7(1866-1874)Online publication date: 4-Apr-2024
  • (2023)Pre-RMSNorm and Pre-CRMSNorm transformersProceedings of the 37th International Conference on Neural Information Processing Systems10.5555/3666122.3668105(45777-45793)Online publication date: 10-Dec-2023
  • (2023)Auto-differentiation of relational computations for very large scale machine learningProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619806(33581-33598)Online publication date: 23-Jul-2023
  • (2023)Momentum ensures convergence of SIGNSGD under weaker assumptionsProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619781(33077-33099)Online publication date: 23-Jul-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media