Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Just move it!: dynamic parameter allocation in action

Published: 01 July 2021 Publication History

Abstract

Parameter servers (PSs) ease the implementation of distributed machine learning systems, but their performance can fall behind that of single machine baselines due to communication overhead. We demonstrate Lapse, an open source PS with dynamic parameter allocation. Previous work has shown that dynamic parameter allocation can improve PS performance by up to two orders of magnitude and lead to near-linear speed-ups over single machine baselines. This demonstration illustrates how Lapse is used and why it can provide order-of-magnitude speed-ups over other PSs. To do so, this demonstration interactively analyzes and visualizes how dynamic parameter allocation looks like in action.

References

[1]
M. Abadi, P. Barham, J. Chen, et al. TensorFlow: A system for large-scale machine learning. OSDI '16, pp. 265--283.
[2]
A. Ahmed, M. Aly, J. Gonzalez, S. Narayanamurthy, A. Smola. Scalable inference in latent variable models. WSDM '12, pp. 123--132.
[3]
A. Beutel, P. P. Talukdar, A. Kumar, C. Faloutsos, E. Papalexakis, E. Xing. FlexiFaCT: Scalable flexible factorization of coupled tensors on Hadoop. SDM '14, pp. 109--117.
[4]
T. Chen, M. Li, Y. Li, et al. MXNet: A flexible and efficient machine learning library for heterogeneous distributed systems. CoRR, abs/1512.01274, 2015.
[5]
T. Chilimbi, Y. Suzue, J. Apacible, K. Kalyanaraman. Project Adam: Building an efficient and scalable deep learning training system. OSDI '14, p. 571--582.
[6]
W. Dai, A. Kumar, J. Wei, Q. Ho, G. Gibson, E. P. Xing. High-performance distributed ML at scale through parameter server consistency models. AAAI '15.
[7]
J. Dean, G. Corrado, R. Monga, et al. Large scale distributed deep networks. NIPS '12, pp. 1223--1231.
[8]
R. Gemulla, E. Nijkamp, P. Haas, Y. Sismanis. Large-scale matrix factorization with distributed stochastic gradient descent. KDD '11, pp. 69--77.
[9]
J. Gonzalez, Y. Low, H. Gu, D. Bickson, C. Guestrin. PowerGraph: Distributed graph-parallel computation on natural graphs. OSDI '12, pp. 17--30.
[10]
Q. Ho, J. Cipar, H. Cui, et al. More effective distributed ML via a stale synchronous parallel parameter server. NIPS '13, pp. 1223--1231.
[11]
Y. Huang, T. Jin, Y. Wu, et al. FlexPS: Flexible parallelism control in parameter server architecture. PVLDB, 11(5):566--579, 2018.
[12]
R. Jagerman, C. Eickhoff, M. de Rijke. Computing web-scale topic models using an asynchronous parameter server. SIGIR '17, pp. 1337--1340.
[13]
J. Jiang, B. Cui, C. Zhang, L. Yu. Heterogeneity-aware distributed parameter servers. SIGMOD '17, p. 463--478.
[14]
J. Kim, Q. Ho, S. Lee, et al. STRADS: A distributed framework for scheduled model parallel machine learning. EuroSys '16, pp. 5:1--5:16.
[15]
J. K. Kim, A. Aghayev, G. Gibson, E. Xing. STRADS-AP: Simplifying distributed machine learning programming without introducing a new programming model. USENIX '19, pp. 207--222.
[16]
A. Lerer, L. Wu, J. Shen, T. Lacroix, L. Wehrstedt, A. Bose, A. Peysakhovich. PyTorch-BigGraph: A large-scale graph embedding system. SysML '19.
[17]
M. Li, D. Andersen, J. W. Park, A. Smola, A. Ahmed, V. Josifovski, J. Long, E. Shekita, B.-Y. Su. Scaling distributed machine learning with the parameter server. OSDI '14, pp. 583--598.
[18]
Y. Low, D. Bickson, J. Gonzalez, C. Guestrin, A. Kyrola, J. Hellerstein. Distributed GraphLab: A framework for machine learning and data mining in the cloud. PVLDB, 5(8):716--727, 2012.
[19]
B. Peng, B. Zhang, L. Chen, M. Avram, R. Henschel, C. Stewart, S. Zhu, E. Mccallum, L. Smith, T. Zahniser, et al. HarpLDA+: Optimizing latent dirichlet allocation for parallel efficiency. BigData '17, pp. 243--252.
[20]
P. Raman, S. Srinivasan, S. Matsushima, X. Zhang, H. Yun, S. Vishwanathan. Scaling multinomial logistic regression via hybrid parallelism. KDD '19, pp. 1460--1470.
[21]
A. Renz-Wieland, R. Gemulla, S. Zeuch, V. Markl. Dynamic parameter allocation in parameter servers. PVLDB, 13(12):1877--1890, 2020.
[22]
A. Smola, S. Narayanamurthy. An architecture for parallel topic models. PVLDB, 3(1--2):703--710, 2010.
[23]
C. Teflioudi, F. Makari, R. Gemulla. Distributed matrix completion. ICDM '12, pp. 655--664.
[24]
H. Yun, H.-F. Yu, C.-J. Hsieh, S. Vishwanathan, I. Dhillon. NOMAD: Non-locking, stochastic multi-machine algorithm for asynchronous and decentralized matrix completion. PVLDB, 7(11):975--986, 2014.
[25]
Z. Zhang, B. Cui, Y. Shao, L. Yu, J. Jiang, X. Miao. PS2: Parameter server on Spark. SIGMOD '19, pp. 376--388.

Cited By

View all
  • (2022)HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model TrainingProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517902(470-480)Online publication date: 10-Jun-2022
  • (2022)NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter AccessProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517860(481-495)Online publication date: 10-Jun-2022

Index Terms

  1. Just move it!: dynamic parameter allocation in action
      Index terms have been assigned to the content through auto-classification.

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image Proceedings of the VLDB Endowment
      Proceedings of the VLDB Endowment  Volume 14, Issue 12
      July 2021
      587 pages
      ISSN:2150-8097
      Issue’s Table of Contents

      Publisher

      VLDB Endowment

      Publication History

      Published: 01 July 2021
      Published in PVLDB Volume 14, Issue 12

      Qualifiers

      • Research-article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)4
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 22 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)HET-GMP: A Graph-based System Approach to Scaling Large Embedding Model TrainingProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517902(470-480)Online publication date: 10-Jun-2022
      • (2022)NuPS: A Parameter Server for Machine Learning with Non-Uniform Parameter AccessProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3517860(481-495)Online publication date: 10-Jun-2022

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media