Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3386126acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
short-paper

Elastic Machine Learning Algorithms in Amazon SageMaker

Published: 31 May 2020 Publication History
  • Get Citation Alerts
  • Abstract

    There is a large body of research on scalable machine learning (ML). Nevertheless, training ML models on large, continuously evolving datasets is still a difficult and costly undertaking for many companies and institutions. We discuss such challenges and derive requirements for an industrial-scale ML platform. Next, we describe the computational model behind Amazon SageMaker, which is designed to meet such challenges. SageMaker is an ML platform provided as part of Amazon Web Services (AWS), and supports incremental training, resumable and elastic learning as well as automatic hyperparameter optimization. We detail how to adapt several popular ML algorithms to its computational model. Finally, we present an experimental evaluation on large datasets, comparing SageMaker to several scalable, JVM-based implementations of ML algorithms, which we significantly outperform with regard to computation time and cost.

    Supplementary Material

    MP4 File (3318464.3386126.mp4)
    Presentation Video

    References

    [1]
    Mart'in Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. OSDI (2016), 265--283.
    [2]
    David Arthur and Sergei Vassilvitskii. 2007. k-means+: The advantages of careful seeding. SISAM (2007), 1027--1035.
    [3]
    Bahman Bahmani, Benjamin Moseley, Andrea Vattani, Ravi Kumar, and Sergei Vassilvitskii. 2012. Scalable k-means+. VLDB, Vol. 5, 7 (2012), 622--633.
    [4]
    Denis Baylor, Eric Breck, Heng-Tze Cheng, Noah Fiedel, Chuan Yu Foo, Zakaria Haque, Salem Haykal, Mustafa Ispir, Vihan Jain, Levent Koc, et al. 2017. Tfx: A tensorflow-based production-scale machine learning platform. (2017), 1387--1395.
    [5]
    Christoph Boden, Andrea Spina, Tilmann Rabl, and Volker Markl. 2017. Benchmarking data flow systems for scalable machine learning. In Workshop on Algorithms and Systems for MapReduce and Beyond at ACM Sigmod.
    [6]
    Matthias Boehm, Iulian Antonov, Mark Dokter, Robert Ginthoer, Kevin Innerebner, Florijan Klezin, Stefanie Lindstaedt, Arnab Phani, and Benjamin Rath. 2019. SystemDS: A Declarative Machine Learning System for the End-to-End Data Science Lifecycle. arxiv: cs.DB/1909.02976
    [7]
    Matthias Boehm, Michael W Dusenberry, Deron Eriksson, Alexandre V Evfimievski, Faraz Makari Manshadi, Niketan Pansare, Berthold Reinwald, Frederick R Reiss, Prithviraj Sen, Arvind C Surve, et al. 2016. Systemml: Declarative machine learning on spark. VLDB, Vol. 9, 13 (2016), 1425--1436.
    [8]
    Joos-Hendrik Böse, Valentin Flunkert, Jan Gasthaus, Tim Januschowski, Dustin Lange, David Salinas, Sebastian Schelter, Matthias Seeger, and Yuyang Wang. 2017. Probabilistic demand forecasting at scale. VLDB, Vol. 10, 12 (2017), 1694--1705.
    [9]
    Oscar Boykin, Sam Ritchie, Ian O'Connell, and Jimmy Lin. 2014. Summingbird: A framework for integrating batch and online mapreduce computations. VLDB, Vol. 7, 13 (2014), 1441--1451.
    [10]
    Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. ML Systems workshop at NeurIPS (2015).
    [11]
    Dan Feldman, Melanie Schmidt, and Christian Sohler. 2013. Turning big data into tiny data: Constant-size coresets for k-means, pca and projective clustering. SIAM (2013), 1434--1453.
    [12]
    Valentin Flunkert, David Salinas, Jan Gasthaus, and Tim Januschowski. 2019. DeepAR: Probabilistic forecasting with autoregressive recurrent networks. International Journal of Forecasting (2019).
    [13]
    Sudipto Guha, Adam Meyerson, Nina Mishra, Rajeev Motwani, and Liadan O'Callaghan. 2003. Clustering data streams: Theory and practice. TKDE, Vol. 15, 3 (2003), 515--528.
    [14]
    Charles AR Hoare. 1961. Algorithm 65: find. Commun. ACM, Vol. 4, 7 (1961), 321--322.
    [15]
    Rob J Hyndman and Yeasmin Khandakar. 2008. Automatic time series forecasting: the forecast package for R. Journal of Statistical Software (2008).
    [16]
    Tim Januschowski, Jan Gasthaus, Yuyang Wang, Syama Sundar Rangapuram, and Laurent Callot. 2018. Deep Learning for Forecasting: Current Trends and Challenges. Foresight: The International Journal of Applied Forecasting, Vol. 51 (2018), 42--47.
    [17]
    Tim Januschowski and Stephan Kolassa. 2019. A Classification of Business Forecasting Problems. Foresight. The Applied Forecasting Journal (2019).
    [18]
    Zohar S. Karnin, Kevin J. Lang, and Edo Liberty. 2016. Optimal Quantile Approximation in Streams. FOCS (2016), 71--78.
    [19]
    Arun Kumar, Robert McCann, Jeffrey Naughton, and Jignesh M Patel. 2016. Model selection management systems: The next frontier of advanced analytics. SIGMOD Record, Vol. 44, 4 (2016), 17--22.
    [20]
    Kalev Leetaru and Philip A. Schrodt. 2013. GDELT: Global data on events, location, and tone. ISA Annual Convention (2013). http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.686.6605
    [21]
    Mu Li, David G. Andersen, Jun Woo Park, Alexander J. Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J. Shekita, and Bor-Yiing Su. 2014a. Scaling Distributed Machine Learning with the Parameter Server. OSDI (2014), 583--598. http://dl.acm.org/citation.cfm?id=2685048.2685095
    [22]
    Mu Li, David G Andersen, Alexander J Smola, and Kai Yu. 2014b. Communication efficient distributed machine learning with the parameter server. NeurIPS (2014), 19--27.
    [23]
    Mu Li, Ziqi Liu, Alexander J. Smola, and Yu-Xiang Wang. 2016. DiFacto: Distributed Factorization Machines. WSDM (2016), 377--386. https://doi.org/10.1145/2835776.2835781
    [24]
    Edo Liberty, Ram Sriharsha, and Maxim Sviridenko. 2016. An algorithm for online k-means clustering. Algorithm Engineering and Experiments workshop at SIAM (2016), 81--89.
    [25]
    Stuart Lloyd. 1982. Least squares quantization in PCM. IEEE Transactions on Information Theory, Vol. 28, 2 (1982), 129--137.
    [26]
    Yucheng Low, Danny Bickson, Joseph Gonzalez, Carlos Guestrin, Aapo Kyrola, and Joseph M Hellerstein. 2012. Distributed GraphLab: a framework for machine learning and data mining in the cloud. VLDB, Vol. 5, 8 (2012), 716--727.
    [27]
    Spyros Makridakis, Evangelos Spiliotis, and Vassilios Assimakopoulos. 2019. The M4 Competition: 100,000 time series and 61 mehotds. International Journal of Forecasting (2019).
    [28]
    Andrew Kachites McCallum. 2002. MALLET: A Machine Learning for Language Toolkit. (2002). http://mallet.cs.umass.edu.
    [29]
    H Brendan McMahan, Gary Holt, David Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, et al. 2013. Ad click prediction: a view from the trenches. KDD (2013), 1222--1230.
    [30]
    Xiangrui Meng, Joseph Bradley, Burak Yavuz, Evan Sparks, Shivaram Venkataraman, Davies Liu, Jeremy Freeman, DB Tsai, Manish Amde, Sean Owen, et al. 2016. Mllib: Machine learning in apache spark. JMLR, Vol. 17, 1 (2016), 1235--1241.
    [31]
    Yishu Miao, Lei Yu, and Phil Blunsom. 2016. Neural variational inference for text processing. ICML (2016).
    [32]
    Feng Niu, Benjamin Recht, Christopher Re, and Stephen J. Wright. 2011. HOGWILD!: A Lock-free Approach to Parallelizing Stochastic Gradient Descent. NeurIPS (2011), 693--701.
    [33]
    Adam Paszke, Sam Gross, Soumith Chintala, and Gregory Chanan. 2017. Pytorch: Tensors and dynamic neural networks in python with strong gpu acceleration. PyTorch: Tensors and dynamic neural networks in Python with strong GPU acceleration, Vol. 6 (2017).
    [34]
    Neoklis Polyzotis, Sudip Roy, Steven Euijong Whang, and Martin Zinkevich. 2018. Data lifecycle challenges in production machine learning: a survey. SIGMOD Record, Vol. 47, 2 (2018), 17--28.
    [35]
    Steffen Rendle. 2010. Factorization Machines. ICDM (2010), 995--1000.
    [36]
    Steffen Rendle. 2012. Factorization Machines with libFM. ACM TIST, Vol. 3 (2012), 57:1--57:22.
    [37]
    Sebastian Schelter, Felix Biessmann, Tim Januschowski, David Salinas, Stephan Seufert, Gyuri Szarvas, Manasi Vartak, Samuel Madden, Hui Miao, Amol Deshpande, et al. 2018. On Challenges in Machine Learning Model Management. IEEE Data Engineering Bulletin, Vol. 41, 4 (2018), 5--15.
    [38]
    Sebastian Schelter, Stefan Grafberger, Philipp Schmidt, Tammo Rukat, Mario Kiessling, Andrey Taptunov, Felix Biessmann, and Dustin Lange. 2019. Differential Data Quality Verification on Partitioned Data. In 2019 IEEE 35th International Conference on Data Engineering (ICDE). IEEE, 1940--1945.
    [39]
    Sebastian Schelter, Venu Satuluri, and Reza Zadeh. 2014. Factorbird-a parameter server approach to distributed matrix factorization. Distributed ML and Matrix computations workshop at NeurIPS (2014).
    [40]
    David Sculley. 2010. Web-scale k-means clustering. WWW (2010), 1177--1178.
    [41]
    David Sculley, Gary Holt, Daniel Golovin, Eugene Davydov, Todd Phillips, Dietmar Ebner, Vinay Chaudhary, Michael Young, Jean-Francois Crespo, and Dan Dennison. 2015. Hidden technical debt in machine learning systems. NeurIPS (2015), 2503--2511.
    [42]
    Michael Shindler, Alex Wong, and Adam W Meyerson. 2011. Fast and accurate k-means for large datasets. NeurIPS (2011), 2375--2383.
    [43]
    S Smyl, J Ranganathan, and A Pasqua. 2018. M4 Forecasting Competition: Introducing a New Hybrid ES-RNN Model. URL: https://eng.uber.com/m4-forecasting-competition (2018).
    [44]
    Ce Zhang and Christopher Ré. 2014. Dimmwitted: A study of main-memory statistical analytics. VLDB, Vol. 7, 12 (2014), 1283--1294.

    Cited By

    View all
    • (2024)Network Intrusion Detection system with Machine learning Intrusion Detection System with Machine Learning As a ServiceJournal of Information Systems Applied Research10.62273/EWQL502317:3(4-15)Online publication date: 2024
    • (2024)PACE: Poisoning Attacks on Learned Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36392922:1(1-27)Online publication date: 26-Mar-2024
    • Show More Cited By

    Index Terms

    1. Elastic Machine Learning Algorithms in Amazon SageMaker

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
      June 2020
      2925 pages
      ISBN:9781450367356
      DOI:10.1145/3318464
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 31 May 2020

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. elastic machine learning
      2. scalable machine learning

      Qualifiers

      • Short-paper

      Conference

      SIGMOD/PODS '20
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)209
      • Downloads (Last 6 weeks)16

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Network Intrusion Detection system with Machine learning Intrusion Detection System with Machine Learning As a ServiceJournal of Information Systems Applied Research10.62273/EWQL502317:3(4-15)Online publication date: 2024
      • (2024)PACE: Poisoning Attacks on Learned Cardinality EstimationProceedings of the ACM on Management of Data10.1145/36392922:1(1-27)Online publication date: 26-Mar-2024
      • (2024)COSMO: A Large-Scale E-commerce Common Sense Knowledge Generation and Serving System at AmazonCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3653398(148-160)Online publication date: 9-Jun-2024
      • (2024)A Comprehensive Defense Framework Against Model Extraction AttacksIEEE Transactions on Dependable and Secure Computing10.1109/TDSC.2023.326132721:2(685-700)Online publication date: Mar-2024
      • (2024)Monitoring tropical freshwater fish with underwater videography and deep learningMarine and Freshwater Research10.1071/MF2316675:10Online publication date: 2-Jul-2024
      • (2024)Analysis of Deep Learning Development Platforms and Their Applications in Sustainable Development within the Education SectorE3S Web of Conferences10.1051/e3sconf/202447700098477(00098)Online publication date: 16-Jan-2024
      • (2023)FASTune: Towards Fast and Stable Database Tuning System with Reinforcement LearningElectronics10.3390/electronics1210216812:10(2168)Online publication date: 10-May-2023
      • (2023)F3KM: Federated, Fair, and Fast k-meansProceedings of the ACM on Management of Data10.1145/36267281:4(1-25)Online publication date: 12-Dec-2023
      • (2023)EdgeFM: Leveraging Foundation Model for Open-set Learning on the EdgeProceedings of the 21st ACM Conference on Embedded Networked Sensor Systems10.1145/3625687.3625793(111-124)Online publication date: 12-Nov-2023
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media