Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3426745.3431338acmconferencesArticle/Chapter ViewAbstractPublication PagesconextConference Proceedingsconference-collections
research-article

Maggy: Scalable Asynchronous Parallel Hyperparameter Search

Published: 01 December 2020 Publication History

Abstract

Running extensive experiments is essential for building Machine Learning (ML) models. Such experiments usually require iterative execution of many trials with varying run times. In recent years, Apache Spark has become the de-facto standard for parallel data processing in the industry, in which iterative processes are implemented within the bulk-synchronous parallel (BSP) execution model. The BSP approach is also being used to parallelize ML trials in Spark. However, the BSP task synchronization barriers prevent asynchronous execution of trials, which leads to a reduced number of trials that can be run on a given computational budget. In this paper, we introduce Maggy, an open-source framework based on Spark, to execute ML trials asynchronously in parallel, with the ability to early stop poorly performing trials. In the experiments, we compare Maggy with the BSP execution of parallel trials in Spark and show that on random hyperparameter search on a convolutional neural network for the Fashion-MNIST dataset Maggy reduces the required time to execute a fixed number of trials by 33% to 58%, without any loss in the final model accuracy.

Supplementary Material

MP4 File (3426745.3431338.mp4)
Presentation of "Maggy: Scalable Asynchronous Parallel Hyperparameter Search" for the 1st Workshop on Distributed Machine Learning (DistributedML'20), co-located with CoNEXT 2020, December 1st, 2020, Barcelona, Spain.

References

[1]
Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ( {OSDI} 16). 265--283.
[2]
Ahsan S Alvi, Binxin Ru, Jan Calliess, Stephen J Roberts, and Michael A Os-borne. 2019. Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation. arXiv preprint arXiv:1901.10452 (2019).
[3]
Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. 2017. Practical Neural Network Performance Prediction for Early Stopping. arXiv preprint arXiv:1705.10823 2, 3 (2017), 6.
[4]
James Bergstra, Daniel Yamins, and David Cox. 2012. Hyperopt: Distributed Asynchronous Hyper-parameter Optimization. Retrieved May 21, 2020 from http://hyperopt.github.io/hyperopt
[5]
James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In International Conference on Machine Learning. 115--123.
[6]
Franćois Chollet. 2020. Keras: The Next Five Years. Retrieved May 21, 2020 from https://www.youtube.com/watch?v=HBqCpWldPII
[7]
Databricks.2019. Scaling Hyperopt to Tune Machine Learning Models in Python. Retrieved Sep 18, 2020 from https://databricks.com/blog/2019/10/29/scaling-hyperopt-to-tune-machine-learning-models-in-python.html
[8]
Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and Efficient Hyperparameter Optimization at Scale. arXiv preprint arXiv:1807.01774 (2018).
[9]
David Ginsbourger, Janis Janusevskis, and Rodolphe Le Riche. 2011. Dealing with Asynchronicity in Parallel Gaussian Process based Global Optimization.
[10]
Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495.
[11]
Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.
[12]
Kevin Jamieson and Ameet Talwalkar. 2016. Non-stochastic Best Arm Identification and Hyperparameter Optimization. In Artificial Intelligence and Statistics. 240--248.
[13]
Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, and Barnabás Póczos. 2018. Parallelised Bayesian Optimisation via Thompson Sampling. In International Conference on Artificial Intelligence and Statistics. 133--142.
[14]
Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765--6816.
[15]
Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2018. Massively Parallel Hyperparameter Tuning. arXiv preprint arXiv:1810.05934 (2018).
[16]
Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E Gonzalez, and Ion Stoica. 2018. Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint arXiv:1807.05118 (2018).
[17]
Moritz Meister, Sina Sheikholeslami, Robin Andersson, Alexandru A Ormenisan, and Jim Dowling. 2020. Towards Distribution Transparency for Supervised ML With Oblivious Training Functions. In Workshop on MLOps Systems.
[18]
Richard Meyes, Melanie Lu, Constantin Waubert de Puiseau, and Tobias Meisen. 2019. Ablation Studies in Artificial Neural Networks. arXiv preprint arXiv:1901.08644 (2019).
[19]
Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561--577.
[20]
Tom O'Malley, Elie Bursztein, James Long, François Chollet, Haifeng Jin, Luca Invernizzi, et al. 2019. Keras Tuner. Retrieved May 21, 2020 from https://github.com/keras-team/keras-tuner
[21]
Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, and Yoshua Bengio. 2016. Deconstructing the Ladder Network Architecture. In International Conference on Machine Learning. 2368--2376.
[22]
Lutz Prechelt. 1998. Early stopping-but when? In Neural Networks: Tricks of the trade. Springer, 55--69.
[23]
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 1 (2015), 148--175.
[24]
Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).
[25]
Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Presented as part of the 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12). 15--28.
[26]
Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.

Cited By

View all
  • (2024)MFRLMO: Model-free reinforcement learning for multi-objective optimization of apache sparkICST Transactions on Scalable Information Systems10.4108/eetsis.476411:5Online publication date: 20-Feb-2024
  • (2023)Tuning parameters of Apache Spark with Gauss–Pareto-based multi-objective optimizationKnowledge and Information Systems10.1007/s10115-023-02032-z66:2(1065-1090)Online publication date: 13-Dec-2023
  • (2022)Scalable Artificial Intelligence for Earth Observation Data Using HopsworksRemote Sensing10.3390/rs1408188914:8(1889)Online publication date: 14-Apr-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning
December 2020
46 pages
ISBN:9781450381826
DOI:10.1145/3426745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Asynchronous Hyperparameter Optimization
  2. Machine Learning
  3. Scalable Hyperparameter Search

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

CoNEXT '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 5 of 10 submissions, 50%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)4
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MFRLMO: Model-free reinforcement learning for multi-objective optimization of apache sparkICST Transactions on Scalable Information Systems10.4108/eetsis.476411:5Online publication date: 20-Feb-2024
  • (2023)Tuning parameters of Apache Spark with Gauss–Pareto-based multi-objective optimizationKnowledge and Information Systems10.1007/s10115-023-02032-z66:2(1065-1090)Online publication date: 13-Dec-2023
  • (2022)Scalable Artificial Intelligence for Earth Observation Data Using HopsworksRemote Sensing10.3390/rs1408188914:8(1889)Online publication date: 14-Apr-2022
  • (2022)Accelerate Model Parallel Deep Learning Training Using Effective Graph Traversal Order in Device PlacementDistributed Applications and Interoperable Systems10.1007/978-3-031-16092-9_8(114-130)Online publication date: 6-Sep-2022
  • (2021)AutoAblationProceedings of the 1st Workshop on Machine Learning and Systems10.1145/3437984.3458834(55-61)Online publication date: 26-Apr-2021
  • (2021)ExtremeEarth Meets Satellite Data From SpaceIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2021.310798214(9038-9063)Online publication date: 2021
  • (2021)Jespipe: A Plugin-Based, Open MPI Framework for Adversarial Machine Learning Analysis2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671385(3663-3670)Online publication date: 15-Dec-2021
  • (2021)Distributed training and scalability for the particle clustering method UClusterEPJ Web of Conferences10.1051/epjconf/202125102054251(02054)Online publication date: 23-Aug-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media