research-article

Maggy: Scalable Asynchronous Parallel Hyperparameter Search

Authors:

Moritz Meister,

Sina Sheikholeslami,

Amir H. Payberah,

Vladimir Vlassov,

Jim DowlingAuthors Info & Claims

DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning

Pages 28 - 33

https://doi.org/10.1145/3426745.3431338

Published: 01 December 2020 Publication History

Abstract

Running extensive experiments is essential for building Machine Learning (ML) models. Such experiments usually require iterative execution of many trials with varying run times. In recent years, Apache Spark has become the de-facto standard for parallel data processing in the industry, in which iterative processes are implemented within the bulk-synchronous parallel (BSP) execution model. The BSP approach is also being used to parallelize ML trials in Spark. However, the BSP task synchronization barriers prevent asynchronous execution of trials, which leads to a reduced number of trials that can be run on a given computational budget. In this paper, we introduce Maggy, an open-source framework based on Spark, to execute ML trials asynchronously in parallel, with the ability to early stop poorly performing trials. In the experiments, we compare Maggy with the BSP execution of parallel trials in Spark and show that on random hyperparameter search on a convolutional neural network for the Fashion-MNIST dataset Maggy reduces the required time to execute a fixed number of trials by 33% to 58%, without any loss in the final model accuracy.

Supplementary Material

MP4 File (3426745.3431338.mp4)

Presentation of "Maggy: Scalable Asynchronous Parallel Hyperparameter Search" for the 1st Workshop on Distributed Machine Learning (DistributedML'20), co-located with CoNEXT 2020, December 1st, 2020, Barcelona, Spain.

Download
38.90 MB

References

[1]

Martín Abadi, Paul Barham, Jianmin Chen, Zhifeng Chen, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Geoffrey Irving, Michael Isard, et al. 2016. Tensorflow: A system for large-scale machine learning. In 12th {USENIX} symposium on operating systems design and implementation ( {OSDI} 16). 265--283.

Digital Library

[2]

Ahsan S Alvi, Binxin Ru, Jan Calliess, Stephen J Roberts, and Michael A Os-borne. 2019. Asynchronous Batch Bayesian Optimisation with Improved Local Penalisation. arXiv preprint arXiv:1901.10452 (2019).

[3]

Bowen Baker, Otkrist Gupta, Ramesh Raskar, and Nikhil Naik. 2017. Practical Neural Network Performance Prediction for Early Stopping. arXiv preprint arXiv:1705.10823 2, 3 (2017), 6.

[4]

James Bergstra, Daniel Yamins, and David Cox. 2012. Hyperopt: Distributed Asynchronous Hyper-parameter Optimization. Retrieved May 21, 2020 from http://hyperopt.github.io/hyperopt

[5]

James Bergstra, Daniel Yamins, and David Cox. 2013. Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures. In International Conference on Machine Learning. 115--123.

Digital Library

[6]

Franćois Chollet. 2020. Keras: The Next Five Years. Retrieved May 21, 2020 from https://www.youtube.com/watch?v=HBqCpWldPII

[7]

Databricks.2019. Scaling Hyperopt to Tune Machine Learning Models in Python. Retrieved Sep 18, 2020 from https://databricks.com/blog/2019/10/29/scaling-hyperopt-to-tune-machine-learning-models-in-python.html

[8]

Stefan Falkner, Aaron Klein, and Frank Hutter. 2018. BOHB: Robust and Efficient Hyperparameter Optimization at Scale. arXiv preprint arXiv:1807.01774 (2018).

[9]

David Ginsbourger, Janis Janusevskis, and Rodolphe Le Riche. 2011. Dealing with Asynchronicity in Parallel Gaussian Process based Global Optimization.

[10]

Daniel Golovin, Benjamin Solnik, Subhodeep Moitra, Greg Kochanski, John Karro, and D Sculley. 2017. Google Vizier: A Service for Black-Box Optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 1487--1495.

Digital Library

[11]

Frank Hutter, Lars Kotthoff, and Joaquin Vanschoren. 2019. Automated Machine Learning: Methods, Systems, Challenges. Springer Nature.

[12]

Kevin Jamieson and Ameet Talwalkar. 2016. Non-stochastic Best Arm Identification and Hyperparameter Optimization. In Artificial Intelligence and Statistics. 240--248.

[13]

Kirthevasan Kandasamy, Akshay Krishnamurthy, Jeff Schneider, and Barnabás Póczos. 2018. Parallelised Bayesian Optimisation via Thompson Sampling. In International Conference on Artificial Intelligence and Statistics. 133--142.

[14]

Lisha Li, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization. The Journal of Machine Learning Research 18, 1 (2017), 6765--6816.

Digital Library

[15]

Liam Li, Kevin Jamieson, Afshin Rostamizadeh, Ekaterina Gonina, Moritz Hardt, Benjamin Recht, and Ameet Talwalkar. 2018. Massively Parallel Hyperparameter Tuning. arXiv preprint arXiv:1810.05934 (2018).

[16]

Richard Liaw, Eric Liang, Robert Nishihara, Philipp Moritz, Joseph E Gonzalez, and Ion Stoica. 2018. Tune: A Research Platform for Distributed Model Selection and Training. arXiv preprint arXiv:1807.05118 (2018).

[17]

Moritz Meister, Sina Sheikholeslami, Robin Andersson, Alexandru A Ormenisan, and Jim Dowling. 2020. Towards Distribution Transparency for Supervised ML With Oblivious Training Functions. In Workshop on MLOps Systems.

[18]

Richard Meyes, Melanie Lu, Constantin Waubert de Puiseau, and Tobias Meisen. 2019. Ablation Studies in Artificial Neural Networks. arXiv preprint arXiv:1901.08644 (2019).

[19]

Philipp Moritz, Robert Nishihara, Stephanie Wang, Alexey Tumanov, Richard Liaw, Eric Liang, Melih Elibol, Zongheng Yang, William Paul, Michael I Jordan, et al. 2018. Ray: A Distributed Framework for Emerging AI Applications. In 13th {USENIX} Symposium on Operating Systems Design and Implementation ({OSDI} 18). 561--577.

Digital Library

[20]

Tom O'Malley, Elie Bursztein, James Long, François Chollet, Haifeng Jin, Luca Invernizzi, et al. 2019. Keras Tuner. Retrieved May 21, 2020 from https://github.com/keras-team/keras-tuner

[21]

Mohammad Pezeshki, Linxi Fan, Philemon Brakel, Aaron Courville, and Yoshua Bengio. 2016. Deconstructing the Ladder Network Architecture. In International Conference on Machine Learning. 2368--2376.

[22]

Lutz Prechelt. 1998. Early stopping-but when? In Neural Networks: Tricks of the trade. Springer, 55--69.

[23]

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P Adams, and Nando De Freitas. 2015. Taking the Human Out of the Loop: A Review of Bayesian Optimization. Proc. IEEE 104, 1 (2015), 148--175.

[24]

Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv preprint arXiv:1708.07747 (2017).

[25]

Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave, Justin Ma, Murphy McCauly, Michael J Franklin, Scott Shenker, and Ion Stoica. 2012. Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing. In Presented as part of the 9th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 12). 15--28.

[26]

Matei Zaharia, Mosharaf Chowdhury, Michael J Franklin, Scott Shenker, Ion Stoica, et al. 2010. Spark: Cluster computing with working sets. HotCloud 10, 10--10 (2010), 95.

Digital Library

Cited By

Öztürk M(2024)MFRLMO: Model-free reinforcement learning for multi-objective optimization of apache sparkICST Transactions on Scalable Information Systems10.4108/eetsis.476411:5Online publication date: 20-Feb-2024
https://doi.org/10.4108/eetsis.4764
Öztürk M(2023)Tuning parameters of Apache Spark with Gauss–Pareto-based multi-objective optimizationKnowledge and Information Systems10.1007/s10115-023-02032-z66:2(1065-1090)Online publication date: 13-Dec-2023
https://doi.org/10.1007/s10115-023-02032-z
Hagos DKakantousis TSheikholeslami SWang TVlassov VPayberah AMeister MAndersson RDowling J(2022)Scalable Artificial Intelligence for Earth Observation Data Using HopsworksRemote Sensing10.3390/rs1408188914:8(1889)Online publication date: 14-Apr-2022
https://doi.org/10.3390/rs14081889
Show More Cited By

Index Terms

Maggy: Scalable Asynchronous Parallel Hyperparameter Search
1. Computing methodologies

Recommendations

Anatomy of machine learning algorithm implementations in MPI, Spark, and Flink

With the ever-increasing need to analyze large amounts of data to get useful insights, it is essential to develop complex parallel machine learning algorithms that can scale with data and number of parallel processes. These algorithms need to run on ...
Heterogeneous Big Data Parallel Computing Optimization Model using MPI/OpenMP Hybrid and Sensor Networks
For the heterogeneous big data parallel computing model, two levels of parallelism between nodes are not considered, resulting in low efficiency of heterogeneous big data parallel computing and bandwidth to send and receive information, high communication ...
Challenges for MapReduce in Big Data
SERVICES '14: Proceedings of the 2014 IEEE World Congress on Services

In the Big Data community, MapReduce has been seen as one of the key enabling approaches for meeting continuously increasing demands on computing resources imposed by massive data sets. The reason for this is the high scalability of the MapReduce ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

DistributedML'20: Proceedings of the 1st Workshop on Distributed Machine Learning

December 2020

46 pages

ISBN:9781450381826

DOI:10.1145/3426745

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGCOMM: ACM Special Interest Group on Data Communication

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 December 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Conference

CoNEXT '20

Sponsor:

SIGCOMM

CoNEXT '20: The 16th International Conference on emerging Networking EXperiments and Technologies

December 1, 2020

Barcelona, Spain

Acceptance Rates

Overall Acceptance Rate 5 of 10 submissions, 50%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
177
Total Downloads

Downloads (Last 12 months)21
Downloads (Last 6 weeks)4

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Öztürk M(2024)MFRLMO: Model-free reinforcement learning for multi-objective optimization of apache sparkICST Transactions on Scalable Information Systems10.4108/eetsis.476411:5Online publication date: 20-Feb-2024
https://doi.org/10.4108/eetsis.4764
Öztürk M(2023)Tuning parameters of Apache Spark with Gauss–Pareto-based multi-objective optimizationKnowledge and Information Systems10.1007/s10115-023-02032-z66:2(1065-1090)Online publication date: 13-Dec-2023
https://doi.org/10.1007/s10115-023-02032-z
Hagos DKakantousis TSheikholeslami SWang TVlassov VPayberah AMeister MAndersson RDowling J(2022)Scalable Artificial Intelligence for Earth Observation Data Using HopsworksRemote Sensing10.3390/rs1408188914:8(1889)Online publication date: 14-Apr-2022
https://doi.org/10.3390/rs14081889
Wang TPayberah AHagos DVlassov V(2022)Accelerate Model Parallel Deep Learning Training Using Effective Graph Traversal Order in Device PlacementDistributed Applications and Interoperable Systems10.1007/978-3-031-16092-9_8(114-130)Online publication date: 6-Sep-2022
https://doi.org/10.1007/978-3-031-16092-9_8
Sheikholeslami SMeister MWang TPayberah AVlassov VDowling J(2021)AutoAblationProceedings of the 1st Workshop on Machine Learning and Systems10.1145/3437984.3458834(55-61)Online publication date: 26-Apr-2021
https://dl.acm.org/doi/10.1145/3437984.3458834
Hagos DKakantousis TVlassov VSheikholeslami SWang TDowling JParis CMarinelli DWeikmann GBruzzone LKhaleghian SKraemer TEltoft TMarinoni APantazi DStamoulis GBilidas DPapadakis GMandilaras GKoubarakis MTroumpoukis AKonstantopoulos SMuerth MAppel FFleming ACziferszky A(2021)ExtremeEarth Meets Satellite Data From SpaceIEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing10.1109/JSTARS.2021.310798214(9038-9063)Online publication date: 2021
https://doi.org/10.1109/JSTARS.2021.3107982
Alemany SNucciarone JPissinou N(2021)Jespipe: A Plugin-Based, Open MPI Framework for Adversarial Machine Learning Analysis2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671385(3663-3670)Online publication date: 15-Dec-2021
https://doi.org/10.1109/BigData52589.2021.9671385
Sunneborn Gudnadottir OGedon DDesmarais CBengtsson Bernander KSainudiin RGonzalez Suarez R(2021)Distributed training and scalability for the particle clustering method UClusterEPJ Web of Conferences10.1051/epjconf/202125102054251(02054)Online publication date: 23-Aug-2021
https://doi.org/10.1051/epjconf/202125102054

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents