Article

Bayesian Optimization with a Prior for the Optimum

Authors:

Leonardo B. Oliveira,

Kunle Olukotun,

Marius Lindauer,

Frank HutterAuthors Info & Claims

Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part III

Pages 265 - 296

https://doi.org/10.1007/978-3-030-86523-8_17

Published: 13 September 2021 Publication History

Abstract

While Bayesian Optimization (BO) is a very popular method for optimizing expensive black-box functions, it fails to leverage the experience of domain experts. This causes BO to waste function evaluations on bad design choices (e.g., machine learning hyperparameters) that the expert already knows to work poorly. To address this issue, we introduce Bayesian Optimization with a Prior for the Optimum (BOPrO). BOPrO allows users to inject their knowledge into the optimization process in the form of priors about which parts of the input space will yield the best performance, rather than BO’s standard priors over functions, which are much less intuitive for users. BOPrO then combines these priors with BO’s standard probabilistic model to form a pseudo-posterior used to select which points to evaluate next. We show that BOPrO is around

6.67 \times

faster than state-of-the-art methods on a common suite of benchmarks, and achieves a new state-of-the-art performance on a real-world hardware design application. We also show that BOPrO converges faster even if the priors for the optimum are not entirely accurate and that it robustly recovers from misleading priors.

References

[1]

Balandat, M., et al.: BoTorch: a framework for efficient Monte-Carlo Bayesian optimization. In: Advances in Neural Information Processing Systems (2020)

[2]

Bergstra, J., Yamins, D., Cox, D.D.: Making a science of model search: hyperparameter optimization in hundreds of dimensions for vision architectures. In: International Conference on Machine Learning (2013)

[3]

Bergstra, J.S., Bardenet, R., Bengio, Y., Kégl, B.: Algorithms for hyper-parameter optimization. In: Advances in Neural Information Processing Systems (2011)

[4]

Bouthillier, X., Varoquaux, G.: Survey of machine-learning experimental methods at NeurIPS2019 and ICLR2020. Research report, Inria Saclay Ile de France (January 2020). https://hal.archives-ouvertes.fr/hal-02447823

[5]

Calandra R, Seyfarth A, Peters J, and Deisenroth MP Bayesian optimization for learning gaits under uncertainty Ann. Math. Artif. Intell. 2016 76 1–2 5-23

Digital Library

[6]

Chen, Y., Huang, A., Wang, Z., Antonoglou, I., Schrittwieser, J., Silver, D., de Freitas, N.: Bayesian optimization in AlphaGo. CoRR abs/1812.06855 (2018)

[7]

Clarke, A., McMahon, B., Menon, P., Patel, K.: Optimizing hyperparams for image datasets in Fastai (2020). https://www.platform.ai/post/optimizing-hyperparams-for-image-datasets-in-fastai

[8]

Dixon, L.C.W.: The global optimization problem: an introduction. In: Toward Global Optimization 2, pp. 1–15 (1978)

[9]

Eriksson, D., Pearce, M., Gardner, J.R., Turner, R., Poloczek, M.: Scalable global optimization via local Bayesian optimization. In: Advances in Neural Information Processing Systems (2019)

[10]

Falkner, S., Klein, A., Hutter, F.: BOHB: robust and efficient hyperparameter optimization at scale. In: International Conference on Machine Learning (2018)

[11]

Feurer, M., Springenberg, J.T., Hutter, F.: Initializing bayesian hyperparameter optimization via meta-learning. In: AAAI Conference on Artificial Intelligence (2015)

[12]

Gardner, J.R., Kusner, M.J., Xu, Z.E., Weinberger, K.Q., Cunningham, J.P.: Bayesian optimization with inequality constraints. In: International Conference on Machine Learning (ICML) (2014)

[13]

Golovin, D., Solnik, B., Moitra, S., Kochanski, G., Karro, J., Sculley, D.: Google Vizier: a service for black-box optimization. In: SIGKDD International Conference on Knowledge Discovery and Data Mining (2017)

[14]

GPy: GPy: a Gaussian process framework in Python (since 2012). http://github.com/SheffieldML/GPy

[15]

Hansen, N., Ostermeier, A.: Adapting arbitrary normal mutation distributions in evolution strategies: the covariance matrix adaptation. In: Proceedings of IEEE International Conference on Evolutionary Computation (1996)

[16]

Hansen, N., Akimoto, Y., Baudis, P.: CMA-ES/pycma on GitHub

[17]

Hernández-Lobato, J.M., Hoffman, M.W., Ghahramani, Z.: Predictive entropy search for efficient global optimization of black-box functions. In: Advances in Neural Information Processing Systems (2014)

[18]

Hutter F, Xu L, Hoos H, and Leyton-Brown K Algorithm runtime prediction: methods & evaluation Artif. Intell. 2014 206 79-111

Digital Library

[19]

Hutter, F., Hoos, H.H., Leyton-Brown, K.: Sequential model-based optimization for general algorithm configuration. In: Learning and Intelligent Optimization Conference (2011)

[20]

Hutter F, Kotthoff L, and Vanschoren J Automated Machine Learning: Methods, Systems, Challenges 2019 Cham Springer

[21]

Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. In: International Conference on Learning Representations (2015)

[22]

Klein, A., Dai, Z., Hutter, F., Lawrence, N.D., Gonzalez, J.: Meta-surrogate benchmarking for hyperparameter optimization. In: Advances in Neural Information Processing Systems (2019)

[23]

Koeplinger, D., et al.: Spatial: a language and compiler for application accelerators. In: SIGPLAN Conference on Programming Language Design and Implementation (2018)

[24]

Kushner HJ A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise J. Basic Eng. 1964 86 1 97-106

[25]

Li, C., Gupta, S., Rana, S., Nguyen, V., Robles-Kelly, A., Venkatesh, S.: Incorporating expert prior knowledge into experimental design via posterior sampling. arXiv preprint arXiv:2002.11256 (2020)

[26]

Lindauer, M., Eggensperger, K., Feurer, M., Falkner, S., Biedenkapp, A., Hutter, F.: SMAC v3: algorithm configuration in Python (2017). https://github.com/automl/SMAC3

[27]

Lindauer, M., Hutter, F.: Warmstarting of model-based algorithm configuration. In: AAAI Conference on Artificial Intelligence (2018)

[28]

López-Ibáñez M, Dubois-Lacoste J, Pérez Cáceres L, Stützle T, and Birattari M The irace package: iterated racing for automatic algorithm configuration Oper. Res. Perspect. 2016 3 43-58

[29]

Mockus, J., Tiesis, V., Zilinskas, A.: The application of Bayesian methods for seeking the extremum. In: Towards Global Optimization 2, pp. 117–129 (1978)

[30]

Nardi, L., Bodin, B., Saeedi, S., Vespa, E., Davison, A.J., Kelly, P.H.: Algorithmic performance-accuracy trade-off in 3d vision applications using hypermapper. In: International Parallel and Distributed Processing Symposium Workshops (2017)

[31]

Nardi, L., Koeplinger, D., Olukotun, K.: Practical design space exploration. In: International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (2019)

[32]

Neal, R.M.: Bayesian Learning for Neural Networks, vol. 118. Springer, New York (1996).

[33]

Oh, C., Gavves, E., Welling, M.: BOCK: Bayesian optimization with cylindrical kernels. In: International Conference on Machine Learning (2018)

[34]

Paleyes, A., Pullin, M., Mahsereci, M., Lawrence, N., González, J.: Emulation of physical processes with Emukit. In: Workshop on Machine Learning and the Physical Sciences, NeurIPS (2019)

[35]

Pedregosa F et al. Scikit-learn: machine learning in Python J. Mach. Learn. Res. 2011 12 2825-2830

Digital Library

[36]

Perrone, V., Shen, H., Seeger, M., Archambeau, C., Jenatton, R.: Learning search spaces for Bayesian optimization: another view of hyperparameter transfer learning. In: Advances in Neural Information Processing Systems (2019)

[37]

Ramachandran A, Gupta S, Rana S, Li C, and Venkatesh S Incorporating expert prior in Bayesian optimisation via space warping Knowl. Based Syst. 2020 195 105663

[38]

Shahriari, B., Bouchard-Côté, A., Freitas, N.: Unbounded Bayesian optimization via regularization. In: Artificial Intelligence and Statistics. pp. 1168–1176 (2016)

[39]

Shahriari B, Swersky K, Wang Z, Adams RP, and De Freitas N Taking the human out of the loop: a review of Bayesian optimization Proc. IEEE 2015 104 1 148-175

[40]

Siivola, E., Vehtari, A., Vanhatalo, J., González, J., Andersen, M.R.: Correcting boundary over-exploration deficiencies in Bayesian optimization with virtual derivative sign observations. In: International Workshop on Machine Learning for Signal Processing (2018)

[41]

Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. In: Advances in Neural Information Processing Systems (2012)

[42]

Srinivas, N., Krause, A., Kakade, S.M., Seeger, M.W.: Gaussian process optimization in the bandit setting: no regret and experimental design. In: International Conference on Machine Learning (2010)

[43]

Hutter F, Kotthoff L, and Vanschoren J Automated Machine Learning 2019 Cham Springer

[44]

Wang, Q., et al.: ATMSeer: increasing transparency and controllability in automated machine learning. In: CHI Conference on Human Factors in Computing Systems (2019)

[45]

Wu, J., Poloczek, M., Wilson, A.G., Frazier, P.I.: Bayesian optimization with gradients. In: Advances in Neural Information Processing Systems 30 (2017)

Cited By

Woźnica KGrzyb MTrafas ZBiecek P(2024)Consolidated learning: a domain-specific model-free optimization strategy with validation on metaMIMIC benchmarksMachine Language10.1007/s10994-023-06359-0113:7(4925-4949)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s10994-023-06359-0
Müller SFeurer MHollmann NHutter FKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)PFNs4BOProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619464(25444-25470)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619464
Hellsten ESouza ALenfers JLacouture RHsu OEjjeh AKjolstad FSteuwer MOlukotun KNardi LAamodt TSwift MJerger N(2023)BaCO: A Fast and Portable Bayesian Compiler Optimization FrameworkProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624770(19-42)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624770
Show More Cited By

Recommendations

Prior specification for Bayesian matrix factorization via prior predictive matching

The behavior of many Bayesian models used in machine learning critically depends on the choice of prior distributions, controlled by some hyperparameters typically selected through Bayesian optimization or cross-validation. This requires repeated, costly,...
Prior hyperparameters in Bayesian PCA
ICANN/ICONIP'03: Proceedings of the 2003 joint international conference on Artificial neural networks and neural information processing

Bayesian PCA (BPCA) provides a Bayes inference for probabilistic PCA, in which several prior distributions have been devised; for example, automatic relevance determination (ARD) is used for determining the dimensionality. However, there is ...
How Bayesian should Bayesian optimisation be?
GECCO '21: Proceedings of the Genetic and Evolutionary Computation Conference Companion

Bayesian optimisation (BO) uses probabilistic surrogate models - usually Gaussian processes (GPs) - for the optimisation of expensive black-box functions. At each BO iteration, the GP hyperparameters are fit to previously-evaluated data by maximising ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

Machine Learning and Knowledge Discovery in Databases. Research Track: European Conference, ECML PKDD 2021, Bilbao, Spain, September 13–17, 2021, Proceedings, Part III

Sep 2021

856 pages

ISBN:978-3-030-86522-1

DOI:10.1007/978-3-030-86523-8

Editors:
Nuria Oliver
ELLIS - The European Laboratory for Learning and Intelligent Systems, Alicante, Spain
,
Fernando Pérez-Cruz
ETHZ and EPFL, Zürich, Switzerland
,
Stefan Kramer
Johannes Gutenberg University of Mainz, Mainz, Germany
,
Jesse Read
École Polytechnique, Palaiseau, France
,
Jose A. Lozano
Basque Center for Applied Mathematics, Bilbao, Spain

© Springer Nature Switzerland AG 2021.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 13 September 2021

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Woźnica KGrzyb MTrafas ZBiecek P(2024)Consolidated learning: a domain-specific model-free optimization strategy with validation on metaMIMIC benchmarksMachine Language10.1007/s10994-023-06359-0113:7(4925-4949)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1007/s10994-023-06359-0
Müller SFeurer MHollmann NHutter FKrause ABrunskill ECho KEngelhardt BSabato SScarlett J(2023)PFNs4BOProceedings of the 40th International Conference on Machine Learning10.5555/3618408.3619464(25444-25470)Online publication date: 23-Jul-2023
https://dl.acm.org/doi/10.5555/3618408.3619464
Hellsten ESouza ALenfers JLacouture RHsu OEjjeh AKjolstad FSteuwer MOlukotun KNardi LAamodt TSwift MJerger N(2023)BaCO: A Fast and Portable Bayesian Compiler Optimization FrameworkProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 410.1145/3623278.3624770(19-42)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3623278.3624770
Swamy TZulfiqar ANardi LShahbaz MOlukotun KAamodt TJerger NSwift M(2023)Homunculus: Auto-Generating Efficient Data-Plane ML Pipelines for Datacenter NetworksProceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 310.1145/3582016.3582022(329-342)Online publication date: 25-Mar-2023
https://dl.acm.org/doi/10.1145/3582016.3582022
Mayr MHvarfner CChatzilygeroudis KNardi LKrueger V(2022)Learning Skill-based Industrial Robot Tasks with User Priors2022 IEEE 18th International Conference on Automation Science and Engineering (CASE)10.1109/CASE49997.2022.9926713(1485-1492)Online publication date: 20-Aug-2022
https://dl.acm.org/doi/10.1109/CASE49997.2022.9926713

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents