Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

HyperNOMAD: Hyperparameter Optimization of Deep Neural Networks Using Mesh Adaptive Direct Search

Published: 26 June 2021 Publication History

Abstract

The performance of deep neural networks is highly sensitive to the choice of the hyperparameters that define the structure of the network and the learning process. When facing a new application, tuning a deep neural network is a tedious and time-consuming process that is often described as a “dark art.” This explains the necessity of automating the calibration of these hyperparameters. Derivative-free optimization is a field that develops methods designed to optimize time-consuming functions without relying on derivatives. This work introduces the HyperNOMAD package, an extension of the NOMAD software that applies the MADS algorithm [7] to simultaneously tune the hyperparameters responsible for both the architecture and the learning process of a deep neural network (DNN). This generic approach allows for an important flexibility in the exploration of the search space by taking advantage of categorical variables. HyperNOMAD is tested on the MNIST, Fashion-MNIST, and CIFAR-10 datasets and achieves results comparable to the current state of the art.

References

[1]
M. A. Abramson. 2004. Mixed variable optimization of a load-bearing thermal insulation system using a filter pattern search algorithm. Optimization and Engineering 5, 2 (2004), 157–177.
[2]
M. A. Abramson, C. Audet, J. W. Chrissis, and J. G. Walston. 2009. Mesh adaptive direct search algorithms for mixed variable optimization. Optimization Letters 3, 1 (2009), 35–47.
[3]
M. A. Abramson, C. Audet, and J. E. Dennis Jr. 2007. Filter pattern search algorithms for mixed variable constrained optimization problems. Pacific Journal of Optimization 3, 3 (2007), 477–500. http://www.ybook.co.jp/online/pjoe/vol3/pjov3n3p477.html.
[4]
C. Audet, V. Béchard, and S. Le Digabel. 2008. Nonsmooth optimization through mesh adaptive direct search and variable neighborhood search. Journal of Global Optimization 41, 2 (2008), 299–318.
[5]
C. Audet, C.-K. Dang, and D. Orban. 2014. Optimization of algorithms with OPAL. Mathematical Programming Computation 6, 3 (2014), 233–254.
[6]
C. Audet and J. E. Dennis Jr.2001. Pattern search algorithms for mixed variable programming. SIAM Journal on Optimization 11, 3 (2001), 573–594. http://link.aip.org/link/?SJE/11/573/1
[7]
C. Audet and J. E. Dennis Jr.2006. Mesh adaptive direct search algorithms for constrained optimization. SIAM Journal on Optimization 17, 1 (2006), 188–217.
[8]
C. Audet and W. Hare. 2017. Derivative-Free and Blackbox Optimization. Springer International Publishing, Cham, Switzerland.
[9]
C. Audet, S. Le Digabel, and C. Tribes. 2019. The mesh adaptive direct search algorithm for granular and discrete variables. SIAM Journal on Optimization 29, 2 (2019), 1164–1189.
[10]
C. Audet and D. Orban. 2006. Finding optimal algorithmic parameters using derivative-free optimization. SIAM Journal on Optimization 17, 3 (2006), 642–664. http://dx.doi.org/
[11]
C. Audet and C. Tribes. 2018. Mesh-based Nelder-Mead algorithm for inequality constrained optimization. Computational Optimization and Applications 71, 2 (2018), 331–352.
[12]
B. Baker, O. Gupta, N. Naik, and R. Raskar. 2016. Designing Neural Network Architectures Using Reinforcement Learning. Technical Report. arXiv. http://arxiv.org/abs/1611.02167
[13]
P. Balaprakash, M. Salim, T. Uram, V. Vishwanath, and S. Wild. 2018. DeepHyper: Asynchronous hyperparameter search for deep neural networks. In 2018 IEEE 25th International Conference on High Performance Computing (HiPC’18). IEEE, Bengaluru, India, 42–51.
[14]
Y. Bengio. 2012. Practical recommendations for gradient-based training of deep architectures. In Neural Networks: Tricks of the Trade. Springer, Berlin, 437–478.
[15]
J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. 2011. Algorithms for hyper-parameter optimization. In Advances in Neural Information Processing Systems. Curran Associates Inc., Red Hook, NY, 2546–2554.
[16]
J. Bergstra and Y. Bengio. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Research 13 (2012), 281–305.
[17]
J. Bergstra, D. Yamins, and D. D. Cox. 2013. Making a science of model search: Hyperparameter optimization in hundreds of dimensions for vision architectures. In Proceedings of the 30th International Conference on International Conference on Machine Learning (ICML’13), Vol. 28. JMLR.org, Atlanta, GA, I–115–I–123. http://dl.acm.org/citation.cfm?id=3042817.3042832
[18]
T. Bosc. 2016. Learning to Learn Neural Networks. Technical Report. arXiv. http://arxiv.org/abs/1610.06072
[19]
L. Bottou. 2012. Stochastic Gradient Descent Tricks. Lecture Notes in Computer Science (LNCS), Vol. 7700. Springer, Berlin, 430–445. https://www.microsoft.com/en-us/research/publication/stochastic-gradient-tricks/
[20]
X. Bouthillier, C. Tsirigotis, F. Corneau-Tremblay, P. Delaunay, R. Askari, D. Suhubdy, M. Noukhovitch, D. Serdyuk, A. Bergeron, P. Henderson, P. Lamblin, M. Bronzi, and C. Beckham. 2019. Oríon - Asynchronous Distributed Hyperparameter Optimization. Retrieved September 19, 2020, from https://github.com/Epistimio/orion.
[21]
A. R Conn, K. Scheinberg, and L. N. Vicente. 2009. Global convergence of general derivative-free trust-region algorithms to first and second order critical points. SIAM Journal on Optimization 20, 1 (2009), 387–415.
[22]
A. Deshpande. 2019. A Beginner’s Guide to Understanding Convolutional Neural Networks. https://adesh-pande3.github.io/adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neural-Networks. https://adeshpande3.github.io/adeshpande3.github.io/A-Beginner’s-Guide-To-Understanding-Convolutional-Neur-al-Networks
[23]
G. Diaz, A. Fokoue, G. Nannicini, and H. Samulowitz. 2017. An effective algorithm for hyperparameter optimization of neural networks. IBM Journal of Research and Development 61, 4 (2017), 9:1–9:11.
[24]
J. Duchi, E. Hazan, and Y. Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. Journal of Machine Learning Research 12 (2011), 2121–2159.
[25]
T. Elsken, J. H. Metzen, and F. Hutter. 2018. Neural Architecture Search: A Survey. Technical Report. arXiv. http://arxiv.org/abs/1808.05377
[26]
T. Elsken, J. H. Metzen, and F. Hutter. 2019. Efficient Multi-Objective Neural Architecture Search via Lamarckian Evolution. Technical Report. International Conference on Learning Representations, New Orleans, LA. https://openreview.net/forum?id=ByME42AqK7
[27]
Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. Efficient and robust automated machine learning. In Advances in Neural Information Processing Systems 28, C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett (Eds.). Curran Associates, Inc., Montreal, Canada, 2962–2970. http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf.
[28]
H. Ghanbari and K. Scheinberg. 2017. Black-Box Optimization in Machine Learning with Trust Region Based Derivative Free Algorithm. Technical Report. arXiv. http://arxiv.org/abs/1703.06925
[29]
D. Golovin, B. Solnik, S. Moitra, G. Kochanski, J. Karro, and D. Sculley. 2017. Google Vizier: A service for black-box optimization. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, Association for Computing Machinery, New York, NY, 1487–1495.
[30]
M. Hassan. 2019. VGG16: Convolutional Network for Classification and Detection. https://neurohive.io/en/popular-networks/vgg16/.
[31]
R. Hooke and T. A. Jeeves. 1961. “Direct search” solution of numerical and statistical problems. Journal of the Association for Computing Machinery 8, 2 (1961), 212–229.
[32]
F. Hutter, H. H. Hoos, and K. Leyton-Brown. 2011. Sequential model-based optimization for general algorithm configuration. In International Conference on Learning and Intelligent Optimization. Springer, Berlin, 507–523.
[33]
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM International Conference on Multimedia. ACM, Association for Computing Machinery, New York, NY, 675–678.
[34]
D. P. Kingma and L. B. Jimmy. 2015. Adam: A Method for Stochastic Optimization. Technical Report. arXiv. https://arxiv.org/abs/1412.6980
[35]
M. Kokkolaras, C. Audet, and J. E. Dennis Jr.2001. Mixed variable optimization of the number and composition of heat intercepts in a thermal insulation system. Optimization and Engineering 2, 1 (2001), 5–29.
[36]
A. Krizhevsky and G. Hinton. 2009. Learning Multiple Layers of Features from Tiny Images. Technical Report. Citeseer.
[37]
S. Le Digabel. 2011. Algorithm 909: NOMAD: Nonlinear optimization with the MADS algorithm. ACM Transactions on Mathematical Software 37, 4 (2011), 44:1–44:15.
[38]
S. Le Digabel and S. M. Wild. 2015. A Taxonomy of Constraints in Simulation-Based Optimization. Technical Report G-2015-57. Les cahiers du GERAD. http://www.optimization-online.org/DB_HTML/2015/05/4931.html
[39]
Y. A. LeCun, L. Bottou, G. B. Orr, and K. R. Müller. 2012. Efficient BackProp. Springer, Berlin,9–48.
[40]
Y. LeCun and C. Cortes. 2010. MNIST handwritten digit database. http://yann.lecun.com/exdb/mnist/
[41]
S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen. 2018. Learning hand-eye coordination for robotic grasping with deep learning and large-scale data collection. International Journal of Robotics Research 37, 4–5 (2018), 421–436.
[42]
L. Li, K. Jamieson, G. DeSalvo, A. Rostamizadeh, and A. Talwalkar. 2018. Hyperband: A novel bandit-based approach to hyperparameter optimization. Journal of Machine Learning Research 18 (2018), 1–52.
[43]
G. Litjens, T. Kooi, B. E. Bejnordi, A. A. A. Setio, F. Ciompi, M. Ghafoorian, J. A. Van Der Laak, G. Van Bram, and C. L. Sánchez. 2017. A survey on deep learning in medical image analysis. Medical Image Analysis 42 (2017), 60–88.
[44]
J. Liu, N. Ploskas, and N. V. Sahinidis. 2018. Tuning BARON using derivative-free optimization algorithms. Journal of Global Optimization 74, 4 (2018), 611–637.
[45]
P. R. Lorenzo, J. Nalepa, M. Kawulok, L. S. Ramos, and J. R. Pastor. 2017. Particle swarm optimization for hyper-parameter selection in deep neural networks. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, Association for Computing Machinery, New York, NY, 481–488.
[46]
I. Loshchilov and F. Hutter. 2016. CMA-ES for Hyperparameter Optimization of Deep Neural Networks. Technical Report. arXiv. http://arxiv.org/abs/1604.07269
[47]
A. R. Mello, J. de Matos, M. R. Stemmer, A. de Souza Britto Jr, and A. L. Koerich. 2019. A Novel Orthogonal Direction Mesh Adaptive Direct Search Approach for SVM Hyperparameter Tuning. Technical Report. arXiv. http://arxiv.org/abs/1904.11649
[48]
A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, and S. Chintala. 2019. PyTorch: An imperative style, high-performance deep learning library. In Advances in Neural Information Processing Systems 32, H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett (Eds.). Curran Associates, Inc., New York, NY, 8024–8035. http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
[49]
V. Pavlovsky. 2019. Introduction to Convolutional Neural Networks. https://www.vaetas.cz/posts/intro-convolutional-neural-networks.
[50]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830.
[51]
M. Porcelli and Ph.L. Toint. 2017. BFO, A trainable derivative-free brute force optimizer for nonlinear bound-constrained optimization and equilibrium computations with continuous and discrete variables. ACM Transactions on Mathematical Software 44, 1 (2017), 6:1–6:25.
[52]
M. J. D. Powell. 2009. The BOBYQA Algorithm for Bound Constrained Optimization without Derivatives. Technical Report DAMTP 2009/NA06. Department of Applied Mathematics and Theoretical Physics, University of Cambridge, Cambridge, England. http://www.damtp.cam.ac.uk/user/na/NA_papers/NA2009_06.pdf.
[53]
E. Real, A. Aggarwal, Y. Huang, and Q. V. Le. 2018. Regularized Evolution for Image Classifier Architecture Search. Technical Report. arXiv. http://arxiv.org/abs/1802.01548
[54]
K. Simonyan and A. Zisserman. 2014. Very Deep Convolutional Networks for Large-Scale Image Recognition. Technical Report. arXiv. http://arxiv.org/abs/1409.1556
[55]
S. C. Smithson, G. Yang, W. J. Gross, and B. H. Meyer. 2016. Neural networks designing neural networks: Multi-objective hyper-parameter optimization. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’16). IEEE, Association for Computing Machinery, New York, NY, 1–8.
[56]
J. Snoek, H. Larochelle, and R. Prescott Adams. 2012. Practical Bayesian optimization of machine learning algorithms. In Advances in Neural Information Processing Systems (NIPS’12) 25. Curran Associates Inc., Red Hook, NY, 2960–2968. https://dash.harvard.edu/handle/1/11708816
[57]
M. Suganuma, S. Shirakawa, and T. Nagao. 2017. A genetic programming approach to designing convolutional neural network architectures. In Proceedings of the Genetic and Evolutionary Computation Conference. ACM, International Joint Conferences on Artificial Intelligence Organization, 497–504.
[58]
T. Tieleman and G. Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning. 26–31 pages. https://www.cs.toronto.edu/ tijmen/csc321/slides/lecture_slides_lec6.pdf.
[59]
V. Torczon. 1997. On the convergence of pattern search algorithms. SIAM Journal on Optimization 7, 1 (1997), 1–25.
[60]
M. Wistuba, N. Schilling, and L. Schmidt-Thieme. 2018. Scalable Gaussian process-based transfer surrogates for hyperparameter optimization. Machine Learning 107, 1 (2018), 43–78.
[61]
H. Xiao, K. Rasul, and R. Vollgraf. 2017. Fashion-MNIST: A Novel Image Dataset for Benchmarking Machine Learning Algorithms. arXiv:cs.LG/cs.LG/1708.07747
[62]
Yelp. 2014. Metric Optimization Engine. https://github.com/Yelp/MOE.
[63]
S. R. Young, D. C. Rose, T. P. Karnowski, S. H. Lim, and R. M. Patton. 2015. Optimizing deep learning hyper-parameters through an evolutionary algorithm. In Proceedings of the Workshop on Machine Learning in High-Performance Computing Environments. ACM, Association for Computing Machinery, New York, NY, 1–5.
[64]
A. Zela, A. Klein, S. Falkner, and F. Hutter. 2018. Towards Automated Deep Learning: Efficient Joint Neural Architecture and Hyperparameter Search. Technical Report. arXiv. http://arxiv.org/abs/1807.06906
[65]
B. Zoph and Q. V. Le. 2016. Neural Architecture Search with Reinforcement Learning. Technical Report. arXiv. http://arxiv.org/abs/1611.01578
[66]
B. Zoph, V. Vasudevan, J. Shlens, and Q. V. Le. 2018. Learning transferable architectures for scalable image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE, Salt Lake City, UT, 8697–8710.

Cited By

View all
  • (2024)Deep Learning and Face Recognition: Face Recognition Approach Based on the DS-CDCN AlgorithmApplied Sciences10.3390/app1413573914:13(5739)Online publication date: 1-Jul-2024
  • (2024)Optimizing MEMS Optical Arrays with Silicon Micro Mirrors using Machine Learning Techniques2024 3rd International Conference on Computational Modelling, Simulation and Optimization (ICCMSO)10.1109/ICCMSO61761.2024.00022(22-27)Online publication date: 14-Jun-2024
  • (2024)Hyperparameter elegance: fine-tuning text analysis with enhanced genetic algorithm hyperparameter landscapeKnowledge and Information Systems10.1007/s10115-024-02202-766:11(6761-6783)Online publication date: 1-Nov-2024
  • Show More Cited By

Index Terms

  1. HyperNOMAD: Hyperparameter Optimization of Deep Neural Networks Using Mesh Adaptive Direct Search

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Mathematical Software
      ACM Transactions on Mathematical Software  Volume 47, Issue 3
      September 2021
      251 pages
      ISSN:0098-3500
      EISSN:1557-7295
      DOI:10.1145/3472960
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 26 June 2021
      Accepted: 01 February 2021
      Revised: 01 February 2021
      Received: 01 June 2019
      Published in TOMS Volume 47, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Deep neural networks
      2. blackbox optimization
      3. categorical variables.
      4. derivative-free optimization
      5. hyperparameter optimization
      6. mesh adaptive direct search
      7. neural architecture search

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      • NSERC Alliance

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)48
      • Downloads (Last 6 weeks)4
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Deep Learning and Face Recognition: Face Recognition Approach Based on the DS-CDCN AlgorithmApplied Sciences10.3390/app1413573914:13(5739)Online publication date: 1-Jul-2024
      • (2024)Optimizing MEMS Optical Arrays with Silicon Micro Mirrors using Machine Learning Techniques2024 3rd International Conference on Computational Modelling, Simulation and Optimization (ICCMSO)10.1109/ICCMSO61761.2024.00022(22-27)Online publication date: 14-Jun-2024
      • (2024)Hyperparameter elegance: fine-tuning text analysis with enhanced genetic algorithm hyperparameter landscapeKnowledge and Information Systems10.1007/s10115-024-02202-766:11(6761-6783)Online publication date: 1-Nov-2024
      • (2024)A new multi-objective hyperparameter optimization algorithm for COVID-19 detection from x-ray imagesSoft Computing10.1007/s00500-024-09872-z28:19(11601-11617)Online publication date: 23-Jul-2024
      • (2024)Decentralized Diagnostics: The Role of Federated Learning in Modern Medical ImagingAdvances in Intelligent Disease Diagnosis and Treatment10.1007/978-3-031-65640-8_9(223-239)Online publication date: 18-Sep-2024
      • (2023)Computer-Aided Design of Hand-Drawn art Food Packaging Design Based on Deep Neural Network ModelApplied Mathematics and Nonlinear Sciences10.2478/amns.2023.1.003088:2(2043-2052)Online publication date: 6-Jun-2023
      • (2023)An improved hyperparameter optimization framework for AutoML systems using evolutionary algorithmsScientific Reports10.1038/s41598-023-32027-313:1Online publication date: 23-Mar-2023
      • (2023)A General Mathematical Framework for Constrained Mixed-variable Blackbox Optimization Problems with Meta and Categorical VariablesOperations Research Forum10.1007/s43069-022-00180-64:1Online publication date: 23-Feb-2023
      • (2023)Eight years of AutoML: categorisation, review and trendsKnowledge and Information Systems10.1007/s10115-023-01935-165:12(5097-5149)Online publication date: 8-Aug-2023
      • (2022)Weekly Nowcasting of New COVID-19 Cases Using Past Viral Load MeasurementsViruses10.3390/v1407141414:7(1414)Online publication date: 28-Jun-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      HTML Format

      View this article in HTML Format.

      HTML Format

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media