Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Analysing the Overfit of the Auto-sklearn Automated Machine Learning Tool

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2019)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 11943))

Abstract

With the ever-increasing number of pre-processing and classification algorithms, manually selecting the best algorithm and their best hyper-parameter settings (i.e. the best classification workflow) is a daunting task. Automated Machine Learning (Auto-ML) methods have been recently proposed to tackle this issue. Auto-ML tools aim to automatically choose the best classification workflow for a given dataset. In this work we analyse the predictive accuracy and overfit of the state-of-the-art auto-sklearn tool, which iteratively builds a classification ensemble optimised for the user’s dataset. This work has 3 contributions. First, we measure 3 types of auto-sklearn’s overfit, involving the differences of predictive accuracies measured on different data subsets: two parts of the training set (for learning and internal validation of the model) and the hold-out test set used for final evaluation. Second, we analyse the distribution of types of classification models selected by auto-sklearn across all 17 datasets. Third, we measure correlations between predictive accuracies on different data subsets and different types of overfitting. Overall, substantial degrees of overfitting were found in several datasets, and decision tree ensembles were the most frequently selected types of models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

References

  1. Feurer, M., Eggensperger, K., Falkner, S., Lindauer, M., Hutter, F.: Practical automated machine learning for the automl challenge 2018. In: International Workshop on Automatic Machine Learning, ICML 2018, pp. 1–12 (2018)

    Google Scholar 

  2. Feurer, M., Hutter, F.: Towards further automation in AutoML. In: ICML AutoML Workshop, p. 13 (2018)

    Google Scholar 

  3. Feurer, M., Klein, A., Eggensperger, K., Springenberg, J., Blum, M., Hutter, F.: Efficient and robust automated machine learning. In: Advances in Neural Information Processing Systems, vol. 28, pp. 2962–2970. Curran Associates, Inc. (2015)

    Google Scholar 

  4. Guyon, I., et al.: A brief review of the ChaLearn AutoML challenge: any-time any-dataset learning without human intervention. In: Workshop on Automatic Machine Learning, pp. 21–30 (2016)

    Google Scholar 

  5. Japkowicz, N., Shah, M.: Evaluating Learning Algorithms A Classification Perspective. Cambridge University Press, Cambridge (2011)

    Book  Google Scholar 

  6. Kordík, P., Černỳ, J., Frỳda, T.: Discovering predictive ensembles for transfer learning and meta-learning. Mach. Learn. 107(1), 177–207 (2018)

    Article  MathSciNet  Google Scholar 

  7. Kotthoff, L., Thornton, C., Hoos, H.H., Hutter, F., Leyton-Brown, K.: Auto-WEKA 2.0: automatic model selection and hyperparameter optimization in WEKA. J. Mach. Learn. Res. 18(1), 826–830 (2017)

    MathSciNet  MATH  Google Scholar 

  8. Mohr, F., Wever, M., Hüllermeier, E.: Ml-Plan: automated machine learning via hierarchical planning. Mach. Learn. 107(8–10), 1495–1515 (2018)

    Article  MathSciNet  Google Scholar 

  9. Yao, Q., et al.: Taking human out of learning applications: a survey on automated machine learning. CoRR abs/1810.13306 (2018)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fabio Fabris .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Fabris, F., Freitas, A.A. (2019). Analysing the Overfit of the Auto-sklearn Automated Machine Learning Tool. In: Nicosia, G., Pardalos, P., Umeton, R., Giuffrida, G., Sciacca, V. (eds) Machine Learning, Optimization, and Data Science. LOD 2019. Lecture Notes in Computer Science(), vol 11943. Springer, Cham. https://doi.org/10.1007/978-3-030-37599-7_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-37599-7_42

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-37598-0

  • Online ISBN: 978-3-030-37599-7

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics