Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2001576.2001844acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Tuned data mining: a benchmark study on different tuners

Published: 12 July 2011 Publication History

Abstract

The complex, often redundant and noisy data in real-world data mining (DM) applications frequently lead to inferior results when out-of-the-box DM models are applied. A tuning of parameters is essential to achieve high-quality results. In this work we aim at tuning parameters of the preprocessing and the modeling phase conjointly. The framework TDM (Tuned Data Mining) was developed to facilitate the search for good parameters and the comparison of different tuners. It is shown that tuning is of great importance for high-quality results. Surrogate-model based tuning utilizing the Sequential Parameter Optimization Toolbox (SPOT) is compared with other tuners (CMA-ES, BFGS, LHD) and evidence is found that SPOT is well suited for this task. In benchmark tasks like the Data Mining Cup (DMC) tuned models achieve remarkably better ranks than their untuned counterparts.

References

[1]
T. Bartz-Beielstein. SPOT: An R package for automatic and interactive tuning of optimization algorithms by sequential parameter optimization. Technical Report arXiv:1006.4645. CIOP Technical Report 05--10, Cologne University of Applied Sciences, Jun 2010.
[2]
T. Bartz-Beielstein, C. Lasarczyk, and M. Preuß. Sequential parameter optimization. In B. McKay et al., editors, Proceedings 2005 Congress on Evolutionary Computation (CEC'05), Edinburgh, Scotland, volume 1, pages 773--780, Piscataway NJ, 2005. IEEE Press.
[3]
T. Bartz-Beielstein, C. Lasarczyk, and M. Preuss. The sequential parameter optimization toolbox. In Bartz-Beielstein et al., editors, Experimental Methods for the Analysis of Optimization Algorithms, pages 337--360. Springer, Berlin, Heidelberg, New York, 2010.
[4]
B. Bischl. The mlr package: Machine learning in R. http://mlr.r-forge.r-project.org, accessed 14.04.2011.
[5]
B. Bischl, O. Mersmann, and H. Trautmann. Resampling methods in model validation. In T. Bartz-Beielstein et al., editors, Workshop WEMACS joint to PPSN2010, number TR10--2-007 in Technical Reports, TU Dortmund, 2010.
[6]
L. Breiman. Random forests. Machine Learning, 45(1):5 --32, 2001.
[7]
C. Broyden, J. Dennis, and J. Moré. On the local and superlinear convergence of quasi-Newton methods. IMA Journal of Applied Mathematics, 12(3):223--245, 1973.
[8]
R. Byrd, P. Lu, J. Nocedal, and C. Zhu. A limited memory algorithm for bound constrained optimization. SIAM Journal on Scientific Computing, 16(5):1190--1208, 1995.
[9]
P. Domingos. Metacost: A general method for making classifiers cost-sensitive. In Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining (KDD-99), pages 195--215, 1999.
[10]
D. Goldberg. Genetic algorithms in search, optimization, and machine learning. Addison-wesley, 1989.
[11]
N. Hansen. The CMA evolution strategy: a comparing review. In J. Lozano, P. Larranaga, I. Inza, and E. Bengoetxea, editors, Towards a new evolutionary computation. Advances on estimation of distribution algorithms, pages 75--102. Springer, 2006.
[12]
F. Jurecka. Automated metamodeling for efficient multi-disciplinary optimization of complex automotive structures. In 7th European LS-DYNA Conference, Salzburg, Austria, 2009.
[13]
S. Kögel. Data Mining Cup DMC. http://www.data-mining-cup.de, accessed 14.04.2011.
[14]
W. Konen. The TDM framework: Tuned data mining in R. CIOP Technical Report 01--11, Cologne University of Applied Sciences, Jan 2011.
[15]
W. Konen, P. Koch, O. Flasch, and T. Bartz-Beielstein. Parameter-tuned data mining: A general framework. In F. Hoffmann and E. Hüllermeier, editors, Proceedings 20. Workshop Computational Intelligence. Universitatsverlag Karlsruhe, 2010.
[16]
W. Konen, P. Koch, O. Flasch, T. Bartz-Beielstein, M. Friese, and B. Naujoks. Tuned data mining: A benchmark study on different tuners. CIOP Technical Report 02--11, Cologne University of Applied Sciences, Feb 2011.
[17]
W. Konen, T. Zimmer, and T. Bartz-Beielstein. Optimized modeling of fill levels in stormwater tanks using CI-based parameter selection schemes (in german). at-Automatisierungstechnik, 57(3):155--166, 2009.
[18]
A. Liaw and M. Wiener. Classification and regression by random Forest. R News, 2:18--22, 2002. http://CRAN.R-project.org/doc/Rnews/.
[19]
H. Liu and L. Yu. Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17:491--502, 2005.
[20]
I. Mierswa. Rapid Miner. http://rapid-i.com, accessed 14.04.2011.
[21]
R. Mikut, O. Burmeister, M. Reischl, and T. Loose. Die MATLAB-Toolbox Gait-CAD. In R. Mikut and M. Reischl, editors, Proceedings 16. Workshop Computational Intelligence, pages 114--124, Karlsruhe, 2006. Universitatsverlag, Karlsruhe.
[22]
V. Nannen and A. E. Eiben. Efficient relevance estimation and value calibration of evolutionary algorithm parameters. In IEEE Congress on Evolutionary Computation, pages 103--110, 2007.
[23]
B. Schölkopf and A. J. Smola. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press, Cambridge, MA, USA, 12 2002.
[24]
S. K. Smit and A. E. Eiben. Comparing Parameter Tuning Methods for Evolutionary Algorithms. In IEEE Congress on Evolutionary Computation (CEC), pages 399--406, May 2009.
[25]
A. Stuhlsatz, J. Lippel, and T. Zielke. Feature extraction for simple classification. In Proc. Int. Conf. on Pattern Recognition (ICPR), Istanbul, Turkey, page 23, 2010.
[26]
E. Talbi. A taxonomy of hybrid metaheuristics. Journal of heuristics, 8(5):541--564, 2002.
[27]
V. N. Vapnik. Statistical Learning Theory. Wiley-Interscience, September 1998.
[28]
C. Wolf, D. Gaida, A. Stuhlsatz, T. Ludwig, S. McLoone, and M. Bongards. Predicting organic acid concentration from UV/vis spectro measurements - a comparison of machine learning techniques. Trans. Inst. of Measurement and Control, 2011.
[29]
C. Wolf, D. Gaida, A. Stuhlsatz, S. McLoone, and M. Bongards. Organic acid prediction in biogas plants using UV/vis spectroscopic online-measurements. Life System Modeling and Intelligent Computing, 97:200--206, 2010.

Cited By

View all
  • (2023)Mind the Gap: Measuring Generalization Performance Across Multiple ObjectivesAdvances in Intelligent Data Analysis XXI10.1007/978-3-031-30047-9_11(130-142)Online publication date: 1-Apr-2023
  • (2022)A Literature Survey on Offline Automatic Algorithm ConfigurationApplied Sciences10.3390/app1213631612:13(6316)Online publication date: 21-Jun-2022
  • (2021)Fake It Till You Make It: Guidelines for Effective Synthetic Data GenerationApplied Sciences10.3390/app1105215811:5(2158)Online publication date: 28-Feb-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation
July 2011
2140 pages
ISBN:9781450305570
DOI:10.1145/2001576
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 July 2011

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data mining
  2. parameter tuning
  3. sequential parameter optimization

Qualifiers

  • Research-article

Conference

GECCO '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Mind the Gap: Measuring Generalization Performance Across Multiple ObjectivesAdvances in Intelligent Data Analysis XXI10.1007/978-3-031-30047-9_11(130-142)Online publication date: 1-Apr-2023
  • (2022)A Literature Survey on Offline Automatic Algorithm ConfigurationApplied Sciences10.3390/app1213631612:13(6316)Online publication date: 21-Jun-2022
  • (2021)Fake It Till You Make It: Guidelines for Effective Synthetic Data GenerationApplied Sciences10.3390/app1105215811:5(2158)Online publication date: 28-Feb-2021
  • (2020)Tuning Hyper-Parameters of Machine Learning Methods for Improving the Detection of Hate SpeechAdvances on Smart and Soft Computing10.1007/978-981-15-6048-4_7(71-78)Online publication date: 20-Oct-2020
  • (2019)Effect of the Sampling of a Dataset in the Hyperparameter Optimization Phase over the Efficiency of a Machine Learning AlgorithmComplexity10.1155/2019/62789082019:1Online publication date: 4-Feb-2019
  • (2019)Hyperparameter OptimizationAutomated Machine Learning10.1007/978-3-030-05318-5_1(3-33)Online publication date: 18-May-2019
  • (2018)AutotuneProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3219819.3219837(443-452)Online publication date: 19-Jul-2018
  • (2017)Multi-objective evolution of machine learning workflows2017 IEEE Symposium Series on Computational Intelligence (SSCI)10.1109/SSCI.2017.8285357(1-8)Online publication date: Nov-2017
  • (2017)Kernel Construction and Feature Subset Selection in Support Vector MachinesSimulated Evolution and Learning10.1007/978-3-319-68759-9_49(605-616)Online publication date: 14-Oct-2017
  • (2012)Tuning and evolution of support vector kernelsEvolutionary Intelligence10.1007/s12065-012-0073-85:3(153-170)Online publication date: 4-May-2012

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media