Abstract
One of the most important challenges in supervised learning is how to evaluate the quality of the models evolved by different machine learning techniques. Up to now, we have relied on measures obtained by running the methods on a wide test bed composed of real-world problems. Nevertheless, the unknown inherent characteristics of these problems and the bias of learners may lead to inconclusive results. This paper discusses the need to work under a controlled scenario and bets on artificial data set generation. A list of ingredients and some ideas about how to guide such generation are provided, and promising results of an evolutionary multi-objective approach which incorporates the use of data complexity estimates are presented.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Asuncion, A., Newman, D.: UCI machine learning repository (2007)
Basu, M., Ho, T.K.: Data Complexity in Pattern Recognition. Springer, Heidelberg (2006)
Bernadó-Mansilla, E., Ho, T.K., Orriols-Puig, A.: Data complexity and evolutionary learning. In: Data Complexity in Pattern Recognition, pp. 115–134. Springer, Heidelberg (2006)
Coello, C.A., Lamont, G.B., Veldhuizen, D.A.V.: Evolutionary Algorithms for Solving Multi-Objective Problems, 2nd edn. Springer, New York (2007)
Deb, K., Pratap, A., Agarwal, S., Meyarivan, T.: A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE TEC 6, 182–197 (2002)
Ho, T.K.: Data complexity analysis: Linkage between context and solution in classification. In: Proceedings of the Joint IAPR International Workshops on Structural and Syntactic Pattern Recognition (SSPR 2008) and Statistical Techniques in Pattern Recognition, SPR 2008 (2008)
Ho, T.K., Basu, M.: Complexity measures of supervised classification problems. IEEE Transactions on PAMI 24(3), 289–300 (2002)
Jeske, D.R., Samadi, B., Lin, P.J., Ye, L.: Generation of synthetic data sets for evaluating the accuracy of knowledge discovery systems. In: 11th International Conference on Knowledge Discovery in Data mining, pp. 756–762 (2005)
Macià, N., Bernadó-Mansilla, E., Orriols-Puig, A.: Preliminary approach on synthetic datasets generation for classification. In: 2008 International Conference on Pattern Recognition. LNCS, vol. 5342, pp. 986–995. Springer, Heidelberg (2008)
Macià, N., Orriols-Puig, A., Bernadó-Mansilla, E.: Genetic-based synthetic data sets for the analysis of classifiers’ behavior. In: Proceedings of the 2008 Hybrid Intelligent Systems Conference, pp. 507–512 (2008)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Macià, N., Orriols-Puig, A., Bernadó-Mansilla, E. (2009). Beyond Homemade Artificial Data Sets. In: Corchado, E., Wu, X., Oja, E., Herrero, Á., Baruque, B. (eds) Hybrid Artificial Intelligence Systems. HAIS 2009. Lecture Notes in Computer Science(), vol 5572. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-02319-4_73
Download citation
DOI: https://doi.org/10.1007/978-3-642-02319-4_73
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-02318-7
Online ISBN: 978-3-642-02319-4
eBook Packages: Computer ScienceComputer Science (R0)