Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2001576.2001759acmconferencesArticle/Chapter ViewAbstractPublication PagesgeccoConference Proceedingsconference-collections
research-article

Predicting problem difficulty for genetic programming applied to data classification

Published: 12 July 2011 Publication History

Abstract

During the development of applied systems, an important problem that must be addressed is that of choosing the correct tools for a given domain or scenario. This general task has been addressed by the genetic programming (GP) community by attempting to determine the intrinsic difficulty that a problem poses for a GP search. This paper presents an approach to predict the performance of GP applied to data classification, one of the most common problems in computer science. The novelty of the proposal is to extract statistical descriptors and complexity descriptors of the problem data, and from these estimate the expected performance of a GP classifier. We derive two types of predictive models: linear regression models and symbolic regression models evolved with GP. The experimental results show that both approaches provide good estimates of classifier performance, using synthetic and real-world problems for validation. In conclusion, this paper shows that it is possible to accurately predict the expected performance of a GP classifier using a set of descriptors that characterize the problem data.

References

[1]
J. Eggermont, J. N. Kok, and W. A. Kosters. Genetic programming for data classification: partitioning the search space. In Proceedings of the 2004 ACM symposium on Applied computing, SAC '04, pages 1001--1005, New York, NY, USA, 2004. ACM.
[2]
E. Galván-López, S. Dignum, and R. Poli. The effects of constant neutrality on performance and problem hardness in gp. In Proceedings of the 11th European conference on Genetic programming, EuroGP'08, pages 312--324, Berlin, Heidelberg, 2008. Springer-Verlag.
[3]
E. Galván-López, J. McDermott, M. O'Neill, and A. Brabazon. Defining locality as a problem difficulty measure in genetic programming. Genet. Program. Evolv. Mach. (accepted).
[4]
T. K. Ho and M. Basu. Complexity measures of supervised classification problems. IEEE Trans. Pattern Anal. Mach. Intell., 24:289--300, March 2002.
[5]
J. H. Holland. Adaptation in Natural and Artificial Systems. University of Michigan Press, Ann Arbor, MI, 1975.
[6]
M. Kimura. The neutral theory of molecular evolution. Cambridge University Press., 1983.
[7]
K. E. Kinnear. Fitness landscapes and difficulty in genetic programming. In Proceedings of the First IEEE Conference on Evolutionary Computing, pages 142--147, Piscataway, NY, 1994. IEEE Press.
[8]
J. R. Koza. Genetic programming II: automatic discovery of reusable programs. MIT Press, Cambridge, MA, USA, 1994.
[9]
M. A. Little, P. E. McSharry, E. J. Hunter, and L. O. Raming. Suitability of dysphonia measurements for telemonitoring of parkinson's disease. IEEE Transactions on Biomedical Engineering, 56(4):1015--1022, 2008.
[10]
S. Luke and L. Panait. Lexicographic parsimony pressure. In Proceedings of the Genetic and Evolutionary Computation Conference, GECCO '02, pages 829--836, San Francisco, CA, USA, 2002. Morgan Kaufmann Publishers Inc.
[11]
J. McDermott, E. Galvan-Lopez, and M. O'Neill. A fine-grained view of GP locality with binary decision diagrams as ant phenotypes. In R. Schaefer, C. Cotta, J. Kolodziej, and G. Rudolph, editors, PPSN 2010 11th International Conference on Parallel Problem Solving From Nature, volume 6238 of Lecture Notes in Computer Science, pages 164--173, Krakow, Poland, 2010. Springer.
[12]
D. Michie, D. J. Spiegelhalter, C. C. Taylor, and J. Campbell, editors. Machine learning, neural and statistical classification. Ellis Horwood, Upper Saddle River, NJ, USA, 1994.
[13]
L. C. Molina, L. Belanche, and A. Nebot. Feature selection algorithms: A survey and experimental evaluation. In Proceedings of the 2002 IEEE International Conference on Data Mining, ICDM '02, pages 306--324, Washington, DC, USA, 2002. IEEE Computer Society.
[14]
A. E. Nix and M. D. Vose. Modeling genetic algorithms with markov chains. Annals of Mathematics and Artificial Intelligence, 5:79--88, 1992.
[15]
R. Poli. A simple but theoretically-motivated method to control bloat in genetic programming. In C. Ryan, T. Soule, M. Keijzer, E. P. K. Tsang, R. Poli, and E. Costa, editors, Genetic Programming, 6th European Conference, EuroGP 2003, Essex, UK, April 14--16, 2003. Proceedings, volume 2610 of Lecture Notes in Computer Science, pages 204--217. Springer, 2003.
[16]
R. Poli and N. F. McPhee. General schema theory for genetic programming with subtree-swapping crossover: Part i. Evol. Comput., 11(1):53--66, 2003.
[17]
R. Poli and N. F. McPhee. General schema theory for genetic programming with subtree-swapping crossover: Part ii. Evol. Comput., 11(2):169--206, 2003.
[18]
R. Poli and N. F. McPhee. Parsimony pressure made easy. In GECCO '08: Proceedings of the 10th annual conference on Genetic and evolutionary computation, pages 1267--1274, New York, NY, USA, 2008. ACM.
[19]
R. Poli, N. F. McPhee, and J. E. Rowe. Exact schema theory and markov chain models for genetic programming and variable-length genetic algorithms with homologous crossover. Genet. Program. Evolv. Mach., 5:31--70, March 2004.
[20]
R. Poli and L. Vanneschi. Fitness-proportional negative slope coefficient as a hardness measure for genetic algorithms. In Proceedings of the 9th annual conference on Genetic and evolutionary computation, GECCO '07, pages 1335--1342, New York, NY, USA, 2007. ACM.
[21]
S. Silva and J. Almeida. Gplab--a genetic programming toolbox for matlab. In L. Gregersen, editor, Proceedings of the Nordic MATLAB conference, pages 273--278, 2003.
[22]
S. Silva and E. Costa. Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories. Genet. Program. Evolv. Mach., 10(2):141--179, 2009.
[23]
J. W. Smith, J. E. Everhart, W. C. Dickson, W. C. Knowler, and R. S. Johannes. Using the adap learning algorithm to forecast the onset of diabetes mellitus. Johns Hopkins APL Technical Digest, 10:262--266, 1988.
[24]
S. Y. Sohn. Meta analysis of classification algorithms for pattern recognition. IEEE Trans. Pattern Anal. Mach. Intell., 21:1137--1144, November 1999.
[25]
M. Tomassini, L. Vanneschi, P. Collard, and M. Clergue. A study of fitness distance correlation as a difficulty measure in genetic programming. Evol. Comput., 13:213--239, June 2005.
[26]
L. Trujillo, Y. Martínez, and P. Melin. Estimating classifier performance with genetic programming. In S. Silva et al., editor, Proceedings of the 14th European Conference on Genetic Programming, EuroGP 2011, volume 6621 of LNCS, pages 275--286, Turin, Italy, 2011. Springer Verlag.
[27]
L. Vanneschi, M. Castelli, and S. Silva. Measuring bloat, overfitting and functional complexity in genetic programming. In GECCO '10: Proceedings of the 12th annual conference on Genetic and evolutionary computation, pages 877--884, New York, NY, USA, 2010. ACM.
[28]
L. Vanneschi, M. Tomassini, P. Collard, S. Vérel, Y. Pirola, and G. Mauri. A comprehensive view of fitness landscapes with neutrality and fitness clouds. In Proceedings of the 10th European conference on Genetic programming, EuroGP'07, pages 241--250, Berlin, Heidelberg, 2007. Springer-Verlag.
[29]
T. Yu and J. F. Miller. Neutrality and the evolvability of boolean function landscape. In Proceedings of the 4th European Conference on Genetic Programming, EuroGP '01, pages 204--217, London, UK, 2001. Springer-Verlag.
[30]
M. Zhang and W. Smart. Using gaussian distribution to construct fitness functions in genetic programming for multiclass object classification. Pattern Recogn. Lett., 27:1266--1274, August 2006.

Cited By

View all
  • (2023)Models to classify the difficulty of genetic algorithms to solve continuous optimization problemsNatural Computing10.1007/s11047-022-09936-923:2(431-451)Online publication date: 12-Jan-2023
  • (2019)How Complex Is Your Classification Problem?ACM Computing Surveys10.1145/334771152:5(1-34)Online publication date: 13-Sep-2019
  • (2016)Prediction of expected performance for a genetic programming classifierGenetic Programming and Evolvable Machines10.1007/s10710-016-9265-917:4(409-449)Online publication date: 1-Dec-2016
  • Show More Cited By
  1. Predicting problem difficulty for genetic programming applied to data classification

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      GECCO '11: Proceedings of the 13th annual conference on Genetic and evolutionary computation
      July 2011
      2140 pages
      ISBN:9781450305570
      DOI:10.1145/2001576
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 July 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. classification
      2. genetic programming
      3. performance prediction

      Qualifiers

      • Research-article

      Conference

      GECCO '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,669 of 4,410 submissions, 38%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)2
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 04 Feb 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Models to classify the difficulty of genetic algorithms to solve continuous optimization problemsNatural Computing10.1007/s11047-022-09936-923:2(431-451)Online publication date: 12-Jan-2023
      • (2019)How Complex Is Your Classification Problem?ACM Computing Surveys10.1145/334771152:5(1-34)Online publication date: 13-Sep-2019
      • (2016)Prediction of expected performance for a genetic programming classifierGenetic Programming and Evolvable Machines10.1007/s10710-016-9265-917:4(409-449)Online publication date: 1-Dec-2016
      • (2014)Performance Classification of Genetic Algorithms on Continuous Optimization ProblemsNature-Inspired Computation and Machine Learning10.1007/978-3-319-13650-9_1(1-12)Online publication date: 2014
      • (2013)Searching for novel clustering programsProceedings of the 15th annual conference on Genetic and evolutionary computation10.1145/2463372.2463505(1093-1100)Online publication date: 6-Jul-2013
      • (2013)Identification of epilepsy stages from ECoG using genetic programming classifiersComputers in Biology and Medicine10.1016/j.compbiomed.2013.08.01643:11(1713-1723)Online publication date: 1-Nov-2013
      • (2013)Searching for novel classifiersProceedings of the 16th European conference on Genetic Programming10.1007/978-3-642-37207-0_13(145-156)Online publication date: 3-Apr-2013
      • (2013)Analysis and Classification of Epilepsy Stages with Genetic ProgrammingEVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation II10.1007/978-3-642-31519-0_4(57-70)Online publication date: 2013
      • (2013)Locality in Continuous Fitness-Valued Cases and Genetic Programming DifficultyEVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation II10.1007/978-3-642-31519-0_3(41-56)Online publication date: 2013
      • (2013)Preliminary Study of Bloat in Genetic Programming with Behavior-Based SearchEVOLVE - A Bridge between Probability, Set Oriented Numerics, and Evolutionary Computation IV10.1007/978-3-319-01128-8_19(293-305)Online publication date: 2013
      • Show More Cited By

      View Options

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Figures

      Tables

      Media

      Share

      Share

      Share this Publication link

      Share on social media