Abstract
The k-nearest neighbors classifier is a widely used classification method that has proven to be very effective in supervised learning tasks. In this paper, a fuzzy rough set method for prototype selection, focused on optimizing the behavior of this classifier, is presented. The hybridization with an evolutionary feature selection method is considered to further improve its performance, obtaining a competent data reduction algorithm for the 1-nearest neighbors classifier. This hybridization is performed in the training phase, by using the solution of each preprocessing technique as the starting condition of the other one, within a cycle. The results of the experimental study, which have been contrasted through nonparametric statistical tests, show that the new hybrid approach obtains very promising results with respect to classification accuracy and reduction of the size of the training set.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Notes
The experiments have been carried out on a machine with a Dual Core 3,20 GHz processor and 2GB of RAM, running under the Fedora 4 operating System.
References
Aha DW, Kibler D, Albert MK (1991) Instance-based learning algorithms. Mach Learn 6:37–66
Alcalá-Fdez J, Sánchez L, García S, del Jesus MJ, Ventura S, Garrell JM, Otero J, Romero C, Bacardit J, Rivas VM, Fernández JC, Herrera F (2008) KEEL: a software tool to assess evolutionary algorithms for data mining problems. Soft Comput 13(3):307–318
Alcalá-Fdez J, Fernández A, Luengo J, Derrac J, García S, Sánchez L, Herrera F (2011) Keel data-mining software tool: data set repository, integration of algorithms and experimental analysis framework. J Mult Valued Log Soft Comput 17(2–3):255–287
Almuallim H, Dietterich T (1991) Learning with many irrelevant features. In: Proceedings of the 9th national conference on artificial intelligence, vol 2, Anaheim, CA, USA, July 14–19, The MIT Press, pp 547–552
Alpaydin E (2010) Introduction to machine learning, 2nd edn. The MIT Press, Cambridge
Bell G, Hey T, Szalay A (2009) Beyond the data deluge. Science 323:1297–1298
Cano JR, Herrera F, Lozano M (2003) Using evolutionary algorithms as instance selection for data reduction in KDD: An experimental study. IEEE Trans Evol Comput 7(6):561–575
Cano JR, Herrera F, Lozano M (2007) Evolutionary stratified training set selection for extracting classification rules with trade-off precision-interpretability. Data Knowl Eng 60:90–100
Cano JR, Herrera F, Lozano M, García S (2008) Making CN2-SD subgroup discovery algorithm scalable to large size data sets using instance selection. Expert Syst Appl 35:1949–1965
Casillas J, Cordon O, Del Jesus MJ, Herrera F (2001) Genetic feature selection in a fuzzy rule-based classification system learning process for high-dimensional problems. Inf Sci 136:135–157
Chen Y, Garcia EK, Gupta MR, Rahimi A, Cazzanti L (2009) Similarity-based classification: concepts and algorithms. J Mach Learn Res 10:747–776
Cornelis C, Jensen R, Hurtado G, Slezak D (2010) Attribute selection with fuzzy decision reducts. Inf Sci 180:209–224
Cover TM, Hart PE (1967) Nearest neighbor pattern classification. IEEE Trans Inf Theory 13(1):21–27
De Cock M, Cornelis C, Kerre EE (2007) Fuzzy rough sets: The forgotten step. IEEE Trans Fuzzy Syst 15(1):121–130
Derrac J, García S, Herrera F (2010a) IFS-CoCo: Instance and feature selection based on cooperative coevolution with nearest neighbor rule. Pattern Recognit 43(6):2082–2105
Derrac J, García S, Herrera F (2010b) A survey on evolutionary instance selection and generation. Int J Appl Metaheur Comput 1(1):60–92
Derrac J, Cornelis C, García S, Herrera F (2012) Enhancing evolutionary instance selection algorithms by means of fuzzy rough set based feature selection. Inf Sci 186(1):73–92
Destercke S (2012) A k-nearest neighbours method based on imprecise probabilities. Soft Comput 16(5):833–844
Dubois D, Prade H (1990) Rough fuzzy sets and fuzzy rough sets. Int J General Syst 17:191–209
Eiben AE, Smith JE (2003) Introduction to Evolutionary Computing. Natural Computing, Springer-Verlag, Berlin
Eshelman LJ (1991) The CHC adaptive search algorithm: how to have safe search when engaging in nontraditional genetic recombination. In: Rawlins GJE (ed) Foundations of genetic algorithms, Morgan Kaufmann, San Mateo, pp 265–283
Ferrandiz S, Boullé M (2010) Bayesian instance selection for the nearest neighbor rule. Mach Learn 81(81):229–256
Franco A, Maltoni D, Nanni L (2010) Data pre-processing through reward-punishment editing. Pattern Anal Appl 13:367–381
Frank A, Asuncion A (2010) UCI machine learning repository. http://archive.ics.uci.edu/ml
Freitas AA (2002) Data mining and knowledge discovery with evolutionary algorithms. Springer-Verlag, Berlin
García S, Herrera F (2008) An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. J Mach Learn Res 9:2677–2694
García S, Herrera F (2009) Evolutionary undersampling for classification with imbalanced datasets: Proposals and taxonomy. Evol Comput 17(3):275–306
García S, Cano JR, Herrera F (2008) A memetic algorithm for evolutionary prototype selection: A scaling up approach. Pattern Recognit 41(8):2693–2709
García S, Fernández A, Luengo J, Herrera F (2009) A study of statistical techniques and performance measures for genetics-based machine learning: accuracy and interpretability. Soft Comput 13(10):959–977
García S, Fernández A, Luengo J, Herrera F (2010) Advanced nonparametric tests for multiple comparisons in the design of experiments in computational intelligence and data mining: experimental analysis of power. Inf Sci 180:2044–2064
García S, Derrac J, Cano JR, Herrera F (2012a) Prototype selection for nearest neighbor classification: taxonomy and empirical study. IEEE Trans Pattern Anal Mach Intell 34(3):417–435
García S, Luengo J, Sáez JA, López V, Herrera F (2012b) A survey of discretization techniques: taxonomy and empirical analysis in supervised learning. IEEE Trans Knowl Data Eng (in press)
García-Pedrajas N (2011) Evolutionary computation for training set selection. Wiley Interdiscip Rev Data Min Knowl Dis 1(6):512–523
García-Pedrajas N, Romero JA, Ortiz-Boyer D (2010) A cooperative coevolutionary algorithm for instance selection for instance-based learning. Mach Learn 78:381–420
Ghosh A, Jain LC (eds) (2005) Evolutionary computation in data mining. Springer-Verlag, Berlin
Gil-Pita R, Yao X (2008) Evolving edited k-nearest neighbor classifiers. Int J Neural Syst 18(6):1–9
Gonzalez A, Perez R (2001) Selection of relevant features in a fuzzy genetic learning algorithm. IEEE Trans Syst Man Cybern 31(3):417–425
Guyon I, Elisseeff A (2003) An introduction to variable and feature selection. J Mach Learn Res 3:1157–1182
Guyon I, Gunn S, Nikravesh M, Zadeh LA (eds) (2006) Feature extraction: foundations and applications. Springer, Berlin
Hart PE (1968) The condensed nearest neighbour rule. IEEE Trans Inf Theory 18(5):515–516
He H, Garcia EA (2009) Learning from imbalanced data. IEEE Trans Knowl Data Eng 21:1263–1284
He Q, Wu C (2011) Membership evaluation and feature selection for fuzzy support vector machine based on fuzzy rough sets. Soft Comput 15(6):1105–1114
Ho SY, Liu CC, Liu S (2002) Design of an optimal nearest neighbor classifier using an intelligent genetic algorithm. Pattern Recognit Lett 23(13):1495–1503
Inza I, Larrañaga P, Sierra B (2001) Feature subset selection by bayesian networks: a comparison with genetic and sequential algorithms. Int J Approx Reason 27:143–164
Ishibuchi H, Nakashima T (1998) Evolution of reference sets in nearest neighbor classification. In: Second Asia-Pacific conference on simulated evolution and learning on simulated evolution and learning (SEAL’98). Lecture notes in computer science, vol 1585, pp 82–89
Ishibuchi H, Nakashima T, Nii M (2001) Genetic-algorithm-based instance and feature selection. In: Liu H, Motoda H (eds) Instance selection and construction for data mining, Kluwer Academic Publishers, Dordrecht, pp 95–112
Jensen R, Cornelis C (2010) Fuzzy-rough instance selection. In: Proceedings of the WCCI 2010 IEEE world congress on computational intelligence, IEEE congress on fuzzy logic, Barcelona, Spain, pp 1776–1782
Jensen R, Shen Q (2007) Fuzzy-rough sets assisted attribute selection. IEEE Trans Fuzzy Syst 15(1):73–89
Jensen R, Shen Q (2009) New approaches to fuzzy-rough feature selection. IEEE Trans Fuzzy Syst 17(4):824–838
Kim K (2006) Artificial neural networks with evolutionary instance selection for financial forecasting. Expert Syst Appl 30:519–526
Kira K, Rendell L (1992) A practical approach to feature selection. In: Proceedings of the 9th international workshop on machine learning, Aberdeen, Scotland UK, pp 249–256
Kohavi R, John G (1997) Wrappers for feature selection. Artif Intell 97:273–324
Kuncheva LI (1995) Editing for the k-nearest neighbors rule by a genetic algorithm. Pattern Recognit Lett 16:809–814
Kuncheva LI, Jain L (1999) Nearest neighbor classifier: simultaneous editing and descriptor selection. Pattern Recognit Lett 20:1149–1156
Kusunoki Y, Inuiguchi M (2010) A unified approach to reducts in dominance-based rough set approach. Soft Comput 14(5):507–515
Liu H, Motoda H (eds) (1998) Feature selection for knowledge discovery and data mining. The Springer international series in engineering and computer science, Springer, Berlin
Liu H, Motoda H (eds) (2001) Instance selection and construction for data mining. The Springer international series in engineering and computer science, Springer, Berlin
Liu H, Motoda H (eds) (2007) Computational methods of feature selection. Chapman & Hall/Crc data mining and knowledge discovery series, Chapman & Hall/Crc, London
Liu H, Yu L (2005) Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 17(3):1–12
Mjolsness E, DeCoste D (2001) Machine learning for science: state of the art and future prospects. Science 293:2051–2055
Oh IS, Lee JS, Moon BR (2004) Hybrid genetic algorithms for feature selection. IEEE Trans Pattern Anal Mach Intell 26:1424–1437
Pappa GL, Freitas AA (2009) Automating the design of data mining algorithms: an evolutionary computation approach. Natural computing. Springer, Berlin
Pawlak Z (1982) Rough sets. Int J Comput Inf Sci 11(5):341–356
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer Academic Publishing, Dordrecht
Pawlak Z, Skowron A (2007a) Rough sets: some extensions. Inf Sci 177(1):28–40
Pawlak Z, Skowron A (2007b) Rudiments of rough sets. Inf Sci 177:3–27
Pyle D (1999) Data preparation for data mining. The Morgan Kaufmann series in data management systems. Morgan Kaufmann, Menlo Park
Quirino T, Kubat M, Bryan NJ (2010) Instinct-based mating in genetic algorithms applied to the tuning of 1-nn classifiers. IEEE Trans Knowl Data Eng 22(12):1724–1737
Radzikowska A, Kerre E (2002) A comparative study of fuzzy rough sets. Fuzzy Sets Syst 126:137–156
Ramentol E, Verbiest N, Bello R, Caballero Y, Cornelis C, Herrera F (2012) SMOTE-FRST: a new resampling method using fuzzy rough set theory. In: 10th International FLINS conference on uncertainty modelling in knowledge engineering and decision making (to appear)
Rokach L (2008) Genetic algorithm-based feature set partitioning for classification problems. Pattern Recognit 41:1676–1700
Saeys Y, Inza I, Larrañaga P (2007) A review of feature selection techniques in bioinformatics. Bioinformatics 19:2507–2517
Shakhnarovich G, Darrell T, Indyk P (eds) (2006) Nearest-neighbor methods in learning and vision: theory and practice. The MIT Press, Cambridge
Sheskin DJ (2011) Handbook of parametric and nonparametric statistical procedures, 5th edn. Chapman & Hall/CRC, London
Shie J, Chen S (2008) Feature subset selection based on fuzzy entropy measures for handling classification problems. Appl Intell 28:69–82
Stracuzzi D, Utgoff P (2004) Randomized variable elimination. J Mach Learn Res 5:1331–1362
Triguero I, García S, Herrera F (2010) IPADE: Iterative prototype adjustment for nearest neighbor classification. IEEE Trans Neural Netw 21(12):1984–1990
Triguero I, Derrac J, García S, Herrera F (2012) A taxonomy and experimental study on prototype generation for nearest neighbor classification. IEEE Trans Syst Man Cybern Part C Appl Rev 42(1):86–100
Tsang E, Chen D, Yeung D, Wang X, Lee JT (2008) Attributes reduction using fuzzy rough sets. IEEE Trans Fuzzy Syst 16(5):1130–1141
Weinberger K, Saul L (2009) Distance metric learning for large margin nearest neighbor classification. J Mach Learn Res 10:207–244
Whitley LD (1989) The genitor algorithm and selection pressure: Why rank-based allocation of reproductive trials is best. In: Proceedings of the 3rd international conference on genetic algorithms, vol 2, Fairfax, Virginia, USA, June 1989, Morgan Kaufmann, pp 116–123
Wilson DL (1972) Asymptotic properties of nearest neighbor rules using edited data. IEEE Trans Syst Man Cybern 2(3):408–421
Wilson DR, Martinez TR (2000) Reduction techniques for instance-based learning algorithms. Mach Learn 38(3):257–286
Witten IH, Frank E, Hall MA (2011) Data mining: practical machine learning tools and techniques, 3rd edn. Morgan Kaufmann series in data management systems. Morgan Kaufmann, Menlo Park
Wu X, Kumar V (eds) (2009) The top ten algorithms in data mining. Data mining and knowledge discovery. Chapman & Hall/CRC, London
Zadeh LA (1965) Fuzzy sets. Inf Control 8(3):338–353
Zhai J (2011) Fuzzy decision tree based on fuzzy-rough technique. Soft Comput 15(6):1087–1096
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Derrac, J., Verbiest, N., García, S. et al. On the use of evolutionary feature selection for improving fuzzy rough set based prototype selection. Soft Comput 17, 223–238 (2013). https://doi.org/10.1007/s00500-012-0888-3
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-012-0888-3