Abstract
In this paper, a novel feature selection algorithm FEAST is proposed based on association rule mining. The proposed algorithm first mines association rules from a data set; then, it identifies the relevant and interactive feature values with the constraint association rules whose consequent is the target concept, and detects the redundant feature values with constraint association rules whose consequent and antecedent are both single feature value. After that, it eliminates the redundant feature values, and obtains the feature subset by mapping the relevant feature values to corresponding features. The efficiency and effectiveness of FEAST are tested upon both synthetic and real world data sets, and the classification results of the three different types of classifiers (including Naive Bayes, C4.5 and PART) with the other four representative feature subset selection algorithms (including CFS, FCBF, INTERACT and associative-based FSBAR) were compared. The results on synthetic data sets show that FEAST can effectively identify irrelevant and redundant features while reserving interactive ones. The results on the real world data sets show that FEAST outperformed other feature subset selection algorithms in terms of average classification accuracy and Win/Draw/Loss record.
This work is supported by the National Natural Science Foundation of China under grant 61070006.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Asuncion, A., Newman, D.J.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/
Chen, G., Liu, H., Yu, L., Wei, Q., Zhang, X.: A new approach to classification based on association rule mining. Decision Support Systems 42(2), 674–689 (2006)
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(3), 131–156 (1997)
Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151(1-2), 155–176 (2003)
Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann Publishers Inc. (1998)
Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recognition 43(1), 5–13 (2010)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc. (2000)
Han, J.: CPAR: Classification based on predictive association rules. In: Proceedings of the Third SIAM International Conference on Data Mining, vol. 3, pp. 331–335. Society for Industrial & Applied (2003)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery 8(1), 53–87 (2004)
Jakulin, A., Bratko, I.: Testing the significance of attribute interactions. In: Proceedings of the 21st International Conference on Machine learning, pp. 409–416. ACM (2004)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the 11th International Conference on Machine Learning, vol. 129, pp. 121–129. Citeseer (1994)
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, vol. 1, pp. 338–345. Citeseer (1995)
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proceedings of the 3rd International Conference on Information and Knowledge Management, pp. 401–407. ACM (1994)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of International Conference on Machine Learning, pp. 284–292. Citeseer (1996)
Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings of IEEE International Conference on Data Mining, pp. 369–376. IEEE Computer Society (2001)
Liu, H., Setiono, R.: A probabilistic approach to feature selection-a filter solution. In: Proceedings of the 13rd International Conference of Machine learning. Morgan Kaufmann Pub. (1996)
Park, H., Kwon, H.C.: Extended relief algorithms in instance-based feature filtering. In: Proceedings of the 6th International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007), pp. 123–128. IEEE Computer Society (2007)
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Scanlon, P., Potamianos, G., Libal, V., Chu, S.M.: Mutual information based visual feature selection for lipreading. In: Processings of the 8th International Conference on Spoken Language, pp. 857–860. Citeseer (2004)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to data mining. Pearson Addison Wesley, Boston (2006)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Pub. (2005)
Xie, J., Wu, J., Qian, Q.: Feature selection algorithm based on association rules mining method. In: Proceedings of 8th IEEE/ACIS International Conference on Computer and Information Science, pp. 357–362. IEEE (2009)
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of 20th International Conference on Machine Leaning, vol. 20, pp. 856–863 (2003)
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
Zhao, Z., Liu, H.: Searching for interacting features in subset selection. Intelligent Data Analysis 13(2), 207–228 (2009)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Wang, G., Song, Q. (2012). Selecting Feature Subset via Constraint Association Rules. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30220-6_26
Download citation
DOI: https://doi.org/10.1007/978-3-642-30220-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30219-0
Online ISBN: 978-3-642-30220-6
eBook Packages: Computer ScienceComputer Science (R0)