Selecting Feature Subset via Constraint Association Rules

Wang, Guangtao; Song, Qinbao

doi:10.1007/978-3-642-30220-6_26

Guangtao Wang²³ &
Qinbao Song²³

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7302))

Included in the following conference series:

Pacific-Asia Conference on Knowledge Discovery and Data Mining

2397 Accesses
3 Citations

Abstract

In this paper, a novel feature selection algorithm FEAST is proposed based on association rule mining. The proposed algorithm first mines association rules from a data set; then, it identifies the relevant and interactive feature values with the constraint association rules whose consequent is the target concept, and detects the redundant feature values with constraint association rules whose consequent and antecedent are both single feature value. After that, it eliminates the redundant feature values, and obtains the feature subset by mapping the relevant feature values to corresponding features. The efficiency and effectiveness of FEAST are tested upon both synthetic and real world data sets, and the classification results of the three different types of classifiers (including Naive Bayes, C4.5 and PART) with the other four representative feature subset selection algorithms (including CFS, FCBF, INTERACT and associative-based FSBAR) were compared. The results on synthetic data sets show that FEAST can effectively identify irrelevant and redundant features while reserving interactive ones. The results on the real world data sets show that FEAST outperformed other feature subset selection algorithms in terms of average classification accuracy and Win/Draw/Loss record.

This work is supported by the National Natural Science Foundation of China under grant 61070006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Feature subset selection combining maximal information entropy and maximal information coefficient

Article 29 July 2019

Optimal and Novel Hybrid Feature Selection Framework for Effective Data Classification

Hybrid Feature Selection Method Based on the Genetic Algorithm and Pearson Correlation Coefficient

References

Asuncion, A., Newman, D.J.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/
Chen, G., Liu, H., Yu, L., Wei, Q., Zhang, X.: A new approach to classification based on association rule mining. Decision Support Systems 42(2), 674–689 (2006)
Article Google Scholar
Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(3), 131–156 (1997)
Article Google Scholar
Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151(1-2), 155–176 (2003)
Article MathSciNet MATH Google Scholar
Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)
MathSciNet MATH Google Scholar
Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann Publishers Inc. (1998)
Google Scholar
Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recognition 43(1), 5–13 (2010)
Article MATH Google Scholar
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)
MATH Google Scholar
Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc. (2000)
Google Scholar
Han, J.: CPAR: Classification based on predictive association rules. In: Proceedings of the Third SIAM International Conference on Data Mining, vol. 3, pp. 331–335. Society for Industrial & Applied (2003)
Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery 8(1), 53–87 (2004)
Article MathSciNet Google Scholar
Jakulin, A., Bratko, I.: Testing the significance of attribute interactions. In: Proceedings of the 21st International Conference on Machine learning, pp. 409–416. ACM (2004)
Google Scholar
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the 11th International Conference on Machine Learning, vol. 129, pp. 121–129. Citeseer (1994)
Google Scholar
John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, vol. 1, pp. 338–345. Citeseer (1995)
Google Scholar
Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proceedings of the 3rd International Conference on Information and Knowledge Management, pp. 401–407. ACM (1994)
Google Scholar
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)
Article MATH Google Scholar
Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of International Conference on Machine Learning, pp. 284–292. Citeseer (1996)
Google Scholar
Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)
Chapter Google Scholar
Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings of IEEE International Conference on Data Mining, pp. 369–376. IEEE Computer Society (2001)
Google Scholar
Liu, H., Setiono, R.: A probabilistic approach to feature selection-a filter solution. In: Proceedings of the 13rd International Conference of Machine learning. Morgan Kaufmann Pub. (1996)
Google Scholar
Park, H., Kwon, H.C.: Extended relief algorithms in instance-based feature filtering. In: Proceedings of the 6th International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007), pp. 123–128. IEEE Computer Society (2007)
Google Scholar
Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)
Google Scholar
Scanlon, P., Potamianos, G., Libal, V., Chu, S.M.: Mutual information based visual feature selection for lipreading. In: Processings of the 8th International Conference on Spoken Language, pp. 857–860. Citeseer (2004)
Google Scholar
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to data mining. Pearson Addison Wesley, Boston (2006)
Google Scholar
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Pub. (2005)
Google Scholar
Xie, J., Wu, J., Qian, Q.: Feature selection algorithm based on association rules mining method. In: Proceedings of 8th IEEE/ACIS International Conference on Computer and Information Science, pp. 357–362. IEEE (2009)
Google Scholar
Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of 20th International Conference on Machine Leaning, vol. 20, pp. 856–863 (2003)
Google Scholar
Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)
MathSciNet MATH Google Scholar
Zhao, Z., Liu, H.: Searching for interacting features in subset selection. Intelligent Data Analysis 13(2), 207–228 (2009)
Google Scholar

Download references

Author information

Authors and Affiliations

Dept. of Computer Science and Technology, Xi’an Jiaotong University, China
Guangtao Wang & Qinbao Song

Authors

Guangtao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Qinbao Song
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computer Science and Engineering, Michigan State University, 428 S. Shaw Lane, 48824-1226, East Lansing, MI, USA
Pang-Ning Tan
School of Information Technologies, University of Sydney, 1 Cleveland St., 2006, Sydney, NSW, Australia
Sanjay Chawla
Faculty of Computing and Informatics, Jalan Multimedia, Multimedia University, 63100, Cyberjaya, Selangor, Malaysia
Chin Kuan Ho
Department of Computing and Information Systems, The University of Melbourne, 111 Barry Street, 3053, Melbourne, VIC, Australia
James Bailey

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Wang, G., Song, Q. (2012). Selecting Feature Subset via Constraint Association Rules. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30220-6_26

Download citation

DOI: https://doi.org/10.1007/978-3-642-30220-6_26
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30219-0
Online ISBN: 978-3-642-30220-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Selecting Feature Subset via Constraint Association Rules

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Feature subset selection combining maximal information entropy and maximal information coefficient

Optimal and Novel Hybrid Feature Selection Framework for Effective Data Classification

Hybrid Feature Selection Method Based on the Genetic Algorithm and Pearson Correlation Coefficient

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Selecting Feature Subset via Constraint Association Rules

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Feature subset selection combining maximal information entropy and maximal information coefficient

Optimal and Novel Hybrid Feature Selection Framework for Effective Data Classification

Hybrid Feature Selection Method Based on the Genetic Algorithm and Pearson Correlation Coefficient

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation