Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Selecting Feature Subset via Constraint Association Rules

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2012)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 7302))

Included in the following conference series:

Abstract

In this paper, a novel feature selection algorithm FEAST is proposed based on association rule mining. The proposed algorithm first mines association rules from a data set; then, it identifies the relevant and interactive feature values with the constraint association rules whose consequent is the target concept, and detects the redundant feature values with constraint association rules whose consequent and antecedent are both single feature value. After that, it eliminates the redundant feature values, and obtains the feature subset by mapping the relevant feature values to corresponding features. The efficiency and effectiveness of FEAST are tested upon both synthetic and real world data sets, and the classification results of the three different types of classifiers (including Naive Bayes, C4.5 and PART) with the other four representative feature subset selection algorithms (including CFS, FCBF, INTERACT and associative-based FSBAR) were compared. The results on synthetic data sets show that FEAST can effectively identify irrelevant and redundant features while reserving interactive ones. The results on the real world data sets show that FEAST outperformed other feature subset selection algorithms in terms of average classification accuracy and Win/Draw/Loss record.

This work is supported by the National Natural Science Foundation of China under grant 61070006.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Asuncion, A., Newman, D.J.: UCI machine learning repository (2007), http://archive.ics.uci.edu/ml/

  2. Chen, G., Liu, H., Yu, L., Wei, Q., Zhang, X.: A new approach to classification based on association rule mining. Decision Support Systems 42(2), 674–689 (2006)

    Article  Google Scholar 

  3. Dash, M., Liu, H.: Feature selection for classification. Intelligent Data Analysis 1(3), 131–156 (1997)

    Article  Google Scholar 

  4. Dash, M., Liu, H.: Consistency-based search in feature selection. Artificial Intelligence 151(1-2), 155–176 (2003)

    Article  MathSciNet  MATH  Google Scholar 

  5. Fleuret, F.: Fast binary feature selection with conditional mutual information. Journal of Machine Learning Research 5, 1531–1555 (2004)

    MathSciNet  MATH  Google Scholar 

  6. Frank, E., Witten, I.H.: Generating accurate rule sets without global optimization. In: Proceedings of the Fifteenth International Conference on Machine Learning, pp. 144–151. Morgan Kaufmann Publishers Inc. (1998)

    Google Scholar 

  7. Gheyas, I.A., Smith, L.S.: Feature subset selection in large dimensionality domains. Pattern Recognition 43(1), 5–13 (2010)

    Article  MATH  Google Scholar 

  8. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    MATH  Google Scholar 

  9. Hall, M.A.: Correlation-based feature selection for discrete and numeric class machine learning. In: Proceedings of the Seventeenth International Conference on Machine Learning, pp. 359–366. Morgan Kaufmann Publishers Inc. (2000)

    Google Scholar 

  10. Han, J.: CPAR: Classification based on predictive association rules. In: Proceedings of the Third SIAM International Conference on Data Mining, vol. 3, pp. 331–335. Society for Industrial & Applied (2003)

    Google Scholar 

  11. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery 8(1), 53–87 (2004)

    Article  MathSciNet  Google Scholar 

  12. Jakulin, A., Bratko, I.: Testing the significance of attribute interactions. In: Proceedings of the 21st International Conference on Machine learning, pp. 409–416. ACM (2004)

    Google Scholar 

  13. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant features and the subset selection problem. In: Proceedings of the 11th International Conference on Machine Learning, vol. 129, pp. 121–129. Citeseer (1994)

    Google Scholar 

  14. John, G.H., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the 11th Conference on Uncertainty in Artificial Intelligence, vol. 1, pp. 338–345. Citeseer (1995)

    Google Scholar 

  15. Klemettinen, M., Mannila, H., Ronkainen, P., Toivonen, H., Verkamo, A.I.: Finding interesting rules from large sets of discovered association rules. In: Proceedings of the 3rd International Conference on Information and Knowledge Management, pp. 401–407. ACM (1994)

    Google Scholar 

  16. Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1-2), 273–324 (1997)

    Article  MATH  Google Scholar 

  17. Koller, D., Sahami, M.: Toward optimal feature selection. In: Proceedings of International Conference on Machine Learning, pp. 284–292. Citeseer (1996)

    Google Scholar 

  18. Kononenko, I.: Estimating Attributes: Analysis and Extensions of RELIEF. In: Bergadano, F., De Raedt, L. (eds.) ECML 1994. LNCS, vol. 784, pp. 171–182. Springer, Heidelberg (1994)

    Chapter  Google Scholar 

  19. Li, W., Han, J., Pei, J.: CMAR: Accurate and efficient classification based on multiple class-association rules. In: Proceedings of IEEE International Conference on Data Mining, pp. 369–376. IEEE Computer Society (2001)

    Google Scholar 

  20. Liu, H., Setiono, R.: A probabilistic approach to feature selection-a filter solution. In: Proceedings of the 13rd International Conference of Machine learning. Morgan Kaufmann Pub. (1996)

    Google Scholar 

  21. Park, H., Kwon, H.C.: Extended relief algorithms in instance-based feature filtering. In: Proceedings of the 6th International Conference on Advanced Language Processing and Web Information Technology (ALPIT 2007), pp. 123–128. IEEE Computer Society (2007)

    Google Scholar 

  22. Quinlan, J.R.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  23. Scanlon, P., Potamianos, G., Libal, V., Chu, S.M.: Mutual information based visual feature selection for lipreading. In: Processings of the 8th International Conference on Spoken Language, pp. 857–860. Citeseer (2004)

    Google Scholar 

  24. Tan, P.N., Steinbach, M., Kumar, V.: Introduction to data mining. Pearson Addison Wesley, Boston (2006)

    Google Scholar 

  25. Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann Pub. (2005)

    Google Scholar 

  26. Xie, J., Wu, J., Qian, Q.: Feature selection algorithm based on association rules mining method. In: Proceedings of 8th IEEE/ACIS International Conference on Computer and Information Science, pp. 357–362. IEEE (2009)

    Google Scholar 

  27. Yu, L., Liu, H.: Feature selection for high-dimensional data: A fast correlation-based filter solution. In: Proceedings of 20th International Conference on Machine Leaning, vol. 20, pp. 856–863 (2003)

    Google Scholar 

  28. Yu, L., Liu, H.: Efficient feature selection via analysis of relevance and redundancy. Journal of Machine Learning Research 5, 1205–1224 (2004)

    MathSciNet  MATH  Google Scholar 

  29. Zhao, Z., Liu, H.: Searching for interacting features in subset selection. Intelligent Data Analysis 13(2), 207–228 (2009)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2012 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Wang, G., Song, Q. (2012). Selecting Feature Subset via Constraint Association Rules. In: Tan, PN., Chawla, S., Ho, C.K., Bailey, J. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2012. Lecture Notes in Computer Science(), vol 7302. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30220-6_26

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-30220-6_26

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-30219-0

  • Online ISBN: 978-3-642-30220-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics