Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Supportive Utility of Irrelevant Features in Data Preprocessing

  • Conference paper
Advances in Knowledge Discovery and Data Mining (PAKDD 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4426))

Included in the following conference series:

  • 1868 Accesses

Abstract

Many classification algorithms degrade their learning performance while irrelevant features are introduced. Feature selection is a process to choose an optimal subset of features and removes irrelevant ones. But many feature selection algorithms focus on filtering out the irrelevant attributes regarding the learned task only, not considering their hidden supportive information to other attributes: whether they are really irrelevant or potentially relevant? Since in medical domain, an irrelevant symptom is treated as the one providing neither explicit information nor supportive information for disease diagnosis. Therefore, the traditional feature selection methods may be unsuitable for handling such critical problem. In this paper, we propose a new method that selecting not only the relevant features, but also targeting at the latent useful irrelevant attributes by measuring their supportive importance to other attributes. The empirical results demonstrate a comparison of performance of various classification algorithms on twelve real-life datasets from UCI repository.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and Data Mining. Kluwer Academic Publishers, Dordrecht (2000)

    Book  Google Scholar 

  2. Molina, L.C., Belanche, L., Nebot, A.: Feature Selection Algorithms: A Survey and Experimental Evaluation. In: Proceedings of IEEE International Conference on Data Mining, ICDM, pp. 306–313. IEEE Computer Society Press, Los Alamitos (2002)

    Chapter  Google Scholar 

  3. Dash, M., Liu, H.: Feature Selection for Classification. Intelligent Data Analysis 1(3), 131–156 (1997)

    Article  Google Scholar 

  4. Miller, A.J.: Subset Selection in Regression. Chapman and Hall, Boca Raton (1990)

    MATH  Google Scholar 

  5. Devijver, P.A., Kittler, J.: Pattern Recognition: A Statistical Approach. Prentice-Hall International, Englewood Cliffs (1982)

    MATH  Google Scholar 

  6. John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: Proceedings of the Eleventh International Conference on Machine learning, pp. 121–129 (1994)

    Google Scholar 

  7. Molina, L.C., Belanche, L., Nebot, A.: Feature Selection Algorithms: A Survey and Experimental Evaluation. In: Proceedings of IEEE International Conference on Data Mining, ICDM, pp. 306–313. IEEE Computer Society Press, Los Alamitos (2002)

    Chapter  Google Scholar 

  8. Guyon, I., Elisseeff, A.: An Introduction to Variable and Feature Selection. Journal of Machine Learning Research 3, 1157–1182 (2003)

    Article  MATH  Google Scholar 

  9. Caruana, R., Sa, V.R.: Benefiting from the Variables that Variable Selection Discards. Journal of Machine Learning Research 3, 1245–1264 (2003)

    Article  MATH  Google Scholar 

  10. The Hypertensive Research Group of Hear Internal Medicine Department of People’s Hospital of Beijing Medical University: Hundred Questions and Answers in Modern Knowledge of Hypertension (1998)

    Google Scholar 

  11. Jia, L., Xu, Y.: Guan Xing Bing De Zhen Duan Yu Zhi Liao. Jun Shi Yi Xue Ke Xue Chu Ban She (2001)

    Google Scholar 

  12. Pazzani, M.J.: Searching for Dependencies in Bayesian Classifiers. In: Proceedings of the Fifth International Workshop on AI and Statistics, pp. 424–429. Springer, Heidelberg (1996)

    Google Scholar 

  13. Zhu, X.L.: Fundamentals of Applied Information Theory. Tsinghua University Press, Beijing (2000)

    Google Scholar 

  14. Fayyad, U.M., Irani, K.B.: Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In: Proceedings of the Thirteenth International Joint Conference on Artificial Intelligence, pp. 1022–1027 (1993)

    Google Scholar 

  15. Kira, K., Rendell, L.: A Practical Approach to Feature Selection. In: Proceedings of International Conference on Machine Learning, Aberdeen, pp. 249–256. Morgan Kaufmann, San Francisco (1992a)

    Google Scholar 

  16. Kira, K., Rendell, L.: The Feature Selection Problem: Traditional Methods and New Algorithm. In: Proceedings of AAAI’92, San Jose, CA, AAAI Press, Menlo Park (1992b)

    Google Scholar 

  17. Jain, A., Zongker, D.: Feature Selection: Evaluation, Application and Small Sample Performance. IEEE Transactions on Pattern Analysis and Machine Intelligence 19, 153–158 (1997)

    Article  Google Scholar 

  18. Blake, C.L., Merz, C.J.: UCI Repository of Machine Learning Databases. Department of Information and Computer Science, University of California (1998), http://www.ics.uci.edu/~mlearn/MLRepository.html

  19. Quinlan, J.R.: Induction of Decision Trees. Machine Learning 1, 81–106 (1986)

    Google Scholar 

  20. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo (1993)

    Google Scholar 

  21. Quinlan, J.R.: Improved Use of Continuous Attributes in C4.5. Journal of Artificial Intelligence Research 4, 77–90 (1996)

    MATH  Google Scholar 

  22. Langley, P., Iba, W., Thompsom, K.: An Analysis of Bayesian Classifiers. In: Proceedings of the tenth national conference on artificial intelligence, pp. 223–228. AAAI Press, Menlo Park (1992)

    Google Scholar 

  23. Holte, R.C.: Very Simple Classification Rules Perform Well on Most Commonly Used Datasets. Machine Learning 11 (1993)

    Google Scholar 

  24. Vafaie, H., Imam, I.F.: Feature Selection Methods: Genetic Algorithms vs. Greedy like Search. In: Proceedings of International Conference on Fuzzy and Intelligent Control System (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Zhi-Hua Zhou Hang Li Qiang Yang

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Chao, S., Li, Y., Dong, M. (2007). Supportive Utility of Irrelevant Features in Data Preprocessing. In: Zhou, ZH., Li, H., Yang, Q. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2007. Lecture Notes in Computer Science(), vol 4426. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71701-0_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71701-0_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71700-3

  • Online ISBN: 978-3-540-71701-0

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics