Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/956750.956840acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Efficiently handling feature redundancy in high-dimensional data

Published: 24 August 2003 Publication History

Abstract

High-dimensional data poses a severe challenge for data mining. Feature selection is a frequently used technique in pre-processing high-dimensional data for successful data mining. Traditionally, feature selection is focused on removing irrelevant features. However, for high-dimensional data, removing redundant features is equally critical. In this paper, we provide a study of feature redundancy in high-dimensional data and propose a novel correlation-based approach to feature selection within the filter model. The extensive empirical study using real-world data shows that the proposed approach is efficient and effective in removing redundant and irrelevant features.

References

[1]
S. D. Bay. The UCI KDD Archive, 1999. http://kdd.ics.uci.edu.]]
[2]
C. Blake and C. Merz. UCI repository of machine learning databases, 1998. http://www.ics.uci.edu/~mlearn/MLRepository.html.]]
[3]
A. Blum and P. Langley. Selection of relevant features and examples in machine learning. Artificial Intelligence, 97:245--271, 1997.]]
[4]
S. Das. Filters, wrappers and a boosting-based hybird for feature selection. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 74--81, 2001.]]
[5]
M. Dash and H. Liu. Feature selection for classifications. Intelligent Data Analysis: An International Journal, 1(3):131--156, 1997.]]
[6]
M. Dash, H. Liu, and H. Motoda. Consistency based feature selection. In Proceedings of the Fourth Pacific Asia Conference on Knowledge Discovery and Data Mining, pages 98--109. Springer-Verlag, 2000.]]
[7]
M. Hall. Correlation Based Feature Selection for Machine Learning. PhD thesis, University of Waikato, Dept. of Computer Science, 1999.]]
[8]
K. Kira and L. Rendell. The feature selection problem: Traditional methods and a new algorithm. In Proceedings of the Tenth National Conference on Artificial Intelligence, pages 129--134. Menlo Park: AAAI Press/The MIT Press, 1992.]]
[9]
R. Kohavi and G. John. Wrappers for feature subset selection, Artificial Intelligence, 97(1--2):273--324, 1997.]]
[10]
I. Kononenko. Estimating attributes : Analysis and extension of RELIEF. In F. Bergadano and L. De Raedt, editors, Proceedings of the European Conference on Machine Learning, pages 171--182, Catania, Italy, 1994. Berlin: Springer-Verlag.]]
[11]
H. Liu, F. Hussain, C. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, 6(4):393--423, 2002.]]
[12]
H. Liu and H. Motoda. Feature Selection for Knowledge Discovery and Data Mining. Boston: Kluwer Academic Publishers, 1998.]]
[13]
H. Liu, H. Motoda, and L. Yu. Feature selection with selective sampling. In Proceedings of the Nineteenth International Conference on Machine Learning, pages 395--402, 2002.]]
[14]
H. Liu and R. Setiono. A probabilistic approach to feature selection - a filter solution. In L. Saitta, editor, Proceedings of International Conference on Machine Learning (ICML-96), July 3--6, 1996, pages 319--327, Bari, Italy, 1996. San Francisco: Morgan Kaufmann Publishers, CA.]]
[15]
H. Liu, L. Yu, M. Dash, and H. Motoda. Active feature selection using classes. In Proceedings of the Seventh Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-03), pages 474--485, 2003.]]
[16]
A. Y. Ng. On feature selection: learning with exponentially many irrelevant features as training examples. In Proceedings of the Fifteenth International Conference on Machine Learning, pages 404--412, 1998.]]
[17]
K. Ng and H. Liu. Customer retention via data mining. AI Review, 14(6):569--590, 2000.]]
[18]
W. H. Press, B. P. Flannery, S. A. Teukolsky, and W. T. Vetterling. Numerical Recipes in C. Cambridge University Press, Cambridge, 1988.]]
[19]
J. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.]]
[20]
I. Witten and E. Frank. Data Mining - Practical Machine Learning Tools and Techniques with JAVA Implementations. Morgan Kaufmann Publishers, 2000.]]
[21]
E. Xing, M. Jordan, and R. Karp. Feature selection for high-dimensional genomic mircoarray data. In Proceedings of the Eighteenth International Conference on Machine Learning, pages 601--608, 2001.]]
[22]
Y. Yang and J. O. Pederson. A comparative study on feature selection in text categorization. In Proceedings of the Fourteenth International Conference on Machine Learning, pages 412--420, 1997.]]

Cited By

View all
  • (2023)Multimodal Fusion Interactions: A Study of Human and Automatic QuantificationProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614151(425-435)Online publication date: 9-Oct-2023
  • (2023)Intrusion Detection System based on GRU2023 7th International Conference on Electrical, Mechanical and Computer Engineering (ICEMCE)10.1109/ICEMCE60359.2023.10490953(889-893)Online publication date: 20-Oct-2023
  • (2023)Feature ranking chi-square method to improve the epileptic seizure prediction by employing machine learning algorithmsWaves in Random and Complex Media10.1080/17455030.2023.2226246(1-27)Online publication date: 6-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
August 2003
736 pages
ISBN:1581137370
DOI:10.1145/956750
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2003

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. feature selection
  2. high-dimensional data
  3. redundancy

Qualifiers

  • Article

Conference

KDD03
Sponsor:

Acceptance Rates

KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)12
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Multimodal Fusion Interactions: A Study of Human and Automatic QuantificationProceedings of the 25th International Conference on Multimodal Interaction10.1145/3577190.3614151(425-435)Online publication date: 9-Oct-2023
  • (2023)Intrusion Detection System based on GRU2023 7th International Conference on Electrical, Mechanical and Computer Engineering (ICEMCE)10.1109/ICEMCE60359.2023.10490953(889-893)Online publication date: 20-Oct-2023
  • (2023)Feature ranking chi-square method to improve the epileptic seizure prediction by employing machine learning algorithmsWaves in Random and Complex Media10.1080/17455030.2023.2226246(1-27)Online publication date: 6-Jul-2023
  • (2023)IoT data analytics in dynamic environmentsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105366116:COnline publication date: 20-Jan-2023
  • (2023)TL-CNN-IDS: transfer learning-based intrusion detection system using convolutional neural networkThe Journal of Supercomputing10.1007/s11227-023-05347-479:15(17562-17584)Online publication date: 8-May-2023
  • (2022)Machine Learning Based Multimodal Neuroimaging Genomics Dementia Score for Predicting Future Conversion to Alzheimer’s DiseaseJournal of Alzheimer's Disease10.3233/JAD-22002187:3(1345-1365)Online publication date: 31-May-2022
  • (2022)NeuChainProceedings of the VLDB Endowment10.14778/3551793.355181615:11(2585-2598)Online publication date: 29-Sep-2022
  • (2022)DQDFProceedings of the VLDB Endowment10.14778/3503585.350360215:4(949-957)Online publication date: 14-Apr-2022
  • (2022)MTH-IDS: A Multitiered Hybrid Intrusion Detection System for Internet of VehiclesIEEE Internet of Things Journal10.1109/JIOT.2021.30847969:1(616-632)Online publication date: 1-Jan-2022
  • (2022)COVID-19 lung infection detection using deep learning with transfer learning and ResNet101 features extraction and selectionWaves in Random and Complex Media10.1080/17455030.2022.2091807(1-24)Online publication date: 28-Jun-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media