Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2339530.2339544acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Mining emerging patterns by streaming feature selection

Published: 12 August 2012 Publication History

Abstract

Building an accurate emerging pattern classifier with a high-dimensional dataset is a challenging issue. The problem becomes even more difficult if the whole feature space is unavailable before learning starts. This paper presents a new technique on mining emerging patterns using streaming feature selection. We model high feature dimensions with streaming features, that is, features arrive and are processed one at a time. As features flow in one by one, we online evaluate each coming feature to determine whether it is useful for mining predictive emerging patterns (EPs) by exploiting the relationship between feature relevance and EP discriminability (the predictive ability of an EP). We employ this relationship to guide an online EP mining process. This new approach can mine EPs from a high-dimensional dataset, even when its entire feature set is unavailable before learning. The experiments on a broad range of datasets validate the effectiveness of the proposed approach against other well-established methods, in terms of predictive accuracy, pattern numbers and running time.

References

[1]
C. F. Aliferis, I. Tsamardinos, A. Statnikov & L.E. Brown. (2003) Causal Explorer: a causal probabilistic network learning toolkit for biomedical discovery. METMBS'03.
[2]
Roberto J. Bayardo. (1998) Efficiently mining long patterns from databases. SIGMOD'98, 85--93.
[3]
J. Bailey, T. Manoukian & K. Ramamohanarao. (2002) Fast algorithms for mining emerging patterns. PKDD'02, 39--50.
[4]
J. Bailey, T. Manoukian & K. Ramamohanarao. (2003) A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. ICDM'03, 485--488.
[5]
C. L. Blake & C. J. Merz. (1998) UCI Repository of Machine Learning Databases.
[6]
H. Fan & K. Ramamohanarao. (2002) An efficient single-scan algorithm for mining essential jumping emerging patterns for classification. PAKDD'02, 456--462.
[7]
H. Fan & K. Ramamohanarao. (2006) Fast discovery and the generalization of strong jumping emerging patterns for building compact and accurate classifiers. IEEE Transactions on Knowledge and Data Engineering, 18(6), 721--737.
[8]
G. Fang, G. Pandey, W. Wang, M. Gupta, M. Steinbach, & V. Kumar. (2012) Mining low-support discriminative patterns from dense and high-dimensional data. IEEE Transactions on Knowledge and Data Engineering, 24(2), 279--294.
[9]
G. Dong & J. Li. (1999) Efficient mining of emerging patterns: discovering trends and differences. KDD'99, 43--52.
[10]
G. Dong, X. Zhang, L. Wong, & J. Li. (1999) CAEP: Classification by Aggregating Emerging Patterns. DS'99, 30--42.
[11]
R. Kohavi & G. H. John. (1997) Wrappers for feature subset selection. Artificial Intelligence, 97, 273--324.
[12]
J. Li, G. Dong, & K. Ramamohanarao. (2000) Making use of the most expressive jumping emerging patterns for classification. PAKDD'00, 220--232.
[13]
J. Li, G. Dong & K. Ramamohanarao (2000). Instance-based classification by emerging patterns. PKDD'00, 191--200.
[14]
W. Li, J. Han, & J. Pei. (2001) CMAR: accurate and efficient classification based on multiple-class association rule. ICDM'01, 369--376.
[15]
B. Liu, W. Hsu, & Y. Ma. (1998) Integrating classification and association rule mining. KDD'98, 80--86.
[16]
D. Lo, H. Cheng, J. Han, S. Khoo, & C. Sun. (2009) Classification of software behaviors for failure detection: a discriminative pattern mining approach. KDD'09, 557--566.
[17]
E. Loekito & J. Bailey. (2006) Fast mining of high dimensional expressive contrast patterns using zero suppressed binary decision diagrams. KDD'06, 307--316.
[18]
S. Mao & G. Dong. (2005) Discovery of highly differentiative gene groups from microarray gene expression data using the gene club approach. J. Bioinformatics and Computational Biology, 3(6):1263--1280.
[19]
X. Wu, K.Yu, H. Wang & W. Ding. (2010) Online streaming feature selection. ICML'10, 1159--1166.
[20]
X. Yin & J. Han. (2003) CPAR: classification based on predictive association rule. SDM'03, 369--376.
[21]
K. Yu, X. Wu, W. Ding, H. Wang & H. Yao. (2011) Causal associative classification. ICDM'11, 914--923.
[22]
L. Yu & H. Liu. (2004) Efficient feature selection via analysis of relevance and redundancy. J. of Machine Learning Research, 5, 1205--1224.
[23]
X. Zhang, G. Dong & K. Ramamohanarao. (2000) Exploring constraints to efficiently mine emerging patterns from large high-dimensional datasets. KDD'00, 310--314.
[24]
J. Zhou, D. Foster, R. A. Stine & L. H. Ungar. (2006) Streamwise feature selection. J. of Machine Learning Research, 7, 1861--1885.

Cited By

View all
  • (2023)Survey on Imbalanced Dataset Classification—Machine LearningIntelligent Systems and Sustainable Computing10.1007/978-981-99-4717-1_19(207-216)Online publication date: 3-Oct-2023
  • (2022)Discriminant Analysis on a Stream of FeaturesEngineering Applications of Neural Networks10.1007/978-3-031-08223-8_19(223-234)Online publication date: 10-Jun-2022
  • (2020)A Review of Supervised Classification based on Contrast Patterns: Applications, Trends, and ChallengesJournal of Grid Computing10.1007/s10723-020-09526-yOnline publication date: 4-Oct-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
August 2012
1616 pages
ISBN:9781450314626
DOI:10.1145/2339530
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 August 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. emerging patterns
  2. feature relevance
  3. streaming features

Qualifiers

  • Research-article

Conference

KDD '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

Upcoming Conference

KDD '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)0
Reflects downloads up to 15 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Survey on Imbalanced Dataset Classification—Machine LearningIntelligent Systems and Sustainable Computing10.1007/978-981-99-4717-1_19(207-216)Online publication date: 3-Oct-2023
  • (2022)Discriminant Analysis on a Stream of FeaturesEngineering Applications of Neural Networks10.1007/978-3-031-08223-8_19(223-234)Online publication date: 10-Jun-2022
  • (2020)A Review of Supervised Classification based on Contrast Patterns: Applications, Trends, and ChallengesJournal of Grid Computing10.1007/s10723-020-09526-yOnline publication date: 4-Oct-2020
  • (2019)Cost-Sensitive Pattern-Based classification for Class Imbalance problemsIEEE Access10.1109/ACCESS.2019.29139827(60411-60427)Online publication date: 2019
  • (2017)EPACO: a novel ant colony optimization for emerging patterns based classificationCluster Computing10.1007/s10586-017-0894-421:1(453-467)Online publication date: 18-May-2017
  • (2016)Discriminative Sequential Pattern Mining for Software Failure DetectionProceedings of the 10th International Conference on Informatics and Systems10.1145/2908446.2908453(153-158)Online publication date: 9-May-2016
  • (2016)Study of the impact of resampling methods for contrast pattern based classifiers in imbalanced databasesNeurocomputing10.1016/j.neucom.2015.04.120175:PB(935-947)Online publication date: 29-Jan-2016
  • (2015)Classification with Streaming Features: An Emerging-Pattern Mining ApproachACM Transactions on Knowledge Discovery from Data10.1145/27004099:4(1-31)Online publication date: 1-Jun-2015
  • (2015)DFP-SEPSF: A dynamic frequent pattern tree to mine strong emerging patterns in streamwise featuresEngineering Applications of Artificial Intelligence10.1016/j.engappai.2014.08.01037(54-70)Online publication date: Jan-2015
  • (2014)A twitter recruitment intelligent system: association rule mining for smoking cessationSocial Network Analysis and Mining10.1007/s13278-014-0212-64:1Online publication date: 12-Aug-2014
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media