Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1066157.1066234acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
Article

Mining top-K covering rule groups for gene expression data

Published: 14 June 2005 Publication History

Abstract

In this paper, we propose a novel algorithm to discover the top-k covering rule groups for each row of gene expression profiles. Several experiments on real bioinformatics datasets show that the new top-k covering rule mining algorithm is orders of magnitude faster than previous association rule mining algorithms.Furthermore, we propose a new classification method RCBT. RCBT classifier is constructed from the top-k covering rule groups. The rule groups generated for building RCBT are bounded in number. This is in contrast to existing rule-based classification methods like CBA [19] which despite generating excessive number of redundant rules, is still unable to cover some training data with the discovered rules. Experiments show that the RCBT classifier can match or outperform other state-of-the-art classifiers on several benchmark gene expression datasets. In addition, the top-k covering rule groups themselves provide insights into the mechanisms responsible for diseases directly.

References

[1]
R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases (VLDB'94), pages 487--499, Sept. 1994.]]
[2]
T. R. Anderson and T. A. Slotkin. Maturation of the adrenal medulla--iv. effects of morphine. Biochem Pharmacol, August 1975.]]
[3]
P. Baldi and S. Brunak. Bioinformatics: The Machine Learning Approach. MIT Press, 1998.]]
[4]
R. J. Bayardo and R. Agrawal. Mining the most intersting rules. In Proc. of ACM SIGKDD, 1999,]]
[5]
K. S. Bose and R. H. Sarma. Delineation of the intimate details of the backbone conformation of pyridine nucleotide coenzymes in aqueous solution. Biochem Biophys Res Commun, October 1975.]]
[6]
G. Cong, A. K. H. Tung, X. Xu, F. Pan, and J. Yang, Farmer: Finding interesting rule groups in microarray datasets. In 23rd ACM International Conference on Management of Data, 2004.]]
[7]
C. Creighton and S. Hanash. Mining gene expression databases for association rules. Bioinformatics, 19, 2003.]]
[8]
S. Doddi, A. Marathe, S. Ravi, and D. Torney. Discovery of association rules in medical data. Med. Inform. Internet. Med., 26:25--33, 2001.]]
[9]
G. Dong, X. Zhang, L. Wong, and J. Li. Caep: Classification by aggregating emerging patterns. Discovery Science, 1999.]]
[10]
D. J. Glenn and R. A. Maurer. Mrg1 binds to the lim domain of lhx2 and may function as a coactivator to stimulate glycoprotein hormone α-subunit gene expression. J Biol Chem, 274, December 1999.]]
[11]
J. Han and J. Pei. Mining frequent patterns by pattern growth: methodology and implications. KDD Exploration, 2, 2000.]]
[12]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. 2000 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'00), 2000.]]
[13]
D. Jiang, J. Pei, M. Ramanathan, C. Tang, and A. Zhang. Mining coherent gene clusters from gene-sample-time microarray data. In KDD, pages 430--439, 2004.]]
[14]
D. Jiang, J. Pei, and A. Zhang. A general approach to mining quality pattern-based clusters from gene expression data. In DASFAA 2005. To Appear.]]
[15]
T. Joachims. Making large-scale svm learning practical. Advances in Kernel Methods - Support Vector Learning, 1999. http://svmlight.joachims.org/.]]
[16]
M. Kasai, J. Guerrero-Santoro, R. Friedman, E. S. Leman, R. H. Getzenberg, and D. B. DeFranco. The group 3 lim domain protein paxillin potentiates androgen receptor transactivation in prostate cancer cell lines. Cancer Research, 63:4927--4935, August 2003.]]
[17]
S. Kurimoto, N. Moriyama, K. Takata, S. A. Nozaw, Y. Aso, and H. Hirano. Detection of a glycosphingolipid antigen in bladder cancer cells with monoclonal antibody mrg-1. Histochem J., 1995.]]
[18]
J. Li and L. Wong. Identifying good diagnostic genes or genes groups from gene expression data by using the concept of emerging patterns. Bioinformatics, 18:725--734, 2002.]]
[19]
B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proc. 1998 Int. Conf. Knowledge Discovery and Data Mining (KDD'98), 1998.]]
[20]
B. Liu, W. Hsu, and Y. Ma. Pruning and summarizing the discovered associations. In ACM KDD, 1999.]]
[21]
M. Nagata, H. Fujita, H. Ida, H. Hoshina, T. Inoue, Y. Seki, M. Ohnishi, T. Ohyama, S. Shingaki, M. Kaji, T. Saku, and R. Takagi. Identification of potential biomarkers of lymph node metastasis in oral squamous cell carcinoma by cdna microarray analysis. International Journal of Cancer, 106:683--689, June 2003.]]
[22]
R. Ng, L. V. S. Lakshmanan, J. Han, and A. Pang. Exploratory mining and pruning optimizations of constrained associations rules. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data (SIGMOD'98), 1998.]]
[23]
F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. J. Zaki, Carpenter: Finding closed patterns in long biological datasets. In Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003.]]
[24]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. 7th Int. Conf. Database Theory (ICDT'99), Jan. 1999.]]
[25]
J. Pei, X. Zhang, M. Cho, H. Wang, and P. S. Yu. Maple: A fast algorithm for maximal pattern-based clustering. In ICDM, pages 259--266, 2003.]]
[26]
J. L. Pfaltz and C. M. Taylor. Closed set mining of biological data. Workshop on Data Mining in Bioinformatics, pages 43--48, 2002.]]
[27]
J. R. Quinlan. Bagging, boosting, and C4.5. In Proc. 1996 Nat. Conf. Artificial Intelligence (AAAI'96), volume 1, pages 725--730, Portland, OR, Aug. 1996.]]
[28]
R. Rastogi and K. Shim. Mining optimized association rules with categorical and numeric attributes. In Int. Conf. on Data Engineering, 1998.]]
[29]
F. Rioult, J.-F. Boulicaut, B. Crémilleux, and J. Besson. Using transposition for pattern discovery from microarray data. In DMKD, pages 73--79, 2003.]]
[30]
J. Wang, J. Han, and J. Pei. Closet+: Searching for the best strategies for mining frequent closed itemsets. In Proc. 2003 ACM SIGKDD Int. Conf. on Knowledge Discovery and Data Mining (KDD'03), 2003.]]
[31]
M. Zaki and C. Hsiao. Charm: An efficient algorithm for closed association rule mining. In Proc. of SDM 2002, 2002.]]

Cited By

View all
  • (2021)A Novel Pruning Strategy for Mining Discriminative PatternsIranian Journal of Science and Technology, Transactions of Electrical Engineering10.1007/s40998-020-00397-3Online publication date: 5-Jan-2021
  • (2020)Efficient Learning with Exponentially-Many Conjunctive Precursors for Interpretable Spatial Event ForecastingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291218732:10(1923-1935)Online publication date: 1-Oct-2020
  • (2018)DPPred: An Effective Prediction Framework with Concise Discriminative PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.275747630:7(1226-1239)Online publication date: 1-Jul-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '05: Proceedings of the 2005 ACM SIGMOD international conference on Management of data
June 2005
990 pages
ISBN:1595930604
DOI:10.1145/1066157
  • Conference Chair:
  • Fatma Ozcan
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2005

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Article

Conference

SIGMOD/PODS05
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)8
  • Downloads (Last 6 weeks)1
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2021)A Novel Pruning Strategy for Mining Discriminative PatternsIranian Journal of Science and Technology, Transactions of Electrical Engineering10.1007/s40998-020-00397-3Online publication date: 5-Jan-2021
  • (2020)Efficient Learning with Exponentially-Many Conjunctive Precursors for Interpretable Spatial Event ForecastingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2019.291218732:10(1923-1935)Online publication date: 1-Oct-2020
  • (2018)DPPred: An Effective Prediction Framework with Concise Discriminative PatternsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2017.275747630:7(1226-1239)Online publication date: 1-Jul-2018
  • (2018)A rule-based classification of short message service type2018 2nd International Conference on Inventive Systems and Control (ICISC)10.1109/ICISC.2018.8398982(1139-1142)Online publication date: Jan-2018
  • (2018)New automatic fuzzy relational clustering algorithms using multi-objective NSGA-IIInformation Sciences10.1016/j.ins.2018.03.025448-449(112-133)Online publication date: Jun-2018
  • (2018)Polygene-based evolutionary algorithms with frequent pattern miningFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-016-6104-312:5(950-965)Online publication date: 1-Oct-2018
  • (2018)Association rule mining algorithms on high-dimensional datasetsArtificial Life and Robotics10.1007/s10015-018-0437-y23:3(420-427)Online publication date: 1-Sep-2018
  • (2018)Frequent Itemset Mining in High Dimensional Data: A ReviewComputational Science and Technology10.1007/978-981-13-2622-6_32(325-334)Online publication date: 28-Aug-2018
  • (2018)Rule-Based ClassificationEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_559(3265-3268)Online publication date: 7-Dec-2018
  • (2018)Frequent Itemsets and Association RulesEncyclopedia of Database Systems10.1007/978-1-4614-8265-9_171(1536-1541)Online publication date: 7-Dec-2018
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media