Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1557019.1557092acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Correlated itemset mining in ROC space: a constraint programming approach

Published: 28 June 2009 Publication History

Abstract

Correlated or discriminative pattern mining is concerned with finding the highest scoring patterns w.r.t. a correlation measure (such as information gain). By reinterpreting correlation measures in ROC space and formulating correlated itemset mining as a constraint programming problem, we obtain new theoretical insights with practical benefits. More specifically, we contribute 1) an improved bound for correlated itemset miners, 2) a novel iterative pruning algorithm to exploit the bound, and 3) an adaptation of this algorithm to mine all itemsets on the convex hull in ROC space. The algorithm does not depend on a minimal frequency threshold and is shown to outperform several alternative approaches by orders of magnitude, both in runtime and in memory requirements.

Supplementary Material

JPG File (p647-nijssen.jpg)
MP4 File (p647-nijssen.mp4)

References

[1]
S. D. Bay and M. J. Pazzani. Detecting change incategorical data: Mining contrast sets. In KDD, pages 302--306, 1999.
[2]
R. J. Bayardo Jr. and R. Agrawal. Mining the most interesting rules. In KDD, pages 145--154, 1999.
[3]
R. J. Bayardo Jr., R. Agrawal, and D. Gunopulos. Constraint-based rule mining in large, dense databases. In ICDE, pages 188--197, 1999.
[4]
F. Bonchi and C. Lucchese. Extending the state-of-the-art of constraint-based pattern discovery. Data Knowl. Eng., 60(2):377--399, 2007.
[5]
B. Bringmann and A. Zimmermann. Tree2 - decision trees for tree structured data. In PKDD, pages 46--58, 2005.
[6]
B. Bringmann, A. Zimmermann, L. De Raedt, andS. Nijssen. Don't be afraid of simpler patterns. InPKDD, pages 55--66, 2006.
[7]
C. Bucila, J. Gehrke, D. Kifer, and W. M. White. DualMiner: A dual-pruning algorithm for itemsets with constraints. Data Min. Knowl. Discov.,7(3):241--272, 2003.
[8]
H. Cheng, X. Yan, J. Han, and C.-W. Hsu. Discriminative frequent pattern analysis for effective classification. In ICDE, pages 716--725, 2007.
[9]
H. Cheng, X. Yan, J. Han, and P. S. Yu. Direct discriminative pattern mining for effective classification. In ICDE, pages 169--178, 2008.
[10]
L. De Raedt, T. Guns, and S. Nijssen. Constraint programming for itemset mining. In KDD, pages 204--212, 2008.
[11]
M. Deshpande, M. Kuramochi, N. Wale, and G. Karypis. Frequent substructure-based approaches for classifying chemical compounds. IEEE Trans. Knowl. Data Eng., 17(8):1036--1050, 2005.
[12]
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In KDD, pages 43--52, 1999.
[13]
W. Fan, K. Zhang, H. Cheng, J. Gao, X. Yan, J. Han,P. S. Yu, and O. Verscheure. Direct mining of discriminative and essential frequent patterns via model-based search tree. In KDD, pages 230--238,2008.
[14]
T. Fawcett. An introduction to ROC analysis. Pattern Recognition Letters, 27(8):861--874, 2006.
[15]
J. Furnkranz and P. A. Flach. ROC 'n' rule learning -- towards a better understanding of covering algorithms. Machine Learning, 58(1):39--77, 2005.
[16]
H. Grosskreutz, S. Ruping, and S. Wrobel. Tight optimistic estimates for fast subgroup discovery. In ECML/PKDD (1), pages 440--456, 2008.
[17]
M. Hirao, H. Hoshino, A. Shinohara, M. Takeda, and S. Arikawa. A practical algorithm to find the best subsequence patterns. Theor. Comput. Sci.,292(2):465--479, 2003.
[18]
B. Kavsek, N. Lavrac, and V. Jovanoski. APRIORI-SD: Adapting association rule learning to subgroup discovery. In IDA, pages 230--241, 2003.
[19]
S. Kramer, L. De Raedt, and C. Helma. Molecular feature mining in HIV data. In KDD, pages 136--143, 2001.
[20]
Y. Morimoto, T. Fukuda, H. Matsuzawa, T. Tokuyama, and K. Yoda. Algorithms for mining association rules for binary segmentations of huge categorical databases. In VLDB, pages 380--391, 1998.
[21]
S. Morishita and J. Sese. Traversing itemset lattice with statistical metric pruning. In PODS, pages 226--236, 2000.
[22]
S. Nijssen and J. N. Kok. Multi-class correlated pattern mining. In KDID, pages 165--187, 2005.
[23]
R. C. Prati and P. A. Flach. ROCCER: An algorithm for rule learning based on ROC analysis. In IJCAI, pages 823--828, 2005.
[24]
J. Sese and S. Morishita. Answering the most correlated n association rules efficiently. In PKDD, pages 410--422, 2002.
[25]
T. Uno, M. Kiyomi, and H. Arimura. Lcm ver. 2: Efficient mining algorithms forfrequent/closed/maximal itemsets. In FIMI, 2004.
[26]
S. Wrobel. An algorithm for multi-relational discovery of subgroups. In PKDD, pages 78--87, 1997.
[27]
X. Yan, H. Cheng, J. Han, and P. S. Yu. Mining significant graph patterns by leap search. In SIGMOD Conference, pages 433--444, 2008.
[28]
M. J. Zaki, S. Parthasarathy, M. Ogihara, and W. Li. New algorithms for fast discovery of association rules. In KDD, pages 283--286, 1997.
[29]
J. Zhu and G. Grahne. Reducing the main memory consumptions of FPmax* and FPclose. In FIMI, 2004.
[30]
A. Zimmermann and B. Bringmann. CTC - correlating tree patterns for classification. In ICDM, pages 833--836, 2005.

Cited By

View all
  • (2022)VEPRECO: Vertical databases with pre-pruning strategies and common candidate selection policies to fasten sequential pattern miningExpert Systems with Applications10.1016/j.eswa.2022.117517204(117517)Online publication date: Oct-2022
  • (2021)Decision Tree for SequencesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3075023(1-1)Online publication date: 2021
  • (2020)Inferring Implicit Rules by Learning Explicit and Hidden Item DependencyIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2017.276854750:3(935-946)Online publication date: Mar-2020
  • Show More Cited By

Index Terms

  1. Correlated itemset mining in ROC space: a constraint programming approach

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '09: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
      June 2009
      1426 pages
      ISBN:9781605584959
      DOI:10.1145/1557019
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 28 June 2009

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. ROC analysis
      2. constraint programming
      3. itemset mining

      Qualifiers

      • Research-article

      Conference

      KDD09

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 09 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2022)VEPRECO: Vertical databases with pre-pruning strategies and common candidate selection policies to fasten sequential pattern miningExpert Systems with Applications10.1016/j.eswa.2022.117517204(117517)Online publication date: Oct-2022
      • (2021)Decision Tree for SequencesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2021.3075023(1-1)Online publication date: 2021
      • (2020)Inferring Implicit Rules by Learning Explicit and Hidden Item DependencyIEEE Transactions on Systems, Man, and Cybernetics: Systems10.1109/TSMC.2017.276854750:3(935-946)Online publication date: Mar-2020
      • (2019)A tutorial on statistically sound pattern discoveryData Mining and Knowledge Discovery10.1007/s10618-018-0590-x33:2(325-377)Online publication date: 1-Mar-2019
      • (2018)MLIC: A MaxSAT-Based Framework for Learning Interpretable Classification RulesCardiovascular Computing—Methodologies and Clinical Applications10.1007/978-3-319-98334-9_21(312-327)Online publication date: 23-Aug-2018
      • (2017)Flexible constrained sampling with guarantees for pattern miningData Mining and Knowledge Discovery10.1007/s10618-017-0501-631:5(1266-1293)Online publication date: 1-Sep-2017
      • (2017)CoverSize: A Global Constraint for Frequency-Based Itemset MiningPrinciples and Practice of Constraint Programming10.1007/978-3-319-66158-2_34(529-546)Online publication date: 23-Aug-2017
      • (2016)Binary partition for itemsets expansion in mining high utility itemsetsIntelligent Data Analysis10.3233/IDA-16083820:4(915-931)Online publication date: 15-Jun-2016
      • (2015)Process Discovery under Precedence ConstraintsACM Transactions on Knowledge Discovery from Data10.1145/27100209:4(1-39)Online publication date: 1-Jun-2015
      • (2015)The Difference and the Norm — Characterising Similarities and Differences Between DatabasesMachine Learning and Knowledge Discovery in Databases10.1007/978-3-319-23525-7_13(206-223)Online publication date: 29-Aug-2015
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media