Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1150402.1150438acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams

Published: 20 August 2006 Publication History

Abstract

Patterns of contrast are a very important way of comparing multi-dimensional datasets. Such patterns are able to capture regions of high difference between two classes of data, and are useful for human experts and the construction of classifiers. However, mining such patterns is particularly challenging when the number of dimensions is large. This paper describes a new technique for mining several varieties of contrast pattern, based on the use of Zero-Suppressed Binary Decision Diagrams (ZBDDs), a powerful data structure for manipulating sparse data. We study the mining of both simple contrast patterns, such as emerging patterns, and more novel and complex contrasts, which we call disjunctive emerging patterns. A performance study demonstrates our ZBDD technique is highly scalable, substantially improves on state of the art mining for emerging patterns and can be effective for discovering complex contrasts from datasets with thousands of attributes.

References

[1]
F. A. Aloul, I. L. Markov, and K. A. Sakallah. MINCE: A static global variable ordering for SAT and BDD. In Int'l Workshop on Logic Synthesis, 2001.]]
[2]
F. A. Aloul, M. N. Mneimneh, and K. Sakallah. ZBDD-based backtrack search SAT solver. In Int'l Workshop on Logic Synthesis, 2002.]]
[3]
J. Bailey, T. Manoukian, and K. Ramamohanarao. Fast algorithms for mining emerging patterns. In Proc. of PKDD 2002, pages 39--50.]]
[4]
J. Bailey, T. Manoukian, and K. Ramamohanarao. A fast algorithm for computing hypergraph transversals and its application in mining emerging patterns. In Proc. of ICDM, pages 485--488, 2003.]]
[5]
S. D. Bay and M. J. Pazzani. Detecting group differences: Mining contrast sets. Data Mining and Knowledge Discovery., 5(3):213--246, 2001.]]
[6]
R. E. Bryant. Graph-based algorithms for boolean function manipulation. IEEE Transactions on Computers, 35(8):677--691, 1986.]]
[7]
P. Chatalic and L. Simon. Multi-resolution on compressed sets of clauses. In Proc. of ICTAI, pages 2--10, 2000.]]
[8]
G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and differences. In Proc. of ACM KDD, pages 43--52, 1999.]]
[9]
G. Dong and J. Li. Mining border descriptions of emerging patterns from dataset pairs. Knowledge and Information Systems, 8(2):178--202, 2005.]]
[10]
G. Dong and X. Zhang and L. Wong and J. Li. CAEP: Classification by Aggregating Emerging Patterns. In Proc. of the 2nd Int'l Conf. on Discovery Science, pages 30--42, 1999.]]
[11]
J. Edmonds, J. Gryz, D. Liang, and R. J. Miller. Mining for empty spaces in large data sets. Theor. Comput. Sci., 296(3):435--452, 2003.]]
[12]
H. Fan and K. Ramamohanarao. Fast discovery and the generalization of strong jumping emerging patterns for buildihng compact and accurate classifiers. IEEE Transactions on Data Engineering, To appear.]]
[13]
H. Fujii, G. Ootomo, and C. Hori. Interleaving based variable ordering methods for ordered binary decision diagrams. In Proc. of IEEE/ACM ICCAD '93, pages 38--41, 1993.]]
[14]
J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proc. of the Int'l Conf. on Management of Data, pages 1--12, 2000.]]
[15]
H. Hirsh. Generalizing version spaces. Machine Learning, 17(1):5--45, 1994.]]
[16]
H. Li, J. Li, L. Wong, M. Feng, and Y. P. Tan. Relative risk and odds ratio: A data mining perspective. In PODS, 2005.]]
[17]
J. Li, G. Dong, and K. Ramamohanarao. Making use of the most expressive jumping emerging patterns for classification. In Proc. of PAKDD 2000, pages 220--232.]]
[18]
J. Li, H. Liu, J. R. Downing, A. Yeoh, and L. Wong. Simple rules underlying gene expression profiles of more than six subtypes of Acute Lymphoblastic Leukaemia (ALL) patients. Bioinformatics, 19:71--78, 2003.]]
[19]
J. Li and L. Wong. Emerging patterns and gene expression data. In Proc. of the 12th Workshop on Genome Informatics, pages 3--13, 2001.]]
[20]
J. Li and L. Wong. Identifying good diagnostic gene groups from gene expression profiles using the concept of emerging patterns. Bioinformatics, 18(10):1406--1407, 2002.]]
[21]
B. Liu, L. P. Ku, and W. Hsu. Discovering interesting holes in data. In Proc. of IJCAI, pages 930--935, 1997.]]
[22]
H. Liu, J. Han, D. Xin, and Z. Shao. Top-down mining of interesting patterns from very high dimensional data. In To appear in Proc. of ICDE'06.]]
[23]
S. Minato. Zero-suppressed BDDs for set manipulation in combinatorial problems. In Proc. of the 30th Int'l Conf. on Design Automation, pages 272--277, 1993.]]
[24]
S. Minato. Zero-suppressed BDDs and their applications. Int'l Journal on Software Tools for Technology Transfer (STTT), 3(2):156--170, 2001.]]
[25]
S. Minato and H. Arimura. Combinatorial itemset analysis based on Zero-suppressed BDDs. In IEEE/IEICE/IPSJ Int'l Workshop on Challenges in Web Information Retrieval and Integration (WIRI), pages 3--10, 2005.]]
[26]
A. Mishchenko. An introduction to Zero-suppressed Binary Decision Diagrams.]]
[27]
T. M. Mitchell. Generalization as Search. AI, 18(2):203--226, 1982.]]
[28]
F. Pan, G. Cong, A. K. H. Tung, J. Yang, and M. Zaki. Carpenter: Finding closed patterns in long biological datasets. In Proc. of KDD'03, 2003.]]
[29]
A. Rauzy. Mathematical foundations of minimal cutsets. IEEE Transactions on Reliability, 50(4), 2001.]]
[30]
F. Rioult, J. Boulicaut, D. Crémilleux, and J. Besson. Using transposition for pattern discovery from microarray data. In DMKD, pages 73--79, 2003.]]
[31]
R. Rudell. Dynamic variable ordering for ordered binary decision diagrams. In Proc. of the Int'l Conf. on CAD, pages 42--47, 1993.]]
[32]
C. Scholl, B. Becker, and A. Brogle. The multiple variable order problem for binary decision diagrams: theory and practical application. In Proc. of the 2001 Conf. on Asia South Pacific Design Automation, pages 85--90, 2001.]]
[33]
M. Sebag. Delaying the choice of bias: A disjunctive version space approach. In Proc. of ICML 1996, pages 444--452.]]
[34]
F. Somenzi. CUDD: CU decision diagram package, 1997. Public software, Colorado University, Boulder.]]
[35]
A. Soulet, B. Cramilleux, and F. Rioult. Condensed representation of emerging patterns. In Proc. of PAKDD 04, pages 127--132, 2004.]]
[36]
R. Srikant and R. Agrawal. Mining quantitative association rules in large relational tables. In SIGMOD96, pages 1--12.]]
[37]
G. I. Webb, S. Butler, and D. Newlands. On detecting differences between groups. In Proc. of KDD03, pages 256--265, 2003.]]

Cited By

View all
  • (2024)Enumerating Minimum Feedback Vertex Sets in directed graphs with union-cat treesRAIRO - Theoretical Informatics and Applications10.1051/ita/202401558(18)Online publication date: 19-Dec-2024
  • (2023)Rule Mining for Correcting Classification Models2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00170(1331-1336)Online publication date: 1-Dec-2023
  • (2023)A distributed evolutionary fuzzy system-based method for the fusion of descriptive emerging patterns in data streamsInformation Fusion10.1016/j.inffus.2022.10.02891(412-423)Online publication date: Mar-2023
  • Show More Cited By

Index Terms

  1. Fast mining of high dimensional expressive contrast patterns using zero-suppressed binary decision diagrams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '06: Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2006
    986 pages
    ISBN:1595933395
    DOI:10.1145/1150402
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 August 2006

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. contrast patterns
    2. disjunctive emerging patterns
    3. zero-suppressed binary decision diagrams

    Qualifiers

    • Article

    Conference

    KDD06

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Upcoming Conference

    KDD '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 13 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Enumerating Minimum Feedback Vertex Sets in directed graphs with union-cat treesRAIRO - Theoretical Informatics and Applications10.1051/ita/202401558(18)Online publication date: 19-Dec-2024
    • (2023)Rule Mining for Correcting Classification Models2023 IEEE International Conference on Data Mining (ICDM)10.1109/ICDM58522.2023.00170(1331-1336)Online publication date: 1-Dec-2023
    • (2023)A distributed evolutionary fuzzy system-based method for the fusion of descriptive emerging patterns in data streamsInformation Fusion10.1016/j.inffus.2022.10.02891(412-423)Online publication date: Mar-2023
    • (2022)Effective Mining of Contrast Hybrid Patterns from Nominal-numerical Mixed DataAdvanced Data Mining and Applications10.1007/978-3-031-22064-7_26(352-367)Online publication date: 24-Nov-2022
    • (2021)Storing Set Families More Compactly with Top ZDDsAlgorithms10.3390/a1406017214:6(172)Online publication date: 31-May-2021
    • (2021)Mining Rare Recurring Events in Network Traffic using Second Order Contrast Patterns2021 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN52387.2021.9533918(1-8)Online publication date: 18-Jul-2021
    • (2020)Identifying insufficient data coverage in databases with multiple relationsProceedings of the VLDB Endowment10.14778/3407790.340782113:12(2229-2242)Online publication date: 14-Sep-2020
    • (2019)Exploiting the Power of Group Differences: Using Patterns to Solve Data Analysis ProblemsSynthesis Lectures on Data Mining and Knowledge Discovery10.2200/S00897ED1V01Y201901DMK01611:1(1-146)Online publication date: 22-Feb-2019
    • (2019)Frequent Itemset MiningBusiness and Consumer Analytics: New Ideas10.1007/978-3-030-06222-4_6(269-304)Online publication date: 31-May-2019
    • (2018)Multi-core Decision DiagramsHandbook of Parallel Constraint Reasoning10.1007/978-3-319-63516-3_13(509-545)Online publication date: 6-Apr-2018
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media