Abstract
All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to derive bounds on the support of an itemset. Based on these deduction rules, we construct a condensed representation of all frequent itemsets, by removing those itemsets for which the support can be derived, resulting in the so called Non-Derivable Itemsets (NDI) representation. We also present connections between our proposal and recent other proposals for condensed representations of frequent itemsets. Experiments on real-life datasets show the effectiveness of the NDI representation, making the search for frequent non-derivable itemsets a useful and tractable alternative to mining all frequent itemsets.
Article PDF
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
References
Agrawal R, Imilienski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Washington, DC, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. VLDB Int. Conf. Very Large Data Bases, Santiago, Chile, pp 487–499
Bastide Y, Taouil R, Pasquier N, Stumme G, Lakhal L (2000) Mining frequent patterns with counting inference. SIGKDD Explor 2(2):66–75
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington, pp 85–93
Bonferroni C (1936) Teoria statistica della classi e calcolo della probabilitá. Publicazioni del R. Instituto Superiore di Scienze Economiche e Commerciali di Firenze 8:1–62
Boulicaut J, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining Knowledge Discovery 4:5–22
Boulicaut J-F, Bykowski A (2000) Frequent closures as a concise representation for binary data mining. In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 62–73
Boulicaut, J.-F., A. Bykowski, and C. Rigotti (2000). Approximation of frequency queries by means of free-sets. In Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 75–85.
Bykowski A, Rigotti C (2001) A condensed representation to find frequent patterns. In: Proc. PODS Int. Conf. Principles of Database Systems, pp 267–273
Bykowski A, Rigotti C (2003) DBC: a condensed representation of frequent patterns for efficient mining. J Inform Syst 28(8):949–977
Calders T (2003a) Axiomatization and deduction rules for the frequency of itemsets. Ph. D. thesis, University of Antwerp, Belgium
Calders T (2003b) Deducing bounds on the support of itemsets. In: Database technologies for data mining, vol 2682 of LNCS, pp 214–233, Springer
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp 74–85. Springer
Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Lavrac N, Gamberger D, Blockeel H, Todorovski L (eds) Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, vol 2838 of Lecture Notes in Computer Science, pp 71–82. Springer-Verlag.
Calders T, Goethals B (2005a) Depth-first non-derivable itemset mining. In: Proc. SIAM Int. Conf. on Data Mining
Calders T, Goethals B (2005b) Quick inclusion–exclusion. In: Proceedings ECML-PKDD 2005 Workshop Knowledge Discovery in Inductive Databases, vol 3933 of LNCS, pp 86–103. Springer
Dexters N, Calders T (2004) Theoretical bounds on the size of condensed representations. In: Proceedings ECML-PKDD 2004 Workshop Knowledge Discovery in Inductive Databases, pp 25–36
Dobra A (2002) Statistical tools for disclosure limitation in multi-way contingency tables. Ph. D. thesis, Department of Statistics, Carnegie Mellon University
Dobra A, Fienberg S (2000) Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proc Nat Acad Sci 97(22):11885–11892
Dobra A, Fienberg SE (2001) Bounds for cell entries in contingency tables induced by fixed marginal totals. UNECE Stat J 18:363–371
Fienberg SE (1998) Fréchet and bonferroni bounds for multi-way tables of counts with applications to disclosure limitation. In: Statistical data protection (SDP-98), pp 115–129. Eurostat
Fréchet M (1951) Sur les tableaux de correlation dont les marges sont donnés. Ann Univ Lyon Sect A, Series 3 14:53–77
Galambos J, Simonelli I (1996) Bonferroni-type inequalities with applications. Springer
Goethals B, Muhonen J, Toivonen H (2005) Nonderivable association rules. In: Proc. SIAM Int. Conf. on Data Mining
Goethals B, Zaki M (2004) Advances in frequent itemset mining implementations: report on fimi’03. SIGKDD Explor Newslett 6(1):109–117
Groth D, Robertson E (2001) Discovering frequent itemsets in the presence of highly frequent items. In: In Proceedings Workshop on Rule Based Data Mining, in Conjunction with the 14th International Conference On Applications of Prolog
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Dallas, TX, pp 1–12
Jaroszewicz S, Simivici DA (2002) Support approximations using bonferroni-type inequalities. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp 212–224
Jaroszewicz S, Simivici DA, Rosenberg I (2002) An inclusion-exclusion result for boolean polynomials and its applications in data mining. In: Proc. of the Discrete Mathematics in Data Mining Workshop, SIAM Datamining Conference
Jordan C, (1927) The foundations of the theory of probability. Mat Phys Lapok 34:109–136
Kahn J, Linial N, Samorodnitsky A (1996) Inclusion-exclusion: Exact and approximate. Combinatorica 16:465–477
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proc. IEEE Int. Conf. on Data Mining, pp 305–312
Kryszkiewicz M, Gajek M (2002a) Concise representation of frequent patterns based on generalized disjunction-free generators In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 159–171
Kryszkiewicz M, Gajek M (2002b) Why to apply generalized disjunction-free generators representation of frequent patterns? In: Proc. International Syposium on Methodologies for Intelligent Systems, pp 382–392
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases
Melkman AA, Shimony SE (1997) A note on approximate inclusion-exclusion. Discrete Appl Math 73:23–26
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc. ICDT Int. Conf. Database Theory, pp 398–416
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6(5):570–594
Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX
Zaki M, (2000, May/June). Scalable algorithms for association mining. IEEE Trans Knowledge Data Eng 12(3):372–390
Zaki M, Hsiao C (1999) ChARM: an efficient algorithm for closed association rule mining. In: Technical Report 99-10, Computer Science, Rensselaer Polytechnic Institute
Zaki M, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp 283–286. AAAI Press
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Geoffrey Webb.
Rights and permissions
Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
About this article
Cite this article
Calders, T., Goethals, B. Non-derivable itemset mining. Data Min Knowl Disc 14, 171–206 (2007). https://doi.org/10.1007/s10618-006-0054-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10618-006-0054-6