Non-derivable itemset mining

Calders, Toon; Goethals, Bart

doi:10.1007/s10618-006-0054-6

Non-derivable itemset mining

Open access
Published: 26 January 2007

Volume 14, pages 171–206, (2007)
Cite this article

Download PDF

You have full access to this open access article

Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Non-derivable itemset mining

Download PDF

Toon Calders^1,2 &
Bart Goethals²

1980 Accesses
89 Citations
Explore all metrics

Abstract

All frequent itemset mining algorithms rely heavily on the monotonicity principle for pruning. This principle allows for excluding candidate itemsets from the expensive counting phase. In this paper, we present sound and complete deduction rules to derive bounds on the support of an itemset. Based on these deduction rules, we construct a condensed representation of all frequent itemsets, by removing those itemsets for which the support can be derived, resulting in the so called Non-Derivable Itemsets (NDI) representation. We also present connections between our proposal and recent other proposals for condensed representations of frequent itemsets. Experiments on real-life datasets show the effectiveness of the NDI representation, making the search for frequent non-derivable itemsets a useful and tractable alternative to mining all frequent itemsets.

Article PDF

On Maximal Frequent Itemsets Enumeration

Structure of frequent itemsets with extended double constraints

Article Open access 29 January 2016

On Maximal Frequent Itemsets Mining with Constraints

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

References

Agrawal R, Imilienski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Washington, DC, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Proc. VLDB Int. Conf. Very Large Data Bases, Santiago, Chile, pp 487–499
Bastide Y, Taouil R, Pasquier N, Stumme G, Lakhal L (2000) Mining frequent patterns with counting inference. SIGKDD Explor 2(2):66–75
Article Google Scholar
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Seattle, Washington, pp 85–93
Bonferroni C (1936) Teoria statistica della classi e calcolo della probabilitá. Publicazioni del R. Instituto Superiore di Scienze Economiche e Commerciali di Firenze 8:1–62
Google Scholar
Boulicaut J, Bykowski A, Rigotti C (2003) Free-sets: a condensed representation of boolean data for the approximation of frequency queries. Data Mining Knowledge Discovery 4:5–22
Article MathSciNet Google Scholar
Boulicaut J-F, Bykowski A (2000) Frequent closures as a concise representation for binary data mining. In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 62–73
Boulicaut, J.-F., A. Bykowski, and C. Rigotti (2000). Approximation of frequency queries by means of free-sets. In Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp. 75–85.
Bykowski A, Rigotti C (2001) A condensed representation to find frequent patterns. In: Proc. PODS Int. Conf. Principles of Database Systems, pp 267–273
Bykowski A, Rigotti C (2003) DBC: a condensed representation of frequent patterns for efficient mining. J Inform Syst 28(8):949–977
Article Google Scholar
Calders T (2003a) Axiomatization and deduction rules for the frequency of itemsets. Ph. D. thesis, University of Antwerp, Belgium
Calders T (2003b) Deducing bounds on the support of itemsets. In: Database technologies for data mining, vol 2682 of LNCS, pp 214–233, Springer
Calders T, Goethals B (2002) Mining all non-derivable frequent itemsets. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp 74–85. Springer
Calders T, Goethals B (2003) Minimal k-free representations of frequent sets. In: Lavrac N, Gamberger D, Blockeel H, Todorovski L (eds) Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, vol 2838 of Lecture Notes in Computer Science, pp 71–82. Springer-Verlag.
Calders T, Goethals B (2005a) Depth-first non-derivable itemset mining. In: Proc. SIAM Int. Conf. on Data Mining
Calders T, Goethals B (2005b) Quick inclusion–exclusion. In: Proceedings ECML-PKDD 2005 Workshop Knowledge Discovery in Inductive Databases, vol 3933 of LNCS, pp 86–103. Springer
Dexters N, Calders T (2004) Theoretical bounds on the size of condensed representations. In: Proceedings ECML-PKDD 2004 Workshop Knowledge Discovery in Inductive Databases, pp 25–36
Dobra A (2002) Statistical tools for disclosure limitation in multi-way contingency tables. Ph. D. thesis, Department of Statistics, Carnegie Mellon University
Dobra A, Fienberg S (2000) Bounds for cell entries in contingency tables given marginal totals and decomposable graphs. Proc Nat Acad Sci 97(22):11885–11892
Article MATH MathSciNet Google Scholar
Dobra A, Fienberg SE (2001) Bounds for cell entries in contingency tables induced by fixed marginal totals. UNECE Stat J 18:363–371
Google Scholar
Fienberg SE (1998) Fréchet and bonferroni bounds for multi-way tables of counts with applications to disclosure limitation. In: Statistical data protection (SDP-98), pp 115–129. Eurostat
Fréchet M (1951) Sur les tableaux de correlation dont les marges sont donnés. Ann Univ Lyon Sect A, Series 3 14:53–77
Google Scholar
Galambos J, Simonelli I (1996) Bonferroni-type inequalities with applications. Springer
Goethals B, Muhonen J, Toivonen H (2005) Nonderivable association rules. In: Proc. SIAM Int. Conf. on Data Mining
Goethals B, Zaki M (2004) Advances in frequent itemset mining implementations: report on fimi’03. SIGKDD Explor Newslett 6(1):109–117
Article Google Scholar
Groth D, Robertson E (2001) Discovering frequent itemsets in the presence of highly frequent items. In: In Proceedings Workshop on Rule Based Data Mining, in Conjunction with the 14th International Conference On Applications of Prolog
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Proc. ACM SIGMOD Int. Conf. Management of Data, Dallas, TX, pp 1–12
Jaroszewicz S, Simivici DA (2002) Support approximations using bonferroni-type inequalities. In: Proc. PKDD Int. Conf. Principles of Data Mining and Knowledge Discovery, pp 212–224
Jaroszewicz S, Simivici DA, Rosenberg I (2002) An inclusion-exclusion result for boolean polynomials and its applications in data mining. In: Proc. of the Discrete Mathematics in Data Mining Workshop, SIAM Datamining Conference
Jordan C, (1927) The foundations of the theory of probability. Mat Phys Lapok 34:109–136
Google Scholar
Kahn J, Linial N, Samorodnitsky A (1996) Inclusion-exclusion: Exact and approximate. Combinatorica 16:465–477
Article MATH MathSciNet Google Scholar
Kryszkiewicz M (2001) Concise representation of frequent patterns based on disjunction-free generators. In: Proc. IEEE Int. Conf. on Data Mining, pp 305–312
Kryszkiewicz M, Gajek M (2002a) Concise representation of frequent patterns based on generalized disjunction-free generators In: Proc. PaKDD Pacific-Asia Conf. on Knowledge Discovery and Data Mining, pp 159–171
Kryszkiewicz M, Gajek M (2002b) Why to apply generalized disjunction-free generators representation of frequent patterns? In: Proc. International Syposium on Methodologies for Intelligent Systems, pp 382–392
Mannila H, Toivonen H (1996) Multiple uses of frequent sets and condensed representations. In: Proc. KDD Int. Conf. Knowledge Discovery in Databases
Melkman AA, Shimony SE (1997) A note on approximate inclusion-exclusion. Discrete Appl Math 73:23–26
Article MATH MathSciNet Google Scholar
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Discovering frequent closed itemsets for association rules. In: Proc. ICDT Int. Conf. Database Theory, pp 398–416
Pei J, Dong G, Zou W, Han J (2004) Mining condensed frequent-pattern bases. Knowl Inf Syst 6(5):570–594
Article Google Scholar
Pei J, Han J, Mao R (2000) Closet: an efficient algorithm for mining frequent closed itemsets. In: ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, Dallas, TX
Zaki M, (2000, May/June). Scalable algorithms for association mining. IEEE Trans Knowledge Data Eng 12(3):372–390
Google Scholar
Zaki M, Hsiao C (1999) ChARM: an efficient algorithm for closed association rule mining. In: Technical Report 99-10, Computer Science, Rensselaer Polytechnic Institute
Zaki M, Parthasarathy S, Ogihara M, Li W (1997) New algorithms for fast discovery of association rules. In: Heckerman D, Mannila H, Pregibon D (eds), Proceedings of the Third International Conference on Knowledge Discovery and Data Mining, pp 283–286. AAAI Press

Download references

Author information

Authors and Affiliations

Eindhoven Technical University, Eindhoven, The Netherlands
Toon Calders
University of Antwerp, Antwerp, Belgium
Toon Calders & Bart Goethals

Authors

Toon Calders
View author publications
You can also search for this author in PubMed Google Scholar
Bart Goethals
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Toon Calders.

Additional information

Communicated by Geoffrey Webb.

Rights and permissions

Open Access This is an open access article distributed under the terms of the Creative Commons Attribution Noncommercial License ( https://creativecommons.org/licenses/by-nc/2.0 ), which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.

Reprints and permissions

About this article

Cite this article

Calders, T., Goethals, B. Non-derivable itemset mining. Data Min Knowl Disc 14, 171–206 (2007). https://doi.org/10.1007/s10618-006-0054-6

Download citation

Received: 06 December 2005
Accepted: 23 June 2006
Published: 26 January 2007
Issue Date: February 2007
DOI: https://doi.org/10.1007/s10618-006-0054-6

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Non-derivable itemset mining

Abstract

Article PDF

Similar content being viewed by others

On Maximal Frequent Itemsets Enumeration

Structure of frequent itemsets with extended double constraints

On Maximal Frequent Itemsets Mining with Constraints

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Non-derivable itemset mining

Abstract

Article PDF

Similar content being viewed by others

On Maximal Frequent Itemsets Enumeration

Structure of frequent itemsets with extended double constraints

On Maximal Frequent Itemsets Mining with Constraints

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation