Abstract
Since its introduction, frequent-pattern mining has been the subject of numerous studies, including incremental updating. Many existing incremental mining algorithms are Apriori-based, which are not easily adoptable to FP-tree-based frequent-pattern mining. In this paper, we propose a novel tree structure, called CanTree (canonical-order tree), that captures the content of the transaction database and orders tree nodes according to some canonical order. By exploiting its nice properties, the CanTree can be easily maintained when database transactions are inserted, deleted, and/or modified. For example, the CanTree does not require adjustment, merging, and/or splitting of tree nodes during maintenance. No rescan of the entire updated database or reconstruction of a new tree is needed for incremental updating. Experimental results show the effectiveness of our CanTree in the incremental mining of frequent patterns. Moreover, the applicability of CanTrees is not confined to incremental mining; CanTrees can also be applicable to other frequent-pattern mining tasks including constrained mining and interactive mining.
Similar content being viewed by others
References
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Buneman P, Jajodia S (eds) Proceedings of the SIGMOD 1993. ACM Press, New York, pp 207–216
Agrawal R, Srikant R (1994) Fast algorithms for mining association rules. In: Bocca JB, Jarke M, Zaniolo C (eds) Proceedings of the VLDB 1994. Morgan Kaufmann, San Francisco, CA, pp 487–499
Ayan NF, Tansel AU, Arkun E (1999) An efficient algorithm to update large itemsets with early pruning. In: Fayyad U, Chaudhuri S, Madigan D (eds) Proceedings of the SIGKDD 1999. ACM Press, New York, pp 287–291 Chairmen: Fayyad U, Chaudhuri S, Madigan D Proceedings Chair: Shim K
Bayardo RJ (1998) Efficiently mining long patterns from databases. In: Haas LM, Tiwary A (eds) Proceedings of the SIGMOD 1998. ACM Press, New York, pp 85–93
Blake CL, Merz CJ (1998) UCI repository of machine learning databases. University of California – Irvine, Irvine, CA
Bonchi F, Giannotti F, Mazzanti A, Pedreschi D (2005) Efficient breadth-first mining of frequent pattern with monotone constraints. KAIS 8(2):131–153
Bonchi F, Lucchese C (2004) On closed constrained frequent pattern mining. In: Rastogi R, Morik K, Bramer M, Wu X (eds) Proceedings of the ICDM 2004. IEEE Computer Society Press, Los Alamitos, CA, pp 35–42
Brin S, Motwani R, Silverstein C (1997) Beyond market baskets: generalizing association rules to correlations. In: Peckham J (ed) Proceedings of the SIGMOD 1997. ACM Press, New York, pp 265–276
Bucila C, Gehrke J, Kifer D, White WM (2002) DualMiner: a dual-pruning algorithm for itemsets with constraints. In: Zaïane OR, Goebel R, Hand D, et al (eds) Proceedings of the SIGKDD 2002. ACM Press, New York, pp 42–51
Cheung DW, Han J, Ng VT, Wong CY (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Su SYW (ed) Proceedings of the ICDE 1996. IEEE Computer Society Press, Los Alamitos, CA, pp 106–114
Cheung DW, Lee SD, Kao B (1997) A general incremental technique for maintaining discovered association rules. In: Topor RW, Tanaka K (eds) Proceedings of the DASFAA 1997. World Scientific, Singapore, pp 185–194
Cheung W, Zaïane OR (2003) Incremental mining of frequent patterns without candidate generation or support constraint. In: Desai BC, Ng W (eds) Proceedings of the IDEAS 2003. IEEE Computer Society Press, Los Alamitos, CA, pp 111–116
Coatney M, Parthasarathy S (2005) MotifMiner: efficient discovery of common substructures in biochemical molecules. KAIS 7(2):202–223
Fukuda T, Morimoto Y, Morishita S, Tokuyama T (1996) Data mining using two-dimensional optimized association rules: scheme, algorithms, and visualization. In: Jagadish HV, Mumick IS (eds) Proceedings of the SIGMOD 1996. ACM Press, New York, pp 13–23
Gade K, Wang J, Karypis G (2004) Efficient closed pattern mining in the presence of tough block constraints. In: Kim W, Kohavi R, Gehrke J, DuMouchel W (eds) Proceedings of the SIGKDD 2004. ACM Press, New York, pp 138–147
Goethals B, Zaki MJ (2003) Advances in frequent itemset mining implementations: introduction to FIMI'03. In: Goethals B, Zaki MJ (eds) Proceedings of the FIMI 2003. Available via CEUR-WS.org
Han J, Pei J, Yin Y (2000) Mining frequent patterns without candidate generation. In: Chen W, Naughton JF, Bernstein PA (eds) Proceedings of the SIGMOD 2000. ACM Press, New York, pp 1–12
Han J, Pei J, Yin Y, Mao R (2004) Mining frequent patterns without candidate generation: a frequent-pattern tree approach. Data Min Knowledge Dis 8(1):53–87
Hidber C (1999) Online association rule mining. In: Delis A, Faloutsos C, Ghandeharizadeh S (eds) Proceedings of the SIGMOD 1999. ACM Press, New York, pp 145–156
Huang H, Wu X, Relue R (2002) Association analysis with one scan of databases. In: Kumar V, Tsumoto S, Zhong N, et al (eds) Proceedings of the ICDM 2002. IEEE Computer Society Press, Los Alamitos, CA, pp 629–632 In: Kumar V, Tsumoto S, Zhong N, Yu PS, Wu X (eds)
Koh J-L, Shieh S-F (2004) An efficient approach for maintaining association rules based on adjusting FP-tree structures. In: Lee Y-J, Li J, Whang K-Y, Lee D (eds) Proceedings of the DASFAA 2004. Springer-Verlag, Berlin Heidelberg New York, pp 417–424
Lakshmanan LVS, Leung CK-S, Ng RT (2003) Efficient dynamic mining of constrained frequent sets. ACM TODS 28(4):337–389
Leung CK-S (2004) Interactive constrained frequent-pattern mining system. In: Bernardino J, Desai BC (eds) Proceedings of the IDEAS 2004. IEEE Computer Society Press, Los Alamitos, CA, pp 49–58
Leung CK-S, Khan QI, Hoque T (2005) CanTree: a tree structure for efficient incremental mining of frequent patterns. In: Han J, Wah BW, Raghavan V, et al (eds) Proceedings of the ICDM 2005. IEEE Computer Society Press, Los Alamitos, CA, pp 274–281 In: Han J, Wah BW, Raghavan V, Wu X, Rastogi R (eds)
Leung CK-S, Lakshmanan LVS, Ng RT (2002) Exploiting succinct constraints using FP-trees. SIGKDD Explorat 4(1):40–49
Leung CK-S, Ng RT, Mannila H (2002) OSSM: a segmentation approach to optimize frequency counting. In: Agrawal R, Dittrich K, Ngu AHH (eds) Proceedings of the ICDE 2002. IEEE Computer Society Press, Los Alamitos, CA, pp 583–592
Ng RT, Lakshmanan LVS, Han J, Pang A (1998) Exploratory mining and pruning optimizations of constrained associations rules. In: Haas LM, Tiwary A (eds) Proceedings of the SIGMOD 1998. ACM Press, New York, pp 13–24
Ong K-L, Ng WK, Lim E-P (2003) FSSM: fast construction of the optimized segment support map. In: Kambayashi Y, Mohania MK, Wöss W (eds) Proceedings of the DaWaK 2003. Springer-Verlag, Berlin Heidelberg New York, pp 257–266
Park JS, Chen M-S, Yu PS (1997) Using a hash-based method with transaction trimming for mining association rules. IEEE TKDE 9(5):813–825
Pei J, Han J, Lakshmanan LVS (2001) Mining frequent itemsets with convertible constraints. In: Buchmann A, Georgakopoulos D (eds) Proceedings of the ICDE 2001. IEEE Computer Society Press, Los Alamitos, CA, pp 433–442
Pei J, Han J, Mao R (2000) CLOSET: an efficient algorithm for mining frequent closed itemsets. In: Gunopulos D, Rastogi R (eds) Proceedings of the DMKD 2000, pp 21–30 (the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery) is Available via www.cs.ucr.edu/~dg/DMKD.html
Pietracaprina A, Zandolin D (2003) Mining frequent itemsets using Patricia tries. In: Goethals B, Zaki MJ (eds) Proceedings of the FIMI 2003. Available via CEUR-WS.org
Sarawagi S, Thomas S, Agrawal R (1998) Integrating association rule mining with relational database systems: alternatives and implications. In: Haas LM, Tiwary A (eds) Proceedings of the SIGMOD 1998. ACM Press, New York, pp 343–354
Teng W-G, Hsieh M-J, Chen M-S (2005) A statistical framework for mining substitution rules. KAIS 7(2):158–178
Tsur D, Ullman JD, Abiteboul S, et al (1998) Query flocks: a generalization of association-rule mining. In: Haas LM, Tiwary A (eds) Proceedings of the SIGMOD 1998. ACM Press, New York, pp 1–12 Tsur D, Ullman JD, Abiteboul S, Clifton C, Motwani R, Nestorov S, Rosenthal A (1998)
Tzvetkov P, Yan X, Han J (2005) TSP: mining top-k closed sequential patterns. KAIS 7(4):438–457
Wang W, Yang J, Yu P (2004) WAR: weighted association rules for item intensities. KAIS 6(2):203–229
Zaki MJ, Hsiao C-J (2002) CHARM: an efficient algorithm for closed itemset mining. In: Grossman RL, Han J, Kumar V, et al (eds) Proceedings of the SDM 2002. SIAM, Philadelphia, PA, pp 457–473
Author information
Authors and Affiliations
Corresponding author
Additional information
Carson K.-S. Leung received his B.Sc.(Honours), M.Sc., and Ph.D. degrees, all in computer science, from the University of British Columbia, Canada. Currently, he is an Assistant Professor at the University of Manitoba, Canada. His research interests include the areas of databases, data mining, and data warehousing. His work has been published in refereed journals and conferences such as ACM Transactions on Database Systems (TODS), IEEE International Conference on Data Engineering (ICDE), and IEEE International Conference on Data Mining (ICDM)
Quamrul I. Khan received his B.Sc. degree in computer science from North South University, Bangladesh, in 2001. He then worked as a Test Engineer and a Software Engineer for a few years before he started his current M.Sc. degree program in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung.
Zhan Li received her B.Eng. degree in computer engineering from Harbin Engineering University, China, in 2002. Currently, she is pursuing her M.Sc. degree in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung.
Tariqul Hoque received his B.Sc. degree in computer science from North South University, Bangladesh, in 2001. Currently, he is pursuing his M.Sc. degree in computer science at the University of Manitoba under the academic supervision of Dr. C. K.-S. Leung.
Rights and permissions
About this article
Cite this article
Leung, C.KS., Khan, Q.I., Li, Z. et al. CanTree: a canonical-order tree for incremental frequent-pattern mining. Knowl Inf Syst 11, 287–311 (2007). https://doi.org/10.1007/s10115-006-0032-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10115-006-0032-8