Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Finding Frequent Closed Itemsets in Sliding Window in Linear Time

Published: 01 October 2008 Publication History

Abstract

One of the most well-studied problems in data mining is computing the collection of frequent itemsets in large transactional databases. Since the introduction of the famous Apriori algorithm [14], many others have been proposed to find the frequent itemsets. Among such algorithms, the approach of mining closed itemsets has raised much interest in data mining community. The algorithms taking this approach include TITANIC [8], CLOSET+ [6], DCI-Closed [4], FCI-Stream [3], GC-Tree [15], TGC-Tree [16] etc. Among these algorithms, FCI-Stream, GC-Tree and TGC-Tree are online algorithms work under sliding window environments. By the performance evaluation in [16], GC-Tree [15] is the fastest one. In this paper, an improved algorithm based on GC-Tree is proposed, the computational complexity of which is proved to be a linear combination of the average transaction size and the average closed itemset size. The algorithm is based on the essential theorem presented in Sect. 4.2. Empirically, the new algorithm is several orders of magnitude faster than the state of art algorithm, GC-Tree.

References

[1]
S.B. Yahia, T. Hamrouni, and E.M. Nguifo, “Frequent closed itemset based algorithms: A thorough structural and analytical survey,” ACM SIGKDD Explorations Newsletter, vol.8, no.1, pp.93–104, June 2006.
[2]
C. Lin, D. Chiu, Y. Wu, and A.L.P. Chen, “Mining frequent itemsets from data streams with a time-sensitive sliding window,” Proc. SIAM International Conference on Data Mining, pp.68–79, April 2005.
[3]
N. Jiang and L. Gruenwald, “CFI-stream: Mining closed frequent itemsets in data streams,” Proc. ACM SIGKDD Conference, pp.592–597, Aug. 2006.
[4]
C. Lucchese, S. Orlando, and R. Perego, “DCI closed: A fast and memory efficient algorithm to mine frequent closed itemsets,” Proc. IEEE ICDM Workshop on Frequent itemset Mining Implementations (FIMI'04), CEUR Workshop Proceedings, vol.126, Nov. 2004.
[5]
C. Lucchese, S. Orlando, and R. Perego, “Fast and memory efficient mining of frequent closed itemsets,” IEEE Journal Transactions of Knowledge and Data Engineering (TKDE), vol.18, no.1, pp.21–36, Jan. 2006.
[6]
J. Wang, J. Han, and J. Pei, “CLOSET+: Searching for the best strategies for mining frequent closed itemsets,” Proc. ACM SIGKDD Conference, pp.236–245, Aug. 2003.
[7]
M.J. Zaki and C. Hsiao, “CHARM: An efficient algorithm for closed itemsets mining,” Proc. SIAM International Conference on Data Mining, pp.457–473, April 2002.
[8]
G. Stumme, R. Taouil, Y. Bastide, N. Pasquier, and L. Lakhal, “Computing iceberg concept lattices with TITANIC,” Journal of Knowledge and Data Engineering (KDE), vol.2, no.42, pp.189–222, 2002.
[9]
M.J. Zaki and K. Gouda, “Fast vertical mining using diffsets,” Technical Report 01-1, Computer Science Dept., Rensselaer Polytechnic Institute, March 2001.
[10]
C. Lucchese, S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “KDCI: A multistrategy algorithm for mining frequent sets,” Proc. IEEE ICDM Workshop on Frequent itemset Mining Implementations (FIMI'03), CEUR Workshop Proceedings, vol.126, Nov. 2003.
[11]
S. Orlando, P. Palmerini, R. Perego, and F. Silvestri, “Adaptive and resource-aware mining of frequent sets,” Proc. IEEE International Conference on Data Mining, pp.338–345, Nov. 2002.
[12]
N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal, “Discovering frequent closed itemsets for association rules,” Proc. International Conference on Database Theory (ICDT'99), pp.398–416, Jan. 1999.
[13]
J. Pei, J. Han, and R. Mao, “CLOSET: An efficient algorithm for mining frequent closed itemsets,” Proc. ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, pp.21–30, May 2000.
[14]
R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” Proc. International Conference on Very Large Databases, pp.487–499, Sept. 1994.
[15]
J. Chen and S. Li, “GC-tree: A fast online algorithm for mining frequent closed itemsets,” Proc. PAKDD Workshop of HPDMA, pp.457–468, May 2007.
[16]
J. Chen and B. Zhou, “TGC-tree: An online algorithm tracing closed itemset and transaction set simultaneously,” Proc. Large Scale Knowledge Recources, pp.38–50, March 2008.
[17]
G. Grahne and J. Zhu, “Efficiently using prefix-trees in mining frequent itemsets,” Proc. IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03), CEUR Workshop Proceedings, vol.90, Nov. 2003.
[18]
T. Uno, T. Asai, Y. Uchida, and H. Arimura, “LCM: An efficient algorithm for enumerating frequent closed item sets,” Proc. IEEE ICDM Workshop on Frequent Itemset Mining Implementations (FIMI'03), CEUR Workshop Proceedings, vol.90, Nov. 2003.
  1. Finding Frequent Closed Itemsets in Sliding Window in Linear Time

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image IEICE - Transactions on Information and Systems
      IEICE - Transactions on Information and Systems  Volume E91-D, Issue 10
      October 2008
      161 pages
      ISSN:0916-8532
      EISSN:1745-1361
      Issue’s Table of Contents

      Publisher

      Oxford University Press, Inc.

      United States

      Publication History

      Published: 01 October 2008

      Author Tags

      1. association rules
      2. closed itemsets
      3. data mining
      4. frequent itemsets
      5. online algorithm

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • 0
        Total Citations
      • 0
        Total Downloads
      • Downloads (Last 12 months)0
      • Downloads (Last 6 weeks)0
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      View Options

      View options

      Get Access

      Login options

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media