Abstract
Mining frequent itemsets over data streams is an emergent research topic in recent years. Previous approaches generally use a fixed support threshold to discover the patterns in the stream. However, the threshold will be changed to cope with the needs of the users and the characteristics of the incoming data in reality. Changing the threshold implies a re-mining of the whole transactions in a non-streaming environment. Nevertheless, the "look-once" feature of the streaming data cannot provide the discarded transactions so that a re-mining on the stream is impossible. Therefore, we propose a method for variable support mining of frequent itemsets over the data stream. A synopsis vector is constructed for maintaining statistics of past transactions and is invoked only when necessary. The conducted experimental results show that our approach is efficient and scalable for variable support mining in data streams.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Agrawal, R., Srikant, R.: Fast Algorithm for Mining Association Rules. In: Proc. of the 20th International Conference on Very Large Databases (VLDB 1994), pp. 487–499 (1994)
Babcock, B., Babu, S., Datar, M., Motwani, R., Widom, J.: Models and Issues in data stream systems. In: Proc. of the 2002 ACM Symposium on Principles of Database Systems (PODS 2002). ACM Press, New York (2002)
Chi, Y., Wang, H.: Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. In: Perner, P. (ed.) ICDM 2004. LNCS, vol. 3275, pp. 59–66. Springer, Heidelberg (2004)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Proc. of the NSF Workshop on Next Generation Data Mining (2002)
Han, J., Pei, J., Yin, Y.: Mining Frequent Patterns without Candidate Generation. In: Proc. of the 2000 ACM SIGMOD International Conference on Management of Data, vol. 9(2), pp. 1–12 (1999)
Koyuturk, M., Grama, A., Ramakrishnan, N.: Compression, clustering and pattern discovery in very high dimensional discrete-attribute datasets. IEEE Transactions on Knowledge and Data Engineering 17(5), 447–461 (2005)
Li, H.F., Lee, S.Y., Shan, M.K.: An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In: Proc. of the First International Workshop on Knowledge Discovery in Data Streams, Pisa, Italy, September 2004, pp. 20–24 (2004)
Lin, M.Y., Lee, S.Y.: Interactive Sequence Discovery by Incremental Mining. Information Sciences: An International Journal 165(3-4), 187–205 (2004)
Lin, M.Y., Lee, S.Y.: A Fast Lexicographic Algorithm for Association Rule Mining in Web Applications. In: Proc. of the ICDCS Workshop on Knowledge Discovery and Data Mining in the World-Wide Web, Taipei, Taiwan, R.O.C, pp. F7–F14 (2000)
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Proc. of the 28th VLDB Conference, Hong Kong, China, August 2002, pp. 346–357 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, MY., Hsueh, SC., Hwang, SK. (2006). Variable Support Mining of Frequent Itemsets over Data Streams Using Synopsis Vectors. In: Ng, WK., Kitsuregawa, M., Li, J., Chang, K. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2006. Lecture Notes in Computer Science(), vol 3918. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11731139_84
Download citation
DOI: https://doi.org/10.1007/11731139_84
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33206-0
Online ISBN: 978-3-540-33207-7
eBook Packages: Computer ScienceComputer Science (R0)