Abstract
Frequent pattern mining has emerged as an important mining task in data stream mining. A number of algorithms have been proposed. These algorithms usually use a method of two steps: one is calculating the frequency of itemsets while monitoring each arrival of the data stream, and the other is to output the frequent itemsets according to user’s requirement. Due to the large number of item combinations for each transaction occurred in data stream, the first step costs lots of time. Therefore, for high speed long transaction data streams, there may be not enough time to process every transactions arrived in stream, which will reduce the mining accuracy. In this paper, we propose a new approach to deal with this issue. Our new approach is a kind of lazy approach, which delays calculation of the frequency of each itemset to the second step. So, the first step only stores necessary information for each transaction, which can avoid missing any transaction arrival in data stream. In order to improve accuracy, we propose monitoring items which are most likely to be frequent. By this method, many candidate itemsets can be pruned, which leads to the good performance of the algorithm, DELAY, designed based on this method. A comprehensive experimental study shows that our algorithm achieves some improvements over existing algorithms, LossyCounting and FDPM, especially for long transaction data streams.
This work was supported in part by the National Natural Science Foundation of China under Grant No. 70471006,70621061, 60496325 and 60573092.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient Computation of Frequent and Top-k Elements in Data Streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, Springer, Heidelberg (2004)
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proceedings of the ACM SIGKDD Conference (1998)
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streamsusing Ensemble Classifiers. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Massive Databases. In: Int’l Conf. on Management of Data (May 1993)
Zhu, Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: Int’l Conf. on Very Large Data Bases (2002)
Chi, Y., Wang, H., Yu, P.S., Richard, R.: Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. In: IEEE Int’l Conf. on Data Mining (November 2004)
Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering (2004)
Cheng, J., Ke, Y., Ng, W.: Maintaining Frequent Itemsets over High-Speed Data Streams. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, Springer, Heidelberg (2006)
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Int’l Conf. on Very Large Databases (2002)
Chang, J.H., Lee, W.S., Zhou, A.: Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Data Mining: Next Generation Challenges and Future Directions, AAAI/MIT Press, Cambridge (2003)
Li, H.-F., Lee, S.-Y., Shan, M.-K.: An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In: Int’l Workshop on Knowledge Discovery in Data Streams (September 2004)
Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering (2004)
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. Theoretical Computer Science (2004)
Lin, C.-H., Chiu, D.-Y., Wu, Y.-H., Chen, A.L.P.: Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window. In: SIAM Int’l Conf. on Data Mining (April 2005)
Yu, J.X., Chong, Z.H., Lu, H.J., Zhou, A.Y.: False positive or false negative: Mining frequent Itemsets from high speed transactional data streams. In: Nascimento, M.A., Kossmann, D. (eds.) VLDB 2004. Proc. of the 30th Int’l Conf. on Very Large Data Bases, pp. 204–215. Morgan Kaufmann Publishers, Toronto (2004)
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Bayardo Jr., R.J.: Efficiently mining long patterns from databases. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. SIGMOD Record, vol. 27(2), pp. 85–93. ACM Press, New York (1998)
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery (2003)
Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of 20th Intl. Conf. on Very Large Data Bases, pp. 487–499 (1994)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Yang, H., Liu, H., He, J. (2007). DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_2
Download citation
DOI: https://doi.org/10.1007/978-3-540-73871-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)