DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

Yang, Hui; Liu, Hongyan; He, Jun

doi:10.1007/978-3-540-73871-8_2

Hui Yang²⁴,
Hongyan Liu²⁵ &
Jun He²⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4632))

Included in the following conference series:

International Conference on Advanced Data Mining and Applications

2233 Accesses

Abstract

Frequent pattern mining has emerged as an important mining task in data stream mining. A number of algorithms have been proposed. These algorithms usually use a method of two steps: one is calculating the frequency of itemsets while monitoring each arrival of the data stream, and the other is to output the frequent itemsets according to user’s requirement. Due to the large number of item combinations for each transaction occurred in data stream, the first step costs lots of time. Therefore, for high speed long transaction data streams, there may be not enough time to process every transactions arrived in stream, which will reduce the mining accuracy. In this paper, we propose a new approach to deal with this issue. Our new approach is a kind of lazy approach, which delays calculation of the frequency of each itemset to the second step. So, the first step only stores necessary information for each transaction, which can avoid missing any transaction arrival in data stream. In order to improve accuracy, we propose monitoring items which are most likely to be frequent. By this method, many candidate itemsets can be pruned, which leads to the good performance of the algorithm, DELAY, designed based on this method. A comprehensive experimental study shows that our algorithm achieves some improvements over existing algorithms, LossyCounting and FDPM, especially for long transaction data streams.

This work was supported in part by the National Natural Science Foundation of China under Grant No. 70471006,70621061, 60496325 and 60573092.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window

Article Open access 27 June 2023

Frequent Itemset Mining Algorithms—A Literature Survey

FCHM-stream: fast closed high utility itemsets mining over data streams

Article 03 February 2023

References

Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient Computation of Frequent and Top-k Elements in Data Streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, Springer, Heidelberg (2004)
Chapter Google Scholar
Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proceedings of the ACM SIGKDD Conference (1998)
Google Scholar
Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streamsusing Ensemble Classifiers. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)
Google Scholar
Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Massive Databases. In: Int’l Conf. on Management of Data (May 1993)
Google Scholar
Zhu, Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: Int’l Conf. on Very Large Data Bases (2002)
Google Scholar
Chi, Y., Wang, H., Yu, P.S., Richard, R.: Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. In: IEEE Int’l Conf. on Data Mining (November 2004)
Google Scholar
Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering (2004)
Google Scholar
Cheng, J., Ke, Y., Ng, W.: Maintaining Frequent Itemsets over High-Speed Data Streams. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, Springer, Heidelberg (2006)
Google Scholar
Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Int’l Conf. on Very Large Databases (2002)
Google Scholar
Chang, J.H., Lee, W.S., Zhou, A.: Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)
Google Scholar
Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Data Mining: Next Generation Challenges and Future Directions, AAAI/MIT Press, Cambridge (2003)
Google Scholar
Li, H.-F., Lee, S.-Y., Shan, M.-K.: An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In: Int’l Workshop on Knowledge Discovery in Data Streams (September 2004)
Google Scholar
Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering (2004)
Google Scholar
Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. Theoretical Computer Science (2004)
Google Scholar
Lin, C.-H., Chiu, D.-Y., Wu, Y.-H., Chen, A.L.P.: Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window. In: SIAM Int’l Conf. on Data Mining (April 2005)
Google Scholar
Yu, J.X., Chong, Z.H., Lu, H.J., Zhou, A.Y.: False positive or false negative: Mining frequent Itemsets from high speed transactional data streams. In: Nascimento, M.A., Kossmann, D. (eds.) VLDB 2004. Proc. of the 30th Int’l Conf. on Very Large Data Bases, pp. 204–215. Morgan Kaufmann Publishers, Toronto (2004)
Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)
Chapter Google Scholar
Bayardo Jr., R.J.: Efficiently mining long patterns from databases. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. SIGMOD Record, vol. 27(2), pp. 85–93. ACM Press, New York (1998)
Chapter Google Scholar
Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery (2003)
Google Scholar
Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)
Article MathSciNet Google Scholar
Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of 20th Intl. Conf. on Very Large Data Bases, pp. 487–499 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Information School, Renmin University of China, Beijing, 100872, China
Hui Yang & Jun He
School of Economics and Management, Tsinghua University, Beijing, 100084, China
Hongyan Liu

Authors

Hui Yang
View author publications
You can also search for this author in PubMed Google Scholar
Hongyan Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jun He
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Department, University of Calgary , Calgary, AB, Canada
Reda Alhajj
School of Computer Science and Technology , Harbin Institute of Technology, Harbin, China
Hong Gao
School of Computer Science and Technology , Harbin Institute of Technology , Harbin, China
Jianzhong Li
School of Information Technology and Electronic Engineering , The University of Queensland , Queensland, Australia
Xue Li
Department of Computing Science , University of Alberta, Edmonton, AB, Canada
Osmar R. Zaïane

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Yang, H., Liu, H., He, J. (2007). DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_2

Download citation

DOI: https://doi.org/10.1007/978-3-540-73871-8_2
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

Abstract

Access this chapter

Preview

Similar content being viewed by others

Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window

Frequent Itemset Mining Algorithms—A Literature Survey

FCHM-stream: fast closed high utility itemsets mining over data streams

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

Abstract

Access this chapter

Preview

Similar content being viewed by others

Mining Discriminative Itemsets Over Data Streams Using Efficient Sliding Window

Frequent Itemset Mining Algorithms—A Literature Survey

FCHM-stream: fast closed high utility itemsets mining over data streams

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation