Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams

  • Conference paper
Advanced Data Mining and Applications (ADMA 2007)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 4632))

Included in the following conference series:

  • 2233 Accesses

Abstract

Frequent pattern mining has emerged as an important mining task in data stream mining. A number of algorithms have been proposed. These algorithms usually use a method of two steps: one is calculating the frequency of itemsets while monitoring each arrival of the data stream, and the other is to output the frequent itemsets according to user’s requirement. Due to the large number of item combinations for each transaction occurred in data stream, the first step costs lots of time. Therefore, for high speed long transaction data streams, there may be not enough time to process every transactions arrived in stream, which will reduce the mining accuracy. In this paper, we propose a new approach to deal with this issue. Our new approach is a kind of lazy approach, which delays calculation of the frequency of each itemset to the second step. So, the first step only stores necessary information for each transaction, which can avoid missing any transaction arrival in data stream. In order to improve accuracy, we propose monitoring items which are most likely to be frequent. By this method, many candidate itemsets can be pruned, which leads to the good performance of the algorithm, DELAY, designed based on this method. A comprehensive experimental study shows that our algorithm achieves some improvements over existing algorithms, LossyCounting and FDPM, especially for long transaction data streams.

This work was supported in part by the National Natural Science Foundation of China under Grant No. 70471006,70621061, 60496325 and 60573092.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Metwally, A., Agrawal, D., Abbadi, A.E.: Efficient Computation of Frequent and Top-k Elements in Data Streams. In: Eiter, T., Libkin, L. (eds.) ICDT 2005. LNCS, vol. 3363, Springer, Heidelberg (2004)

    Chapter  Google Scholar 

  2. Bayardo Jr., R.J.: Efficiently Mining Long Patterns from Databases. In: Proceedings of the ACM SIGKDD Conference (1998)

    Google Scholar 

  3. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streamsusing Ensemble Classifiers. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)

    Google Scholar 

  4. Agrawal, R., Imielinski, T., Swami, A.: Mining Association Rules between Sets of Items in Massive Databases. In: Int’l Conf. on Management of Data (May 1993)

    Google Scholar 

  5. Zhu, Y., Shasha, D.: StatStream: Statistical Monitoring of Thousands of Data Streams in Real Time. In: Int’l Conf. on Very Large Data Bases (2002)

    Google Scholar 

  6. Chi, Y., Wang, H., Yu, P.S., Richard, R.: Moment: Maintaining Closed Frequent Itemsets over a Stream Sliding Window. In: IEEE Int’l Conf. on Data Mining (November 2004)

    Google Scholar 

  7. Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering (2004)

    Google Scholar 

  8. Cheng, J., Ke, Y., Ng, W.: Maintaining Frequent Itemsets over High-Speed Data Streams. In: Ng, W.-K., Kitsuregawa, M., Li, J., Chang, K. (eds.) PAKDD 2006. LNCS (LNAI), vol. 3918, Springer, Heidelberg (2006)

    Google Scholar 

  9. Manku, G.S., Motwani, R.: Approximate Frequency Counts over Data Streams. In: Int’l Conf. on Very Large Databases (2002)

    Google Scholar 

  10. Chang, J.H., Lee, W.S., Zhou, A.: Finding Recent Frequent Itemsets Adaptively over Online Data Streams. In: ACM SIGKDD Int’l Conf. on Knowledge Discovery and Data Mining (August 2003)

    Google Scholar 

  11. Giannella, C., Han, J., Pei, J., Yan, X., Yu, P.S.: Mining Frequent Patterns in Data Streams at Multiple Time Granularities. In: Data Mining: Next Generation Challenges and Future Directions, AAAI/MIT Press, Cambridge (2003)

    Google Scholar 

  12. Li, H.-F., Lee, S.-Y., Shan, M.-K.: An Efficient Algorithm for Mining Frequent Itemsets over the Entire History of Data Streams. In: Int’l Workshop on Knowledge Discovery in Data Streams (September 2004)

    Google Scholar 

  13. Chang, J.H., Lee, W.S.: A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. Journal of Information Science and Engineering (2004)

    Google Scholar 

  14. Charikar, M., Chen, K., Farach-Colton, M.: Finding Frequent Items in Data Streams. Theoretical Computer Science (2004)

    Google Scholar 

  15. Lin, C.-H., Chiu, D.-Y., Wu, Y.-H., Chen, A.L.P.: Mining Frequent Itemsets from Data Streams with a Time-Sensitive Sliding Window. In: SIAM Int’l Conf. on Data Mining (April 2005)

    Google Scholar 

  16. Yu, J.X., Chong, Z.H., Lu, H.J., Zhou, A.Y.: False positive or false negative: Mining frequent Itemsets from high speed transactional data streams. In: Nascimento, M.A., Kossmann, D. (eds.) VLDB 2004. Proc. of the 30th Int’l Conf. on Very Large Data Bases, pp. 204–215. Morgan Kaufmann Publishers, Toronto (2004)

    Google Scholar 

  17. Pasquier, N., Bastide, Y., Taouil, R., Lakhal, L.: Discovering frequent closed itemsets for association rules. In: Beeri, C., Bruneman, P. (eds.) ICDT 1999. LNCS, vol. 1540, pp. 398–416. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  18. Bayardo Jr., R.J.: Efficiently mining long patterns from databases. In: Haas, L.M., Tiwary, A. (eds.) Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data. SIGMOD Record, vol. 27(2), pp. 85–93. ACM Press, New York (1998)

    Chapter  Google Scholar 

  19. Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: A frequent-pattern tree approach. Data Mining and Knowledge Discovery (2003)

    Google Scholar 

  20. Zaki, M.J.: Scalable algorithms for association mining. IEEE Transactions on Knowledge and Data Engineering 12(3), 372–390 (2000)

    Article  MathSciNet  Google Scholar 

  21. Agrawal, R., Srikant, R.: Fast algorithms for mining association rules. In: Proc. of 20th Intl. Conf. on Very Large Data Bases, pp. 487–499 (1994)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this paper

Cite this paper

Yang, H., Liu, H., He, J. (2007). DELAY: A Lazy Approach for Mining Frequent Patterns over High Speed Data Streams. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-73871-8_2

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-73870-1

  • Online ISBN: 978-3-540-73871-8

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics