Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology

  • Conference paper
  • First Online:
Advances in Knowledge Discovery and Data Mining (PAKDD 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2336))

Included in the following conference series:

Abstract

An active research in data mining is the discovery of sequential patterns, which finds all frequent sub-sequences in a sequence database. Most of the studies specify no time constraints such as maximum/minimum gaps between adjacent elements of a pattern in the mining so that the resultant patterns may be uninteresting. In addition, a data sequence containing a pattern is rigidly defined as only when each element of the pattern is contained in a distinct element of the sequence. This limitation might lose useful patterns for some applications because sometimes items of an element might be spread across adjoining elements within a specified time period or time window. Therefore, we propose a pattern-growth approach for mining the generalized sequential patterns. Our approach features in reducing the size of sub-databases by bounded and windowed projection techniques. Bounded projections keep only time-gap valid sub-sequences and windowed projections save non-redundant sub-sequences satisfying the sliding time window constraint. Furthermore, the delimited growth technique directly generates constraint-satisfactory patterns and speeds up the growing process. The empirical evaluations show that the proposed approach has good linear scalability and outperforms the well-known GSP algorithm in the discovery of generalized sequential patterns.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. R. Agrawal and R. Srikant, „Mining Sequential Patterns,“ Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14, 1995.

    Google Scholar 

  2. C. Bettini, X. S. Wang, and S. Jajodia, „Mining Temporal Relationships with Multiple Granularities in Time Sequences,“ Data Engineering Bulletin, Vol. 21, pp. 32–38, 1998.

    Google Scholar 

  3. M. N. Garofalakis, R. Rastogi, and K. Shim, „SPIRIT: Sequential Pattern Mining with Regular Expression Constraints,“ Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, pp. 223–234, 1999.

    Google Scholar 

  4. J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal and M.-C. Hsu, „FreeSpan: Frequent pattern-projected sequential pattern mining,“ Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 355–359, 2000.

    Google Scholar 

  5. M. Y. Lin and S. Y. Lee, „Incremental Update on Sequential Patterns in Large Databases,“ Proceedings of 10th IEEE International Conference on Tools with Artificial Intelligence, Taipei, Taiwan, pp. 24–31, 1998.

    Google Scholar 

  6. H. Mannila, H. Toivonen and A. I. Verkamo, „Discovery of Frequent Episodes in Event Sequences,“ Data Mining and Knowledge Discovery, Vol. 1, Issue 3, pp. 259–289, 1997.

    Article  Google Scholar 

  7. F. Masseglia, F. Cathala, and P. Poncelet, „The PSP Approach for Mining Sequential Patterns,“ Proceedings of 1998 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, Vol. 1510, Nantes, France, pp. 176–184, Sep. 1998.

    Google Scholar 

  8. T. Oates, M. D. Schmill, D. Jensen, and P. R. Cohen, „A Family of Algorithms for Finding Temporal Structure in Data,“ Proceedings of the 6th International Workshop on AI and Statistics, Fort Lauderdale, Florida, pp. 371–378, 1997.

    Google Scholar 

  9. J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal and M.-C. Hsu, „PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth,“ Proceedings of 2001 International Conference on Data Engineering, pp. 215–224, 2001.

    Google Scholar 

  10. T. Shintani and M. Kitsuregawa, „Mining algorithms for sequential patterns in parallel: Hash based approach,“ Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data mining, pp. 283–294, 1998.

    Google Scholar 

  11. R. Srikant and R. Agrawal, „Mining Sequential Patterns: Generalizations and Performance Improvements,“ Proceedings of the 5th International Conference on Extending Database Technology, Avignon, France, pp. 3–17, 1996. (An extended version is the IBM Research Report RJ 9994)

    Google Scholar 

  12. K. Wang, „Discovering patterns from large and dynamic sequential data,“ Journal of Intelligent Information Systems, Vol. 9, No. 1, pp. 33–56, 1997.

    Article  Google Scholar 

  13. M. J. Zaki, „Sequence Mining in Categorical Domains: Incorporating Constraints,“ Proceedings of the 9th International Conference on Information and Knowledge Management, Washington D.C., pp. 422–429, 2000.

    Google Scholar 

  14. M. J. Zaki, „SPADE: An Efficient Algorithm for Mining Frequent Sequences,“ Machine Learning Journal, Vol. 42, No. 1/2, pp. 31–60, 2001.

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Lin, MY., Lee, SY., Wang, SS. (2002). DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_19

Download citation

  • DOI: https://doi.org/10.1007/3-540-47887-6_19

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43704-8

  • Online ISBN: 978-3-540-47887-4

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics