Abstract
An active research in data mining is the discovery of sequential patterns, which finds all frequent sub-sequences in a sequence database. Most of the studies specify no time constraints such as maximum/minimum gaps between adjacent elements of a pattern in the mining so that the resultant patterns may be uninteresting. In addition, a data sequence containing a pattern is rigidly defined as only when each element of the pattern is contained in a distinct element of the sequence. This limitation might lose useful patterns for some applications because sometimes items of an element might be spread across adjoining elements within a specified time period or time window. Therefore, we propose a pattern-growth approach for mining the generalized sequential patterns. Our approach features in reducing the size of sub-databases by bounded and windowed projection techniques. Bounded projections keep only time-gap valid sub-sequences and windowed projections save non-redundant sub-sequences satisfying the sliding time window constraint. Furthermore, the delimited growth technique directly generates constraint-satisfactory patterns and speeds up the growing process. The empirical evaluations show that the proposed approach has good linear scalability and outperforms the well-known GSP algorithm in the discovery of generalized sequential patterns.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
R. Agrawal and R. Srikant, „Mining Sequential Patterns,“ Proceedings of the 11th International Conference on Data Engineering, Taipei, Taiwan, pp. 3–14, 1995.
C. Bettini, X. S. Wang, and S. Jajodia, „Mining Temporal Relationships with Multiple Granularities in Time Sequences,“ Data Engineering Bulletin, Vol. 21, pp. 32–38, 1998.
M. N. Garofalakis, R. Rastogi, and K. Shim, „SPIRIT: Sequential Pattern Mining with Regular Expression Constraints,“ Proceedings of the 25th International Conference on Very Large Data Bases, Edinburgh, Scotland, pp. 223–234, 1999.
J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal and M.-C. Hsu, „FreeSpan: Frequent pattern-projected sequential pattern mining,“ Proceedings of the 6th ACM SIGKDD international conference on Knowledge discovery and data mining, pp. 355–359, 2000.
M. Y. Lin and S. Y. Lee, „Incremental Update on Sequential Patterns in Large Databases,“ Proceedings of 10th IEEE International Conference on Tools with Artificial Intelligence, Taipei, Taiwan, pp. 24–31, 1998.
H. Mannila, H. Toivonen and A. I. Verkamo, „Discovery of Frequent Episodes in Event Sequences,“ Data Mining and Knowledge Discovery, Vol. 1, Issue 3, pp. 259–289, 1997.
F. Masseglia, F. Cathala, and P. Poncelet, „The PSP Approach for Mining Sequential Patterns,“ Proceedings of 1998 2nd European Symposium on Principles of Data Mining and Knowledge Discovery, Vol. 1510, Nantes, France, pp. 176–184, Sep. 1998.
T. Oates, M. D. Schmill, D. Jensen, and P. R. Cohen, „A Family of Algorithms for Finding Temporal Structure in Data,“ Proceedings of the 6th International Workshop on AI and Statistics, Fort Lauderdale, Florida, pp. 371–378, 1997.
J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal and M.-C. Hsu, „PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth,“ Proceedings of 2001 International Conference on Data Engineering, pp. 215–224, 2001.
T. Shintani and M. Kitsuregawa, „Mining algorithms for sequential patterns in parallel: Hash based approach,“ Proceedings of the Second Pacific-Asia Conference on Knowledge Discovery and Data mining, pp. 283–294, 1998.
R. Srikant and R. Agrawal, „Mining Sequential Patterns: Generalizations and Performance Improvements,“ Proceedings of the 5th International Conference on Extending Database Technology, Avignon, France, pp. 3–17, 1996. (An extended version is the IBM Research Report RJ 9994)
K. Wang, „Discovering patterns from large and dynamic sequential data,“ Journal of Intelligent Information Systems, Vol. 9, No. 1, pp. 33–56, 1997.
M. J. Zaki, „Sequence Mining in Categorical Domains: Incorporating Constraints,“ Proceedings of the 9th International Conference on Information and Knowledge Management, Washington D.C., pp. 422–429, 2000.
M. J. Zaki, „SPADE: An Efficient Algorithm for Mining Frequent Sequences,“ Machine Learning Journal, Vol. 42, No. 1/2, pp. 31–60, 2001.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Lin, MY., Lee, SY., Wang, SS. (2002). DELISP: Efficient Discovery of Generalized Sequential Patterns by Delimited Pattern-Growth Technology. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_19
Download citation
DOI: https://doi.org/10.1007/3-540-47887-6_19
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43704-8
Online ISBN: 978-3-540-47887-4
eBook Packages: Springer Book Archive