Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/956750.956808acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Probabilistic discovery of time series motifs

Published: 24 August 2003 Publication History

Abstract

Several important time series data mining problems reduce to the core task of finding approximately repeated subsequences in a longer time series. In an earlier work, we formalized the idea of approximately repeated subsequences by introducing the notion of time series motifs. Two limitations of this work were the poor scalability of the motif discovery algorithm, and the inability to discover motifs in the presence of noise.Here we address these limitations by introducing a novel algorithm inspired by recent advances in the problem of pattern discovery in biosequences. Our algorithm is probabilistic in nature, but as we show empirically and theoretically, it can find time series motifs with very high probability even in the presence of noise or "don't care" symbols. Not only is the algorithm fast, but it is an anytime algorithm, producing likely candidate motifs almost immediately, and gradually improving the quality of results over time.

References

[1]
Agrawal, R., Lin, K. I., Sawhney, H. S. & Shim, K. (1995). Fast similarity search in the presence of noise, scaling, and translation in time-series databases. In proceedings of the 21st Int'l Conference on Very Large Databases. Zurich, Switzerland, Sept. pp 490--50.]]
[2]
Apostolico, A., Bock, M. E. & Lonardi, S. (2002). Monotony of surprise and large-scale quest for unusual words. In proceedings of the 6th Int'l Conference on Research in Computational Molecular Biology. Washington, DC, April 18--21. pp. 22--31.]]
[3]
Bailey, T & Elkan, C. (1995). Unsupervised learning of multiple motifs in biopolymers using expectation maximization, Machine Learning, 21 (1/2), pp. 51--80.]]
[4]
Buhler, J. (2001). Efficient large-scale sequence comparison by locality-sensitive hashing, Bioinformatics 17: pp. 419--428.]]
[5]
Caraca-Valente., J. P. & Lopez-Chavarrias. I. (2000). Discovering similar patterns in time series. In Proceedings of the Association for Computing Machinery 6th International Conference on Knowledge Discovery and Data Mining, pp. 497--505.]]
[6]
Chan, K. & Fu, A. W. (1999). Efficient time series matching by wavelets. In proceedings of the 15th IEEE Int'l Conference on Data Engineering. Sydney, Australia, Mar 23--26. pp 126--133.]]
[7]
Das, G., Lin, K., Mannila, H., Renganathan, G. & Smyth, P. (1998). Rule discovery from time series. In proceedings of the 4th Int'l Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 16--22.]]
[8]
Dasgupta., D. & Forrest, S. (1999). Novelty detection in time series data using ideas from immunology. In Proceedings of the 5th International Conference on Intelligent Systems (1999).]]
[9]
Daw, C. S., Finney, C. E. A. & Tracy, E. R. (2001). Symbolic analysis of experimental data. Review of Scientific Instruments.]]
[10]
Durbin, R., Eddy, S., Krogh, A. & Mitchison, G. (1998). Biological sequence analysis: probabilistic models of proteins and nucleic acids. Cambridge University Press.]]
[11]
Engelhardt, B., Chien, S. & Mutz, D. (2000). Hypothesis generation strategies for adaptive problem solving. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT.]]
[12]
Ge, X. & Smyth, P. (2000). Deformable Markov model templates for time-series pattern matching. In Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Boston, MA, Aug 20--23. pp 81--90.]]
[13]
Gionis, A., Indyk, P., Motwani, R. (1999). Similarity search in high dimensions via hashing. In proceedings of 25th Int'l Conference on Very Large Databases. Edinburgh, Scotland.]]
[14]
Han, J. Dong, G. & Yin., Y. (1999). Efficient mining partial periodic patterns in time series database. In Proceedings of the 15th International Conference on Data Engineering, Sydney, Australia. pp 106--115.]]
[15]
Hegland, M., Clarke, W. & Kahn, M. (2002). Mining the MACHO dataset, Computer Physics Communications, Vol 142(1--3), December 15. pp. 22--28.]]
[16]
Hertz, G. & Stormo, G. (1999). Identifying DNA and protein patterns with statistically significant alignments of multiple sequences. Bioinformatics, Vol. 15, pp. 563--577.]]
[17]
van Helden, J., Andre, B., & Collado-Vides, J. (1998) Extracting regulatory sites from the upstream region of the yeast genes by computational analysis of oligonucleotides. J. Mol. Biol., Vol. 281, pp. 827--842.]]
[18]
Höppner, F. (2001). Discovery of temporal patterns -- learning rules about the qualitative behavior of time series. In Proceedings of the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases. Freiburg, Germany, pp. 192--203.]]
[19]
Indyk, P., Koudas, N. & Muthukrishnan, S. (2000). Identifying representative trends in massive time series data sets using sketches. In proceedings of the 26th Int'l Conference on Very Large Data Bases. Cairo, Egypt, Sept 10--14. pp 363--372.]]
[20]
Indyk, P., and Motwani. R. Raghavan. R. & Vempala, S. (1997). Locality-preserving hashing in multidimensional spaces. In Proceedings of the 29th Annual ACM Symposium on Theory of Computing. pp. 618--625.]]
[21]
Keogh, E. and Kasetty, S. (2002). On the Need for Time Series Data Mining Benchmarks: A Survey and Empirical Demonstration. In the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. July 23--26, 2002. Edmonton, Alberta, Canada. pp 102--111.]]
[22]
Keogh, E. and Pazzani, M. (1998). An enhanced representation of time series which allows fast and accurate classification clustering and relevance feedback. In 4th International Conference on Knowledge Discovery and Data Mining. New York, NY, Aug 27--31. pp 239--243]]
[23]
Keogh, E., Chakrabarti, K., Pazzani, M. & Mehrotra (2000). Dimensionality reduction for fast similarity search in large time series databases. Journal of Knowledge and Information Systems. pp 263--286.]]
[24]
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F. & Wootton, J. C. (1993). Detecting subtle sequence signals: A Gibbs sampling strategy for multiple alignment. Science, Oct. Vol. 262, pp 208--214.]]
[25]
Lawrence. C. &. Reilly. A. (1990). An expectation maximization (EM) algorithm for the identification and characterization of common sites in unaligned biopolymer sequences. Proteins, Vol. 7, pp 41--51.]]
[26]
Lin, J. Keogh, E. Patel, P. & Lonardi, S. (2002). Finding motifs in time series. In the 2nd Workshop on Temporal Data Mining, at the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Edmonton, Alberta, Canada.]]
[27]
Oates, T., Schmill, M. & Cohen, P. (2000). A Method for Clustering the Experiences of a Mobile Robot that Accords with Human Judgments. In Proceedings of the 17th National Conference on Artificial Intelligence. pp 846--851.]]
[28]
Pevzner, P. A. & Sze, S. H. (2000). Combinatorial approaches to finding subtle signals in DNA sequences. In proceedings of the 8th International Conference on Intelligent Systems for Molecular Biology. La Jolla, CA, Aug 19--23. pp 269--278.]]
[29]
Reinert, G., Schbath, S. & Waterman, M. S. (2000). Probabilistic and statistical properties of words: An overview. J. Comput. Bio., Vol. 7, pp 1--46.]]
[30]
Rigoutsos, I. & Floratos, A. (1998) Combinatorial pattern discovery in biological sequences: The Teiresias algorithm, Bioinformatics, 14(1), pp. 55--67.]]
[31]
Roddick, J. F., Hornsby, K. & Spiliopoulou, M. (2001). An Updated Bibliography of Temporal, Spatial and Spatio-Temporal Data Mining Research. Lecture Notes in Artificial Intelligence. 2007. pp 147--163.]]
[32]
Scargle, J., (2000). Bayesian Blocks, A new method to analyze structure in photon counting data, Astrophysical Journal, 504, pp 405--418.]]
[33]
Staden, R. (1989). Methods for discovering novel motifs in nucleic acid sequences. Comput. Appl. Biosci., Vol. 5(5). pp 293--298.]]
[34]
Tompa, M. & Buhler, J. (2001). Finding motifs using random projections. In proceedings of the 5th Int'l Conference on Computational Molecular Biology. Montreal, Canada, Apr 22--25. pp 67--74.]]
[35]
Vlachos, M., Kollios, G. & Gunopulos, G. (2002). Discovering similar multidimensional trajectories. In proceedings 18th International Conference on Data Engineering. pp 673--684.]]
[36]
Yi, B. K., & Faloutsos, C. (2000). Fast time sequence indexing for arbitrary Lp norms. In proceedings of the 26th Intl Conference on Very Large Databases. pp 385--394.]]
[37]
Yi, B. K., Jagadish, H., & Faloutsos, C. (1998). Efficient retrieval of similar time sequences under time wrapping. IEEE International Conference on Data Engineering. pp 201--208.]]

Cited By

View all
  • (2024)Time Series Data MiningComputer Science in Sport10.1007/978-3-662-68313-2_17(141-148)Online publication date: 6-Mar-2024
  • (2023)Integration of Fuzzy Ontologies and Neural Networks in the Detection of Time Series AnomaliesMathematics10.3390/math1105120411:5(1204)Online publication date: 1-Mar-2023
  • (2023)Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding DistancesProceedings of the VLDB Endowment10.14778/3594512.359453016:8(2019-2032)Online publication date: 22-Jun-2023
  • Show More Cited By

Index Terms

  1. Probabilistic discovery of time series motifs

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '03: Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2003
    736 pages
    ISBN:1581137370
    DOI:10.1145/956750
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 24 August 2003

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. data mining
    2. motifs
    3. randomized algorithms
    4. time series

    Qualifiers

    • Article

    Conference

    KDD03
    Sponsor:

    Acceptance Rates

    KDD '03 Paper Acceptance Rate 46 of 298 submissions, 15%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)53
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Time Series Data MiningComputer Science in Sport10.1007/978-3-662-68313-2_17(141-148)Online publication date: 6-Mar-2024
    • (2023)Integration of Fuzzy Ontologies and Neural Networks in the Detection of Time Series AnomaliesMathematics10.3390/math1105120411:5(1204)Online publication date: 1-Mar-2023
    • (2023)Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding DistancesProceedings of the VLDB Endowment10.14778/3594512.359453016:8(2019-2032)Online publication date: 22-Jun-2023
    • (2023)GraphTS: Graph-represented time series for subsequence anomaly detectionPLOS ONE10.1371/journal.pone.029009218:8(e0290092)Online publication date: 16-Aug-2023
    • (2023)Parameter-free Spikelet: Discovering Different Length and Warped Time Series Motifs using an Adaptive Time Series RepresentationProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599310(857-866)Online publication date: 6-Aug-2023
    • (2023)IRMAC: Interpretable Refined Motifs in Binary Classification for smart grid applicationsEngineering Applications of Artificial Intelligence10.1016/j.engappai.2022.105588117(105588)Online publication date: Jan-2023
    • (2023)Discovering time series motifs of all lengths using dynamic time warpingWorld Wide Web10.1007/s11280-023-01207-626:6(3815-3836)Online publication date: 20-Sep-2023
    • (2023)Time series analysis acceleration with advanced vectorization extensionsThe Journal of Supercomputing10.1007/s11227-023-05060-279:9(10178-10207)Online publication date: 2-Feb-2023
    • (2023)Zeitreihen Data MiningSportinformatik10.1007/978-3-662-67026-2_17(159-167)Online publication date: 15-Oct-2023
    • (2022)Fast Summarization of Long Time Series with Graphics ProcessorMathematics10.3390/math1010178110:10(1781)Online publication date: 23-May-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media