Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Finding the most unusual time series subsequence: algorithms and applications

Published: 01 January 2007 Publication History

Abstract

In this work we introduce the new problem of finding time seriesdiscords. Time series discords are subsequences of longer time series that are maximally different to all the rest of the time series subsequences. They thus capture the sense of the most unusual subsequence within a time series. While discords have many uses for data mining, they are particularly attractive as anomaly detectors because they only require one intuitive parameter (the length of the subsequence) unlike most anomaly detection algorithms that typically require many parameters. While the brute force algorithm to discover time series discords is quadratic in the length of the time series, we show a simple algorithm that is three to four orders of magnitude faster than brute force, while guaranteed to produce identical results. We evaluate our work with a comprehensive set of experiments on diverse data sources including electrocardiograms, space telemetry, respiration physiology, anthropological and video datasets.

References

[1]
Bentley JL, Sedgewick R (1997) Fast algorithms for sorting and searching strings. In: Proceedings of the 8th annual ACM-SIAM symposium on discrete algorithms, pp 360---369
[2]
Chen Z, Fu A, Tang J (2003) On complementarity of cluster and outlier detection schemes. In: Proceedings of data warehousing and knowledge discovery (DaWaK 2003), pp 234---243
[3]
Chiu B, Keogh E, Lonardi S (2004) Probabilistic discovery of time series motifs. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 493---498
[4]
Coerman TH, Leiserson CE, Rivest RL, et al. (2001) Introduction to algorithms, 2nd edn. MIT Press, Cambridge, MA
[5]
Dasgupta D, Forrest S (1996) Novelty detection in time series data using ideas from immunology. In: Proceedings of the 5th international conference on intelligent systems, pp 87---92
[6]
Duchene F, Garbayl C, Rialle V (2004) Mining heterogeneous multivariate time-series for learning meaningful patterns: application to home health telecare. Laboratory TIMC-IMAG, Facult'e de m'edecine de Grenoble, France
[7]
Fleagle JG (1999) Primate adaptation and evolution. Academic Press, San Diego, CA
[8]
Gionis A, Mannila H (2003) Finding recurrent sources in sequences. In: Proceedings of the 7th annual international conference on research in computational molecular biology (RECOMB 2003), pp 123---130
[9]
Keogh E (2005) Availabe via http://www.cs.ucr.edu/~eamonn/discords/
[10]
Keogh E, Kasetty S (2002) On the need for time series data mining benchmarks: a survey and empirical demonstration. In: Proceedings of the 8th ACM SIGKDD international conference on knowledge discovery and data mining, pp 102---111
[11]
Keogh E, Lonardi S, Ratanamahatana C (2004) Towards parameter-free data mining. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 206---215
[12]
Kitaguchi S (2004) Extracting feature based on motif from a chronic hepatitis dataset. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence (JSAI)
[13]
Knorr E, Ng R, Tucakov V (2000) Distance-based outliers: algorithms and applications. VLDB J 8(3/4):237---253
[14]
Kumar N, Lolla N, Keogh E, et al. (2005) Time-series bitmaps: a practical visualization tool for working with large time series databases. In: Proceedings of the 5th SIAM international conference on data mining, pp 531---535
[15]
Lanctot JK, Li M, Ma B, et al. (2003) Distinguishing string selection problems. Inf Comput 185(1):41---55
[16]
Lin J, Keogh E, Lonardi S, et al. (2003) A symbolic representation of time series, with implications for streaming algorithms. In: Proceedings of the 8th ACM SIGMOD workshop on research issues in data mining and knowledge discovery, pp 2---11
[17]
Lin J, Keogh E, Lonardi S, et al. (2004) Visually mining and monitoring massive time series. In: Proceedings of the 10th ACM SIGKDD international conference on knowledge discovery and data mining, pp 460---469
[18]
Ma J, Perkins S (2003) Online novelty detection on temporal sequences. In: Proceedings of the 9th ACM SIGKDD international conference on knowledge discovery and data mining, pp 613---618
[19]
Ratanamahatana C, Keogh E (2004) Making time-series classification more accurate using learned constraints. In: Proceedings of the 4th SIAM international conference on data mining, pp 11---22
[20]
Ratanamahatana C, Keogh E (2005) Three myths about dynamic time warping. In: Proceedings of the 5th SIAM international conference on data mining, pp 506---510
[21]
Rombo S, Terracina G (2004) Discovering representative models in large time series databases. In: Proceedings of the 6th international conference on flexible query answering systems, pp 84---97
[22]
Ruzzo WL, Tompa M (1999) A linear time algorithm for finding all maximal scoring subsequences. In: Proceedings of the 7th international conference on intelligent systems for molecular biology, pp 234---241
[23]
Sadakane K (2000) Compressed text databases with efficient query algorithms based on the compressed suffix array. In: Proceedings of the 11th international conference on algorithms and computation (ISAAC 2000), pp 410---421
[24]
Tanaka Y, Uehara K (2004) Motif discovery algorithm from motion data. In: Proceedings of the 18th annual conference of the Japanese society for artificial intelligence (JSAI)
[25]
White TD (2000) Human osteology, 2nd edn. Academic Press, San Diego, New York, pp 63---64
[26]
Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbitrary Lp norms. In: Proceedings of the 26th international conference on very large data bases, pp 385---394

Cited By

View all
  • (2022)Nearly k-Universal Words - Investigating a Part of Simon’s CongruenceDescriptional Complexity of Formal Systems10.1007/978-3-031-13257-5_5(57-71)Online publication date: 29-Aug-2022
  • (2021)A Review on Outlier/Anomaly Detection in Time Series DataACM Computing Surveys10.1145/344469054:3(1-33)Online publication date: 17-Apr-2021
  • (2021)Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional DataMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-030-86514-6_2(19-36)Online publication date: 13-Sep-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 11, Issue 1
January 2007
128 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 January 2007

Author Tags

  1. Anomaly detection
  2. Clustering
  3. Time series data mining

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Nearly k-Universal Words - Investigating a Part of Simon’s CongruenceDescriptional Complexity of Formal Systems10.1007/978-3-031-13257-5_5(57-71)Online publication date: 29-Aug-2022
  • (2021)A Review on Outlier/Anomaly Detection in Time Series DataACM Computing Surveys10.1145/344469054:3(1-33)Online publication date: 17-Apr-2021
  • (2021)Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional DataMachine Learning and Knowledge Discovery in Databases. Applied Data Science Track10.1007/978-3-030-86514-6_2(19-36)Online publication date: 13-Sep-2021
  • (2020)Efficient Mining of Outlying Sequence Patterns for Analyzing Outlierness of Sequence DataACM Transactions on Knowledge Discovery from Data10.1145/339967114:5(1-26)Online publication date: 5-Aug-2020
  • (2019)Efficient discovery of sequence outlier patternsProceedings of the VLDB Endowment10.14778/3324301.332430812:8(920-932)Online publication date: 1-Apr-2019
  • (2018)On Real-time Detecting Passenger Flow AnomaliesProceedings of the 27th ACM International Conference on Information and Knowledge Management10.1145/3269206.3271754(1053-1062)Online publication date: 17-Oct-2018
  • (2018)TABORProceedings of the 2018 on Asia Conference on Computer and Communications Security10.1145/3196494.3196546(525-536)Online publication date: 29-May-2018
  • (2018)Efficient NP Tests for Anomaly Detection Over Birth-Death Type DTMCsJournal of Signal Processing Systems10.1007/s11265-016-1147-090:2(175-184)Online publication date: 1-Feb-2018
  • (2018)Exact variable-length anomaly detection algorithm for univariate and multivariate time seriesData Mining and Knowledge Discovery10.1007/s10618-018-0569-732:6(1806-1844)Online publication date: 1-Nov-2018
  • (2017)Anomaly-based annotation error detection in speech-synthesis corporaComputer Speech and Language10.1016/j.csl.2017.04.00746:C(1-35)Online publication date: 1-Nov-2017
  • Show More Cited By

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media