Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Series2Graph: graph-based subsequence anomaly detection for time series

Published: 01 July 2020 Publication History

Abstract

Subsequence anomaly detection in long sequences is an important problem with applications in a wide range of domains. However, the approaches that have been proposed so far in the literature have severe limitations: they either require prior domain knowledge that is used to design the anomaly discovery algorithms, or become cumbersome and expensive to use in situations with recurrent anomalies of the same type. In this work, we address these problems, and propose an unsupervised method suitable for domain agnostic subsequence anomaly detection. Our method, Series2Graph, is based on a graph representation of a novel low-dimensionality embedding of subsequences. Series2Graph needs neither labeled instances (like supervised techniques), nor anomaly-free data (like zero-positive learning techniques), and identifies anomalies of varying lengths. The experimental results, on the largest set of synthetic and real datasets used to date, demonstrate that the proposed approach correctly identifies single and recurrent anomalies without any prior knowledge of their characteristics, outperforming by a large margin several competing approaches in accuracy, while being up to orders of magnitude faster.

References

[1]
Series2Graph Webpage. http://helios.mi.parisdescartes.fr/~themisp/series2graph/, 2020.
[2]
D. Abboud, M. Elbadaoui, W. Smith, and R. Randall. Advanced bearing diagnostics: A comparative study of two powerful approaches. MSSP, 114, 2019.
[3]
A. Abdul-Aziz, M. R. Woike, N. C. Oza, B. L. Matthews, and J. D. lekki. Rotor health monitoring combining spin tests and data-driven anomaly detection methods. Structural Health Monitoring, 2012.
[4]
M. Ali Abdul-Aziz, N. Woike, B. Oza, Matthews, and G. Baakilini. Propulsion health monitoring of a turbine engine disk using spin test data, 2010.
[5]
J. Antoni and P. Borghesani. A statistical methodology for the design of condition indicators. Mechanical Systems and Signal Processing, 2019.
[6]
A. J. Bagnall, R. L. Cole, T. Palpanas, and K. Zoumpatianos. Data series management (dagstuhl seminar 19282). Dagstuhl Reports, 9(7), 2019.
[7]
S. Bahaadini, V. Noroozi, N. Rohani, S. Coughlin, M. Zevin, J. Smith, V. Kalogera, and A. Katsaggelos. Machine learning for gravity spy: Glitch classification and dataset. Information Sciences, 444:172--186, 5 2018.
[8]
V. Barnet and T. Lewis. Outliers in Statistical Data. John Wiley and Sons, Inc., 1994.
[9]
P. Boniol, M. Linardi, F. Roncallo, and T. Palpanas. Automated anomaly detection in large sequences. In ICDE, 2020.
[10]
P. Boniol, M. Linardi, F. Roncallo, and T. Palpanas. SAD: an unsupervised system for subsequence anomaly detection. In ICDE, 2020.
[11]
P. Boniol, T. Palpanas, M. Meftah, and E. Remy. Graphan: Graph-based subsequence anomaly detection. PVLDB, 13(11), 2020.
[12]
L. Bontemps, V. L. Cao, J. McDermott, and N. LeKhac. Collective anomaly detection based on long short term memory recurrent neural network. CoRR, abs/1703.09752, 2017.
[13]
E. Bradley and H. Kantz. Nonlinear time-series analysis revisited. Chaos: An Interdisciplinary Journal of Nonlinear Science, 2015.
[14]
M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: Identifying density-based local outliers. In SIGMOD, 2000.
[15]
Y. Bu, O. T. Leung, A. W. Fu, E. J. Keogh, J. Pei, and S. Meshkin. WAT: finding top-k discords in time series database. In SIAM, 2007.
[16]
B. Y. Chiu, E. J. Keogh, and S. Lonardi. Probabilistic discovery of time series motifs. In SIGKDD 2003, pages 493--498, 2003.
[17]
N. Daouayry, A. Mechouche, P.-L. Maisonneuve, V.-M. Scuturici, and J.-M. Petit. Data-centric helicopter failure anticipation: The mgb oil pressure virtual sensor case. IEEE BigData, 2019.
[18]
K. Echihabi, K. Zoumpatianos, and T. Palpanas. Scalable Machine Learning on High-Dimensional Vectors: From Data Series to Deep Network Embeddings. In WIMS, 2020.
[19]
K. Echihabi, K. Zoumpatianos, T. Palpanas, and H. Benbrahim. The lernaean hydra of data series similarity search: An experimental evaluation of the state of the art. PVLDB, 2019.
[20]
K. Echihabi, K. Zoumpatianos, T. Palpanas, and H. Benbrahim. Return of the Lernaean Hydra: Experimental Evaluation of Data Series Approximate Similarity Search. PVLDB, 2019.
[21]
G. et al. Physiobank, physiotoolkit, and physionet. Circulation.
[22]
Y. Z. et al. Matrix profile II: exploiting a novel algorithm and gpus to break the one hundred million barrier for time series motifs and joins. In ICDM 2016.
[23]
A. W. Fu, O. T. Leung, E. J. Keogh, and J. Lin. Finding time series discords based on haar transform. In ADMA, 2006.
[24]
Z.-K. Gao and N. Jin. Complex network from time series based on phase space reconstruction. Chaos (Woodbury, N. Y.), 19:033137, 09 2009.
[25]
Z.-K. Gao, M. Small, and J. Kurths. Complex network analysis of time series. EPL (Europhysics Letters), 116(5):50001, dec 2016.
[26]
A. Gogolou, T. Tsandilas, K. Echihabi, A. Bezerianos, and T. Palpanas. Data Series Progressive Similarity Search with Probabilistic Quality Guarantees. In SIGMOD, 2020.
[27]
M. Hadjem, F. Naït-Abdesselam, and A. A. Khokhar. St-segment and t-wave anomalies prediction in an ECG data using rusboost. In Healthcom, 2016.
[28]
N. Halko, P. G. Martinsson, and J. A. Tropp. Finding structure with randomness: Probabilistic algorithms for constructing approximate matrix decompositions. SIAM Review, 2011.
[29]
S. Hochreiter and J. Schmidhuber. Long short-term memory. Neural Comput., 9(8):1735--1780, Nov. 1997.
[30]
H. Kantz and T. Schreiber. Nonlinear Time Series Analysis. Cambridge University Press, New York, NY, USA, 2003.
[31]
E. Keogh, S. Lonardi, and C. A. Ratanamahatana. Towards parameter-free data mining. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD '04, pages 206--215, New York, NY, USA, 2004. ACM.
[32]
E. Keogh, S. Lonardi, C. A. Ratanamahatana, L. Wei, S.-H. Lee, and J. Handley. Compression-based data mining of sequential data. Data Mining and Knowledge Discovery, 2007.
[33]
E. J. Keogh, J. Lin, and A. W. Fu. HOT SAX: efficiently finding the most unusual time series subsequence. In ICDM, 2005.
[34]
L. Lacasa, B. Luque, F. Ballesteros, J. Luque, and J. C. Nuno. From time series to complex networks: The visibility graph. Proceedings of the National Academy of Sciences, 105(13):4972--4975, 2008.
[35]
T. Lee, J. Gottschlich, N. Tatbul, E. Metcalf, and S. Zdonik. Greenhouse: A zero-positive machine learning system for time-series anomaly detection. CoRR, abs/1801.03168, 2018.
[36]
M. Linardi, Y. Zhu, T. Palpanas, and E. J. Keogh. Matrix Profile Goes MAD: Variable-Length Motif And Discord Discovery in Data Series. In DAMI, 2020.
[37]
F. T. Liu, K. M. Ting, and Z.-H. Zhou. Isolation forest. In ICDM, ICDM, 2008.
[38]
Y. Liu, X. Chen, and F. Wang. Efficient Detection of Discords for Time Series Stream. Advances in Data and Web Management, pages 629--634, 2009.
[39]
W. Luo and M. Gallagher. Faster and parameter-free discord search in quasi-periodic time series. In J. Z. Huang, L. Cao, and J. Srivastava, editors, Advances in Knowledge Discovery and Data Mining, 2011.
[40]
P. Malhotra, L. Vig, G. Shroff, and P. Agarwal. Long short term memory networks for anomaly detection in time series. 2015.
[41]
K. Mirylenka, A. Marascu, T. Palpanas, M. Fehr, S. Jank, G. Welde, and D. Groeber. Envelope-based anomaly detection for high-speed manufacturing processes. European Advanced Process Control and Manufacturing Conference, 2013.
[42]
G. B. Moody and R. G. Mark. The impact of the mit-bih arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 2001.
[43]
A. Mueen, E. J. Keogh, Q. Zhu, S. Cash, and M. B. Westover. Exact discovery of time series motifs. In SDM 2009.
[44]
T. Palpanas. Data series management: The road to big sequence analytics. SIGMOD Rec., 44(2):47--52, Aug. 2015.
[45]
T. Palpanas. Evolution of a Data Series Index. CCIS, 1197, 2020.
[46]
T. Palpanas and V. Beckmann. Report on the First and Second Interdisciplinary Time Series Analysis Workshop (ITISA). ACM SIGMOD Record, 48(3), 2019.
[47]
T. Pelkonen, S. Franklin, P. Cavallaro, Q. Huang, J. Meza, J. Teller, and K. Veeraraghavan. Gorilla: A fast, scalable, in-memory time series database. PVLDB, 8(12):1816--1827, 2015.
[48]
B. Peng, P. Fatourou, and T. Palpanas. MESSI: In-Memory Data Series Indexing. In ICDE, 2020.
[49]
B. Peng, T. Palpanas, and P. Fatourou. Paris+: Data series indexing on multi-core architectures. TKDE, 2020.
[50]
D. W. Scott. Multivariate Density Estimation. Theory, Practice, and Visualization. Wiley, 1992.
[51]
P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedihardjo, C. Chen, and S. Frankenstein. Time series anomaly discovery with grammar-based compression. In EDBT, 2015.
[52]
P. Senin, J. Lin, X. Wang, T. Oates, S. Gandhi, A. P. Boedihardjo, C. Chen, and S. Frankenstein. Grammarviz 3.0: Interactive discovery of variable-length time series patterns. TKDD, 2018.
[53]
S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In VLDB 2006, pages 187--198, 2006.
[54]
J. Wang, A. Balasubramanian, L. M. de la Vega, J. Green, A. Samal, and B. Prabhakaran. Word recognition from continuous articulatory movement time-series data using symbolic representations. In SLPAT.
[55]
L. Wei, E. J. Keogh, and X. Xi. Saxually explicit images: Finding unusual shapes. In Proceedings of the 6th IEEE International Conference on Data Mining (ICDM 2006), 18--22 December 2006, Hong Kong, China, pages 711--720, 2006.
[56]
C. Whitney, D. Gottlieb, S. Redline, R. Norman, R. Dodge, E. Shahar, S. Surovec, and F. Nieto. Reliability of scoring respiratory disturbance indices and sleep staging. Sleep, November 1998.
[57]
D. Yankov, E. J. Keogh, J. Medina, B. Y. Chiu, and V. B. Zordan. Detecting time series motifs under uniform scaling. In ACM.
[58]
D. Yankov, E. J. Keogh, and U. Rebbapragada. Disk aware discord discovery: Finding unusual time series in terabyte sized datasets. In ICDM, 2007.
[59]
D. Yankov, E. J. Keogh, and U. Rebbapragada. Disk aware discord discovery: finding unusual time series in terabyte sized datasets. Knowl. Inf. Syst., 17(2):241--262, 2008.
[60]
C. M. Yeh, Y. Zhu, L. Ulanova, N. Begum, Y. Ding, H. A. Dau, D. F. Silva, A. Mueen, and E. J. Keogh. Matrix profile I: all pairs similarity joins for time series: A unifying view that includes motifs, discords and shapelets. In ICDM, pages 1317--1322, 2016.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 13, Issue 12
August 2020
1710 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2020
Published in PVLDB Volume 13, Issue 12

Author Tags

  1. data series
  2. outliers
  3. subsequence anomalies
  4. time series

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)88
  • Downloads (Last 6 weeks)3
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Graph Time-series Modeling in Deep Learning: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/363853418:5(1-35)Online publication date: 28-Feb-2024
  • (2023)Multi-QueryingInformatica10.15388/23-INFOR51934:3(557-576)Online publication date: 28-Jun-2023
  • (2023)Choose Wisely: An Extensive Evaluation of Model Selection for Anomaly Detection in Time SeriesProceedings of the VLDB Endowment10.14778/3611479.361153616:11(3418-3432)Online publication date: 24-Aug-2023
  • (2023)Accelerating Similarity Search for Elastic Measures: A Study and New Generalization of Lower Bounding DistancesProceedings of the VLDB Endowment10.14778/3594512.359453016:8(2019-2032)Online publication date: 22-Jun-2023
  • (2023)ELPIS: Graph-Based Similarity Search for Scalable Data ScienceProceedings of the VLDB Endowment10.14778/3583140.358316616:6(1548-1559)Online publication date: 1-Feb-2023
  • (2023)Odyssey: A Journey in the Land of Distributed Data Series Similarity SearchProceedings of the VLDB Endowment10.14778/3579075.357908716:5(1140-1153)Online publication date: 1-Jan-2023
  • (2023)LightTS: Lightweight Time Series Classification with Adaptive Ensemble DistillationProceedings of the ACM on Management of Data10.1145/35893161:2(1-27)Online publication date: 20-Jun-2023
  • (2023)Dumpy: A Compact and Adaptive Index for Large Data Series CollectionsProceedings of the ACM on Management of Data10.1145/35889651:1(1-27)Online publication date: 30-May-2023
  • (2023)DCdetector: Dual Attention Contrastive Representation Learning for Time Series Anomaly DetectionProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3580305.3599295(3033-3045)Online publication date: 6-Aug-2023
  • (2022)TODProceedings of the VLDB Endowment10.14778/3570690.357070316:3(546-560)Online publication date: 1-Nov-2022
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media