Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2783258.2788611acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Generic and Scalable Framework for Automated Time-series Anomaly Detection

Published: 10 August 2015 Publication History

Abstract

This paper introduces a generic and scalable framework for automated anomaly detection on large scale time-series data. Early detection of anomalies plays a key role in maintaining consistency of person's data and protects corporations against malicious attackers. Current state of the art anomaly detection approaches suffer from scalability, use-case restrictions, difficulty of use and a large number of false positives. Our system at Yahoo, EGADS, uses a collection of anomaly detection and forecasting models with an anomaly filtering layer for accurate and scalable anomaly detection on time-series. We compare our approach against other anomaly detection systems on real and synthetic data with varying time-series characteristics. We found that our framework allows for 50-60% improvement in precision and recall for a variety of use-cases. Both the data and the framework are being open-sourced. The open-sourcing of the data, in particular, represents the first of its kind effort to establish the standard benchmark for anomaly detection.

Supplementary Material

MP4 File (p1939.mp4)

References

[1]
C. Aggarwal. Outlier Analysis. Springer New York, 2013.
[2]
P. Bloomfield. Fourier analysis of time series: an introduction. John Wiley & Sons, 2004.
[3]
M. M. Breunig, H.-P. Kriegel, R. T. Ng, and J. Sander. Lof: Identifying density-based local outliers. SIGMOD Rec., 29(2):93--104, May 2000.
[4]
V. Chandola, A. Banerjee, and V. Kumar. Anomaly detection: A survey. ACM Comput. Surv., 41(3):15:1--15:58, July 2009.
[5]
R. B. Cleveland, W. S. Cleveland, J. E. McRae, and I. Terpenning. Stl: A seasonal-trend decomposition procedure based on loess. Journal of Official Statistics, 6(1):3--73, 1990.
[6]
J. Durbin and S. J. Koopman. Time series analysis by state space methods. Number 38. Oxford University Press, 2012.
[7]
V. A. Epanechnikov. Non-parametric estimation of a multivariate probability density. Theory of Probability & Its Applications, 14(1):153--158, 1969.
[8]
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line learning and an application to boosting, 1996.
[9]
S. S. Haykin, S. S. Haykin, and S. S. Haykin. Kalman filtering and neural networks. Wiley Online Library, 2001.
[10]
R. J. Hyndman and A. B. Koehler. Another look at measures of forecast accuracy. International Journal of Forecasting, pages 679--688, 2006.
[11]
R. H. Jones. Exponential smoothing for multivariate time series. Journal of the Royal Statistical Society. Series B (Methodological), pages 241--251, 1966.
[12]
Y. Kawahara, T. Yairi, and K. Machida. Change-point detection in time-series data based on subspace identification. In ICDM, pages 559--564. IEEE, 2007.
[13]
A. Kejariwal and P. Kumar. Mitigating user experience from 'breaking bad': The twitter approach. In Velocity, New York, NY, Sept. 2014.
[14]
R. Killick. changepoint, an R package that implements various mainstream and specialised changepoint methods., 2014.
[15]
L. Komsta. outliers, an R package of some tests commonly used outlier detection techniques., 2011.
[16]
S. Kullback. Information theory and statistics. Courier Corporation, 1997.
[17]
Z. Lan, Z. Zheng, and Y. Li. Toward automated anomaly identification in large-scale systems. Parallel and Distributed Systems, IEEE Transactions on, 21(2):174--187, 2010.
[18]
N. Laptev and S. Amizadeh. Online dataset for anomaly detection. http://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70, April 2015.
[19]
N. Laptev and S. Amizadeh. Egads source code. https://github.com/yahoo/egads, June 2015.
[20]
S. Liu, M. Yamada, N. Collier, and M. Sugiyama. Change-point detection in time-series data by relative density-ratio estimation. Neural Networks, 43:72--83, 2013.
[21]
V. Moskvina and A. Zhigljavsky. An algorithm based on singular spectrum analysis for change-point detection. Communications in Statistics-Simulation and Computation, 32(2):319--352, 2003.
[22]
D. B. Percival and A. T. Walden. Wavelet methods for time series analysis, volume 4. Cambridge University Press, 2006.
[23]
B. K. Ray and R. S. Tsay. Bayesian methods for change-point detection in long-range dependent processes. Journal of Time Series Analysis, 23(6):687--705, 2002.
[24]
B. Rosner. Percentage points for a generalized esd many-outlier procedure. Technometrics, 25(2):165--172, 1983.
[25]
A. Toshniwal, S. Taneja, A. Shukla, K. Ramasamy, J. M. Patel, S. Kulkarni, J. Jackson, K. Gade, M. Fu, J. Donham, N. Bhagat, S. Mittal, and D. Ryaboy. Storm@twitter. In SIGMOD, pages 147--156, New York, NY, USA, 2014. ACM.
[26]
O. Vallis, J. Hochenbaum, and A. Kejariwal. A novel technique for long-term anomaly detection in the cloud. In USENIX, Philadelphia, PA, June 2014. USENIX Association.
[27]
M. van der Loo. extremevalues, an R package for outlier detection in univariate data, 2010. R package version 2.0.
[28]
S. Venkataraman, J. Caballero, D. Song, A. Blum, and J. Yates. Black box anomaly detection: is it utopian? 2006.
[29]
X. Wang, K. Smith-Miles, and R. Hyndman. Rule induction for forecasting method selection: Meta-learning the characteristics of univariate time series. Neurocomput., 72(10--12):2581--2594, June 2009.
[30]
W. W.-S. Wei. Time series analysis. Addison-Wesley publ, 1994.
[31]
Y. Xie, J. Huang, and R. Willett. Change-point detection for high-dimensional time series with missing data. Selected Topics in Signal Processing, IEEE Journal of, 7(1):12--27, 2013.

Cited By

View all
  • (2024)Deep-SDM: A Unified Computational Framework for Sequential Data Modeling Using Deep Learning ModelsSoftware10.3390/software30100033:1(47-61)Online publication date: 28-Feb-2024
  • (2024)Predicting Machine Failures from Multivariate Time Series: An Industrial Case StudyMachines10.3390/machines1206035712:6(357)Online publication date: 22-May-2024
  • (2024)Unsupervised Anomaly Detection of Intermittent Demand for Spare Parts Based on Dual-Tailed ProbabilityElectronics10.3390/electronics1301019513:1(195)Online publication date: 2-Jan-2024
  • Show More Cited By

Index Terms

  1. Generic and Scalable Framework for Automated Time-series Anomaly Detection

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '15: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    August 2015
    2378 pages
    ISBN:9781450336642
    DOI:10.1145/2783258
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 10 August 2015

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. anomaly detection
    2. scalable anomaly detection
    3. time-series

    Qualifiers

    • Research-article

    Conference

    KDD '15
    Sponsor:

    Acceptance Rates

    KDD '15 Paper Acceptance Rate 160 of 819 submissions, 20%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)468
    • Downloads (Last 6 weeks)34
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Deep-SDM: A Unified Computational Framework for Sequential Data Modeling Using Deep Learning ModelsSoftware10.3390/software30100033:1(47-61)Online publication date: 28-Feb-2024
    • (2024)Predicting Machine Failures from Multivariate Time Series: An Industrial Case StudyMachines10.3390/machines1206035712:6(357)Online publication date: 22-May-2024
    • (2024)Unsupervised Anomaly Detection of Intermittent Demand for Spare Parts Based on Dual-Tailed ProbabilityElectronics10.3390/electronics1301019513:1(195)Online publication date: 2-Jan-2024
    • (2024)Pupillometry and autonomic nervous system responses to cognitive load and false feedback: an unsupervised machine learning approachFrontiers in Neuroscience10.3389/fnins.2024.144569718Online publication date: 30-Aug-2024
    • (2024)Assessment of the technical state of mining machinery and devices with the use of diagnostic methodsProduction Engineering Archives10.30657/pea.2024.30.2630:2(266-272)Online publication date: 26-May-2024
    • (2024)Univariate Time Series Anomaly Detection Based on Hierarchical Attention NetworkTsinghua Science and Technology10.26599/TST.2023.901007329:4(1181-1193)Online publication date: Aug-2024
    • (2024)An Experimental Evaluation of Anomaly Detection in Time SeriesProceedings of the VLDB Endowment10.14778/3632093.363211017:3(483-496)Online publication date: 20-Jan-2024
    • (2024)Graph Time-series Modeling in Deep Learning: A SurveyACM Transactions on Knowledge Discovery from Data10.1145/363853418:5(1-35)Online publication date: 28-Feb-2024
    • (2024)Pre-trained KPI Anomaly Detection Model Through Disentangled TransformerProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671522(6190-6201)Online publication date: 25-Aug-2024
    • (2024)Revisiting VAE for Unsupervised Time Series Anomaly Detection: A Frequency PerspectiveProceedings of the ACM Web Conference 202410.1145/3589334.3645710(3096-3105)Online publication date: 13-May-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media