Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Time series anomaly detection via clustering-based representation

  • Original Paper
  • Published:
Evolving Systems Aims and scope Submit manuscript

Abstract

Time series anomaly detection is an important field of data science. Statistical, distance-based, clustering-based, or density-based approaches can detect anomalies. Generally, distance-based methods are relatively straightforward, but the method’s effectiveness depends on how well they handle the distribution of data points. To address the challenge, a preprocessing step is used to convert the underlying time series into a more useful format. In this paper, a novel clustering-based representation of time series is proposed. This representation is then used to compute anomaly scores and detect anomalies. Experimental studies on synthetic and real datasets show that proposed method outperforms other methods by up to 75% for five standard performance metrics.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Data availibility

Yahoo S5 datasets analyzed during the current study are available at https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70, and the Synthetic datasets generated by https://github.com/KDD-OpenSource/agots repository. The Sin dataset is also generated by Eq. (8).

Notes

  1. For a more formal definition see Definition 1 in Sect. 2.

  2. Piecewise Aggregate Approximation.

  3. Symbolic Aggregate approXimation.

  4. Discrete Fourier Transform.

  5. Discrete Wavelet Transform.

  6. Singular Value Decomposition.

  7. Principal Component Analysis.

  8. Gaussian Mixture Models.

  9. Stochastic Outlier Selection.

  10. Clustering-Based Local Outlier Factor.

  11. Isolation Forest.

  12. It should be noted that the clustering-based representation mechanism is completely different from the clustering-based anomaly detection approaches discussed in Sect. 3.

  13. Optimal Sequence Clustering algorithm.

  14. https://webscope.sandbox.yahoo.com/catalog.php?datatype=s&did=70.

  15. https://github.com/KDD-OpenSource/agots.

  16. https://github.com/ir1979/CUBOID.

  17. https://pyts.readthedocs.io.

  18. https://pycaret.readthedocs.io.

References

  • Akhmedova S, Stanovov V, Kamiya Y (2022) A hybrid clustering approach based on fuzzy logic and evolutionary computation for anomaly detection. Algorithms 15(10):342

    Article  Google Scholar 

  • Aljawarneh SA, Vangipuram R (2020) GARUDA: Gaussian dissimilarity measure for feature Representation and anomaly Detection in internet of things. J Supercomput 76(6):4376–4413

    Article  Google Scholar 

  • Arumugam P, Saranya R (2018) Outlier detection and missing value in seasonal ARIMA model using rainfall data. Mater Today Proc 5(1):1791–1799

    Article  Google Scholar 

  • Azzaoui H, Boukhamla AZE, Arroyo D, Bensayah A (2022) Developing new deep-learning model to enhance network intrusion classification. Evol Syst 13(1):17–25

    Article  Google Scholar 

  • Blázquez-García A, Conde A, Mori U, Lozano JA (2021) A review on outlier/anomaly detection in time series data. ACM Comput Surv (CSUR) 54(3):1–33

    Article  Google Scholar 

  • Bountrogiannis K, Tzagkarakis G, Tsakalides P (2021) Anomaly detection for symbolic time series representations of reduced dimensionality. In: 28th European signal processing conference (EUSIPCO), pp 2398–2402

  • Breunig MM, Kriegel HP, Ng RT, Sander J (2000) LOF: identifying density-based local outliers. In: Proceedings of the ACM SIGMOD international conference on management of data, pp 93–104

  • Carmona-Poyato Á, Fernández-García NL, Madrid-Cuevas FJ, Durán-Rosal AM (2020) A new approach for optimal time-series segmentation. Pattern Recogn Lett 135:153–159

    Article  Google Scholar 

  • Chadha GS, Islam I, Schwung A, Ding SX (2021) Deep convolutional clustering-based time series anomaly detection. Sensors 21(16):5488

    Article  Google Scholar 

  • Cheng X, Wang Z, Yang X, Xu L, Liu Y (2021) Multi-scale detection and interpretation of spatio-temporal anomalies of human activities represented by time-series. Comput Environ Urban Syst 88:101627

    Article  Google Scholar 

  • Choi H-C, Deng C, Park H, Hwang I (2023) Gaussian Mixture Model-Based online anomaly detection for vectored area navigation arrivals. J Aerosp Inf Syst 20(1):37–52

    Google Scholar 

  • Cook AA, Mısırlı G, Fan Z (2019) Anomaly detection for IoT time-series data: a survey. IEEE Internet Things J 7(7):6481–6494

    Article  Google Scholar 

  • Fernandes M, Canito A, Corchado JM, Marreiros G (2019) Fault detection mechanism of a predictive maintenance system based on Autoregressive Integrated Moving Average models. In: Distributed computing and artificial intelligence, 16th international conference, pp 171–180

  • Figueroa K, Paredes R, Reyes N (2018) New permutation is similarity measures for proximity searching. In: International conference on similarity search and applications, pp 122–133

  • Fox AJ (1972) Outliers in time series. J R Stat Soc Ser B (Methodol) 34(3):350–363

    Article  MathSciNet  Google Scholar 

  • Geiger A, Liu D, Alnegheimish S, Cuesta-Infante A, Veeramachaneni K (2020) Tadgan: Time series anomaly detection using generative adversarial networks. In: IEEE international conference on big data (Big Data), pp 33–43

  • Ghalyan IF, Ghalyan NF, Ray A (2021) Optimal window-symbolic time series analysis for pattern classification and anomaly detection. IEEE Trans Industr Inf 18(4):2614–2621

    Article  Google Scholar 

  • Hagemann T, Katsarou K (2020) Reconstruction-based anomaly detection for the cloud: a comparison on the Yahoo! Webscope S5 dataset. In: Proceedings of the 4th international conference on cloud and big data computing, pp 68–75

  • He Z, Xu X, Deng S (2003) Discovering cluster-based local outliers. Pattern Recogn Lett 24(9–10):1641–1650

    Article  Google Scholar 

  • Huang K, Wu Y, Wen H, Liu Y, Yang C, Gui W (2020) Distributed dictionary learning for high-dimensional process monitoring. Control Eng Pract 98:104386

    Article  Google Scholar 

  • Hundman K, Constantinou V, Laporte C, Colwell I, Soderstrom T (2018) Detecting spacecraft anomalies using LSTM and nonparametric dynamic thresholding. In: Proceedings of the 24th ACM SIGKDD international conference on knowledge discovery & data mining, pp 387–395

  • Janssens J, Huszár F, Postma E, van den Herik H (2012) Stochastic outlier selection. Tilburg centre for Creative Computing, techreport 2012-001

  • Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001) Dimensionality reduction for fast similarity search in large time series databases. Knowl Inf Syst 3:263–286

    Article  Google Scholar 

  • Li J, Izakian H, Pedrycz W, Jamal I (2021) Clustering-based anomaly detection in multivariate time series data. Appl Soft Comput 100:106919

    Article  Google Scholar 

  • Liang H, Song L, Wang J, Guo L, Li X, Liang J (2021) Robust unsupervised anomaly detection via multi-time scale DCGANs with forgetting mechanism for industrial multivariate time series. Neurocomputing 423:444–462

    Article  Google Scholar 

  • Lin CR, Chen MS (2002) On the optimal clustering of sequential data. In: Proceedings of the SIAM international conference on data mining, pp 141–157

  • Lindemann B, Maschler B, Sahlab N, Weyrich M (2021) A survey on anomaly detection for technical systems using LSTM networks. Comput Ind 131:103498

    Article  Google Scholar 

  • Liu FT, Ting KM, Zhou ZH (2008) Isolation forest. In: Eighth IEEE international conference on data mining, pp 413–422

  • Liu Y, Garg S, Nie J, Zhang Y, Xiong Z, Kang J, Hossain MS (2020) Deep anomaly detection for time-series data in industrial IoT: a communication-efficient on-device federated learning approach. IEEE Internet Things J 8(8):6348–6358

    Article  Google Scholar 

  • Maciąg PS, Kryszkiewicz M, Bembenik R, Lobo JL, Del Ser J (2021) Unsupervised anomaly detection in stream data with online evolving spiking neural networks. Neural Netw 139:118–139

    Article  Google Scholar 

  • Mahmoodi K, Ketabdari MJ, Vaghefi M (2021) Proposing a new local density estimation outlier detection algorithm: an empirical case study on flow pattern experiments. Pattern Anal Appl 24:1859–1872

    Article  Google Scholar 

  • Munir M, Siddiqui SA, Dengel A, Ahmed S (2018) DeepAnT: a deep learning approach for unsupervised anomaly detection in time series. IEEE Access 7:1991–2005

    Article  Google Scholar 

  • Pérez D, Alonso S, Morán A, Prada MA, Fuertes JJ, Domínguez M (2021) Evaluation of feature learning for anomaly detection in network traffic. Evol Syst 12(1):79–90

    Article  Google Scholar 

  • Pham V, Nguyen N, Li J, Hass J, Chen Y, Dang T (2019) MTSAD: multivariate time series abnormality detection and visualization. In: 2019 IEEE international conference on big data (Big Data), pp 3267–3276

  • Pramitarini Y, Perdana RHY, Tran T-N, Shim K, An B (2022) A hybrid price auction-based secure routing protocol using advanced speed and cosine similarity-based clustering against sinkhole attack in VANETs. Sensors 22(15):5811

    Article  Google Scholar 

  • Ramotsoela DT, Hancke GP, Abu-Mahfouz AM (2019) Attack detection in water distribution systems using machine learning. HCIS 9(1):1–22

    Google Scholar 

  • Reddy A, Ordway-West M, Lee M, Dugan M, Whitney J, Kahana R, Ford B, Muedsam J, Henslee A, Rao M (2017) Using Gaussian Mixture Models to detect outliers in seasonal univariate network traffic. In: IEEE security and privacy workshops (SPW). IEEE, San Jose, CA, USA, pp 229–234

  • Ren H, Liu M, Li Z, Pedrycz W (2017) A Piecewise Aggregate pattern representation Approach for anomaly detection in time series. Knowl-Based Syst 135:29–39

    Article  Google Scholar 

  • Ren H, Li X, Li Z, Pedrycz W (2018) Data representation based on interval-sets for anomaly detection in time series. IEEE Access 6:27473–27479

    Article  Google Scholar 

  • Ren H, Xu B, Wang Y, Yi C, Huang C, Kou X, Xing T, Yang M, Tong J, Zhang Q (2019) Time-series anomaly detection service at Microsoft. In: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery & data mining, pp 3009–3017

  • Sim KH, Sim KY, Bong N (2018) Dynamic time interval data representation in scalable financial time series pattern recognition. In: ACM international conference proceeding series, pp 120–125

  • Singh K, Upadhyaya S (2012) Outlier detection: applications and techniques. Int J Comput Sci Issues (IJCSI) 9(1):307

    Google Scholar 

  • Steland A, Rafajłowicz E, Szajowski K (2015) Stochastic models. Statistics and their applications. Springer, Wrocław

    Book  Google Scholar 

  • Tran L, Mun MY, Shahabi C (2020) Real-time distance-based outlier detection in data streams. Proc VLDB Endowm 14(2):141–153

    Article  Google Scholar 

  • Tukey JW (1977) Exploratory data analysis. Addison-Wesley, Reading

    Google Scholar 

  • Wahid A, Rao ACS (2019) A distance-based outlier detection using particle swarm optimization technique. In: Information and communication technology for competitive strategies: proceedings of third international conference on ICTCS, pp 633–643

  • Wang Z, Fan Y (2022) Density-based structure preserving projections process monitoring model for fused magnesia smelting process. In: IEEE transactions on industrial informatics, pp 1–12

  • Wang D, Liu H, Pedrycz W, Song W, Li H (2022) Design Gaussian information granule based on the principle of justifiable granularity: a multi-dimensional perspective. Expert Syst Appl 197:116763

    Article  Google Scholar 

  • Wang Z, Wang Y, Gao C, Wang F, Lin T, Chen Y (2022) An adaptive sliding window for anomaly detection of time series in wireless sensor networks. Wirel Netw:1–19

  • Yang Y, Chen L, Fan C (2021) ELOF: fast and memory-efficient anomaly detection algorithm in data streams. Soft Comput 25(6):4283–4294

    Article  Google Scholar 

  • Yazdi SV, Douzal-Chouakria A (2018) Time warp invariant kSVD: sparse coding and dictionary learning for time series under time warp. Pattern Recogn Lett 112:1–8

    Article  Google Scholar 

  • Yu M, Sun S (2020) Policy-based reinforcement learning for time series anomaly detection. Eng Appl Artif Intell 95:103919

    Article  Google Scholar 

  • Zhang C, Zuo W, Yin A, Wang X, Liu C (2021) ADET: Anomaly DEtection in time series with linear Time. Int J Mach Learn Cybern 12(1):271–280

    Article  Google Scholar 

  • Zhang W, Lin Z, Liu X (2022) Short-term offshore wind power forecasting-a hybrid model based on Discrete Wavelet Transform (DWT), Seasonal Autoregressive Integrated Moving Average (SARIMA), and deep-learning-based Long Short-Term Memory (LSTM). Renew Energy 185:611–628

    Article  Google Scholar 

  • Zhou ZG, Tang P (2016) Improving time series anomaly detection based on Exponentially Weighted Moving Average (EWMA) of season-trend model residuals. In: IEEE international geoscience and remote sensing symposium (IGARSS), pp 3414–3417

  • Zhou Y, Ren H, Li Z, Pedrycz W (2021) An anomaly detection framework for time series data: an interval-based approach. Knowl-Based Syst 288:107153

    Article  Google Scholar 

  • Zhou Y, Ren H, Li Z, Wu N, Al-Ahmari AM (2021) Anomaly detection via a combination model in time series data. Appl Intell 51(7):4874–4887

    Article  Google Scholar 

  • Zhu X, Pedrycz W, Li Z (2016) Granular encoders and decoders: a study in processing information granules. IEEE Trans Fuzzy Syst 25(5):1115–1126

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Reza Mortazavi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Tables of Scenario 2

Appendix: Tables of Scenario 2

The numerical results in Tables 5, 6, 7, 8 are presented by a confidence interval of 95%.

Table 5 Scenario 2—Performance indices of different methods for Sin dataset
Table 6 Scenario 2—Performance indices of different methods for Yahoo dataset
Table 7 Scenario 2—Performance indices of different methods for Synthetic dataset
Table 8 Scenario 2—\({\text {F-score}}\) indices for all datasets

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Enayati, E., Mortazavi, R., Basiri, A. et al. Time series anomaly detection via clustering-based representation. Evolving Systems 15, 1115–1136 (2024). https://doi.org/10.1007/s12530-023-09543-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12530-023-09543-8

Keywords