Abstract
Anomaly detection for high-dimensional time series is always a difficult problem due to its vast search space. For general high-dimensional data, the anomalies often manifest in subspaces rather than the whole data space, and it requires an \(O(2^N)\) combinatorial search for finding the exact solution (i.e., the anomalous subspaces) where N denotes the number of dimensions. In this paper, we present a novel and practical unsupervised anomaly retrieval system to retrieve anomalies from a large volume of high dimensional transactional time series. Our system consists of two integrated modules: subspace searching module and time series discord mining module. For the subspace searching module, we propose two approximate searching methods which are capable of finding quality anomalous subspaces orders of magnitudes faster than the brute-force solution. For the discord mining module, we adopt a simple, yet effective nearest neighbor method. The proposed system is implemented and evaluated on both synthetic and real-world transactional data. The results indicate that our anomaly retrieval system can localize high quality anomaly candidates in seconds, making it practical to use in a production environment.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
The time series is generated by hourly withdrawn amount.
- 2.
Other inputs (i.e., count(), \(w={1}\,\text {h}\) and \(h={1}\,\text {h}\)) of A() are omitted for brevity.
References
Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: ACM Sigmod Record (2001)
Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record (2000)
Chakraborty, K., Mehrotra, K., Mohan, C.K., Ranka, S.: Forecasting the behavior of multivariate time series using neural networks. Neural Netw. 5, 961–970 (1992)
Chandola, V., et al.: Anomaly detection: a survey. ACM Comput. Surv. 41, 1–58 (2009)
Dean, D.J., Nguyen, H., Gu, X.: UBL: Unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems. In: ICAC (2012)
Duan, L., et al.: Mining outlying aspects on numeric data. DMKD 29, 1116–1151 (2015)
Eiben, A.E., et al.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-662-05094-1
Faruk, D.Ö.: A hybrid neural network and ARIMA model for water quality time series prediction. Eng. Appl. Artif. Intell. 23, 586–594 (2010)
Fortin, F.A., et al.: DEAP: evolutionary algorithms made easy. JMLR 13, 2171–2175 (2012)
Gong, S., Zhang, Y., Yu, G.: Clustering stream data by exploring the evolution of density mountain. VLDB 11, 393–405 (2017)
Gupta, M., et al.: Outlier detection for temporal data: a survey. IEEE TMKD 26, 2250–2267 (2013)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)
He, J., et al.: TScope: automatic timeout bug identification for server systems. In: ICAC (2018)
Holland, J.H.: Genetic algorithms. Sci. Am. 267, 66–73 (1992)
Kaastra, I., Boyd, M.: Designing a neural network for forecasting financial and economic time series. Neurocomputing 10, 215–236 (1996)
Keller, F., Muller, E., Bohm, K.: HICS: high contrast subspaces for density-based outlier ranking. In: ICDE (2012)
Keogh, E., Lin, J., Lee, S.H., Van Herle, H.: Finding the most unusual time series subsequence: algorithms and applications. KIS 11, 1–27 (2007)
Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 831–838. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_86
Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: SIGKDD (2008)
Malhotra, P., Vig, L., Shroff, G., Agarwal, P.: Long short term memory networks for anomaly detection in time series. In: Proceedings. Presses universitaires de Louvain (2015)
Manzoor, E., Lamba, H., Akoglu, L.: xStream: outlier detection in feature-evolving data streams. In: SIGKDD (2018)
Mueen, A., et al.: Time series join on subsequence correlation. In: ICDM (2014)
Müller, E., Schiffer, M., Seidl, T.: Statistical selection of relevant subspace projections for outlier ranking. In: ICDE (2011)
Na, G.S., Kim, D., Yu, H.: DILOF: effective and memory efficient local outlier detection in data streams. In: SIGKDD (2018)
Rakthanmanon, T., et al.: Searching and mining trillions of time series subsequences under dynamic time warping. In: SIGKDD. ACM (2012)
Salehi, M., Leckie, C., Bezdek, J.C., Vaithianathan, T., Zhang, X.: Fast memory efficient local outlier detection in data streams. TKDE 28, 3246–3260 (2016)
Santora, M.: In hours, thieves took $45 million in A.T.M. scheme (2013). https://www.nytimes.com/2013/05/10/nyregion/eight-charged-in-45-million-global-cyber-bank-thefts.html
Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, Cambridge (2008)
Siffer, A., Fouque, P.A., Termier, A., Largouet, C.: Anomaly detection in streams with extreme value theory. In: SIGKDD. ACM (2017)
Su, Y., Zhao, Y., Niu, C., Liu, R., Sun, W., Pei, D.: Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: SIGKDD (2019)
Vinh, N.X., et al.: Discovering outlying aspects in large datasets. DMKD 30, 1520–1555 (2016)
World Bank Group: World bank open data (2019). https://data.worldbank.org/
Wu, T., et al.: Promotion analysis in multi-dimensional space. VLDB 2, 109–120 (2009)
Ye, M., Li, X., Orlowska, M.E.: Projected outlier detection in high-dimensional mixed-attributes data set. Expert Syst. Appl. 36, 7104–7113 (2009)
Yeh, C.C.M., Kavantzas, N., Keogh, E.: Matrix profile VI: Meaningful multidimensional motif discovery. In: ICDM (2017)
Yeh, C.C.M., et al.: Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM (2016)
Zhang, J., Gao, Q., Wang, H.: SPOT: a system for detecting projected outliers from high-dimensional data streams. In: ICDE (2008)
Zhang, L., Lin, J., Karim, R.: An angle-based subspace anomaly detection approach to high-dimensional data: with an application to industrial fault detection. Reliabil. Eng. Syst. Saf. 142, 482–497 (2015)
Zhang, L., Lin, J., Karim, R.: Sliding window-based fault detection from high-dimensional data streams. IEEE Trans. Syst. Man Cybern.: Syst. 47, 289–303 (2016)
Zhu, Y., et al.: Matrix profile II: exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In: ICDM (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
He, J., Yeh, CC.M., Wu, Y., Wang, L., Zhang, W. (2021). Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional Data. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12978. Springer, Cham. https://doi.org/10.1007/978-3-030-86514-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-030-86514-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-86513-9
Online ISBN: 978-3-030-86514-6
eBook Packages: Computer ScienceComputer Science (R0)