Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional Data

  • Conference paper
  • First Online:
Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track (ECML PKDD 2021)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12978))

Abstract

Anomaly detection for high-dimensional time series is always a difficult problem due to its vast search space. For general high-dimensional data, the anomalies often manifest in subspaces rather than the whole data space, and it requires an \(O(2^N)\) combinatorial search for finding the exact solution (i.e., the anomalous subspaces) where N denotes the number of dimensions. In this paper, we present a novel and practical unsupervised anomaly retrieval system to retrieve anomalies from a large volume of high dimensional transactional time series. Our system consists of two integrated modules: subspace searching module and time series discord mining module. For the subspace searching module, we propose two approximate searching methods which are capable of finding quality anomalous subspaces orders of magnitudes faster than the brute-force solution. For the discord mining module, we adopt a simple, yet effective nearest neighbor method. The proposed system is implemented and evaluated on both synthetic and real-world transactional data. The results indicate that our anomaly retrieval system can localize high quality anomaly candidates in seconds, making it practical to use in a production environment.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    The time series is generated by hourly withdrawn amount.

  2. 2.

    Other inputs (i.e., count(), \(w={1}\,\text {h}\) and \(h={1}\,\text {h}\)) of A() are omitted for brevity.

References

  1. Aggarwal, C.C., Yu, P.S.: Outlier detection for high dimensional data. In: ACM Sigmod Record (2001)

    Google Scholar 

  2. Breunig, M.M., Kriegel, H.P., Ng, R.T., Sander, J.: LOF: identifying density-based local outliers. In: ACM Sigmod Record (2000)

    Google Scholar 

  3. Chakraborty, K., Mehrotra, K., Mohan, C.K., Ranka, S.: Forecasting the behavior of multivariate time series using neural networks. Neural Netw. 5, 961–970 (1992)

    Article  Google Scholar 

  4. Chandola, V., et al.: Anomaly detection: a survey. ACM Comput. Surv. 41, 1–58 (2009)

    Article  Google Scholar 

  5. Dean, D.J., Nguyen, H., Gu, X.: UBL: Unsupervised behavior learning for predicting performance anomalies in virtualized cloud systems. In: ICAC (2012)

    Google Scholar 

  6. Duan, L., et al.: Mining outlying aspects on numeric data. DMKD 29, 1116–1151 (2015)

    MathSciNet  MATH  Google Scholar 

  7. Eiben, A.E., et al.: Introduction to Evolutionary Computing. Springer, Heidelberg (2003). https://doi.org/10.1007/978-3-662-05094-1

    Book  MATH  Google Scholar 

  8. Faruk, D.Ö.: A hybrid neural network and ARIMA model for water quality time series prediction. Eng. Appl. Artif. Intell. 23, 586–594 (2010)

    Article  Google Scholar 

  9. Fortin, F.A., et al.: DEAP: evolutionary algorithms made easy. JMLR 13, 2171–2175 (2012)

    MathSciNet  Google Scholar 

  10. Gong, S., Zhang, Y., Yu, G.: Clustering stream data by exploring the evolution of density mountain. VLDB 11, 393–405 (2017)

    Google Scholar 

  11. Gupta, M., et al.: Outlier detection for temporal data: a survey. IEEE TMKD 26, 2250–2267 (2013)

    Google Scholar 

  12. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts and Techniques. Elsevier, Amsterdam (2011)

    MATH  Google Scholar 

  13. He, J., et al.: TScope: automatic timeout bug identification for server systems. In: ICAC (2018)

    Google Scholar 

  14. Holland, J.H.: Genetic algorithms. Sci. Am. 267, 66–73 (1992)

    Article  Google Scholar 

  15. Kaastra, I., Boyd, M.: Designing a neural network for forecasting financial and economic time series. Neurocomputing 10, 215–236 (1996)

    Article  Google Scholar 

  16. Keller, F., Muller, E., Bohm, K.: HICS: high contrast subspaces for density-based outlier ranking. In: ICDE (2012)

    Google Scholar 

  17. Keogh, E., Lin, J., Lee, S.H., Van Herle, H.: Finding the most unusual time series subsequence: algorithms and applications. KIS 11, 1–27 (2007)

    Google Scholar 

  18. Kriegel, H.-P., Kröger, P., Schubert, E., Zimek, A.: Outlier detection in axis-parallel subspaces of high dimensional data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 831–838. Springer, Heidelberg (2009). https://doi.org/10.1007/978-3-642-01307-2_86

    Chapter  Google Scholar 

  19. Kriegel, H.P., Schubert, M., Zimek, A.: Angle-based outlier detection in high-dimensional data. In: SIGKDD (2008)

    Google Scholar 

  20. Malhotra, P., Vig, L., Shroff, G., Agarwal, P.: Long short term memory networks for anomaly detection in time series. In: Proceedings. Presses universitaires de Louvain (2015)

    Google Scholar 

  21. Manzoor, E., Lamba, H., Akoglu, L.: xStream: outlier detection in feature-evolving data streams. In: SIGKDD (2018)

    Google Scholar 

  22. Mueen, A., et al.: Time series join on subsequence correlation. In: ICDM (2014)

    Google Scholar 

  23. Müller, E., Schiffer, M., Seidl, T.: Statistical selection of relevant subspace projections for outlier ranking. In: ICDE (2011)

    Google Scholar 

  24. Na, G.S., Kim, D., Yu, H.: DILOF: effective and memory efficient local outlier detection in data streams. In: SIGKDD (2018)

    Google Scholar 

  25. Rakthanmanon, T., et al.: Searching and mining trillions of time series subsequences under dynamic time warping. In: SIGKDD. ACM (2012)

    Google Scholar 

  26. Salehi, M., Leckie, C., Bezdek, J.C., Vaithianathan, T., Zhang, X.: Fast memory efficient local outlier detection in data streams. TKDE 28, 3246–3260 (2016)

    Google Scholar 

  27. Santora, M.: In hours, thieves took $45 million in A.T.M. scheme (2013). https://www.nytimes.com/2013/05/10/nyregion/eight-charged-in-45-million-global-cyber-bank-thefts.html

  28. Schütze, H., Manning, C.D., Raghavan, P.: Introduction to Information Retrieval, vol. 39. Cambridge University Press, Cambridge (2008)

    MATH  Google Scholar 

  29. Siffer, A., Fouque, P.A., Termier, A., Largouet, C.: Anomaly detection in streams with extreme value theory. In: SIGKDD. ACM (2017)

    Google Scholar 

  30. Su, Y., Zhao, Y., Niu, C., Liu, R., Sun, W., Pei, D.: Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In: SIGKDD (2019)

    Google Scholar 

  31. Vinh, N.X., et al.: Discovering outlying aspects in large datasets. DMKD 30, 1520–1555 (2016)

    MathSciNet  MATH  Google Scholar 

  32. World Bank Group: World bank open data (2019). https://data.worldbank.org/

  33. Wu, T., et al.: Promotion analysis in multi-dimensional space. VLDB 2, 109–120 (2009)

    Google Scholar 

  34. Ye, M., Li, X., Orlowska, M.E.: Projected outlier detection in high-dimensional mixed-attributes data set. Expert Syst. Appl. 36, 7104–7113 (2009)

    Article  Google Scholar 

  35. Yeh, C.C.M., Kavantzas, N., Keogh, E.: Matrix profile VI: Meaningful multidimensional motif discovery. In: ICDM (2017)

    Google Scholar 

  36. Yeh, C.C.M., et al.: Matrix profile I: all pairs similarity joins for time series: a unifying view that includes motifs, discords and shapelets. In: ICDM (2016)

    Google Scholar 

  37. Zhang, J., Gao, Q., Wang, H.: SPOT: a system for detecting projected outliers from high-dimensional data streams. In: ICDE (2008)

    Google Scholar 

  38. Zhang, L., Lin, J., Karim, R.: An angle-based subspace anomaly detection approach to high-dimensional data: with an application to industrial fault detection. Reliabil. Eng. Syst. Saf. 142, 482–497 (2015)

    Article  Google Scholar 

  39. Zhang, L., Lin, J., Karim, R.: Sliding window-based fault detection from high-dimensional data streams. IEEE Trans. Syst. Man Cybern.: Syst. 47, 289–303 (2016)

    Google Scholar 

  40. Zhu, Y., et al.: Matrix profile II: exploiting a novel algorithm and GPUs to break the one hundred million barrier for time series motifs and joins. In: ICDM (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jingzhu He .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

He, J., Yeh, CC.M., Wu, Y., Wang, L., Zhang, W. (2021). Mining Anomalies in Subspaces of High-Dimensional Time Series for Financial Transactional Data. In: Dong, Y., Kourtellis, N., Hammer, B., Lozano, J.A. (eds) Machine Learning and Knowledge Discovery in Databases. Applied Data Science Track. ECML PKDD 2021. Lecture Notes in Computer Science(), vol 12978. Springer, Cham. https://doi.org/10.1007/978-3-030-86514-6_2

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-86514-6_2

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-86513-9

  • Online ISBN: 978-3-030-86514-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics