Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

WinDrift: Early Detection of Concept Drift Using Corresponding and Hierarchical Time Windows

  • Conference paper
  • First Online:
Data Mining (AusDM 2022)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1741))

Included in the following conference series:

  • 466 Accesses

Abstract

In today’s interconnected society, large volumes of time-series data are usually collected from real-time applications. This data is generally used for data-driven decision-making. With time, changes may emerge in the statistical characteristics of this data - this is also known as concept drift. A concept drift can be detected using a concept drift detector. An ideal detector should detect drift accurately and efficiently. However, these properties may not be easy to achieve. To address this gap, a novel drift detection method WinDrift (WD) is presented in this research. The foundation of WD is the early detection of concept drift using corresponding and hierarchical time windows. To assess drift, the proposed method uses two sample hypothesis tests with Kolmogorov-Smirnov (KS) statistical distance. These tests are carried out on sliding windows configured on multiple hierarchical levels that assess drift by comparing statistical distance between two windows of corresponding time period on each level. To evaluate the efficacy of WD, 4 real datasets and 10 reproducible synthetic datasets are used. A comparison with 5 existing state-of-the-art drift detection methods demonstrates that WinDrift detects drift efficiently with minimal false alarms and has efficient computational resource usage. The synthetic datasets and the WD code designed for this work have been made publicly available at https://github.com/naureenaqvi/windrift.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 64.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 84.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Intel(R) Xeon(R) CPU i7–9750H 2.60 GHz. System type 64–bit OS, x64-based processor.

References

  1. Alippi, C., Boracchi, G., Roveri, M.: Hierarchical change-detection tests. IEEE Trans. Neural Networks Learn. Syst. 28(2), 246–258 (2016)

    Article  Google Scholar 

  2. Alippi, C., Roveri, M.: Just-in-time adaptive classifiers-part i: detecting nonstationary changes. IEEE Trans. Neural Networks 19(7), 1145–1153 (2008)

    Article  Google Scholar 

  3. Alippi, C., Roveri, M.: Just-in-time adaptive classifiers-part ii: designing the classifier. IEEE Trans. Neural Networks 19(12), 2053–2064 (2008)

    Article  Google Scholar 

  4. Baena-Garcıa, M., del Campo-Ávila, J., Fidalgo, R., Bifet, A., Gavalda, R., Morales-Bueno, R.: Early drift detection method. In: Fourth International Workshop on Knowledge Discovery from Data Streams, vol. 6, pp. 77–86 (2006)

    Google Scholar 

  5. Bifet, A., Gavaldà, R.: Learning from time-changing data with adaptive windowing. In: Proceedings of the Seventh SIAM International Conference on Data Mining, pp. 135–150. SIAM (2007)

    Google Scholar 

  6. Choulakian, V., Lockhart, R.A., Stephens, M.A.: Cramér-von mises statistics for discrete distributions. Can. J. Stat./La Rev. Can. Statistique, 22, 125–137 (1994)

    Google Scholar 

  7. CSIRO: agriculture flagship. weather stations in Riverina (2021). https://weather.csiro.au/

  8. Dasu, T., Krishnan, S., Venkatasubramanian, S., Yi, K.: An information-theoretic approach to detecting changes in multi-dimensional data streams. In: Proceedings Symposium on the Interface of Statistics, Computing Science, and Applications. Citeseer (2006)

    Google Scholar 

  9. Elmore, K.L.: Alternatives to the chi-square test for evaluating rank histograms from ensemble forecasts. Weather Forecast. 20(5), 789–795 (2005)

    Article  Google Scholar 

  10. Frias-Blanco, I., del Campo-Ávila, J., Ramos-Jimenez, G., Morales-Bueno, R., Ortiz-Diaz, A., Caballero-Mota, Y.: Online and non-parametric drift detection methods based on Hoeffding’s bounds. IEEE Trans. Knowl. Data Eng. 27(3), 810–823 (2014)

    Article  Google Scholar 

  11. Gama, J.a., Žliobait, I., Bifet, A., Pechenizkiy, M., Bouchachia, A.: A survey on concept drift adaptation. ACM Comput. Surv. (CSUR), 46(4), 1–37 (2014)

    Google Scholar 

  12. Gama, J., Medas, P., Castillo, G., Rodrigues, P.: Learning with drift detection. In: Bazzan, A.L.C., Labidi, S. (eds.) SBIA 2004. LNCS (LNAI), vol. 3171, pp. 286–295. Springer, Heidelberg (2004). https://doi.org/10.1007/978-3-540-28645-5_29

    Chapter  Google Scholar 

  13. Habimana, J.R.: Analysis of break-points in financial time series. University of Arkansas (2016)

    Google Scholar 

  14. Kifer, D., Ben-David, S., Gehrke, J.: Detecting change in data streams. In: VLDB, vol. 4, pp. 180–191. Toronto, Canada (2004)

    Google Scholar 

  15. Liu, A.: Concept drift adaptation for learning with streaming data. Ph.D. thesis (2018)

    Google Scholar 

  16. Lu, N., Zhang, G., Lu, J.: Concept drift detection via competence models. Artif. Intell. 209, 11–28 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  17. Martínez-Camblor, P., Carleos, C., Corral, N.: Cramér-von mises statistic for repeated measures. Revista Colombiana de Estadística 37(1), 45–67 (2014)

    Article  MathSciNet  MATH  Google Scholar 

  18. Mehmood, H., Kostakos, P., Cortes, M., Anagnostopoulos, T., Pirttikangas, S., Gilman, E.: Concept drift adaptation techniques in distributed environment for real-world data streams. Smart Cities 4(1), 349–371 (2021)

    Article  Google Scholar 

  19. Montiel, J., Read, J., Bifet, A., Abdessalem, T.: Scikit-multiflow: A multi-output streaming framework. J. Mach. Learn. Res. 19(1), 2915–2914 (2018)

    Google Scholar 

  20. Naqvi, N., Rehman, S.U., Islam, M.Z.: A hyperconnected smart city framework: digital resources using enhanced pedagogical techniques. Australas. J. Inf. Syst. 24 (2020)

    Google Scholar 

  21. Page, E.S.: Continuous inspection schemes. Biometrika 41(1/2), 100–115 (1954)

    Article  MathSciNet  MATH  Google Scholar 

  22. Pratt, J.W., Gibbons, J.D.: Kolmogorov-Smirnov two-sample tests. In: Concepts of Nonparametric Theory. Springer Series in Statistics, pp. 318–344. Springer, New York (1981). https://doi.org/10.1007/978-1-4612-5931-2_7

  23. PyPI: psutil 5.8.0 (2020). https://pypi.org/project/psutil/

  24. Rahman, M.G., Islam, M.Z.: Adaptive decision forest: an incremental machine learning framework. Pattern Recogn. 122, 108345 (2022)

    Article  Google Scholar 

  25. Raza, H., Prasad, G., Li, Y.: EWMA model based shift-detection methods for detecting covariate shifts in non-stationary environments. Pattern Recogn. 48(3), 659–669 (2015)

    Article  Google Scholar 

  26. Shao, J., Ahmadi, Z., Kramer, S.: Prototype-based learning on concept-drifting data streams. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 412–421 (2014)

    Google Scholar 

  27. Siegel, S., Castellan, N.: 2nd edition: Nonparametric statistics for the behavioral sciences (1988)

    Google Scholar 

  28. Van der Vaart, A.W.: Asymptotic Statistics, vol. 3. Cambridge University Press, Cambridge (2000)

    Google Scholar 

  29. Van Rossum, G., Drake Jr, F.L.: Python reference manual. Centrum voor Wiskunde en Informatica Amsterdam (1995)

    Google Scholar 

  30. Wang, H., Abraham, Z.: Concept drift detection for streaming data. In: 2015 International Joint Conference on Neural Networks (IJCNN), pp. 1–9. IEEE (2015)

    Google Scholar 

  31. Watada, J.: Kolmogorov-Smirnov two sample test with continuous fuzzy data, pp. 175–186 (2010)

    Google Scholar 

  32. Webb, G.I., Lee, L.K., Goethals, B., Petitjean, F.: Analyzing concept drift and shift from sample data. Data Min. Knowl. Discov. 32(5), 1179–1199 (2018). https://doi.org/10.1007/s10618-018-0554-1

    Article  MathSciNet  Google Scholar 

  33. Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Mach. Learn. 23, 69–101 (1996). https://doi.org/10.1023/A:1018046501280

    Article  Google Scholar 

  34. Yu, S., Abraham, Z.: Concept drift detection with hierarchical hypothesis testing. In: Proceedings of the 2017 SIAM International Conference on Data Mining, pp. 768–776. SIAM (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Naureen Naqvi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Naqvi, N., Rehman, S.U., Islam, M.Z. (2022). WinDrift: Early Detection of Concept Drift Using Corresponding and Hierarchical Time Windows. In: Park, L.A.F., et al. Data Mining. AusDM 2022. Communications in Computer and Information Science, vol 1741. Springer, Singapore. https://doi.org/10.1007/978-981-19-8746-5_6

Download citation

  • DOI: https://doi.org/10.1007/978-981-19-8746-5_6

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-19-8745-8

  • Online ISBN: 978-981-19-8746-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics