Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Combining Diverse Meta-Features to Accurately Identify Recurring Concept Drift in Data Streams

Published: 12 May 2023 Publication History

Abstract

Learning from streaming data is challenging as the distribution of incoming data may change over time, a phenomenon known as concept drift. The predictive patterns, or experience learned under one distribution may become irrelevant as conditions change under concept drift, but may become relevant once again when conditions reoccur. Adaptive learning methods adapt a classifier to concept drift by identifying which distribution, or concept, is currently present in order to determine which experience is relevant. Identifying a concept requires some representation to be stored for comparison, with the quality of the representation being key to accurate identification. Existing concept representations are based on meta-features, efficient univariate summaries of a concept. However, no single meta-feature can fully represent a concept, leading to severe accuracy loss when existing representations cannot describe concept drift. To avoid these failure cases, we propose the first general framework for combining a diverse range of meta-features into a single representation. We solve two main challenges, first presenting a method of efficiently computing, storing, and querying an arbitrary set of meta-features as a single representation, showing that a combination of meta-features may successfully avoid failure cases seen with existing methods. Second, we present the first method for dynamically learning which meta-features distinguish concepts in any given dataset, significantly improving performance. Our proposed approach enables state-of-the-art feature selection methods, such as mutual information, to be applied to concept representation meta-features for the first time. We investigate tradeoffs between memory budget and classification performance, observing accuracy increases of up to 16% by dynamically weighting the contribution of each meta-feature.

References

[1]
Zahra Ahmadi and Stefan Kramer. 2018. Modeling recurring concepts in data streams: A graph-based framework. Knowledge and Information Systems 55, 1 (2018), 15–44.
[2]
Tahseen Al-Khateeb, Mohammad M. Masud, Khaled M. Al-Naami, Sadi Evren Seker, Ahmad M. Mustafa, Latifur Khan, Zouheir Trabelsi, Charu Aggarwal, and Jiawei Han. 2016. Recurring and novel class detection using class-based ensemble for evolving data stream. IEEE Transactions on Knowledge and Data Engineering 28, 10 (2016), 2752–2764. DOI:
[3]
Cesare Alippi, Giacomo Boracchi, and Manuel Roveri. 2013. Just-in-time classifiers for recurrent concepts. IEEE Transactions on Neural Networks and Learning Systems 24, 4 (2013), 620–634.
[4]
Robert Anderson, Yun Sing Koh, and Gillian Dobbie. 2016. CPF: Concept profiling framework for recurring drifts in data streams. In Proceedings of the Australasian Joint Conference on Artificial Intelligence. Springer, 203–214.
[5]
Robert Anderson, Yun Sing Koh, Gillian Dobbie, and Albert Bifet. 2019. Recurring concept meta-learning for evolving data streams. Expert Systems with Applications 138 (2019), 112832.
[6]
Jean Paul Barddal, Fabrício Enembreck, Heitor Murilo Gomes, Albert Bifet, and Bernhard Pfahringer. 2019. Merit-guided dynamic feature selection filter for data streams. Expert Systems with Applications 116 (2019), 227–242.
[7]
Roberto S. M. Barros, Danilo R. L. Cabral, Paulo M. Gonçalves Jr., and Silas G. T. C. Santos. 2017. RDDM: Reactive drift detection method. Expert Systems with Applications 90, C (2017), 344–355.
[8]
Albert Bifet and Ricard Gavaldà. 2007. Learning from time-changing data with adaptive windowing. In Proceedings of the 2007 SIAM International Conference on Data Mining. SIAM, 443–448.
[9]
Dariusz Brzezinski and Jerzy Stefanowski. 2014. Prequential AUC for classifier evaluation and drift detection in evolving data streams. In Proceedings of the International Workshop on New Frontiers in Mining Complex Patterns. Springer, ACM, 87–101.
[10]
Rodolfo C. Cavalcante, Leandro L. Minku, and Adriano L. I. Oliveira. 2016. FEDD: Feature extraction for explicit concept drift detection in time series. In Proceedings of the 2016 International Joint Conference on Neural Networks. IEEE, 740–747.
[11]
Chun Wai Chiu and Leandro L. Minku. 2018. Diversity-based pool of models for dealing with recurring concepts. In Proceedings of the 2018 International Joint Conference on Neural Networks. IEEE, 1–8.
[12]
Fausto G. da Costa, Felipe S. L. G. Duarte, Rosane M. M. Vallim, and Rodrigo F. de Mello. 2017. Multidimensional surrogate stability to detect data stream concept drift. Expert Systems with Applications 87, C (2017), 15–29.
[13]
Danilo Rafael de Lima Cabral and Roberto Souto Maior de Barros. 2018. Concept drift detection based on Fisher’s Exact test. Information Sciences 442, C (2018), 220–234.
[14]
Fengqian Ding and Chao Luo. 2019. The entropy-based time domain feature extraction for online concept drift detection. Entropy 21, 12 (2019), 1187.
[15]
Anton Dries and Ulrich Rückert. 2009. Adaptive concept drift detection. Statistical Analysis and Data Mining: The ASA Data Science Journal 2, 5–6 (2009), 311–327.
[16]
João Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. 2004. Learning with drift detection. In Advances in Artificial Intelligence—SBIA 2004. A. L. C. Bazzan and S. Labidi (Eds.), Springer, 286–295.
[17]
Mina Ghashami, Edo Liberty, Jeff M. Phillips, and David P. Woodruff. 2016. Frequent directions: Simple and deterministic matrix sketching. SIAM Journal on Computing 45, 5 (2016), 1762–1792.
[18]
Heitor M. Gomes, Albert Bifet, Jesse Read, Jean Paul Barddal, Fabrício Enembreck, Bernhard Pfharinger, Geoff Holmes, and Talel Abdessalem. 2017. Adaptive random forests for evolving data stream classification. Machine Learning 106, 9–10 (2017), 1469–1495.
[19]
Paulo Mauricio Gonçalves Jr. and Roberto Souto Maior de Barros. 2013. RCD: A recurring concept drift framework. Pattern Recognition Letters 34, 9 (2013), 1018–1025.
[20]
Quanquan Gu, Zhenhui Li, and Jiawei Han. 2011. Generalized Fisher score for feature selection. In Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence. 266–273.
[21]
Ben Halstead, Yun Sing Koh, Mykola Pechenizkiy, Albert Bifet, and Russel Pears. 2021. Fingerprinting concepts in data streams with supervised and unsupervised meta-information. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering.
[22]
Ben Halstead, Yun Sing Koh, Patricia Riddle, Russel Pears, Mykola Pechenizkiy, and Albert Bifet. 2021. Recurring concept memory management in data streams: Exploiting data stream concept evolution to improve performance and transparency. Data Mining and Knowledge Discovery 35, 3 (2021), 1–41.
[23]
Johannes Haug and Gjergji Kasneci. 2021. Learning parameter distributions to detect concept drift in data streams. In Proceedings of the 2020 25th International Conference on Pattern Recognition. IEEE, 9452–9459.
[24]
Yi He, Xu Yuan, Sheng Chen, and Xindong Wu. 2021. Online learning in variable feature spaces under incomplete supervision. In Proceedings of the AAAI Conference on Artificial Intelligence. Vol. 35, 4106–4114.
[25]
Fabian Hinder, Valerie Vaquet, and Barbara Hammer. 2022. Suitability of different metric choices for concept drift detection. In Advances in Intelligent Data Analysis XX: 20th International Symposium on Intelligent Data Analysis (IDA’22). Springer-Verlag, 157–170.
[26]
Hanqing Hu, Mehmed Kantardzic, and Lingyu Lyu. 2018. Detecting different types of concept drifts with ensemble framework. In Proceedings of the 2018 17th IEEE International Conference on Machine Learning and Applications. IEEE, 344–350.
[27]
David Tse Jung Huang, Yun Sing Koh, Gillian Dobbie, and Albert Bifet. 2015. Drift detection using stream volatility. In Proceedings of the Joint European Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 417–432.
[28]
Ioannis Katakis, Grigorios Tsoumakas, and Ioannis Vlahavas. 2010. Tracking recurring contexts using ensemble classifiers: An application to email filtering. Knowledge and Information Systems 22, 3 (2010), 371–391.
[29]
J. Zico Kolter and Marcus A. Maloof. 2007. Dynamic weighted majority: An ensemble method for drifting concepts. The Journal of Machine Learning Research 8 (2007), 2755–2790.
[30]
Ludmila I. Kuncheva. 2011. Change detection in streaming multivariate data using likelihood detectors. IEEE Transactions on Knowledge and Data Engineering 25, 5 (2011), 1175–1180.
[31]
Ludmila I. Kuncheva and William J. Faithfull. 2013. PCA feature extraction for change detection in multidimensional unlabeled data. IEEE Transactions on Neural Networks and Learning Systems 25, 1 (2013), 69–80.
[32]
Jundong Li, Kewei Cheng, Suhang Wang, Fred Morstatter, Robert P. Trevino, Jiliang Tang, and Huan Liu. 2017. Feature selection: A data perspective. ACM Computing Surveys 50, 6 (2017), 1–45.
[33]
Jesus L. Lobo, Javier Del Ser, Eneko Osaba, Albert Bifet, and Francisco Herrera. 2021. CURIE: A cellular automaton for concept drift detection. Data Mining and Knowledge Discovery 35, 6 (2021), 2655–2678.
[34]
Mohammad Masud, Jing Gao, Latifur Khan, Jiawei Han, and Bhavani M. Thuraisingham. 2010. Classification and novel class detection in concept-drifting data streams under time constraints. IEEE Transactions on Knowledge and Data Engineering 23, 6 (2010), 859–874.
[35]
Jacob Montiel, Jesse Read, Albert Bifet, and Talel Abdessalem. 2018. Scikit-Multiflow: A multi-output streaming framework. Journal of Machine Learning Research 19, 72 (2018), 1–5.
[36]
Denis Moreira dos Reis, André Maletzke, Diego F. Silva, and Gustavo EAPA Batista. 2018. Classifying and counting with recurrent contexts. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 1983–1992.
[37]
Kyosuke Nishida and Koichiro Yamauchi. 2007. Detecting concept drift using statistical testing. In Proceedings of the 10th International Conference on Discovery Science. Springer, Berlin, 264–269.
[38]
Ali Pesaranghader and Herna L. Viktor. 2016. Fast Hoeffding drift detection method for evolving data streams. In Machine Learning and Knowledge Discovery in Databases. P. Frasconi, N. Landwehr, G. Manco, and J. Vreeken (Eds.), Springer International Publishing, 96–111.
[39]
Siqi Ren, Bo Liao, Wen Zhu, and Keqin Li. 2018. Knowledge-maximized ensemble algorithm for different types of concept drift. Information Sciences 430–431 (2018), 261–281. DOI:
[40]
Joung Woo Ryu, Mehmed M. Kantardzic, and Myung-Won Kim. 2012. Efficiently maintaining the performance of an ensemble classifier in streaming data. In Proceedings of the International Conference on Hybrid Information Technology. Springer, 533–540.
[41]
Tegjyot Singh Sethi and Mehmed Kantardzic. 2017. On the reliable detection of concept drift from streaming unlabeled data. Expert Systems with Applications 82, C (2017), 77–99.
[42]
Jiliang Tang, Salem Alelyani, and Huan Liu. 2014. Feature selection for classification: A review. In Data Classification: Algorithms and Applications. Charu C. Aggarwal (Ed.). CRC Press, 37–64.
[43]
Rosane M. M. Vallim and Rodrigo F. de Mello. 2014. Proposal of a new stability concept to detect changes in unsupervised data streams. Expert Systems with Applications 41, 16 (2014), 7350–7360.
[44]
Jialei Wang, Peilin Zhao, Steven C. H. Hoi, and Rong Jin. 2014. Online feature selection and its applications. IEEE Transactions on Knowledge and Data Engineering 26, 3 (2014), 698–710. DOI:
[45]
Shuo Wang and Leandro L. Minku. 2020. AUC estimation and concept drift detection for imbalanced data streams with multiple classes. In Proceedings of the 2020 International Joint Conference on Neural Networks. IEEE, 1–8.
[46]
Shuo Wang, Leandro L. Minku, and Xin Yao. 2018. A systematic study of online class imbalance learning with concept drift. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4802–4821.
[47]
Ocean Wu, Yun Sing Koh, Gillian Dobbie, and Thomas Lacombe. 2021. Nacre: Proactive recurrent concept drift detection in data streams. In Proceedings of the 2021 International Joint Conference on Neural Networks. 1–8. DOI:
[48]
Dianlong You, Miaomiao Sun, Shunpan Liang, Ruiqi Li, Yang Wang, Jiawei Xiao, Fuyong Yuan, Limin Shen, and Xindong Wu. 2022. Online feature selection for multi-source streaming features. Information Sciences 590, C (2022), 267–295. DOI:
[49]
Dianlong You, Xindong Wu, Limin Shen, Song Deng, Zhen Chen, Chuan Ma, and Qiusheng Lian. 2019. Online feature selection for streaming features using self-adaption sliding-window sampling. IEEE Access 7 (2019), 16088–16100. DOI:
[50]
Shihao Zheng. 2019. Labelless Concept Drift Detection and Explanation. Master’s thesis. Eindhoven University of Technology.
[51]
Xiulin Zheng, Peipei Li, Xuegang Hu, and Kui Yu. 2021. Semi-supervised classification on data streams with recurring concept drift and concept evolution. Knowledge-Based Systems 215 (2021), 106749. DOI:

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Knowledge Discovery from Data
ACM Transactions on Knowledge Discovery from Data  Volume 17, Issue 8
September 2023
348 pages
ISSN:1556-4681
EISSN:1556-472X
DOI:10.1145/3596449
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 May 2023
Online AM: 07 March 2023
Accepted: 27 February 2023
Revised: 30 December 2022
Received: 17 June 2022
Published in TKDD Volume 17, Issue 8

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data Streaming
  2. adaptive learning
  3. meta-features

Qualifiers

  • Research-article

Funding Sources

  • Marsden Fund Council from New Zealand Government funding

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)325
  • Downloads (Last 6 weeks)28
Reflects downloads up to 10 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Detecting IoT Anomalies Using Fuzzy Subspace Clustering AlgorithmsApplied Sciences10.3390/app1403126414:3(1264)Online publication date: 2-Feb-2024
  • (2024)Entropy-based concept drift detection in information systemsKnowledge-Based Systems10.1016/j.knosys.2024.111596290(111596)Online publication date: Apr-2024
  • (2024)A comprehensive analysis of concept drift locality in data streamsKnowledge-Based Systems10.1016/j.knosys.2024.111535289:COnline publication date: 25-Jun-2024
  • (2024)Neural networks for intelligent multilevel control of artificial and natural objects based on data fusion: A surveyInformation Fusion10.1016/j.inffus.2024.102427110(102427)Online publication date: Oct-2024
  • (2024)On metafeatures’ ability of implicit concept identificationMachine Learning10.1007/s10994-024-06612-0Online publication date: 18-Sep-2024
  • (2024)Transfer learning for concept drifting data streams in heterogeneous environmentsKnowledge and Information Systems10.1007/s10115-023-02043-w66:5(2799-2857)Online publication date: 18-Jan-2024
  • (2024)Application Research of Multi-label Learning Under Concept DriftCommunications, Signal Processing, and Systems10.1007/978-981-99-7502-0_44(399-408)Online publication date: 18-Apr-2024
  • (2023)Real-Time Anomaly Detection with Subspace Periodic Clustering ApproachApplied Sciences10.3390/app1313738213:13(7382)Online publication date: 21-Jun-2023
  • (2023)An Intuitionistic Fuzzy-Rough Set-Based Classification for Anomaly DetectionApplied Sciences10.3390/app1309557813:9(5578)Online publication date: 30-Apr-2023
  • (2023)A Mixed Clustering Approach for Real-Time Anomaly DetectionApplied Sciences10.3390/app1307415113:7(4151)Online publication date: 24-Mar-2023

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media