Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Online semi-supervised active learning ensemble classification for evolving imbalanced data streams

Published: 02 July 2024 Publication History

Abstract

Concept drift is a core challenge in classification tasks of data streams. Although many drift adaptation methods have been presented, most of them assume that labels of all data are available, which is impractical in many real-world applications. Additionally, the absence of label makes the imbalance ratio of an imbalanced data stream difficultly being obtained in time, providing the inaccurate guidance for resampling and causing poor generalization. To tackle the joint challenges, an online semi-supervised active learning method is proposed to classifier imbalanced data streams with concept drift. A newly-arrived data is first added to the sliding window, and then assigned a pseudo label in terms of its nearest cluster. Meanwhile, semi-supervised clustering algorithm offers its predicted label. Based on the above two predictive labels, cluster-based query strategy provides the criteria for the evaluation and selection of representative instances. More especially, the uncertainty and importance of instances are defined to synthetically evaluate its representativeness. After obtaining true labels of typical ones, ensemble classifier is updated by all instances in current sliding window. Experimental results on 13 synthetic and real data streams indicate that the proposed method outperforms six comparative methods on both G-mean and Recall under various labeling budgets.

Highlights

An improved active learning strategy is present to select representative data.
An improved semi-supervised clustering is developed to learn from unlabeled data.
A novel combination of active learning and semi-supervised learning is proposed.

References

[1]
Ceschin F., Botacin M., Gomes H.M., Pinage F., Oliveira L.S., Gregio A., Fast & Furious: On the modelling of malware detection as an evolving data stream, Expert Syst. Appl. 212 (2023),.
[2]
Compare M., Baraldi P., Zio E., Challenges to IoT-enabled predictive maintenance for Industry 4.0, IEEE Internet Things J. 7 (5) (2020) 4585–4597,.
[3]
Wang S., Minku L.L., Yao X., A systematic study of online class imbalance learning with concept drift, IEEE Trans. Neural Netw. Learn. Syst. 29 (10) (2018) 4802–4821,.
[4]
Lu J., Liu A., Dong F., Gu F., Gama J., Zhang G., Learning under concept drift: A review, IEEE Trans. Knowl. Data Eng. 31 (12) (2019) 2346–2363,.
[5]
Tanha J., Samadi N., Abdi Y., Razzaghi-Asl N., CPSSDS: Conformal prediction for semi-supervised classification on data streams, Inform. Sci. 584 (2022) 212–234,.
[6]
Liu A., Lu J., Zhang G., Concept drift detection via equal intensity k-means space partitioning, IEEE Trans. Cybern. 51 (6) (2021) 3198–3211,.
[7]
Fahy C., Yang S., Gongora M., Classification in dynamic data streams with a scarcity of labels, IEEE Trans. Knowl. Data Eng. 35 (4) (2023) 3512–3524,.
[8]
Klikowski J., Woźniak M., Deterministic sampling classifier with weighted bagging for drifted imbalanced data stream classification, Appl. Soft Comput. 122 (2022),.
[9]
Dixit A., Mani A., Sampling technique for noisy and borderline examples problem in imbalanced classification, Appl. Soft Comput. 142 (2023),.
[10]
Rastogi R., Sharma S., Fast Laplacian twin support vector machine with active learning for pattern classification, Appl. Soft Comput. 74 (2019) 424–439,.
[11]
Hu Y., Baraldi P., Di Maio F., Zio E., A compacted object sample extraction (COMPOSE)-based method for fault diagnostics in evolving environment, in: 2015 Prognostics and System Health Management Conference, PHM, 2015, pp. 1–5,.
[12]
Zhang Z., Yang Q., Unsupervised feature learning with reconstruction sparse filtering for intelligent fault diagnosis of rotating machinery, Appl. Soft Comput. 115 (2022),.
[13]
Feng Z., Liang M., Chu F., Recent advances in time–frequency analysis methods for machinery fault diagnosis: A review with application examples, Mech. Syst. Signal Process. (2013) 165–205,.
[14]
Li W., Zhu Z., Jiang F., Zhou G., Chen G., Fault diagnosis of rotating machinery with a novel statistical feature extraction and evaluation method, Mech. Syst. Signal Process. 50 (2015) 414––426,.
[15]
Lu F., Tong Q., Feng Z., Wan Q., Unbalanced bearing fault diagnosis under various speeds based on spectrum alignment and deep transfer convolution neural network, IEEE Trans. Ind. Inform. 19 (7) (2023) 8295–8306,.
[16]
Gama J., Zliobaite I., Bifet A., Pechenizkiy M., Bouchachia A., A survey on concept drift adaptation, ACM Comput. Surv. 46 (4) (2014) 1–37,.
[17]
Pratama M., Pedrycz W., Lughofer E., Evolving ensemble fuzzy classifier, IEEE Trans. Fuzzy Syst. 26 (5) (2018) 2552–2567,.
[18]
Lu Y., Cheung Y.-M., Yan Tang Y., Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift, IEEE Trans. Neural Netw. Learn. Syst. 31 (8) (2020) 2764–2778,.
[19]
Malialis K., Panayiotou C.G., Polycarpou M.M., Online learning with adaptive rebalancing in nonstationary environments, IEEE Trans. Neural Netw. Learn. Syst. 32 (10) (2021) 4445–4459,.
[20]
He H., Garcia E.A., Learning from imbalanced data, IEEE Trans. Knowl. Data Eng. 21 (9) (2009) 1263–1284,.
[21]
Jiao B., Guo Y., Gong D., Chen Q., Dynamic ensemble selection for imbalanced data streams with concept drift, IEEE Trans. Neural Netw. Learn. Syst. 35 (1) (2024) 1278–1291,.
[22]
Gama J., Castillo G., Learning with local drift detection, in: Li X., Zaïane O.R., Li Z. (Eds.), Advanced Data Mining and Applications, Springer Berlin Heidelberg, Berlin, Heidelberg, 2006, pp. 42–55.
[23]
Cheng J., Zheng Z., Guo Y., Pu J., Yang S., Active broad learning with multi-objective evolution for data stream classification, Complex Intell. Syst. (2023) 1–18.
[24]
Liu W., Zhang H., Ding Z., Liu Q., Zhu C., A comprehensive active learning method for multiclass imbalanced data streams with concept drift, Knowl.-Based Syst. 215 (2021),.
[25]
Liu S., Xue S., Wu J., Zhou C., Yang J., Li Z., Cao J., Online active learning for drifting data streams, IEEE Trans. Neural Netw. Learn. Syst. 34 (1) (2023) 186–200,.
[26]
Shan J., Zhang H., Liu W., Liu Q., Online active learning ensemble framework for drifted data streams, IEEE Trans. Neural Netw. Learn. Syst. 30 (2) (2019) 486–498,.
[27]
Liu W., Zhu C., Ding Z., Zhang H., Liu Q., Multiclass imbalanced and concept drift network traffic classification framework based on online active learning, Eng. Appl. Artif. Intell. 117 (2023),.
[28]
Sudha S.K., Aji S., An active learning method with entropy weighting subspace clustering for remote sensing image retrieval, Appl. Soft Comput. 125 (2022),.
[29]
Mohamad S., Sayed-Mouchaweh M., Bouchachia A., Online active learning for human activity recognition from sensory data streams, Neurocomputing 390 (2020) 341–358,.
[30]
Din S.U., Shao J., Kumar J., Ali W., Liu J., Ye Y., Online reliable semi-supervised learning on evolving data streams, Inform. Sci. 525 (2020) 153–171,.
[31]
Gao Y., Chandra S., Li Y., Khan L., Bhavani T., SACCOS: A semi-supervised framework for emerging class detection and concept drift adaption over data streams, IEEE Trans. Knowl. Data Eng. 34 (3) (2022) 1416–1426,.
[32]
Gu X., A dual-model semi-supervised self-organizing fuzzy inference system for data stream classification, Appl. Soft Comput. 136 (2023),.
[33]
Wang Y., Jin H., Chen X., Wang B., Yang B., Qian B., Online dynamic clustering based soft sensor for industrial semi-supervised data streams, Sensors 23 (3) (2023) 1520,.
[34]
J. Guo, H. Shi, Y. Kang, K. Kuang, S. Tang, Z. Jiang, C. Sun, F. Wu, Y. Zhuang, Semi-supervised active learning for semi-supervised models: Exploit adversarial examples with graph-based virtual labels, in: Proceedings of the IEEE/CVF International Conference on Computer Vision, 2021, pp. 2896–2905.
[35]
Krawczyk B., Minku L.L., Gama J., Stefanowski J., Wozniak M., Ensemble learning for data stream analysis: A survey, Inf. Fusion 37 (2017) 132–156,.
[36]
Minku L.L., Yao X., DDD: A new ensemble approach for dealing with concept drift, IEEE Trans. Knowl. Data Eng. 24 (4) (2012) 619–633,.
[37]
J. Gao, W. Fan, J. Han, P.S. Yu, A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions, in: C. Apte, B. Liu, S. Parthasarathy, D. Skillicorn (Eds.), Proceedings of the 2007 SIAM International Conference on Data Mining, SDM, pp. 3–14, https://doi.org/10.1137/1.9781611972771.1.
[38]
Gao J., Ding B., Han J., Fan W., Yu P.S., Classifying data streams with skewed class distributions and concept drifts, IEEE Internet Comput. 12 (6) (2008) 37–49,.
[39]
Chen S., He H., SERA: Selectively recursive approach towards nonstationary imbalanced stream data mining, in: 2009 International Joint Conference on Neural Networks, 2009, pp. 522–529,.
[40]
Chen S., He H., Towards incremental learning of nonstationary imbalanced data stream: A multiple selectively recursive approach, Evol. Syst. 2 (1) (2011) 35–50,.
[41]
Ditzler G., Polikar R., Incremental learning of concept drift from streaming imbalanced data, IEEE Trans. Knowl. Data Eng. 25 (10) (2013) 2283–2301,.
[42]
Elwell R., Polikar R., Incremental learning of concept drift in nonstationary environments, IEEE Trans. Neural Netw. 22 (10) (2011) 1517–1531,.
[43]
Gama J., Medas P., Castillo G., Rodrigues P., Learning with drift detection, in: Advances in Artificial Intelligence -SBIA 2004; Lecture Notes in Artificial Intelligence, Springer, 2004, pp. 286–295.
[44]
Wang S., Minku L.L., Ghezzi D., Caltabiano D., Tino P., Yao X., Concept drift detection for online class imbalance learning, in: The 2013 International Joint Conference on Neural Networks, IJCNN, 2013, pp. 1–10,.
[45]
Wang H., Abraham Z., Concept drift detection for streaming data, in: 2015 International Joint Conference on Neural Networks, IJCNN, 2015, pp. 1–9,.
[46]
Brzezinski D., Stefanowski J., Prequential AUC: Properties of the area under the ROC curve for data streams with concept drift, Knowl. Inf. Syst. 52 (2) (2017) 531–562,.
[47]
Sebastiao R., Fernandes J.M., Supporting the page-hinkley test with empirical mode decomposition for change detection, Kryszkiewicz M., Appice A., Slezak D., Rybinski H., Skowron A., Ras Z. (Eds.), International Symposium on Methodologies for Intelligent Systems, vol. 10352, 2017, pp. 492–498,.
[48]
Boiko Ferreira L.E., Gomes H.M., Bifet A., Oliveira L.S., Adaptive random forests with resampling for imbalanced data streams, in: 2019 International Joint Conference on Neural Networks, IJCNN, 2019, pp. 1–6,.
[49]
Gomes H.M., Bifet A., Read J., Barddal J.P., Enembreck F., Pfharinger B., Holmes G., Abdessalem T., Adaptive random forests for evolving data stream classification, Mach. Learn. 106 (9-10, SI) (2017) 1469–1495,.
[50]
Zhang H., Liu W., Shan J., Liu Q., Online active learning paired ensemble for concept drift and class imbalance, IEEE Access 6 (2018) 73815–73828,.
[51]
Krawczyk B., Active and adaptive ensemble learning for online activity recognition from data streams, Knowl.-Based Syst. 138 (2017) 69–78,.
[52]
Zhang H., Liu W., Liu Q., Reinforcement online active learning ensemble for drifting imbalanced data streams, IEEE Trans. Knowl. Data Eng. 34 (8) (2022) 3971–3983,.
[53]
Woolam C., Masud M.M., Khan L., Lacking labels in the stream: Classifying evolving stream data with few labels, Rauch J., Ras Z., Berka P., Elomaa T. (Eds.), 18th International Symposium on Methodologies for Intelligent Systems, vol. 5722, 2009, pp. 552–562,.
[54]
Zhang P., Zhu X., Tan J., Guo L., Classifier and cluster ensembles for mining concept drifting data streams, in: 2010 IEEE International Conference on Data Mining, 2010, pp. 1175–1180,.
[55]
Masud M.M., Woolam C., Gao J., Khan L., Han J., Hamlen K.W., Oza N.C., Facing the reality of data stream classification: Coping with scarcity of labeled data, Knowl. Inf. Syst. 33 (1) (2012) 213–244,.
[56]
Zheng X., Li P., Hu X., Yu K., Semi-supervised classification on data streams with recurring concept drift and concept evolution, Knowl.-Based Syst. 215 (2021),.
[57]
Hosseini M.J., Gholipour A., Beigy H., An ensemble of cluster-based classifiers for semi-supervised classification of non-stationary data streams, Knowl. Inf. Syst. 46 (3) (2016) 567–597,.
[58]
Sculley D., Web-scale k-means clustering, in: Proceedings of the 19th International Conference on World Wide Web, 2010, pp. 1177–1178,.
[59]
Yang S.L., Li Y.S., Hu X.X., Pan R.Y., Optimization study on k value of kmeans algorithm, Syst. Eng.-Theory Pract. 26 (2) (2006) 97–101.
[60]
Guo Y., Chu Y., Jiao B., Cheng J., Yu Z., Cui N., Ma L., Evolutionary dual-ensemble class imbalance learning for human activity recognition, IEEE Trans. Emerg. Top. Comput. Intell. 6 (4) (2022) 728–739,.
[61]
Jiao B., Guo Y., Yang S., Pu J., Gong D., Reduced-space multistream classification based on multiobjective evolutionary optimization, IEEE Trans. Evol. Comput. 27 (4) (2023) 764–777,.
[62]
Liu A., Lu J., Zhang G., Diverse instance-weighting ensemble based on region drift disagreement for concept drift adaptation, IEEE Trans. Neural Netw. Learn. Syst. 32 (1) (2021) 293–307,.
[63]
Kolter J.Z., Maloof M.A., Dynamic weighted majority: An ensemble method for drifting concepts, J. Mach. Learn. Res. 8 (2007) 2755–2790.
[64]
Zliobaite I., Bifet A., Pfahringer B., Holmes G., Active learning with drifting streaming data, IEEE Trans. Neural Netw. Learn. Syst. 25 (1, SI) (2014) 27–39,.
[65]
Masud M.M., Gao J., Khan L., Han J., Thuraisingham B., A practical approach to classify evolving data streams: Training with limited amount of labeled data, in: 2008 Eighth IEEE International Conference on Data Mining, 2008, pp. 929–934,.
[66]
Wang B., Pineau J., Online bagging and boosting for imbalanced data streams, IEEE Trans. Knowl. Data Eng. 28 (12) (2016) 3353–3366,.
[67]
Bifet A., Gavaldà R., Learning from time-changing data with adaptive windowing, in: Proceedings of the 2007 SIAM International Conference on Data Mining, SDM, 2007, pp. 443–448,.
[68]
Lessmann S., Baesens B., Mues C., Pietsch S., Benchmarking classification models for software defect prediction: A proposed framework and novel findings, IEEE Trans. Softw. Eng. 34 (4) (2008) 485–496,.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Applied Soft Computing
Applied Soft Computing  Volume 155, Issue C
Apr 2024
860 pages

Publisher

Elsevier Science Publishers B. V.

Netherlands

Publication History

Published: 02 July 2024

Author Tags

  1. Semi-supervised
  2. Active learning
  3. Concept drift
  4. Imbalance
  5. Data stream

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 0
    Total Downloads
  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media