Abstract
Increasing access to very large and non-stationary datasets in many real problems has made the classical data mining algorithms impractical and made it necessary to design new online classification algorithms. Online learning of data streams has some important features, such as sequential access to the data, limitation on time and space complexity and the occurrence of concept drift. The infinite nature of data streams makes it hard to label all observed instances. It seems that using the semi-supervised approaches have much more compatibility with the problem. So in this paper we present a new semi-supervised ensemble learning algorithm for data streams. This algorithm uses the majority vote of learners for the labeling of unlabeled instances. The empirical study demonstrates that the proposed algorithm is comparable with the state-of-the-art semi-supervised online algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Tsymbal, A.: The Problem of Concept Drift: Definitions and Related Work (2004)
Widmer, G., Kubat, M.: Learning in the presence of concept drift and hidden contexts. Machine Learning 23(1), 69–101 (1996)
Aha, D.W., Kibler, D., Albert, M.K.: Instance-Based Learning Algorithms. Mach. Learn. 6(1), 37–66 (1991)
Salganicoff, M.: Density-Adaptive Learning and Forgetting. In: Tenth International Conference on Machine Learning. Morgan Kaufmann (1993)
Zliobaite, I.: Learning under Concept Drift: an Overview (2010)
Li, P., Wu, X., Hu, X.: Mining Recurring Concept Drifts with Limited Labeled Streaming Data. In: 2nd Asian Conference on Machine Learning (ACML 2010). JMLR, Tokyo (2010)
Masud, M.M.: Adaptive Classification of Scarcely Labeled and Evolving Data Streams, in Computer Science, p. 161. The University of Texas, Dallas (2009)
Klinkenberg, R.: Using Labeled and Unlabeled Data to Learn Drifting Concepts. In: IJCAI 2001 Workshop on Learning from Temporal and Spatial Data. AAAI Press, Menlo Park (2001)
Borchani, H., Larrañaga, P., Bielza, C.: Mining Concept-Drifting Data Streams Containing Labeled and Unlabeled Instances. In: GarcÃa-Pedrajas, N., Herrera, F., Fyfe, C., BenÃtez, J.M., Ali, M. (eds.) IEA/AIE 2010, Part I. LNCS, vol. 6096, pp. 531–540. Springer, Heidelberg (2010)
Zhang, P., Zhu, X., Guo, L.: Mining Data Streams with Labeled and Unlabeled Training Examples. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining. IEEE Computer Society (2009)
Widyantoro, D.H., Yen, J.: Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Transactions on Knowledge and Data Engineering 17(3), 401–412 (2005)
Woolam, C., Masud, M.M., Khan, L.: Lacking Labels in the Stream: Classifying Evolving Stream Data with Few Labels. In: Rauch, J., Raś, Z.W., Berka, P., Elomaa, T. (eds.) ISMIS 2009. LNCS, vol. 5722, pp. 552–562. Springer, Heidelberg (2009)
Ditzler, G., Polikar, R.: Semi-supervised learning in nonstationary environments. IEEE
Kantardzic, M., Ryu, J.W., Walgampaya, C.: Building a New Classifier in an Ensemble Using Streaming Unlabeled Data. In: GarcÃa-Pedrajas, N., Herrera, F., Fyfe, C., BenÃtez, J.M., Ali, M. (eds.) IEA/AIE 2010, Part I. LNCS, vol. 6097, pp. 77–86. Springer, Heidelberg (2010)
Zhou, Z.-H., Li, M.: Tri-Training: Exploiting Unlabeled Data Using Three Classifiers. IEEE Trans. on Knowl. and Data Eng. 17(11), 1529–1541 (2005)
Angluin, D., Laird, P.: Learning From Noisy Examples. Machine Learning 2(4), 343–370 (1988)
Street, W.N., Kim, Y.: A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco (2001)
Zhu, X.: Stream Data Mining repository (2010), http://www.cse.fau.edu/~xqzhu/stream.html
Frank, A., Asuncion, A.: UCI Machine Learning Repository (2010), http://archive.ics.uci.edu/ml (cited May 2011)
Katakis, I., Tsoumakas, G., Vlahavas, I.: Tracking recurring contexts using ensemble classifiers: an application to email filtering. Knowledge and Information Systems 22(3), 371–391 (2009)
Harries, M.B., Sammut, C., Horn, K.: Extracting hidden context. Machine Learning 32(2), 101–126 (1998)
Witten, I.H., Frank, E.: Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann (2005)
Bifet, A., et al.: Moa: Massive online analysis. The Journal of Machine Learning Research 11, 1601–1604
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ahmadi, Z., Beigy, H. (2012). Semi-supervised Ensemble Learning of Data Streams in the Presence of Concept Drift. In: Corchado, E., Snášel, V., Abraham, A., Woźniak, M., Graña, M., Cho, SB. (eds) Hybrid Artificial Intelligent Systems. HAIS 2012. Lecture Notes in Computer Science(), vol 7209. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28931-6_50
Download citation
DOI: https://doi.org/10.1007/978-3-642-28931-6_50
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-28930-9
Online ISBN: 978-3-642-28931-6
eBook Packages: Computer ScienceComputer Science (R0)