Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Concepts Seeds Gathering and Dataset Updating Algorithm for Handling Concept Drift

Published: 01 April 2015 Publication History

Abstract

In data mining, the phenomenon of change in data distribution over time is known as concept drift. In this research, the authors introduce a new approach called Concepts Seeds Gathering and Dataset Updating algorithm CSG-DU that gives the traditional classification models the ability to adapt and cope with concept drift as time passes. CSG-DU is concerned with discovering new concepts in data stream and aims to increase the classification accuracy using any classification model when changes occur in the underlying concepts. The proposed approach has been tested using synthetic and real datasets. The experiments conducted show that after applying the authors' approach, the classification accuracy increased from low values to high and acceptable ones. Finally, a comparison study between CSG-DU and Set Formation for Delayed Labeling algorithm SFDL has been conducted; SFDL is an approach that handles sudden and gradual concept drift. CSG-DU results outperforms SFDL in terms of classification accuracy.

References

[1]
Baena-Garca, M., del Campo-vila, J., Fidalgo, R., Bifet, A., Gavald, R., & Morales-Bueno, R. 2006. Early drift detection method. Paper presented at the Fourth International Workshop on Knowledge Discovery from Data Streams.
[2]
Beyene, A., & Welemariam, T. 2012. Concept Drift in Surgery Prediction. Master. School of Computing at Blekinge Institute of Technology.
[3]
Bifet, A., Holmes, G., Pfahringer, B., Kirkby, R., & Gavald, R. 2009. New ensemble methods for evolving data streams. Paper presented at the Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, France. 10.1145/1557019.1557041
[4]
Bifet, A., Kirkby, R., Kranen, P., & Reutemann, P. 2012. Massive Online Analysis Manual Vol. 1. University Of Waikato: Center for Open Software Innovation. Bogomolny, A. The Distance Formula Retrieved 12 November 2014, from Interactive Mathematics Miscellany and Puzzles http://www.cut-the-knot.org/pythagoras/DistanceFormula.shtml
[5]
Boriah, S. 2010. Time Series Change Detection: Algorithms for Land Cover Change. Doctor Of Philosophy. UNIVERSITY OF MINNESOTA.
[6]
Boriah, S., Chandola, V., & Kumar, V. 2008. Similarity measures for categorical data: A comparative evaluation. Paper presented at the In Proceedings of the eighth SIAM International Conference on Data Mining. 10.1137/1.9781611972788.22
[7]
Brzezinski, D. 2010. Mining Data Streames With Concept Drift. Master thesis, Poznan University of Technology.
[8]
Brzezinski, D., & Stefanowski, J. 2014. Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm. Neural Networks and Learning Systems . IEEE Transactions on, 251, 81-94.
[9]
Chu, F., & Zaniolo, C. 2004. Fast and Light Boosting for Adaptive Mining of Data Streams. In Dai, H., Srikant, R., & Zhang, C. Eds., Advances in Knowledge Discovery and Data Mining Vol. 3056, pp. 282-292. Springer Berlin Heidelberg.
[10]
Fan, W. 2004. Systematic data selection to mine concept-drifting data streams. Paper presented at the Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining, Seattle, WA, USA. 10.1145/1014052.1014069
[11]
Fayyad, U., Shapiro, G., & Smyth, P. 1996. From data mining to knowledge discovery: an overview Advances in knowledge discovery and data mining pp. 1-34. American Association for Artificial Intelligence.
[12]
Gama, J., Rocha, R., & Medas, P. 2003. Accurate decision trees for mining high-speed data streams. Paper presented at the Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C. 10.1145/956750.956813
[13]
Gama, J., ¿liobaite, I., Bifet, A., Pechenizkiy, M., & Bouchachia, A. 2013. A Survey on Concept Drift Adaptation. ACM Computing Surveys, 464, 35.
[14]
Hegedus, I., Nyers, L., & Ormandi, R. 2012, 20-22 Sept. 2012. Detecting concept drift in fully distributed environments. Paper presented at the Intelligent Systems and Informatics SISY, 2012 IEEE 10th Jubilee International Symposium on.
[15]
Hewahi, N., & Kohail, S. 2013. Learning Concept Drift Using Adaptive Training Set Formation Strategy. {IJTD}. International Journal of Technology Diffusion, 41, 33-55.
[16]
Hou, Y. 2012. Détection de concept drift. Master Degree, University of Technology of Compiègne Hulten, G., Spencer, L., & Domingos, P. 2001. Mining time-changing data streams. Paper presented at the Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, California.
[17]
Jing, G., Bolin, D., Wei, F., Jiawei, H., & Yu, P. S. 2008. Classifying Data Streams with Skewed Class Distributions and Concept Drifts. IEEE Internet Computing, 126, 37-49.
[18]
Katakis, I., Tsoumakas, G., & Vlahavas, I. 2008. An Ensemble of Classifiers for Coping with Recurring Contexts in Data Streams. Paper presented at the In Proceeding of 18th European Conference on Artificial Intelligence, Patras, Greece.
[19]
Klinkenberg, R. 2004. Learning drifting concepts: Example selection vs. example weighting. Intelligent Data Analysis, 83, 281-300.
[20]
Kolter, Z., & Maloof, A. 2007. Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts. Journal of Machine Learning Research, 8, 2755-2790.
[21]
Nishida, K. 2008. Learning and Detecting Concept Drift. Doctor of Philosophy. Japan: Hokkaido University.
[22]
Phyu, N. 2009. Survey of Classification Techniques in Data Mining. The International MultiConference of Engineers and Computer Scientists 1, 727-732.
[23]
Street, N., & Kim, Y. 2001. A streaming ensemble algorithm SEA for large-scale classification. Paper presented at the Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, San Francisco, California. 10.1145/502512.502568
[24]
Wang, H., Fan, W., Yu, S., & Han, J. 2003. Mining concept-drifting data streams using ensemble classifiers. Paper presented at the Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining, Washington, D.C. 10.1145/956750.956778
[25]
Widmer, G., & Kubat, M. 1996. Learning in the presence of concept drift and hidden contexts. Machine Learning, 231, 69-101.
[26]
Yang, Q., & Wu, X. 2006. 10 Challenging Problems In Data Mining Research. International Journal of Information Technology & Decision Making, 54, 597-604.
[27]
Zeira, G., Maimon, O., Last, M., & Rokach, L. CHANGE DETECTION IN CLASSIFICATION MODELS INDUCED FROM TIME SERIES DATA Data Mining in Time Series Databases pp. 101-125.
[28]
¿liobait¿, I. 2009. Combining Time and Space Similarity for Small Size Learning under Concept Drift. {: Springer Berlin Heidelberg.}. Foundations of Intelligent Systems, 5722, 412-421.
[29]
Zliobaite, I. 2010. Learning under Concept Drift: an Overview. Computing Research Repository CoRR, 14784.

Cited By

View all
  • (2016)Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity functionExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.10.02246:C(87-105)Online publication date: 15-Mar-2016

Recommendations

Comments

Information & Contributors

Information

Published In

cover image International Journal of Decision Support System Technology
International Journal of Decision Support System Technology  Volume 7, Issue 2
April 2015
83 pages
ISSN:1941-6296
EISSN:1941-630X
Issue’s Table of Contents

Publisher

IGI Global

United States

Publication History

Published: 01 April 2015

Author Tags

  1. Classification
  2. Concept Drift
  3. Concept Drift Detection
  4. Data Mining
  5. Data Stream Mining
  6. Machine Learning

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 19 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2016)Predicting recurring concepts on data-streams by means of a meta-model and a fuzzy similarity functionExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.10.02246:C(87-105)Online publication date: 15-Mar-2016

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media