Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Concept Drifting Detection on Noisy Streaming Data in Random Ensemble Decision Trees

  • Conference paper
Machine Learning and Data Mining in Pattern Recognition (MLDM 2009)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5632))

Abstract

Although a vast majority of inductive learning algorithms has been developed for handling of the concept drifting data streams, especially the ones in virtue of ensemble classification models, few of them could adapt to the detection on the different types of concept drifts from noisy streaming data in a light demand on overheads of time and space. Motivated by this, a new classification algorithm for Concept drifting Detection based on an ensembling model of Random Decision Trees (called CDRDT) is proposed in this paper. Extensive studies with synthetic and real streaming data demonstrate that in comparison to several representative classification algorithms for concept drifting data streams, CDRDT not only could effectively and efficiently detect the potential concept changes in the noisy data streams, but also performs much better on the abilities of runtime and space with an improvement in predictive accuracy. Thus, our proposed algorithm provides a significant reference to the classification for concept drifting data streams with noise in a light weight way.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

Similar content being viewed by others

References

  1. Street, W., Kim, Y.: A streaming ensemble algorithm (sea) for large-scale classification. In: 7th ACM SIGKDD international conference on Knowledge Discovery and Data mining, KDD 2001, pp. 377–382. ACM Press, New York (2001)

    Google Scholar 

  2. Wang, H., Fan, W., Yu, P.S., Han, J.: Mining Concept-Drifting Data Streams Using Ensemble Classifiers. In: 9th ACM SIGKDD international conference on Knowledge Discovery and Data mining, KDD 2003, pp. 226–235. ACM Press, New York (2003)

    Google Scholar 

  3. Fan, W.: Streamminer: a classifier ensemble-based engine to mine concept-drifting data streams. In: 30th international conference on Very Large Data Bases, VLDB 2004, pp. 1257–1260. VLDB Endowment (2004)

    Google Scholar 

  4. Chu, F., Wang, Y., Zaniolo, C.: An adaptive learning approach for noisy data streams. In: 4th IEEE International Conference on Data Mining, pp. 351–354. IEEE Computer Science, Los Alamitos (2004)

    Google Scholar 

  5. Gama, J., Fernandes, R., Rocha, R.: Decision trees for mining data streams. Intelligent Data Analysis 10, 23–45 (2006)

    Google Scholar 

  6. Scholz, M., Klinkenberg, R.: Boosting Classifiers for Drifting Concepts. Intelligent Data Analysis (IDA), Special Issue on Knowledge Discovery from Data Streams 11(1), 3–28 (2007)

    Google Scholar 

  7. Hoeffding, W.: Probability inequalities for sums of bounded random variabless. Journal of the American Statistical Association 58(301), 13–30 (1963)

    Article  MathSciNet  MATH  Google Scholar 

  8. Castillo, G., Gama, J., Medas, P.: Adaptation to Drifting Concepts. In: Pires, F.M., Abreu, S.P. (eds.) EPIA 2003. LNCS (LNAI), vol. 2902, pp. 279–293. Springer, Heidelberg (2003)

    Chapter  Google Scholar 

  9. Gehrke, J., Ganti, V., Ramakrishnan, R., Loh, W.Y.: Boat-optimistic decision tree construction. In: 1999 ACM SIGMOD International Conference on Management of Data, pp. 169–180. ACM Press, New York (1999)

    Chapter  Google Scholar 

  10. Hulten, G., Spencer, L., Domingos, P.: Mining time-changing data streams. In: 7th ACM SIGKDD international conference on Knowledge Discovery and Data mining, KDD 2001, pp. 97–106 (2001)

    Google Scholar 

  11. Li, P., Hu, X., Wu, X.: Mining concept-drifting data streams with multiple semi-random decision trees. In: Tang, C., Ling, C.X., Zhou, X., Cercone, N.J., Li, X. (eds.) ADMA 2008. LNCS, vol. 5139, pp. 733–740. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  12. Ho, T.K.: Random decision forests. In: 3rd International Conference on Document Analysis and Recognition, pp. 278–282. IEEE Computer Society, Los Alamitos (1995)

    Google Scholar 

  13. Abdulsalam, H., Skillicorn, D.B., Martin, P.: Classifying Evolving Data Streams Using Dynamic Streaming Random Forests. In: Bhowmick, S.S., Küng, J., Wagner, R. (eds.) DEXA 2008. LNCS, vol. 5181, pp. 643–651. Springer, Heidelberg (2008)

    Chapter  Google Scholar 

  14. Hu, X., Li, P., Wu, X., Wu, G.: A semi-random multiple decision-tree algorithm for mining data streams. Journal of Computer Science and Technology 22(5), 711–724 (2007)

    Article  Google Scholar 

  15. Yang, Y., Wu, X., Zhu, X.: Combining Proactive and Reactive Predictions for Data Streams. In: 11th ACM SIGKDD international conference on Knowledge Discovery in Data mining, KDD 2005, pp. 710–715. ACM Press, New York (2005)

    Google Scholar 

  16. Mitchell, T.: Machine Learning. McGraw Hill, New York (1997)

    MATH  Google Scholar 

  17. Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  18. Quinlan, R.J.: C4.5: programs for machine learning. Morgan Kaufmann Publishers Inc., San Francisco (1993)

    Google Scholar 

  19. Shafer, J., Agrawal, R., Mehta, M.: Sprint: A scalable parallel classifier for data mining. In: 22th International Conference on Very Large Data Bases, VLDB 1996, pp. 544–555. Morgan Kaufmann, San Francisco (1996)

    Google Scholar 

  20. KDDCUP 1999 DataSet, http://kdd.ics.uci.edu//databases/kddcup99/kddcup99.html

  21. Yahoo! Shopping Web Services, http://developer.yahoo.com/everything.html

  22. Li, P., Liang, Q., Wu, X., Hu, X.: Parameter Estimation in Semi-Random Decision Tree Ensembling on Streaming Data. In: Theeramunkong, T., Kijsirikul, B., Cercone, N., Ho, T.-B. (eds.) PAKDD 2009. LNCS (LNAI), vol. 5476, pp. 376–388. Springer, Heidelberg (2009)

    Google Scholar 

  23. Wikipedia, http://en.wikipedia.org/wiki/Data_stream

  24. Amit, Y., Geman, D.: Shape quantization and recognition with randomized trees. Neural Computation 9, 1545–1588 (1997)

    Article  Google Scholar 

  25. Ho, T.K.: The random subspace method for constructing decision forests. IEEE Transactions on Pattern Analysis and Machine Intelligence 20(8), 832–844 (1998)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Li, P., Hu, X., Liang, Q., Gao, Y. (2009). Concept Drifting Detection on Noisy Streaming Data in Random Ensemble Decision Trees. In: Perner, P. (eds) Machine Learning and Data Mining in Pattern Recognition. MLDM 2009. Lecture Notes in Computer Science(), vol 5632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03070-3_18

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-03070-3_18

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-03069-7

  • Online ISBN: 978-3-642-03070-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics