Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/967900.968033acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
Article

Forest trees for on-line data

Published: 14 March 2004 Publication History

Abstract

This paper presents an hybrid adaptive system for induction of forest of trees from data streams. The Ultra Fast Forest Tree system (UFFT) is an incremental algorithm, with constant time for processing each example, works online, and uses the Hoeffding bound to decide when to install a splitting test in a leaf leading to a decision node. Our system has been designed for continuous data. It uses analytical techniques to choose the splitting criteria, and the information gain to estimate the merit of each possible splitting-test. The number of examples required to evaluate the splitting criteria is sound, based on the Hoeffding bound. For multiclass problems, the algorithm builds a binary tree for each possible pair of classes, leading to a forest of trees. During the training phase the algorithm maintains a short term memory. Given a data stream, a fixed number of the most recent examples are maintained in a data-structure that supports constant time insertion and deletion. When a test is installed, a leaf is transformed into a decision node with two descendant leaves. The sufficient statistics of these leaves are initialized with the examples in the short term memory that will fall at these leaves. We study the behavior of UFFT in different problems. The experimental results shows that UFFT is competitive against a batch decision tree learner in large and medium datasets.

References

[1]
C. Blake, E. Keogh, and C. J. Merz. UCI repository of Machine Learning databases, 1999.
[2]
Leo Breiman. Random forests. Technical report, University of Berkeley, 2002.
[3]
P. Domingos and G. Hulten. Mining high-speed data streams. In Knowledge Discovery and Data Mining, pages 71--80, 2000.
[4]
J. Fürnkranz. Round robin classification. Journal of Machine Learning Research, 2:721--747, 2002.
[5]
J. Gama. An analysis of functional trees. In Proc. 19th International Conference Machine Learning. Morgan Kaufmann Publishers, 2002.
[6]
Jonathan Gratch. Sequential inductive learning. In Proc. Thirteenth National Conference on Artificial Intelligence, volume 1, pages 779--786, 1996.
[7]
J. Gama, R. Rocha, and P. Medas. Accurate decision trees for mining high-speed data streams. In Procs. of the 9th ACM SigKDD Int. Conference in Knowledge Discovery and Data Mining. ACM Press, 2003.
[8]
Dimitrios Kalles and Tim Morris. Efficient incremental induction of decision trees. Machine Learning, 24(3):231--242, 1996.
[9]
J. Kittler. Combining classifiers: A theoretical framework. Pattern analysis and Applications, Vol. 1, No. 1, 1998.
[10]
R. Kohavi. Scaling up the accuracy of naive Bayes classifiers: a decision tree hybrid. In Proc. 2nd International Conference on Knowledge Discovery and Data Mining. AAAI Press, 1996.
[11]
W.-Y. Loh and Y.-S. Shih. Split selection methods for classification trees. Statistica Sinica, 1997.
[12]
R. Motwani and P. Raghavan. Randomized Algorithms. Cambridge University Press, 1997.
[13]
R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann Publishers, Inc., 1993.
[14]
P. Utgoff. Perceptron trees - a case study in hybrid concept representation. In Proc. Seventh National Conference on Artificial Intelligence. Morgan Kaufmann, 1988.
[15]
P. Utgoff, N. Berkman, and J. Clouse. Decision tree induction based on efficient tree restructuring. Machine Learning, 29(1):5--44, 1997.

Cited By

View all
  • (2023)Active Weighted Aging Ensemble for Drifted Data Stream ClassificationInformation Sciences10.1016/j.ins.2023.02.046Online publication date: Feb-2023
  • (2022)Hierarchical clustering for multiple nominal data streams with evolving behaviourComplex & Intelligent Systems10.1007/s40747-021-00634-08:2(1737-1761)Online publication date: 7-Jan-2022
  • (2021)An overview of complex data stream ensemble classificationJournal of Intelligent & Fuzzy Systems10.3233/JIFS-211100(1-29)Online publication date: 9-Aug-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '04: Proceedings of the 2004 ACM symposium on Applied computing
March 2004
1733 pages
ISBN:1581138121
DOI:10.1145/967900
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 March 2004

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Hybrid Forest of Trees
  2. data streams

Qualifiers

  • Article

Conference

SAC04
Sponsor:
SAC04: The 2004 ACM Symposium on Applied Computing
March 14 - 17, 2004
Nicosia, Cyprus

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)2
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Active Weighted Aging Ensemble for Drifted Data Stream ClassificationInformation Sciences10.1016/j.ins.2023.02.046Online publication date: Feb-2023
  • (2022)Hierarchical clustering for multiple nominal data streams with evolving behaviourComplex & Intelligent Systems10.1007/s40747-021-00634-08:2(1737-1761)Online publication date: 7-Jan-2022
  • (2021)An overview of complex data stream ensemble classificationJournal of Intelligent & Fuzzy Systems10.3233/JIFS-211100(1-29)Online publication date: 9-Aug-2021
  • (2021)High Throughput Hardware for Hoeffding Tree Algorithm with Adaptive Naive Bayes Predictor2021 6th International Conference for Convergence in Technology (I2CT)10.1109/I2CT51068.2021.9418100(1-6)Online publication date: 2-Apr-2021
  • (2021)The Effects of Abrupt Changing Data in CART Inference ModelsTrends and Applications in Information Systems and Technologies10.1007/978-3-030-72651-5_21(214-223)Online publication date: 29-Mar-2021
  • (2019)Anomaly Detections for Manufacturing Systems Based on Sensor Data—Insights into Two Challenging Real-World Production SettingsSensors10.3390/s1924537019:24(5370)Online publication date: 5-Dec-2019
  • (2019)Building Autonomic Elements from Video-Streaming ServersJournal of Network and Systems Management10.1007/s10922-019-09503-1Online publication date: 16-Jul-2019
  • (2018)A reviewInternational Journal of Information and Communication Technology10.5555/3193269.319327912:1-2(162-174)Online publication date: 1-Jan-2018
  • (2018)Online bagging for recommender systemsExpert Systems10.1111/exsy.1230335:4Online publication date: 11-Jul-2018
  • (2018)Two birds with one stoneNeurocomputing10.1016/j.neucom.2017.03.094277:C(149-160)Online publication date: 14-Feb-2018
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media