Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Decision Trees for Uncertain Data

Published: 01 January 2011 Publication History

Abstract

Traditional decision tree classifiers work with data whose values are known and precise. We extend such classifiers to handle data with uncertain information. Value uncertainty arises in many applications during the data collection process. Example sources of uncertainty include measurement/quantization errors, data staleness, and multiple repeated measurements. With uncertainty, the value of a data item is often represented not by one single value, but by multiple values forming a probability distribution. Rather than abstracting uncertain data by statistical derivatives (such as mean and median), we discover that the accuracy of a decision tree classifier can be much improved if the "complete information” of a data item (taking into account the probability density function (pdf)) is utilized. We extend classical decision tree building algorithms to handle data tuples with uncertain values. Extensive experiments have been conducted which show that the resulting classifiers are more accurate than those using value averages. Since processing pdfs is computationally more costly than processing single values (e.g., averages), decision tree construction on uncertain data is more CPU demanding than that for certain data. To tackle this problem, we propose a series of pruning techniques that can greatly improve construction efficiency.

Cited By

View all
  • (2023)Data Collection and Analysis in Physical Education Practical Teaching Based on Internet of ThingsInternational Journal of Information Technology and Web Engineering10.4018/IJITWE.33285718:1(1-15)Online publication date: 26-Oct-2023
  • (2023)Relational Query Synthesis ⋈ Decision Tree LearningProceedings of the VLDB Endowment10.14778/3626292.362630617:2(250-263)Online publication date: 1-Oct-2023
  • (2022)Deep Belief Neural Network (DBNN)-Based Categorization of Uncertain Data StreamsInternational Journal of Software Innovation10.4018/IJSI.31226210:1(1-18)Online publication date: 25-Oct-2022
  • Show More Cited By

Recommendations

Reviews

Aris Gkoulalas-Divanis

Value uncertainty is widespread in collected data that contains inaccuracies, such as multiple repeated values or measurement errors. In these cases, typical data mining tasks such as classification may lead to producing erroneous models from the data. Though some have proposed simple methods for dealing with noisy data-for example, replacing the measured value with the average or the median-such methods are too drastic and can greatly influence the accuracy of classification. In this paper, the authors extend classical decision tree building algorithms (based on the framework of C4.5) to cope with uncertain numerical data, where the values of the data items are not crisp, but are described by a range of values leading to a probability density function (pdf). Through extensive experimentation, the authors demonstrate that, when we use suitable pdfs to describe the data items, the accuracy of the decision tree classifiers can be much higher compared to that of classifiers built by using value averages. On the downside, the proposed approaches are much more computationally demanding than traditional approaches, since they require processing of the pdfs. To cope with this limitation, the authors propose three pruning algorithms that can improve efficiency to the point that the execution times become an order of magnitude higher to those of classical decision tree building algorithms. Overall, this interesting paper makes a tangible contribution to the state of the art. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Knowledge and Data Engineering
IEEE Transactions on Knowledge and Data Engineering  Volume 23, Issue 1
January 2011
160 pages

Publisher

IEEE Educational Activities Department

United States

Publication History

Published: 01 January 2011

Author Tags

  1. Uncertain data
  2. Uncertain data, decision tree, classification, data mining.
  3. classification
  4. data mining.
  5. decision tree

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 01 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Data Collection and Analysis in Physical Education Practical Teaching Based on Internet of ThingsInternational Journal of Information Technology and Web Engineering10.4018/IJITWE.33285718:1(1-15)Online publication date: 26-Oct-2023
  • (2023)Relational Query Synthesis ⋈ Decision Tree LearningProceedings of the VLDB Endowment10.14778/3626292.362630617:2(250-263)Online publication date: 1-Oct-2023
  • (2022)Deep Belief Neural Network (DBNN)-Based Categorization of Uncertain Data StreamsInternational Journal of Software Innovation10.4018/IJSI.31226210:1(1-18)Online publication date: 25-Oct-2022
  • (2022)Enhancement of email spam detection using improved deep learning algorithms for cyber securityJournal of Computer Security10.3233/JCS-20011130:2(231-264)Online publication date: 1-Jan-2022
  • (2022)A novel approach to market segmentation selection using artificial intelligence techniquesThe Journal of Supercomputing10.1007/s11227-022-04666-279:2(1235-1262)Online publication date: 22-Jul-2022
  • (2022)Automated image and video object detection based on hybrid heuristic-based U-net segmentation and faster region-convolutional neural network-enabled learningMultimedia Tools and Applications10.1007/s11042-022-13216-082:3(3459-3484)Online publication date: 7-Jul-2022
  • (2022)Handling uncertainty in SBSE: a possibilistic evolutionary approach for code smells detectionEmpirical Software Engineering10.1007/s10664-022-10142-527:6Online publication date: 24-Jun-2022
  • (2022)A least squares twin support vector machine method with uncertain dataApplied Intelligence10.1007/s10489-022-03897-353:9(10668-10684)Online publication date: 22-Aug-2022
  • (2022)Automated fruit grading using optimal feature selection and hybrid classification by self-adaptive chicken swarm optimization: grading of mangoNeural Computing and Applications10.1007/s00521-021-06473-x34:2(1285-1306)Online publication date: 1-Jan-2022
  • (2021)Decision Tree Learning for Uncertain Clinical MeasurementsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.296737833:9(3199-3211)Online publication date: 1-Sep-2021
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media