A Survey On Decision Tree Algorithms of Classification in Data Mining
A Survey On Decision Tree Algorithms of Classification in Data Mining
A Survey On Decision Tree Algorithms of Classification in Data Mining
net/publication/324941161
CITATIONS READS
55 12,231
2 authors, including:
Sunil Kumar
Sangam University
20 PUBLICATIONS 78 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
Social Media Security Risks, Cyber Threats And Risks Prevention And Mitigation Techniques View project
All content following this page was uploaded by Sunil Kumar on 04 May 2018.
Abstract: As the computer technology and computer network technology are developing, the amount of data in information industry is
getting higher and higher. It is necessary to analyze this large amount of data and extract useful knowledge from it. Process of
extracting the useful knowledge from huge set of incomplete, noisy, fuzzy and random data is called data mining. Decision tree
classification technique is one of the most popular data mining techniques. In decision tree divide and conquer technique is used as
basic learning strategy. A decision tree is a structure that includes a root node, branches, and leaf nodes. Each internal node denotes a
test on an attribute, each branch denotes the outcome of a test, and each leaf node holds a class label. The topmost node in the tree is
the root node. This paper focus on the various algorithms of Decision tree (ID3, C4.5, CART), their characteristic, challenges,
advantage and disadvantage.
Handling each attribute with different cost. Table 1: Comparisons between different Decision Tree
Handling training data set with missing attribute values- Algorithms
Features ID3 C4.5 CART
C4.5 allows attribute values to be marked as „?‟ for
Type of data Categorical Continuous and continuous and
missing. Missing attribute values are not used in gain and
Categorical nominal
entropy calculations. attributes data
Handling both continuous and discrete attributes- to handle Speed Low Faster than ID3 Average
continuous attributes, C4.5 creates a threshold and then Boosting Not supported Not supported Supported
splits the list into those whose attribute value is above the Pruning No Pre-pruning Post pruning
threshold and those that are less than or equal to it. Missing Values Can’t deal with Can’t deal with Can deal with
Pruning trees after creation- C4.5 goes back through the Formula Use information Use split info Use Gini
tree once it has been created, and attempts to remove entropy and and gain ratio diversity index
branches that are not needed, by replacing them with leaf information Gain
nodes.
5. Decision Tree Learning Software
3.2 C4.5’s tree-construction algorithm differs in several
respects from CART, for instance Some softwares are used for the analysis of data and some
are used for commonly used data sets for decision tree
Tests in CART are always binary, but C4.5 allows two or learning are discussed below-
more outcomes.
CART uses Gini index to rank tests, whereas C4.5 uses WEKA: WEKA (Waikato Environment for Knowledge
information-based criteria. Analysis) workbench is set of different data mining tools
CART prunes trees with a cost-complexity model whose developed by machine learning group at University of
parameters are estimated by cross-validation, whereas C4.5 Waikato, New Zealand. For easy access to this functionality,
uses a single-pass algorithm derived from binomial it contains a collection of visualization tools and algorithms
confidence limits. for data analysis and predictive modeling together with
This brief discussion has not mentioned what happens graphical user interfaces. WEKA supported versions are
when some of a case’s values are unknown. windows, Linux and MAC operating systems and it
providens various associations, classification and clustering
CART looks for surrogate tests that approximate the algorithms. All of WEKA's techniques are predicated on the
outcomes when the tested attribute has an unknown value, on assumption that the data is available as a single flat file or
the other hand C4.5 apportions the case probabilistically relation, where each data point is described by a fixed
among the outcomes. number of attributes (normally, numeric or nominal
attributes). It also provides pre-processors like attributes
4. CART Algorithm selection algorithms and filters. WEKA provides J48. In J48
we can construct trees with EBP, REP and unpruned trees.
It stands for Classification And Regression Trees. It was
GATree: GATree (Genetically Evolved Decision Trees) use
introduced by Breiman in 1984. It builds both classifications
genetic algorithms to directly evolve classification decision
and regression trees. The classification tree construction by
trees. Instead of using binary strings, it adopts a natural
CART is based on binary splitting of the attributes. CART
representation of the problem by using binary tree structure.
also based on Hunt‟s algorithm and can be implemented
On request to the authors, the evaluation version of GATree
serially. Gini index is used as splitting measure in selecting
is now available. To generate decision trees, we can set
the splitting attribute. CART is different from other Hunt‟s
various parameters like generations, populations, crossover
based algorithm because it is also use for regression analysis
and mutation probability etc.
with the help of the regression trees. The regression analysis
Volume 5 Issue 4, April 2016
www.ijsr.net
Paper ID: NOV162954 2095
Licensed Under Creative Commons Attribution CC BY
International Journal of Science and Research (IJSR)
ISSN (Online): 2319-7064
Index Copernicus Value (2013): 6.14 | Impact Factor (2015): 6.391
Alice d'ISoft: Alice d’ISoft software for Data Mining by cover categories in remote sensing, binary tree with genetic
decision tree is a powerful and inviting tool that allows the algorithm for land cover classification.
creation of segmentation models. For the business user, this
software makes it possible to explore data on line Web Applications Chen et al presented a decision tree
interactively and directly. Alice d’ISoft software works on learning approach to diagnosing failures in large Internet
windows operating system. And the evaluation version of sites. Bonchi et al proposed decision trees for intelligent web
Alice d’ISoft is available on request to the authors. caching.