Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
67 views

How Decision Tree Algorithm Works

The document discusses how decision tree algorithms work. It begins by introducing decision trees as a supervised learning method that can be used for both regression and classification problems. It then describes how decision trees create a training model to predict target variables by learning decision rules from training data. The rest of the document discusses how decision trees are constructed by placing attributes as internal nodes and splitting the data, using measures like information gain and Gini index to select the best attributes, and how the tree can then be used to classify new data.

Uploaded by

hnoor6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
67 views

How Decision Tree Algorithm Works

The document discusses how decision tree algorithms work. It begins by introducing decision trees as a supervised learning method that can be used for both regression and classification problems. It then describes how decision trees create a training model to predict target variables by learning decision rules from training data. The rest of the document discusses how decision trees are constructed by placing attributes as internal nodes and splitting the data, using measures like information gain and Gini index to select the best attributes, and how the tree can then be used to classify new data.

Uploaded by

hnoor6
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

11/20/2017 How Decision Tree Algorithm works

HOW DECISION TREE ALGORITHM


WORKS
 January 30, 2017  Rahul Saxena  10 Comments  Data Science, Machine Learning

Decision Tree Algorithm

Introduction to Decision Tree


Algorithm
http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 1/16
11/20/2017 How Decision Tree Algorithm works

Decision Tree algorithm belongs to the family of supervised learning algorithms.


Unlike other supervised learning algorithms, decision tree algorithm can be
used for solving regression and classi cation problems too.

The general motive of using Decision Tree is to create a training model which
can use to predict class or value of target variables by learning decision rules
inferred from prior data(training data).

The understanding level of Decision Trees algorithm is so easy compared with


other classi cation algorithms. The decision tree algorithm tries to solve the
problem, by using tree representation. Each internal node of the tree
corresponds to an attribute, and each leaf node corresponds to a class label.

Decision Tree Algorithm Pseudocode

1. Place the best attribute of the dataset at the root of the tree.
2. Split the training set into subsets. Subsets should be made in such a way
that each subset contains data with the same value for an attribute.
3. Repeat step 1 and step 2 on each subset until you nd leaf nodes in all the
branches of the tree.

Decision Tree classi er, Image credit: www.packtpub.com

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 2/16
11/20/2017 How Decision Tree Algorithm works

In decision trees, for predicting a class label for a record we start from the root
of the tree. We compare the values of the root attribute with record’s attribute.
On the basis of comparison, we follow the branch corresponding to that value
and jump to the next node.

We continue comparing our record’s attribute values with other internal nodes
of the tree until we reach a leaf node with predicted class value. As we know
how the modeled decision tree can be used to predict the target class or the
value. Now let’s understanding how we can create the decision tree model.

Assumptions while creating Decision Tree

The below are the some of the assumptions we make while using Decision tree:

At the beginning, the whole training set is considered as the root.


Feature values are preferred to be categorical. If the values are continuous
then they are discretized prior to building the model.
Records are distributed recursively on the basis of attribute values.
Order to placing attributes as root or internal node of the tree is done by
using some statistical approach.

Decision tree model example Image Credit: http://zhanpengfang.github.io/

Decision Trees follow Sum of Product (SOP) representation. For the above
images, you can see how we can predict can we accept the new job o er?
 and Use computer daily? from traversing for the root node to the leaf node.

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 3/16
11/20/2017 How Decision Tree Algorithm works

It’s a sum of product representation. The Sum of product(SOP) is also known as


Disjunctive Normal Form. For a class, every branch from the root of the tree to
a leaf node having the same class is a conjunction(product) of values, di erent
branches ending in that class form a disjunction(sum).

The primary challenge in the decision tree implementation is to identify which


attributes do we need to consider as the root node and each level. Handling this
is know the attributes selection. We have di erent attributes selection measure
to identify the attribute which can be considered as the root note at each level.

The popular attribute selection measures:

Information gain
Gini index

Attributes Selection

If dataset consists of “n” attributes then deciding which attribute to place at


the root or at di erent levels of the tree as internal nodes is a complicated step.
By just randomly selecting any node to be the root can’t solve the issue.  If we
follow a random approach, it may give us bad results with low accuracy.

For solving this attribute selection problem, researchers worked and devised
some solutions. They suggested using some criterion like information gain, gini
index, etc. These criterions will calculate values for every attribute. The values
are sorted, and attributes are placed in the tree by following the order i.e,
the attribute with a high value(in case of information gain) is placed at the root.

While using information Gain as a  criterion, we assume attributes to be


categorical, and for gini index, attributes are assumed to be continuous.

Information Gain

By using information gain as a criterion, we try to estimate  the information


contained by each attribute. We are going to use some points deducted from
information theory.
To measure the randomness or uncertainty of a random variable X is de ned by
Entropy.

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 4/16
11/20/2017 How Decision Tree Algorithm works

For a binary classi cation problem with only two classes, positive and negative
class.

If all examples are positive or all are negative then entropy will be zero i.e,
low.
If half of the records are of positive class and half are of negative class
then entropy is one i.e, high.

By calculating entropy measure of each attribute we can calculate their


information gain. Information Gain calculates the expected reduction in
entropy due to sorting on the attribute. Information gain can be calculated.

To get a clear understanding of calculating information gain & entropy, we will


try to implement it on a sample data.

Example: Construct a Decision Tree by using “information gain” as a


criterion

We are going to use this data sample.


Let’s try to use information gain as a
criterion.  Here, we have 5 columns
out of which 4 columns have
continuous data and 5th column
consists of class labels.

A, B, C, D attributes can be
considered as predictors and E
column class labels can be
considered as a  target variable. For
constructing a decision tree from this
data, we have to convert continuous
data into categorical data.

We have chosen some random values to categorize each attribute:

A B C D

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 5/16
11/20/2017 How Decision Tree Algorithm works

>= 5 >= 3.0 >= 4.2 >= 1.4

<5 < 3.0 < 4.2 < 1.4

There are 2 steps for calculating information gain for each attribute:

1. Calculate entropy of Target.


2. Entropy for every attribute A, B, C, D needs to be calculated. Using
information gain formula we will subtract this entropy from the entropy of
target. The result is Information Gain.

The entropy of Target: We have 8  records with negative class and 8 records
with positive class. So, we can directly estimate the entropy of target as 1.

Variable E

Positive Negative
158
Shares
8 8

150 Calculating entropy using formula:

3
E(8,8) = -1*( (p(+ve)*log( p(+ve)) + (p(-ve)*log( p(-ve)) )
= -1*( (8/16)*log2(8/16)) + (8/16) * log2(8/16) )
=1

Information gain for Var A

Var A has value >=5 for 12 records out of 16 and 4 records with value <5 value.

For Var A >= 5 & class == positive: 5/12


For Var A >= 5 & class == negative: 7/12
Entropy(5,7) = -1 * ( (5/12)*log2(5/12) + (7/12)*log2(7/12)) = 0.9799
For Var A <5 & class == positive: 3/4
For Var A <5 & class == negative: 1/4
Entropy(3,1) =  -1 * ( (3/4)*log2(3/4) + (1/4)*log2(1/4)) = 0.81128

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 6/16
11/20/2017 How Decision Tree Algorithm works

Entropy(Target, A) = P(>=5) * E(5,7) + P(<5) * E(3,1)


= (12/16) * 0.9799 + (4/16) * 0.81128 = 0.937745

Information gain for Var B

Var B has value >=3 for 12 records out of 16 and 4 records with value <5 value.

For Var B >= 3 & class == positive: 8/12


For Var B >= 3 & class == negative: 4/12
Entropy(8,4) = -1 * ( (8/12)*log2(8/12) + (4/12)*log2(4/12)) = 0.39054
For VarB <3 & class == positive: 0/4
For Var B <3 & class == negative: 4/4
Entropy(0,4) =  -1 * ( (0/4)*log2(0/4) + (4/4)*log2(4/4)) = 0

Entropy(Target, B) = P(>=3) * E(8,4) + P(<3) * E(0,4)


= (12/16) * 0.39054 + (4/16) * 0 = 0.292905

Information gain for Var C

Var C has value >=4.2 for 6 records out of 16 and 10 records with value <4.2
value.

For Var C >= 4.2 & class == positive: 0/6


For Var C >= 4.2 & class == negative:  6/6
Entropy(0,6) = 0
For VarC < 4.2 & class == positive: 8/10
For Var C < 4.2 & class == negative: 2/10
Entropy(8,2) = 0.72193

Entropy(Target, C) = P(>=4.2) * E(0,6) + P(< 4.2) * E(8,2)


= (6/16) * 0 + (10/16) * 0.72193 = 0.4512

Information gain for Var D

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 7/16
11/20/2017 How Decision Tree Algorithm works

Var D has value >=1.4 for 5 records out of 16 and 11 records with value <5
value.

For Var D >= 1.4 & class == positive: 0/5


For Var D >= 1.4 & class == negative: 5/5
Entropy(0,5) = 0
For Var D < 1.4 & class == positive: 8/11
For Var D < 14 & class == negative: 3/11
Entropy(8,3) =  -1 * ( (8/11)*log2(8/11) + (3/11)*log2(3/11)) = 0.84532

Entropy(Target, D) = P(>=1.4) * E(0,5) + P(< 1.4) * E(8,3)


= 5/16 * 0 + (11/16) * 0.84532 = 0.5811575

Target Target

Positive Negative Positive Neg

>= >=
5 7 8 4
5.0 3.0
A B

<5 3 1 < 3.0 0 4

Information Gain of A = 0.062255 Information Gain of B= 0.70707

Target
Target

Positive Nega
Positive Negative

>=
>= 4.2 0 6 0 5
1.4
C D
< 4.2 8 2
< 1.4 8 3

Information Gain of C= 0.5488


Information Gain of D= 0.41189

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 8/16
11/20/2017 How Decision Tree Algorithm works

From the above Information Gain calculations, we can build a decision tree. We


should place the attributes on the tree according to their values.

An Attribute with better value than other should position as root and A branch
with entropy 0 should be converted to a leaf node. A branch with entropy more
than 0 needs further splitting.

Gini Index

Gini Index is a metric to measure how often a randomly chosen element would
be incorrectly identi ed. It means an attribute with lower gini index should be
preferred.

Example: Construct a Decision Tree by using “gini index” as a criterion

We are going to use same data sample that we used for information gain
example. Let’s try to use gini index as a criterion. Here, we have 5 columns out
of which 4 columns have continuous data and 5th column consists of class
labels.

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 9/16
11/20/2017 How Decision Tree Algorithm works

A, B, C, D attributes can be
considered as predictors and E
column class labels can be
considered as a target variable. For
constructing a decision tree from
this data, we have to convert
continuous data into categorical
data.

We have chosen some random


values to categorize each attribute:

A B C D

>= 5 >= 3.0 >=4.2 >= 1.4

<5 < 3.0 < 4.2 < 1.4

Gini Index for Var A

Var A has value >=5 for 12 records out of 16 and 4 records with value <5 value.

For Var A >= 5 & class == positive: 5/12


For Var A >= 5 & class == negative: 7/12
gini(5,7) = 1- ( (5/12)2 + (7/12)2 ) = 0.4860
For Var A <5 & class == positive: 3/4
For Var A <5 & class == negative: 1/4
gini(3,1) = 1- ( (3/4)2 + (1/4)2 ) = 0.375

By adding weight and sum each of the gini indices:

Gini Index for Var B

Var B has value >=3 for 12 records out of 16 and 4 records with value <5 value.

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 10/16
11/20/2017 How Decision Tree Algorithm works

For Var B >= 3 & class == positive: 8/12


For Var B >= 3 & class == negative: 4/12
gini(8,4) = 1- ( (8/12)2 + (4/12)2 ) = 0.446
For Var B <3 & class == positive: 0/4
For Var B <3 & class == negative: 4/4
gin(0,4) = 1- ( (0/4)2 + (4/4)2 ) = 0

Gini Index for Var C

Var C has value >=4.2 for 6 records out of 16 and 10 records with value <4.2
value.

For Var C >= 4.2 & class == positive: 0/6


For Var C >= 4.2 & class == negative: 6/6
gini(0,6) = 1- ( (0/8)2 + (6/6)2 ) = 0
For Var C < 4.2& class == positive: 8/10
For Var C < 4.2 & class == negative: 2/10
gin(8,2) = 1- ( (8/10)2 + (2/10)2 ) = 0.32

Gini Index for Var D

Var D has value >=1.4 for 5 records out of 16 and 11 records with value <1.4
value.

For Var D >= 1.4 & class == positive: 0/5


For Var D >= 1.4 & class == negative: 5/5
gini(0,5) = 1- ( (0/5)2 + (5/5)2 ) = 0
For Var D < 1.4 & class == positive: 8/11
For Var D < 1.4 & class == negative: 3/11
gin(8,3) = 1- ( (8/11)2 + (3/11)2 ) = 0.397

wTarget Target

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 11/16
11/20/2017 How Decision Tree Algorithm works

Positive Negative Positive Nega

>= >=
5 7 8 4
5.0 3.0
A B

<5 3 1 < 3.0 0 4

Ginin Index of A = 0.45825 Gini Index of B= 0.3345

Target
Target

Positive Neg
Positive Negative

>=
>= 4.2 0 6 0 5
1.4
C D
< 4.2 8 2
< 1.4 8 3

Gini Index of C= 0.2


Gini Index of D= 0.273

Over tting

Over tting is a practical problem while building a decision tree model. The
model is having an issue of over tting is considered  when the algorithm
continues to go deeper and deeper in the to reduce the training set error but
results with an increased test set error i.e, Accuracy of prediction for our model
goes down. It generally happens when it builds many branches due to outliers
and irregularities in data.

Two approaches which we can use to avoid over tting are:

Pre-Pruning
Post-Pruning

Pre-Pruning

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 12/16
11/20/2017 How Decision Tree Algorithm works

In pre-pruning, it stops the tree construction bit early. It is preferred not to split
a node if its goodness measure is below a threshold value. But it’s di cult to
choose an appropriate stopping point.

Post-Pruning

In post-pruning rst, it goes deeper and deeper in the tree to build a complete
tree. If the tree shows the over tting problem then pruning is done as a post-
pruning step. We use a cross-validation data to check the e ect of our pruning.
Using cross-validation data, it tests whether expanding a node will make an
improvement or not.

If it shows an improvement, then we can continue by expanding that node. But


if it shows a reduction in accuracy then it should not be expanded i.e, the node
should be converted to a leaf node.

Decision Tree Algorithm Advantages and Disadvantages

Advantages:

1. Decision Trees are easy to explain. It results in a set of rules.


2. It follows the same approach as humans generally follow while making
decisions.
3. Interpretation of a complex Decision Tree model can be simpli ed by its
visualizations. Even a naive person can understand logic.
4. The Number of hyper-parameters to be tuned is almost null.

Disadvantages:

1. There is a high probability of over tting in Decision Tree.


2. Generally, it gives low prediction accuracy for a dataset as compared to
other machine learning algorithms.
3. Information gain in a decision tree with categorical variables gives a biased
response for attributes with greater no. of categories.
4. Calculations can become complex when there are many class labels.

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 13/16
11/20/2017 How Decision Tree Algorithm works

Follow us:

FACEBOOK| QUORA |TWITTER| GOOGLE+ | LINKEDIN| REDDIT | FLIPBOARD | MEDIUM | GIT

I hope you like this post. If you have any questions, then feel free to comment
below.  If you want me to write on one particular topic, then do tell it to me in
the comments below.

Related Courses:

Do check out unlimited data science courses

Title & links Details What You W

Master
Learnin
Machine Learning A-Z: Hands- Make r
On Python & R In Data Science Learnin
Handle
like Rei
Students Enrolled:: 19,359
Learnin
Course Overall Rating:: 4.6  Deep L
Build a
powerf
Learnin
know h
them to
problem

Machine Learning: Classi cation Course Overall Rating:: 4.7


Describ
output
classi c
Build a
model t

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 14/16
11/20/2017 How Decision Tree Algorithm works

sentim
review
Analyze
to pred
default

Evaluat
using p
metrics
Implem
techniq
(or in th
your ch
Python
recomm

Practical Machine Learning Course Overall Rating:: 4.4


This co
the bas
of build
applyin
functio
empha
applica
Will pro
ground
such as
tests se
and err
introdu
model-
algorith
learnin
These m
includin
classi c

http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 15/16
11/20/2017 How Decision Tree Algorithm works

Naive B
random
The cou
the com
of build
functio
data co
creatio
and eva

Share this:

       
3 150

Related

visualize decision tree in How the random forest Building Decision Tree
python with graphviz algorithm works in machine Algorithm in Python with
April 21, 2017 learning scikit learn
In "Machine Learning" May 22, 2017 February 1, 2017
In "Machine Learning" In "Data Science"

classi cation algorithms Decision tree


http://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/ 16/16

You might also like