Decision trees in Machine Learning

Decision Trees
CART algorithm
Khan

Introduction
Decision trees
Decision trees are a model
where we break our data by
making decisions using series
of conditions(questions).

Decision tree algorithm
● These are also termed as CART algorithms.
● These are used for
○ Classification and
○ Regression
● Classification and Regression Trees

Decision tree components
● Root node
○ It refers to the start of the decision tree with
maximum split ( information Gain)
● Node
○ Node is a condition with multiple outcomes in
the tree.
● Leaf
○ This is the final decision(end point) of a node
from the condition(question)

Every node yields maximum data in each split which could be
achieved by IG
Information Gain ( IG )

It can be calculated by using impurity measures of each split
1. Gini Index (Ig
)
2. Entropy ( Ih
)
3. Classification error ( Ie
)
Impurity Metrics

● Root node is split to get maximum info gain.
● Increase in nodes in the tree causes overfitting.
● Splitting continues until each of the leaf is pure ( one of the
possible outcome )
● Pruning can also be done which means removal of branches
which use features of low importance.
● Gini index ≅ Entropy
● If uniform distribution , entropy is 1
Principle of spliting nodes

Split A
Parent data set ---> 40 items in feature 1 and 40 items in feature 2
Child 1 → 30 items in feature 1 and 10 items in feature 2
Split B
Parent data set ---> 40 items in feature 1 and 40 items in feature 2

ClsificationError=1−maxpj
IE(Dp)=1−40/80=1−0.5=0.5
B:IE(Dleft)=1−40/60=1−23=1/3
B:IE(Dright)=1−20/20=1−1=0
B:IGE=0.5−60/80×13−20/80×0=0.5−0.25−0=0.25
A:IE(Dleft)=1−30/40=1−34=0.25
A:IE(Dright)=1−30/40=1−34=0.25
A:IGE=0.5−40/80×0.25−40/80×0.25=0.5−0.125−0.125=0.25

Gini=1−∑p2
j
IG(Dp)=1−((40/80)2+(40/80)2)=1−(0.52+0.52)=0.5
A:IG(Dleft)=1−((30/40)2+(10/40)2)=1−(9/16+1/16)=38=0.375
A:IG(Dright)=1−((10/40)2+(30/40)2)=1−(1/16+9/16)=38=0.375
A:IG=0.5−40/80×0.375−40/80×0.375=0.125
B:IG(Dleft)=1−((20/60)2+(40/60)2)=1−(9/16+1/16)=1−59=0.44
B:IG(Dright)=1−((20/20)2+(0/20)2)=1−(1+0)=1−1=0
B:IG=0.5−60/80×0.44−0=0.5−0.33=0.17

Entropy=−∑pj
log2
pj
IH(Dp)=−(0.5log2(0.5)+0.5log2(0.5))=1
A:IH(Dleft)=−(30/40log2(30/40)+10/40log2(10/40))=0.81
A:IH(Dright)=−(10/40log2(1040)+30/40log2(30/40))=0.81
A:IGH=1−40/80×0.81−40/80×0.81=0.19
B:IH(Dleft)=−(20/60log2(20/60)+40/60log2(40/60))=0.92
B:IH(Dright)=−(20/20log2(20/20)+0)=0
B:IGH=1−60/80×0.92−20/80×0=0.31

Comparison of all Impurity Metrics
Scaled Entropy = Entropy /2
Gini index is intermediate
values of impurity lying
between classification error
and Entropy .

Pros :
● Simple to understand, interpret, visualize.
● It is effective to use in numerical and categorical data outcomes.
● Requires little effort from users for data preparation.
● Nonlinear relationships between parameters do not affect tree
performance.
● Able to handle irrelevant attributes ( Gain = 0 )

Cons :
● May make a complex tree with maximum depth.
● Unstable as small variation in input data may result in
completely different tree to get generated.
● As it is a greedy algorithm , may not find globally best tree for a
data set .

Applications :
1. Business Management
2. Customer Relationship Management
3. Fraudulent Statement Detection
4. Engineering
5. Energy Consumption
6. Fault Diagnosis
7. Healthcare Management

References :
● Python Machine Learning By Sebastian Raschka
● https://towardsdatascience.com/decision-trees-in-machine-learning-641b9c4e8
052
● https://www.bogotobogo.com/python/scikit-learn/scikt_machine_learning_Decis
ion_Tree_Learning_Informatioin_Gain_IG_Impurity_Entropy_Gini_Classificatio
n_Error.php
● https://media.ed.ac.uk/media/Pros+and+cons+of+decision+trees/1_p4gyge5m
● https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4466856/
● http://what-when-how.com/artificial-intelligence/decision-tree-applications-for-d
ata-modelling-artificial-intelligence/

Let’s code now
Data used : Iris from Sklearn
Plots : Matplotlib
Splits : Two features at a time
File : dtree.py
Link to code : Click here for code

Decision trees in Machine Learning

More Related Content

Decision trees in Machine Learning