Decision Tree Classification
Decision Tree Classification
Decision Tree Classification
Decision tree builds classification or regression models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an associated decision
tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.
A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast and Rainy).
The topmost decision node in a tree which corresponds to the best predictor called root node.
Decision trees can handle both categorical and numerical data.
Information theory
Introduced by Claude Shannon. It has quantified entropy. This is key measure of information
which is usually expressed by the average number of bits needed to store or communicate one
symbol in a message.
ID3 Algorithm:
Based on greedy search through the space of all possible branches with no backtracking
Entropy
A decision tree is built top-down from a root node and involves partitioning the data into
subsets that contain instances with similar values (homogenous). ID3 algorithm uses entropy to
calculate the homogeneity of a sample. If the sample is completely homogeneous the entropy is
zero and if the sample is an equally divided it has entropy of one.
entropy(p1,p2,…,pn)=−p1log(p1)−p2log(p2)−⋯−pnlog(pn)
To build a decision tree, we need to calculate two types of entropy using frequency tables as
follows:
Information Gain
The information gain is based on the decrease in entropy after a dataset is split on an attribute.
Constructing a decision tree is all about finding attribute that returns the highest information
gain (i.e., the most homogeneous branches).
Step 3: Choose attribute with the largest information gain as the decision node, divide the
dataset by its branches and repeat the same process on every branch.
Step 4a: A branch with entropy of 0 is a leaf node.
Step 4b: A branch with entropy more than 0 needs further splitting.
Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all data is classified.
Decision Tree to Decision Rules
A decision tree can easily be transformed to a set of rules by mapping from the root node to
the leaf nodes one by one.
Overall Entropy
Outlook
= 0.971
= 0.971
= 0.693
Temp
= 0.9109
Gain by splitting at Outlook is largest., so we take outlook as decision node and divide the dataset by
its branches
Overall Entropy
E(Sunny): Yes - 3; No – 2
E(3,2) = E(0.6, 0.4) = -[(0.6) log2 (0.6) + (0.4) log2 (0.4)] = -[(0.6) (-0.737) + (0.4) (-1.322)] = 0.971
Temp
E(2,1) = E(0.67, 0.33) = -[(0.67) log2 (0.67) + (0.33) log2 (0.33)] = -[(0.67) (-0.577) + (0.33) (-1.599)] = 0.914
E(1,1) = E(0.67, 0.33) = -[(0.5) log2 (0.5) + (0.5) log2 (0.5)] = -[(0.5 (-1) + (0.5) (-1)] = 1.0
Humidity
Expected New Entropy for humidity= (2/5) (1.0) + (3/5) (0.914)= 0.9484
Windy
Rainy
Overall Entropy
E(Rainy): Yes - 2; No – 3
Now compute expected new entropy for temp, humidity and windy
Temp
E(0,2) = 0
E(1,1) = 1
E(1,0) = 0
Expected New Entropy for Temp= (2/5) (0) + (2/5)(1.0) + (1/5)(0)= 0.40
Humidity
E(0,3) = 0
E(2,0) = 0
Windy
E(1,2) = 0.914
true: yes: 1; no: 1
E(1,1) = 1
Expected New Entropy for humidity= (3/5) (0.914) + (2/5) (1)= 0.9484