Classification Trees - CART and CHAID
Classification Trees - CART and CHAID
CHAID
U DINESH KUMAR
Classification/Decision Trees
• Introduction to Decision Trees.
• The root node is split into two or more branches (called edges)
using split criteria resulting in internal nodes. Each internal
node will have exactly one incoming edge.
2
Gini(k) P( j k )(1 P( j k )) 2 P( j k )(1 P( j k ))
j 1
Smaller number
implies less
impurity
Entropy Calculation
Number of classes 2 (say 0 and 1)
2
Entropy(k) P( j k ) log( P( j k ))
j 1
2
χ statistic
O E
2
E
row sum column sum
Expected frequency E
total sum
Chi-Square test of Independence
Observed Expected
frequency Frequency (O-E)^2/E
0DM-1 135 82.2 33.91
0DM-0 139 191.8 14.53
N0DM-1 165 217.8 12.8
N0DM-0 561 508.2 5.48
1000 1000 66.73
P-value 3.10E-16
Expected
Response Variable
Predictor Total
Y=1 Y=0
0DM = 1 62.43875 146.56 209
CHAID Tree 0DM = 0 176.5613 414.44 591
Total 239 561 800
Business rules:
CHAID Tree
Node 7, 90% accuracy
2
Gini(k) P( j k )(1 P( j k )) 2 P( j k )(1 P( j k ))
j 1
Smaller number
implies less
impurity
Entropy Calculation
Number of classes 2 (say 0 and 1)
2
Entropy(k) P( j k ) log( P( j k ))
j 1