Classification and Regression Trees (CART - III) : DR A. Ramesh
Classification and Regression Trees (CART - III) : DR A. Ramesh
Classification and Regression Trees (CART - III) : DR A. Ramesh
Dr A. RAMESH
DEPARTMENT OF MANAGEMENT STUDIES
1
Agenda
2
Example
Problem Description-
Han, J., Pei, J. and Kamber, M., 2011. Data mining: concepts and
techniques. Elsevier.
3
Import Relevant Libraries and Loading Data File
4
Methods used in Data Encoding
• Fit_transform (): This method is used for Fitting label encoder and return
encoded labels.
5
Data Encoding Procedure
6
Data Encoding
7
Structuring Dataframe
8
Independent and Dependent Variables Selection
9
Build the Decision Tree Model without Splitting
10
Visualizing Decision Tree
11
Decision Tree Visualization
12
Interpretation of the CART Output
13
Calculation of Gini(D)
• We first use the following Equation for Gini index to compute the impurity
of D:
14
Income Attribute
15
Tuples in partition D1
• Low + Medium:
Low + Class: buys computer
Medium
Yes 3+4 =7
No 1+ 2 = 3
16
Tuples in partition D2
• High :
High Class: buys computer
Yes 2
No 2
17
Gini index for income attribute
18
Gini index for income attribute
19
Gini index for income attribute
20
Gini index for income attribute
• Gini income ∈{low, medium}
= 0.443 = Gini income ∈{high}
• Gini income ∈{high, medium}
= 0.45 = Gini income ∈{low}
• Gini income ∈{high, low}
= 0.458 = Gini income ∈{medium}
21
Gini index for Age attribute
22
Gini index for student attribute
23
Gini index for credit_rating attribute
24
Choosing the root node
The attribute with minimum Gini score will be taken, i.e. Age (Gini Age ∈{Youth, Senior} =
0.357 = Gini Age∈{middle_aged} )
25
Gini index for different attributes for sample of 10
• After separating 4 samples belonging middle age, total 10 are remaining:
26
Gini index for different attributes for sample of 10
27
Drawing cart
Age
Youth, senior
yes
No
??? ???
28
For branch Student = No
• Omit the marked rows
(Data entry), either
belonging Age =
middle_aged or student =
Yes
• Total 5 rows are remaining
29
Gini index for different attributes For branch Student = No
30
Drawing cart
Age
Youth, senior
yes
No
??? Age
??? ???
31
For branch Student = Yes
• Omit the marked rows
(Data entry), either
belonging Age =
middle_aged or student =
No
• Total 5 rows are remaining
32
Gini index for different attributes For branch Student = No
33
Drawing cart
Age
Youth, senior
yes No
Credit_rating Age
34
Coding scheme
Age Code Student Code
Youth 2 Yes 1
Middle Age 0 No 0
senior 1 Income Code
High 0
Credit rating Code
Low 1
Fair 1
Medium 2
Excellent 0
Buys computer Class
Yes 1
No 0
35
Values for the dependent
Decision tree variable
Youth, Senior
Middle_age
Decision classifier
• Repeat the
splitting
process until No Yes
we obtain all Number of yes and Sample
the leaf nodes, No in independent size
the final out - variable Excellent Fair
Senior Youth
put:
36
Splitting Dataset
37
Build the Decision Tree Model
38
Evaluating the Model
39
Visualizing Decision Tree
40
Decision Tree Visualization
41
Thank You
42