0% found this document useful (0 votes)

64 views

Example Decision Tree

The document discusses using gini index and information gain as criteria for constructing a decision tree from sample data. It calculates the gini index and information gain for each of four attributes (A, B, C, D) using the sample data, which has continuous values converted to categorical. The attribute with the lowest gini index or highest information gain would be the best attribute to use as the root node of the decision tree.

Uploaded by

Tu Phung

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views

Example Decision Tree

Uploaded by

Tu Phung

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 8

Gini Index

Gini Index is a metric to measure how often a randomly chosen element would be incorrectly
identified. It means an attribute with lower gini index should be preferred.

Example: Construct a Decision Tree by using “gini index” as a criterion

We are going to use same data sample that we used for information gain example. Let’s try to
use gini index as a criterion. Here, we have 5 columns out of which 4 columns have continuous
data and 5th column consists of class labels.

A, B, C, D attributes can be considered as predictors and E column class labels can be considered
as a target variable. For constructing a decision tree from this data, we have to convert
continuous data into categorical data.

We have chosen some random values to categorize each attribute:

A B C D

>= 5 >= 3.0 >=4.2 >= 1.4

<5 < 3.0 < 4.2 < 1.4

Gini Index for Var A

Var A has value >=5 for 12 records out of 16 and 4 records with value <5 value.

 For Var A >= 5 & class == positive: 5/12

 For Var A >= 5 & class == negative: 7/12
o gini(5,7) = 1- ( (5/12)2 + (7/12)2 ) = 0.4860
 For Var A <5 & class == positive: 3/4
 For Var A <5 & class == negative: 1/4
o gini(3,1) = 1- ( (3/4)2 + (1/4)2 ) = 0.375

By adding weight and sum each of the gini indices:

Gini Index for Var B

Var B has value >=3 for 12 records out of 16 and 4 records with value <5 value.

 For Var B >= 3 & class == positive: 8/12

 For Var B >= 3 & class == negative: 4/12
o gini(8,4) = 1- ( (8/12)2 + (4/12)2 ) = 0.446
 For Var B <3 & class == positive: 0/4
 For Var B <3 & class == negative: 4/4
o gin(0,4) = 1- ( (0/4)2 + (4/4)2 ) = 0

Gini Index for Var C

Var C has value >=4.2 for 6 records out of 16 and 10 records with value <4.2 value.

 For Var C >= 4.2 & class == positive: 0/6

 For Var C >= 4.2 & class == negative: 6/6
o gini(0,6) = 1- ( (0/8)2 + (6/6)2 ) = 0
 For Var C < 4.2& class == positive: 8/10
 For Var C < 4.2 & class == negative: 2/10
o gin(8,2) = 1- ( (8/10)2 + (2/10)2 ) = 0.32

Gini Index for Var D

Var D has value >=1.4 for 5 records out of 16 and 11 records with value <1.4 value.
 For Var D >= 1.4 & class == positive: 0/5
 For Var D >= 1.4 & class == negative: 5/5
o gini(0,5) = 1- ( (0/5)2 + (5/5)2 ) = 0
 For Var D < 1.4 & class == positive: 8/11
 For Var D < 1.4 & class == negative: 3/11
o gin(8,3) = 1- ( (8/11)2 + (3/11)2 ) = 0.397

wTarget Target

Positive Negative Positive Negative

>= >=
5 7 8 4
5.0 3.0
A B
<5 3 1 < 3.0 0 4

Ginin Index of A = 0.45825 Gini Index of B= 0.3345

Target
Target
Positive Negative
Positive Negative
>=
>= 4.2 0 6 0 5
1.4
C D
< 4.2 8 2
< 1.4 8 3
Gini Index of C= 0.2
Gini Index of D= 0.273
Entropy

Example: Construct a Decision Tree by using “information gain” as a criterion

We are going to use this data sample. Let’s try to use
information gain as a criterion. Here, we have 5 columns out of which 4 columns have
continuous data and 5th column consists of class labels.

We have chosen some random values to categorize each attribute:

A B C D

>= 5 >= 3.0 >= 4.2 >= 1.4

<5 < 3.0 < 4.2 < 1.4

There are 2 steps for calculating information gain for each attribute:

1. Calculate entropy of Target.

2. Entropy for every attribute A, B, C, D needs to be calculated. Using information gain
formula we will subtract this entropy from the entropy of target. The result is Information
Gain.

The entropy of Target: We have 8 records with negative class and 8 records with positive class.
So, we can directly estimate the entropy of target as 1.

Variable E

Positive Negative
8 8

Calculating entropy using formula:

E(8,8) = -1( (p(+ve)log( p(+ve)) + (p(-ve)*log( p(-ve)) )

= -1*( (8/16)*log2(8/16)) + (8/16) * log2(8/16) )
=1

Information gain for Var A

Var A has value >=5 for 12 records out of 16 and 4 records with value <5 value.

 For Var A >= 5 & class == positive: 5/12

 For Var A >= 5 & class == negative: 7/12
o Entropy(5,7) = -1 * ( (5/12)*log2(5/12) + (7/12)*log2(7/12)) = 0.9799
 For Var A <5 & class == positive: 3/4
 For Var A <5 & class == negative: 1/4
o Entropy(3,1) = -1 * ( (3/4)*log2(3/4) + (1/4)*log2(1/4)) = 0.81128

Entropy(Target, A) = P(>=5) * E(5,7) + P(<5) * E(3,1)

= (12/16) * 0.9799 + (4/16) * 0.81128 = 0.937745

Information gain for Var B

Var B has value >=3 for 12 records out of 16 and 4 records with value <5 value.

 For Var B >= 3 & class == positive: 8/12

 For Var B >= 3 & class == negative: 4/12
o Entropy(8,4) = -1 * ( (8/12)*log2(8/12) + (4/12)*log2(4/12)) = 0.39054
 For VarB <3 & class == positive: 0/4
 For Var B <3 & class == negative: 4/4
o Entropy(0,4) = -1 * ( (0/4)*log2(0/4) + (4/4)*log2(4/4)) = 0

Entropy(Target, B) = P(>=3) * E(8,4) + P(<3) * E(0,4)

= (12/16) * 0.39054 + (4/16) * 0 = 0.292905

Information gain for Var C

Var C has value >=4.2 for 6 records out of 16 and 10 records with value <4.2 value.

 For Var C >= 4.2 & class == positive: 0/6

 For Var C >= 4.2 & class == negative: 6/6
o Entropy(0,6) = 0
 For VarC < 4.2 & class == positive: 8/10
 For Var C < 4.2 & class == negative: 2/10
o Entropy(8,2) = 0.72193

Entropy(Target, C) = P(>=4.2) * E(0,6) + P(< 4.2) * E(8,2)

= (6/16) * 0 + (10/16) * 0.72193 = 0.4512

Information gain for Var D

Var D has value >=1.4 for 5 records out of 16 and 11 records with value <5 value.

 For Var D >= 1.4 & class == positive: 0/5

 For Var D >= 1.4 & class == negative: 5/5
o Entropy(0,5) = 0
 For Var D < 1.4 & class == positive: 8/11
 For Var D < 14 & class == negative: 3/11
o Entropy(8,3) = -1 * ( (8/11)*log2(8/11) + (3/11)*log2(3/11)) = 0.84532

Entropy(Target, D) = P(>=1.4) * E(0,5) + P(< 1.4) * E(8,3)

= 5/16 * 0 + (11/16) * 0.84532 = 0.5811575

Target Target

Positive Negative Positive Negative

>= >=
5 7 8 4
5.0 3.0
A B
<5 3 1 < 3.0 0 4

Information Gain of A = 0.062255 Information Gain of B= 0.7070795

Target Target

Positiv Negative Positive Negative

e
>=
0 5
1.4
>= 4.2 0 6 D
C
< 1.4 8 3
< 4.2 8 2
Information Gain of D= 0.41189
Information Gain of C= 0.5488

From the above Information Gain calculations, we can build a decision tree. We should place the
attributes on the tree according to their values.

An Attribute with better value than other should position as root and A branch with entropy 0
should be converted to a leaf node. A branch with entropy more than 0 needs further splitting.

Tham khảo: https://dataaspirant.com/2017/01/30/how-decision-tree-algorithm-works/

Sullivan Wicks Solution Manual Introduction To Optimum Design 3rd Ed Jasbir Arora Students Would Be Able To Come Up With Innovative Conceptual Solutions in W PDF
0% (7)
Sullivan Wicks Solution Manual Introduction To Optimum Design 3rd Ed Jasbir Arora Students Would Be Able To Come Up With Innovative Conceptual Solutions in W PDF
3 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
10 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
01 Section 6.2.1 QR Code Content
No ratings yet
01 Section 6.2.1 QR Code Content
5 pages
Decision Tree
No ratings yet
Decision Tree
2 pages
P9-10 ClassBasic
No ratings yet
P9-10 ClassBasic
82 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
DWDM final5
No ratings yet
DWDM final5
45 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
MIS416 Chapter6 by DrAsimAlwabel
No ratings yet
MIS416 Chapter6 by DrAsimAlwabel
73 pages
Attribute Selection Presentation by - Rohit Ghosh
No ratings yet
Attribute Selection Presentation by - Rohit Ghosh
11 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Data Mining Algorithms Classification L4
No ratings yet
Data Mining Algorithms Classification L4
7 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
ML Unit 3_Questions
No ratings yet
ML Unit 3_Questions
7 pages
id3algorithm-200307175839
No ratings yet
id3algorithm-200307175839
22 pages
VII - CS8031 - DMDW - Module 6 - Classification - VBP
No ratings yet
VII - CS8031 - DMDW - Module 6 - Classification - VBP
99 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
DAA Project
No ratings yet
DAA Project
20 pages
Decision Tree
No ratings yet
Decision Tree
33 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
42 pages
ML Unit-2
No ratings yet
ML Unit-2
16 pages
Decision Trees
No ratings yet
Decision Trees
13 pages
University of Gondar: August 2011 E.C Gondar, Ethiopia
No ratings yet
University of Gondar: August 2011 E.C Gondar, Ethiopia
10 pages
mod3 answers
No ratings yet
mod3 answers
15 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
Gini Vs Entrophy
No ratings yet
Gini Vs Entrophy
8 pages
Decision Tree and Cart
No ratings yet
Decision Tree and Cart
6 pages
Decision Tree
No ratings yet
Decision Tree
25 pages
CH 5
No ratings yet
CH 5
81 pages
Machine Learning: BY:Vatsal J. Gajera (09BCE010)
No ratings yet
Machine Learning: BY:Vatsal J. Gajera (09BCE010)
25 pages
Unit 6 Finalized
No ratings yet
Unit 6 Finalized
30 pages
dm 3
No ratings yet
dm 3
37 pages
Ch05-DT1-Dr Amin ML
No ratings yet
Ch05-DT1-Dr Amin ML
26 pages
Decision Tree Example
No ratings yet
Decision Tree Example
21 pages
Unit-6: Classification and Prediction
No ratings yet
Unit-6: Classification and Prediction
63 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Decision Trees
No ratings yet
Decision Trees
25 pages
Decision Trees - Detailed Notes
No ratings yet
Decision Trees - Detailed Notes
8 pages
04 Classification
No ratings yet
04 Classification
72 pages
06-Classification_Part1
No ratings yet
06-Classification_Part1
44 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
08 Class Basic
No ratings yet
08 Class Basic
76 pages
Module 3
No ratings yet
Module 3
132 pages
How To Minimize Misclassification Rate and Expected Loss For Given Model
No ratings yet
How To Minimize Misclassification Rate and Expected Loss For Given Model
7 pages
Classification: Basic Concepts
No ratings yet
Classification: Basic Concepts
73 pages
Decision Trees & Overfi/ng
No ratings yet
Decision Trees & Overfi/ng
32 pages
ML intro
No ratings yet
ML intro
45 pages
Classification Intr DT .Pptx
No ratings yet
Classification Intr DT .Pptx
31 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
Decision Tree
No ratings yet
Decision Tree
18 pages
ID3 Lecture4
No ratings yet
ID3 Lecture4
25 pages
Session 6 - Decision Tree
No ratings yet
Session 6 - Decision Tree
37 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Module 3
No ratings yet
Module 3
101 pages
FALLSEM2024-25 BCSE209L TH VL2024250101586 2024-07-30 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101586 2024-07-30 Reference-Material-I
22 pages
ML UNIT-2 Notes
No ratings yet
ML UNIT-2 Notes
15 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Elementary Algebra Notes Examples and Exercises
From Everand
Elementary Algebra Notes Examples and Exercises
George N. Frempong
No ratings yet
Real Estate Math Express: Rapid Review and Practice with Essential License Exam Calculations
From Everand
Real Estate Math Express: Rapid Review and Practice with Essential License Exam Calculations
Stephen Mettling
No ratings yet
T.madas F 2
No ratings yet
T.madas F 2
7 pages
UGTRB Maths Unit 1 Model Question Paper PDF Download
No ratings yet
UGTRB Maths Unit 1 Model Question Paper PDF Download
24 pages
Asymptotic Notations: Aarthi D, Ap / Cse
No ratings yet
Asymptotic Notations: Aarthi D, Ap / Cse
12 pages
Implementation of V Um at
No ratings yet
Implementation of V Um at
24 pages
Unit1 DBMS
No ratings yet
Unit1 DBMS
112 pages
Combining Task and Motion Planning: A Culprit Detection Problem
No ratings yet
Combining Task and Motion Planning: A Culprit Detection Problem
40 pages
Editorial: New Trends in Nonlinear Control Systems and Applications
No ratings yet
Editorial: New Trends in Nonlinear Control Systems and Applications
3 pages
MAT1106-Formula Sheet For Exam
No ratings yet
MAT1106-Formula Sheet For Exam
3 pages
Weather Forecasting Using Neural Network IJERTCONV5IS01197
No ratings yet
Weather Forecasting Using Neural Network IJERTCONV5IS01197
4 pages
Examples of Syndrome Decoding: Ex 1 Let C
100% (1)
Examples of Syndrome Decoding: Ex 1 Let C
2 pages
Fibonacci Sequences: (Natural Models)
No ratings yet
Fibonacci Sequences: (Natural Models)
32 pages
1 s2.0 S2214509522001784 Main
No ratings yet
1 s2.0 S2214509522001784 Main
17 pages
01 Paper 05 Vector Analysis
No ratings yet
01 Paper 05 Vector Analysis
2 pages
Andreas M. Lauchli - Numerical Simulations of Frustrated Systems
No ratings yet
Andreas M. Lauchli - Numerical Simulations of Frustrated Systems
34 pages
Unit Iii Notes - Final
No ratings yet
Unit Iii Notes - Final
31 pages
Assignment 3
No ratings yet
Assignment 3
8 pages
ICSE Class 8 Maths Selina Solutions Chapter 15 Linear Inequations
No ratings yet
ICSE Class 8 Maths Selina Solutions Chapter 15 Linear Inequations
5 pages
Sphincs Plus C pqc2022
No ratings yet
Sphincs Plus C pqc2022
39 pages
MUF0142 Sample Exam Questions 4
No ratings yet
MUF0142 Sample Exam Questions 4
16 pages
01 - Lecture Slide - Overview of Tensorflow
100% (1)
01 - Lecture Slide - Overview of Tensorflow
65 pages
Transportation and Assignment Problem
No ratings yet
Transportation and Assignment Problem
23 pages
Principles and Practice of Automatic Process Control: Carlos A. Smith, PH.D., P.E
No ratings yet
Principles and Practice of Automatic Process Control: Carlos A. Smith, PH.D., P.E
11 pages
PROJECT
No ratings yet
PROJECT
71 pages
Steiner Tree Problem
No ratings yet
Steiner Tree Problem
5 pages
65 SC Tae1 A3
No ratings yet
65 SC Tae1 A3
3 pages
Lecture11 K20GR
No ratings yet
Lecture11 K20GR
24 pages
Information Retrieval System
No ratings yet
Information Retrieval System
10 pages
Download Modern Statistics With R From Wrangling and Exploring Data to Inference and Predictive Modelling second edition Måns Thulin ebook All Chapters PDF
No ratings yet
Download Modern Statistics With R From Wrangling and Exploring Data to Inference and Predictive Modelling second edition Måns Thulin ebook All Chapters PDF
71 pages
Meshref 5 DR - Hossam PredictingLoanApproval CMM2020 Dec2020
No ratings yet
Meshref 5 DR - Hossam PredictingLoanApproval CMM2020 Dec2020
10 pages