0% found this document useful (0 votes)

699 views

03 - Decision - Tree - Hunt Algorithm

Hunt's algorithm is a decision tree induction algorithm developed in 1966. It builds a decision tree recursively by splitting the training records at each node into purer subsets based on an attribute test condition. If all records at a node belong to the same class, it becomes a leaf node labeled with that class. Otherwise, the records are partitioned into smaller subsets using an attribute test, creating a child node for each outcome. This process is repeated recursively on each child node until stopping criteria are met. The algorithm aims to construct a tree that accurately predicts class labels for new data instances.

Uploaded by

Avin Unggul Wijaya XI-MIPA-2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

699 views

03 - Decision - Tree - Hunt Algorithm

Uploaded by

Avin Unggul Wijaya XI-MIPA-2

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 28

Hunt’s Algorithm

Algoritma Hunt

Sekolah Tinggi Ilmu Statistik Jakarta 1

Decision Tree Induction Algorithms
Number of Algorithms:
• Hunt’s
– Hunt's Algorithm (1966)
• Quinlan's
– Iterative Dichotomizer3 (1975) uses Entropy
– C4.5 / 4.8 / 5.0 (1993) uses Entropy
• Brieman's
– Classification And Regression Trees (1984) uses Gini
• Kass's
– CHi-squared Automatic Interaction Detector (1980) uses ____
• IBM:
Mehta
– Supervised Learning In Quest (1996) uses Gini
Shafer
– Scalable PaRallelizable INduction of decision Trees (1996) uses Gini
Sekolah Tinggi Ilmu Statistik Jakarta 2
Hunt’s Algorithm
• In the Hunt’s algorithm, a decision tree is
grown in a recursive fashion by partitioning
the training records successively into purer
subsets

Sekolah Tinggi Ilmu Statistik Jakarta 3

Hunt’s Algorithm
• Let Dt be the set of training records that are
associated with node t and y = {y1, y2, · · · , yc}
be the class labels. The following is a recursive
definition of Hunt’s algorithm.

• Step 1: If all the records in Dt belong to the

same class yt, then t is a leaf node labeled as
yt.

Sekolah Tinggi Ilmu Statistik Jakarta 4

Hunt’s Algorithm
• Step 2: If Dt contains records that belong to
more than one class, an attribute test
condition is used to partition the records into
smaller subsets. A child node is then created
for each outcome of the test condition. The
records in Dt are distributed to the children
based upon their outcomes. This procedure is
repeated for each child node.

Sekolah Tinggi Ilmu Statistik Jakarta 5

Hunt’s Algorithm
Dt = {training records @ node t}
• If Dt = {records from different classes}
– Split Dt into smaller subsets via attribute test
– Traverse each subset with same rules
• If Dt = {records from single class yt}
– Set Node t = leaf node with class label yt
• If Dt = {} (empty)
– Set Node t = leaf node with default class label
yd
• Recursively apply above criterion until ...
– No more training records left
Sekolah Tinggi Ilmu Statistik Jakarta 6
Example
• Consider the problem of predicting whether a
loan applicant will succeed in repaying her
loan obligations or become delinquent, and
subsequently, default on her loan.

• The training set used for predicting borrowers

who will default on their loan payments will
be as follows.

Sekolah Tinggi Ilmu Statistik Jakarta 7

Example. Figure1

Sekolah Tinggi Ilmu Statistik Jakarta 8

Example
• A training set for this problem can be
constructed by examining the historical
records of previous loan borrowers.
• In the training set shown in Figure 1, each
record contains the personal information of a
borrower along with a class label indicating
whether the borrower has defaulted on her
loan payments.

Sekolah Tinggi Ilmu Statistik Jakarta 9

Example
• The initial tree for the classification problem
contains a single node with class label
Defaulted = No as illustrated below:

Figure 1a: Step 1

• This means that most of the borrowers had
successfully repayed their loans.
• However, the tree needs to be refined since
the root node contains records from both
classes. Sekolah Tinggi Ilmu Statistik Jakarta 10
Example
• The records are subsequently divided into smaller
subsets based on the outcomes of the Home
Owner test condition, as shown in Figure below:

Figure 1b: Step 2

• The reason for choosing this attribute test

condition instead of others is an implementation
issue that will be discussed later.

Sekolah Tinggi Ilmu Statistik Jakarta 11

Example
• Now we can assume that this is the best
criterion for splitting the data at this point.

• The Hunt’s algorithm is then applied

recursively to each child of the root node.

• From the training set given in Figure 1, notice

that all borrowers who are home owners had
successfully repayed their loan.
Sekolah Tinggi Ilmu Statistik Jakarta 12
Example
• As a result, the left child of the root is a leaf
node labeled as Defaulted = No as shown in
figure 1b

• For the right child of the root node, we need

to continue applying the recursive step of
Hunt’s algorithm until all the records belong
to the same class.

Sekolah Tinggi Ilmu Statistik Jakarta 13

Example
• This recursive step is shown in Figures 1c and
d below:

Figure1c: Step 3 Figure 1d: step 4

Sekolah Tinggi Ilmu Statistik Jakarta 14

Example
• Generally the whole diagram will be as follows

Sekolah Tinggi Ilmu Statistik Jakarta 15

Design Issues of Decision Tree
Induction
• How to split the training records? - Each
recursive step of the tree growing process
requires an attribute test condition to divide
the records into smaller subsets.
• To implement this step, the algorithm must
provide a method for specifying the test
condition for different attribute types as well as
an objective measure for evaluating the
goodness of each test condition.
Sekolah Tinggi Ilmu Statistik Jakarta 16
Design Issues of Decision Tree
Induction
• When to stop splitting? A stopping condition is
needed to terminate the tree growing
process.

• A possible strategy is to continue expanding a

node until all the records belong to the same
class or if all the records have identical
attribute values.

Sekolah Tinggi Ilmu Statistik Jakarta 17

How to Split an Attribute
• Before automatically creating a decision tree,
you can choose from several splitting
functions that are used to determine which
attribute to split on. The following splitting
functions are available:
– Random - The attribute to split on is chosen
randomly.
– Information Gain - The attribute to split on is the
one that has the maximum information gain.
Sekolah Tinggi Ilmu Statistik Jakarta 18
How to Split an Attribute
– Gain Ratio - Selects the attribute with the highest
information gain to number of input values ratio.
The number of input values is the number of
distinct values of an attribute occurring in the
training set.

– GINI - The attribute with the highest GINI index is

chosen. The GINI index is a measure of impurity of
the examples.

Sekolah Tinggi Ilmu Statistik Jakarta 19

Training Dataset
Age Income Student CreditRating BuysComputer
<=30 high no fair no
<=30 high no excellent no
31 - 40 high no fair yes
>40 medium no fair yes
>40 low yes fair yes
>40 low yes excellent no
31 - 40 low yes excellent yes
<=30 medium no fair no
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31 - 40 medium no excellent yes
31 - 40 high yes fair yes
Sekolah Tinggi Ilmu Statistik Jakarta 20
>40 medium no excellent no
Resultant Decision Tree

Sekolah Tinggi Ilmu Statistik Jakarta 21

Attribute Selection Measure:
Information Gain (ID3/C4.5)
• The attribute selection mechanism used in ID3
and based on work on information theory by
Claude Shannon
• If our data is split into classes according to
fractions {p1,p2…, pm} then the entropy is
measured as the info required to classify any
arbitrary tuple as follows:
m
E ( p1 ,p2 ,...,pm )   pi log 2  pi 
i 1
Sekolah Tinggi Ilmu Statistik Jakarta 22
Attribute Selection Measure:
Information Gain (ID3/C4.5) (cont…)
• The information measure is essentially the
same as entropy
• At the root node the information is as follows:
9 5
info [9,5]  E  , 
 14 14 
9 9 5  5
  log 2    log 2  
14  14  14  14 
 0.94
Sekolah Tinggi Ilmu Statistik Jakarta 23
Attribute Selection Measure:
Information Gain (ID3/C4.5) (cont…)
• To measure the information at a particular
attribute we measure info for the various
splits of that attribute
• For instance with age attribute look at the
distribution of ‘Yes’ and ‘No’ samples for each
value of age. Compute the expected
information for each of these distribution.
• For age “<=30”

Sekolah Tinggi Ilmu Statistik Jakarta 24

Attribute Selection Measure:
Information Gain (ID3/C4.5) (cont…)
• At the age attribute the information is as
follows:
5 4 5
info [2,3], [4,0], [3,2]  info  2,3   info  4,0   info  3,2 
14 14 14
5 2 2 3  3 
   log 2    log 2   
14  5 5 5  5 
4 4 4 0  0 
   log 2    log 2   
14  4 4 4  4 
5 3 3 2  2 
   log 2    log 2   
14  5 5 5  5 
 0Sekolah
.694Tinggi Ilmu Statistik Jakarta 25
Attribute Selection Measure:
Information Gain (ID3/C4.5) (cont…)
• In order to determine which attributes we
should use at each node we measure the
information gained in moving from one node
to another and choose the one that gives us
the most information

Sekolah Tinggi Ilmu Statistik Jakarta 26

Attribute Selection By Information
Gain Example
• Class P: BuysComputer = “yes”
• Class N: BuysComputer = “no”
– I(p, n) = I(9, 5) =0.940
• Compute the entropy for age:
Age Income Student CreditRating BuysComputer
<=30 high no fair no Age pi ni I(pi, ni)
<=30 high no excellent no
31 - 40 high no fair yes >=30 2 3 0.971
>40 medium no fair yes
>40 low yes fair yes 30 – 40 4 0 0
>40 low yes excellent no
31 - 40 low yes excellent yes
<=30 medium no fair no
>40 3 2 0.971
<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium yes excellent yes
31 - 40 medium no excellent yes
31 - 40 high yes Sekolah
fair Tinggi Ilmu Statistik Jakarta
yes 27
Attribute Selection By Information
Gain Computation
5 4 5
E (age)  I (2,3)  I (4,0)  I (3,2)
14 14 14
 0.694
• means “age <=30” has 5 out of 14 samples,
with 2 yes and 3 no. Hence:
Similarly:

Gain(income)  0.029
Gain( student )  0.151
Gain(credit _ rating )  0.048
Sekolah Tinggi Ilmu Statistik Jakarta 28

Practical File: Internet Programming Lab
No ratings yet
Practical File: Internet Programming Lab
26 pages
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
No ratings yet
Module-3 Association Analysis: Data Mining Association Analysis: Basic Concepts and Algorithms
34 pages
Unit Commitment PDF
100% (3)
Unit Commitment PDF
37 pages
Distance-Based Methods - KNN
No ratings yet
Distance-Based Methods - KNN
8 pages
ML Unit-3 ppt
No ratings yet
ML Unit-3 ppt
92 pages
Question Bank 1to11
No ratings yet
Question Bank 1to11
19 pages
Unit 5
No ratings yet
Unit 5
104 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
2 pages
FIND-S Algorithm: Machine Learning 15CSL76
No ratings yet
FIND-S Algorithm: Machine Learning 15CSL76
3 pages
CP4252 Machine Learning lab manual
No ratings yet
CP4252 Machine Learning lab manual
37 pages
Machine Learning: PAC-Learning and VC-Dimension
No ratings yet
Machine Learning: PAC-Learning and VC-Dimension
31 pages
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
No ratings yet
Unit - IV - DIMENSIONALITY REDUCTION AND GRAPHICAL MODELS
59 pages
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
No ratings yet
Objective: For One Dimensional Data Set (7,10,20,28,35), Perform Hierarchical Clustering
13 pages
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
33 pages
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
No ratings yet
Question Bank Module-1: Department of Computer Applications 18mca53 - Machine Learning
7 pages
DAP Lab Manual
No ratings yet
DAP Lab Manual
20 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Unit-3 DWDM
No ratings yet
Unit-3 DWDM
11 pages
Support Vector Machine - Explanation
No ratings yet
Support Vector Machine - Explanation
12 pages
Ai-Unit2 - QB-VDP
No ratings yet
Ai-Unit2 - QB-VDP
13 pages
Da Unit-2
No ratings yet
Da Unit-2
23 pages
Practical 5: Introduction To Weka For Classfication
100% (1)
Practical 5: Introduction To Weka For Classfication
4 pages
Concept Learning
No ratings yet
Concept Learning
62 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
No ratings yet
1) Aim: Demonstration of Preprocessing of Dataset Student - Arff
26 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
unit V
No ratings yet
unit V
67 pages
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
No ratings yet
Subsets, Graph Coloring, Hamiltonian Cycles, Knapsack Problem. Traveling Salesperson Problem
22 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Jntuk Machine Learning 3-2 Unit-4
No ratings yet
Jntuk Machine Learning 3-2 Unit-4
32 pages
ML Lab Manual (1-10) FINAL
No ratings yet
ML Lab Manual (1-10) FINAL
34 pages
FDS Unit 5
No ratings yet
FDS Unit 5
22 pages
FDS Lesson Plan
No ratings yet
FDS Lesson Plan
8 pages
Ad3381 DDM Lab Manual
No ratings yet
Ad3381 DDM Lab Manual
55 pages
Lec01 Conceptlearning
100% (1)
Lec01 Conceptlearning
49 pages
DAA Question Bank
No ratings yet
DAA Question Bank
39 pages
IF4071 - Deep Learning Laboratory
No ratings yet
IF4071 - Deep Learning Laboratory
1 page
UNIT IV (Well Posed Leaning Problems)
100% (1)
UNIT IV (Well Posed Leaning Problems)
16 pages
Anna University, Chennai Non-Autonomous Affiliated Colleges Regulations 2021 Choice Based Credit System B.E. Computer Science and Engineering
No ratings yet
Anna University, Chennai Non-Autonomous Affiliated Colleges Regulations 2021 Choice Based Credit System B.E. Computer Science and Engineering
86 pages
Big Data Analytics Lab Manual
No ratings yet
Big Data Analytics Lab Manual
80 pages
AoA Important Question
100% (1)
AoA Important Question
3 pages
ML Unit-Iv
No ratings yet
ML Unit-Iv
136 pages
K Mean Clustering
No ratings yet
K Mean Clustering
36 pages
Pincer Search Algo
No ratings yet
Pincer Search Algo
8 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
No ratings yet
JNTUK R20 B.Tech CSE 3-2 Machine Learning Unit 4 Notes
23 pages
LP I ML Viva Questions
100% (1)
LP I ML Viva Questions
9 pages
DBMS Unit 3
No ratings yet
DBMS Unit 3
98 pages
Data Mining Notes Jntuh Compress
No ratings yet
Data Mining Notes Jntuh Compress
62 pages
Data Mining Mcqs PDF
No ratings yet
Data Mining Mcqs PDF
39 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Memory Bounded1
No ratings yet
Memory Bounded1
17 pages
IML-IITKGP - Assignment 7 Solution
No ratings yet
IML-IITKGP - Assignment 7 Solution
8 pages
UNIT 2 - Notes
No ratings yet
UNIT 2 - Notes
31 pages
Planning and Acting in The Real World
No ratings yet
Planning and Acting in The Real World
31 pages
Software Engineering Tools and Practices
No ratings yet
Software Engineering Tools and Practices
37 pages
Question Bank AML
No ratings yet
Question Bank AML
4 pages
Unit 3
No ratings yet
Unit 3
95 pages
DMDW_Classification
No ratings yet
DMDW_Classification
18 pages
Module 4
No ratings yet
Module 4
41 pages
Week 3 Relations & Functions
No ratings yet
Week 3 Relations & Functions
25 pages
CSP '22-23 Unit 6 - Algorithms
No ratings yet
CSP '22-23 Unit 6 - Algorithms
98 pages
CS502 Mcqs MidTerm by Vu Topper RM
No ratings yet
CS502 Mcqs MidTerm by Vu Topper RM
45 pages
Rec7 Sol
No ratings yet
Rec7 Sol
7 pages
Abstract Class PDF
No ratings yet
Abstract Class PDF
26 pages
Sta 1311 - Probability I-1
No ratings yet
Sta 1311 - Probability I-1
37 pages
Pointers
No ratings yet
Pointers
22 pages
22BDS0212 DAA Assignment 2
No ratings yet
22BDS0212 DAA Assignment 2
8 pages
Computer Number Systems and Boolean Algebra Notes Formula Sheet
No ratings yet
Computer Number Systems and Boolean Algebra Notes Formula Sheet
2 pages
Boolean Algebra
No ratings yet
Boolean Algebra
31 pages
Function Operations and Composition
No ratings yet
Function Operations and Composition
34 pages
MMW - Functions
No ratings yet
MMW - Functions
29 pages
Digital Logic Design Boolean Algebra and Logic Simplification
No ratings yet
Digital Logic Design Boolean Algebra and Logic Simplification
46 pages
Mechatronics 1
No ratings yet
Mechatronics 1
12 pages
hw2 Description
No ratings yet
hw2 Description
3 pages
DMS Question Bank (Cse It Aids)
No ratings yet
DMS Question Bank (Cse It Aids)
11 pages
Class 11 Assignment 15 (Prac)
No ratings yet
Class 11 Assignment 15 (Prac)
3 pages
1.relations and Functions - Self Evaluation
No ratings yet
1.relations and Functions - Self Evaluation
1 page
Bit-Wise Operators
No ratings yet
Bit-Wise Operators
8 pages
Introduction To Python
No ratings yet
Introduction To Python
29 pages
0x08. C - Recursion
0% (1)
0x08. C - Recursion
13 pages
LESSON 4 in Math in The Modern World
No ratings yet
LESSON 4 in Math in The Modern World
8 pages
Chisel3 Cheat Sheet: Basic Data Types
No ratings yet
Chisel3 Cheat Sheet: Basic Data Types
2 pages
Introduction To Logic
No ratings yet
Introduction To Logic
19 pages
System of Linear Equation
No ratings yet
System of Linear Equation
3 pages
Module 1: Fibonacci Numbers and The Golden Ratio: MMW Reviewer
No ratings yet
Module 1: Fibonacci Numbers and The Golden Ratio: MMW Reviewer
11 pages
Ziegler Trent PDF
No ratings yet
Ziegler Trent PDF
249 pages
Towards A Definition of An Algorithm: Journal of Logic and Computation February 2006
No ratings yet
Towards A Definition of An Algorithm: Journal of Logic and Computation February 2006
39 pages
Tropico
No ratings yet
Tropico
3 pages

03 - Decision - Tree - Hunt Algorithm

Uploaded by

03 - Decision - Tree - Hunt Algorithm

Uploaded by

Hunt’s Algorithm

Sekolah Tinggi Ilmu Statistik Jakarta 1

Sekolah Tinggi Ilmu Statistik Jakarta 3

• Step 1: If all the records in Dt belong to the

Sekolah Tinggi Ilmu Statistik Jakarta 4

Sekolah Tinggi Ilmu Statistik Jakarta 5

• The training set used for predicting borrowers

Sekolah Tinggi Ilmu Statistik Jakarta 7

Sekolah Tinggi Ilmu Statistik Jakarta 8

Sekolah Tinggi Ilmu Statistik Jakarta 9

Figure 1a: Step 1

Figure 1b: Step 2

• The reason for choosing this attribute test

Sekolah Tinggi Ilmu Statistik Jakarta 11

• The Hunt’s algorithm is then applied

• From the training set given in Figure 1, notice

• For the right child of the root node, we need

Sekolah Tinggi Ilmu Statistik Jakarta 13

Figure1c: Step 3 Figure 1d: step 4

Sekolah Tinggi Ilmu Statistik Jakarta 14

Sekolah Tinggi Ilmu Statistik Jakarta 15

• A possible strategy is to continue expanding a

Sekolah Tinggi Ilmu Statistik Jakarta 17

– GINI - The attribute with the highest GINI index is

Sekolah Tinggi Ilmu Statistik Jakarta 19

Sekolah Tinggi Ilmu Statistik Jakarta 21

Sekolah Tinggi Ilmu Statistik Jakarta 24

Sekolah Tinggi Ilmu Statistik Jakarta 26

You might also like