0% found this document useful (0 votes)

48 views

Lesson 7 Supervised Method (Decision Trees) Algorithms

This document discusses decision trees, a supervised machine learning algorithm used for classification and regression problems. It describes key concepts like root nodes, branches, leaf nodes, and pruning. It also covers attribute selection measures like the Gini index and information gain that help determine the optimal attributes to split nodes on. Common decision tree algorithms like ID3, C4.5, and CART are also summarized along with the core algorithm and steps for building a decision tree model.

Uploaded by

Victor Ajraebrill

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

48 views

Lesson 7 Supervised Method (Decision Trees) Algorithms

Uploaded by

Victor Ajraebrill

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 12

ZETECH UNIVERSITY

DATA SCIENCE PROGRAMMIG WITH PYTHON

Lesson 7: supervised method (Decision trees)algorithms

What is Decision Trees?

 A decision tree is one of the supervised machine learning algorithms. This
algorithm can be used for regression and classification problems — yet, is
mostly used for classification problems.
 A decision tree follows a set of if-else conditions to visualize the data and
classify it according to the conditions.

Important terminology
Root Node: This attribute is used for dividing the data into two or more
sets. The feature attribute in this node is selected based on Attribute
Selection Techniques.
Branch or Sub-Tree: A part of the entire decision tree is called a branch or
sub-tree.

1|P ag e
Splitting: Dividing a node into two or more sub-nodes based on if-else
conditions.
Decision Node: After splitting the sub-nodes into further sub-nodes, then
it is called the decision node.
Leaf or Terminal Node: This is the end of the decision tree where it
cannot be split into further sub-nodes.
Pruning: Removing a sub-node from the tree is called pruning.

Working of Decision Tree

 The root node feature is selected based on the results from the Attribute Selection
Measure(ASM).
 The ASM is repeated until a leaf node, or a terminal node cannot be split into
sub-nodes.

2|P ag e
What is Attribute Selective Measure(ASM)?

Attribute Subset Selection Measure is a technique used in the data mining process for
data reduction. The data reduction is necessary to make better analysis and prediction
of the target variable.

The two main ASM techniques are

1. Gini index
2. Information Gain(ID3)

Gini index
 The measure of the degree of probability of a particular variable being
wrongly classified when it is randomly chosen is called the Gini index or
Gini impurity. The data is equally distributed based on the Gini index.

3|P ag e
Pi= probability of an object being classified into a particular class.
When you use the Gini index as the criterion for the algorithm to select the
feature for the root node., The feature with the least Gini index is selected.

2 Information Gain(ID3)
 Entropy is the main concept of this algorithm
o which helps determine a feature or attribute that gives maximum
information about a class is called Information gain or ID3
algorithm.
o By using this method, we can reduce the level of entropy from the
root node to the leaf node.

4|P ag e
algorithms used while Training Decision Trees.
ID3, C4.5, CART and Pruning

1. ID3: Ross Quinlan is credited within the development of ID3, which is shorthand
for “Iterative Dichotomiser 3.” This algorithm leverages entropy and information
gain as metrics to evaluate candidate splits. Some of Quinlan’s research on this
algorithm from 1986 can be found here (PDF, 1.4 MB) (link resides outside
of ibm.com).

2. C4.5: This algorithm is considered a later iteration of ID3, which was also
developed by Quinlan. It can use information gain or gain ratios to evaluate split
points within the decision trees.

3. CART: The term, CART, is an abbreviation for “classification and regression

trees” and was introduced by Leo Breiman. This algorithm typically utilizes Gini
impurity to identify the ideal attribute to split on. Gini impurity measures how
often a randomly chosen attribute is misclassified. When evaluating using Gini
impurity, a lower value is more ideal.

Decision Tree - Classification

Decision tree builds classification or regression models in the form of a tree

structure.

It breaks down a dataset into smaller and smaller subsets while at the same time
an associated decision tree is incrementally developed. The final result is a tree
with decision nodes and leaf nodes.

A decision node (e.g., Outlook) has two or more branches (e.g., Sunny, Overcast
and Rainy). Leaf node (e.g., Play) represents a classification or decision. The
topmost decision node in a tree which corresponds to the best predictor called
root node. Decision trees can handle both categorical and numerical data.

5|P ag e
Algorithm

The core algorithm for building decision trees called ID3 by J. R. Quinlan which
employs a top-down, greedy search through the space of possible branches with no
backtracking. ID3 uses Entropy and Information Gain to construct a decision tree. In
ZeroR model there is no predictor, in OneR model we try to find the single best
predictor, naive Bayesian includes all predictors using Bayes' rule and the
independence assumptions between predictors but decision tree includes all predictors
with the dependence assumptions between predictors.
Entropy
Entropy- is the measure of uncertainty in the data. The effort is to reduce the
entropy and maximize the information gain.
o The feature having the most information is considered important by the
algorithm and is used for training the model.
o By using Information gain you are actually using entropy.
A decision tree is built top-down from a root node and involves partitioning the
data into subsets that contain instances with similar values (homogenous).
ID3 algorithm uses entropy to calculate the homogeneity of a sample.

6|P ag e
o NB If the sample is completely homogeneous the entropy is zero and if
the sample is an equally divided it has entropy of one.

To build a decision tree, we need to calculate two types of entropy using

frequency tables as follows:
a) Entropy using the frequency table of one attribute:

7|P ag e
Information Gain
The information gain is based on the decrease in entropy after a dataset is split
on an attribute. Constructing a decision tree is all about finding attribute that

8|P ag e
returns the highest information gain (i.e., the most homogeneous branches).

Step 1: Calculate entropy of the target.

Step 2: The dataset is then split on the different attributes. The entropy for each
branch is calculated. Then it is added proportionally, to get total entropy for the
split. The resulting entropy is subtracted from the entropy before the split. The
result is the Information Gain, or decrease in entropy.

9|P ag e
Step 3: Choose attribute with the largest information gain as the decision node, divide
the dataset by its branches and repeat the same process on every branch.

Step 4a: A branch with entropy of 0 is a leaf node.

10 | P a g e
Step 4b: A branch with entropy more than 0 needs further splitting.

Step 5: The ID3 algorithm is run recursively on the non-leaf branches, until all
data is classified.

Decision Tree to Decision Rules

11 | P a g e
A decision tree can easily be transformed to a set of rules by mapping from the
root node to the leaf nodes one by one.

To understand the concepts discussed above watch the video link

below

Decision Trees ID3 Solved watch video Link ……..

http://youtube.com/watch?v=pRaKQC_DKLM

12 | P a g e

ABCIP Communication Driver
No ratings yet
ABCIP Communication Driver
49 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Session 5b Classification by Decision Tree Induction (1)
No ratings yet
Session 5b Classification by Decision Tree Induction (1)
42 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Decision Tree Algorithm, Explained
No ratings yet
Decision Tree Algorithm, Explained
20 pages
Decision Tree - Associative Rule Mining
No ratings yet
Decision Tree - Associative Rule Mining
69 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Tree
No ratings yet
Tree
7 pages
MLT UNIT-3 notes
No ratings yet
MLT UNIT-3 notes
35 pages
AI - Mod 5. Part 2
No ratings yet
AI - Mod 5. Part 2
40 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision tree
No ratings yet
Decision tree
16 pages
2179-Unit-3
No ratings yet
2179-Unit-3
29 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Day48 Decision Trees
No ratings yet
Day48 Decision Trees
5 pages
Classification
No ratings yet
Classification
148 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
3. Tree Models
No ratings yet
3. Tree Models
42 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Data Mining
No ratings yet
Data Mining
68 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101735 2024-07-29 Reference-Material-I
48 pages
Lecture 7.1 - Decision Tree Classification
No ratings yet
Lecture 7.1 - Decision Tree Classification
15 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
2 - Decision Tree
No ratings yet
2 - Decision Tree
23 pages
Decision Trees
No ratings yet
Decision Trees
19 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Unit 4a Decision Tree
No ratings yet
Unit 4a Decision Tree
90 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Decision Trees - 2022
No ratings yet
Decision Trees - 2022
49 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
NOTES
No ratings yet
NOTES
18 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
ML Unit-III
No ratings yet
ML Unit-III
30 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
From Everand
Machine Learning - A Comprehensive, Step-by-Step Guide to Learning and Applying Advanced Concepts and Techniques in Machine Learning: 3
Peter Bradley
No ratings yet
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
No ratings yet
Lesson1 Introduction To The Data Science Process and The Value of Learning Data Science
6 pages
Lesson 5 Data Wrangling in Data Science.
100% (1)
Lesson 5 Data Wrangling in Data Science.
11 pages
LESSON 6 The Real Need For Mining - Consensus Byzantine General's Problems Consensus As Distributed Building
No ratings yet
LESSON 6 The Real Need For Mining - Consensus Byzantine General's Problems Consensus As Distributed Building
7 pages
Lesson 3 Introduction To Bitcoin Block Chain
No ratings yet
Lesson 3 Introduction To Bitcoin Block Chain
13 pages
Resume - Vince Capelli
No ratings yet
Resume - Vince Capelli
3 pages
The Essentials of Mobile App Attribution
No ratings yet
The Essentials of Mobile App Attribution
20 pages
Loading, Sequencing, Routing, Scheduling
No ratings yet
Loading, Sequencing, Routing, Scheduling
7 pages
Complete Download of Solution Manual for Introduction to Algorithms, third edition By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein Full Chapters in PDF DOCX
100% (12)
Complete Download of Solution Manual for Introduction to Algorithms, third edition By Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest and Clifford Stein Full Chapters in PDF DOCX
44 pages
Dynamic Modeling and Control of The Wind
No ratings yet
Dynamic Modeling and Control of The Wind
10 pages
EGYPTIAN GOLD - Setup and Configuration v1.4 - en
No ratings yet
EGYPTIAN GOLD - Setup and Configuration v1.4 - en
47 pages
VanguardGuardian+CMS-E Manager User's Manual
No ratings yet
VanguardGuardian+CMS-E Manager User's Manual
49 pages
Web Development Lesson 8..
No ratings yet
Web Development Lesson 8..
10 pages
BT3277 Project Report
No ratings yet
BT3277 Project Report
19 pages
Human Resource Management 512_eb918fdafbb581921e0cfa5a5fab0944
No ratings yet
Human Resource Management 512_eb918fdafbb581921e0cfa5a5fab0944
97 pages
PISA Based Questions for Dry Run Grade 11
No ratings yet
PISA Based Questions for Dry Run Grade 11
17 pages
IOT Exp-3
No ratings yet
IOT Exp-3
5 pages
ProtoNode Startup Guide For Eaton Cooper
No ratings yet
ProtoNode Startup Guide For Eaton Cooper
60 pages
Copy of
No ratings yet
Copy of
1 page
h2 Mother Tongue Language and Literature
No ratings yet
h2 Mother Tongue Language and Literature
27 pages
A1286728257 21703 3 2023 K22bvphy110
No ratings yet
A1286728257 21703 3 2023 K22bvphy110
2 pages
Law - Insider - Com21 Inc - Supply Agreement Dated Effective April 2 2002 - Filed - 13 08 2002 - Contract
No ratings yet
Law - Insider - Com21 Inc - Supply Agreement Dated Effective April 2 2002 - Filed - 13 08 2002 - Contract
28 pages
Navigation ECC 6
No ratings yet
Navigation ECC 6
12 pages
Digital Communications Etiquette
No ratings yet
Digital Communications Etiquette
27 pages
Change Management Best Practices
100% (4)
Change Management Best Practices
3 pages
Azure - Windows Server - Host Scan - 2024 Jan q5trvk
No ratings yet
Azure - Windows Server - Host Scan - 2024 Jan q5trvk
410 pages
BSIT
No ratings yet
BSIT
6 pages
GIS and Map
No ratings yet
GIS and Map
10 pages
Hw5 Solution
No ratings yet
Hw5 Solution
4 pages
Himachal Pradesh State Cooperative Bank (HPSCB) Recruitment 154 Various Vacancy - Last Date 21-10-2013
No ratings yet
Himachal Pradesh State Cooperative Bank (HPSCB) Recruitment 154 Various Vacancy - Last Date 21-10-2013
9 pages
Shivaanandalahri
No ratings yet
Shivaanandalahri
16 pages
Basics of Equipment Qualification
No ratings yet
Basics of Equipment Qualification
3 pages
Final Examination: SUBJECT: C/C++ Programming in UNIX
No ratings yet
Final Examination: SUBJECT: C/C++ Programming in UNIX
5 pages
Quiz#2 Review - Promgt3
No ratings yet
Quiz#2 Review - Promgt3
144 pages