Lecture Notes 3

A decision tree is a supervised machine learning algorithm that uses a flowchart-like tree structure to solve both classification and regression problems. It builds the tree structure by recursively splitting the sample data into purer child nodes based on a splitting criterion, such as entropy, until a stopping condition is reached. The internal nodes represent attributes, branches denote the attribute values or rules, and leaf nodes hold the predictive outcomes. It is a versatile and interpretable algorithm suitable for problems with discrete target variables and attribute values.

Uploaded by

vivek gupta

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

16 views

Lecture Notes 3

Uploaded by

vivek gupta

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 11

Decision Tree

A decision tree is one of the most powerful tools of supervised learning algorithms used for
both classification and regression tasks. It builds a flowchart-like tree structure where each
internal node denotes a test on an attribute, each branch represents an outcome of the test,
and each leaf node (terminal node) holds a class label. It is constructed by recursively
splitting the training data into subsets based on the values of the attributes until a stopping
criterion is met, such as the maximum depth of the tree or the minimum number of samples
required to split a node.
During training, the Decision Tree algorithm selects the best attribute to split the data based
on a metric such as entropy or Gini impurity, which measures the level of impurity or
randomness in the subsets. The goal is to find the attribute that maximizes the information
gain or the reduction in impurity after the split.

What is a Decision Tree?

A decision tree is a flowchart-like tree structure where each internal node denotes the
feature, branches denote the rules and the leaf nodes denote the result of the algorithm. It is
a versatile supervised machine-learning algorithm, which is used for both classification and
regression problems. It is one of the very powerful algorithms. And it is also used in
Random Forest to train on different subsets of training data, which makes random forest
one of the most powerful algorithms in machine learning.

Decision Tree Terminologies

Some of the common Terminologies used in Decision Trees are as follows:

 Root Node: It is the topmost node in the tree, which represents the complete
dataset. It is the starting point of the decision-making process.
 Decision/Internal Node: A node that symbolizes a choice regarding an input
feature. Branching off of internal nodes connects them to leaf nodes or other
internal nodes.
 Leaf/Terminal Node: A node without any child nodes that indicates a class
label or a numerical value.
 Splitting: The process of splitting a node into two or more sub-nodes using a
split criterion and a selected feature.
 Branch/Sub-Tree: A subsection of the decision tree starts at an internal node
and ends at the leaf nodes.
 Parent Node: The node that divides into one or more child nodes.
 Child Node: The nodes that emerge when a parent node is split.
 Impurity: A measurement of the target variable’s homogeneity in a subset of
data. It refers to the degree of randomness or uncertainty in a set of examples.
The Gini index and entropy are two commonly used impurity measurements in
decision trees for classifications task.
 Variance: Variance measures how much the predicted and the target variables
vary in different samples of a dataset. It is used for regression problems in
decision trees. Mean squared error, Mean Absolute Error, Friedmans, or Half
Poisson deviance are used to measure the variance for the regression tasks in the
decision tree.
 Information Gain: Information gain is a measure of the reduction in impurity
achieved by splitting a dataset on a particular feature in a decision tree. The
splitting criterion is determined by the feature that offers the greatest
information gain, It is used to determine the most informative feature to split on
at each node of the tree, with the goal of creating pure subsets
 Pruning: The process of removing branches from the tree that do not provide
any additional information or lead to overfitting.
Decision Tree

Attribute Selection Measures:

Construction of Decision Tree: A tree can be “learned” by splitting the source set into
subsets based on Attribute Selection Measures. Attribute selection measure (ASM) is a
criterion used in decision tree algorithms to evaluate the usefulness of different attributes
for splitting a dataset. The goal of ASM is to identify the attribute that will create the most
homogeneous subsets of data after the split, thereby maximizing the information gain. This
process is repeated on each derived subset in a recursive manner called recursive
partitioning. The recursion is completed when the subset at a node all has the same value of
the target variable, or when splitting no longer adds value to the predictions. The
construction of a decision tree classifier does not require any domain knowledge or
parameter setting and therefore is appropriate for exploratory knowledge discovery.
Decision trees can handle high-dimensional data.
How does the Decision Tree algorithm Work?

The decision tree operates by analyzing the data set to predict its classification. It
commences from the tree’s root node, where the algorithm views the value of the root
attribute compared to the attribute of the record in the actual data set. Based on the
comparison, it proceeds to follow the branch and move to the next node.
The algorithm repeats this action for every subsequent node by comparing its attribute
values with those of the sub-nodes and continuing the process further. It repeats until it
reaches the leaf node of the tree. The complete mechanism can be better explained through
the algorithm given below.
 Step-1: Begin the tree with the root node, says S, which contains the complete dataset.
 Step-2: Find the best attribute in the dataset using Attribute Selection Measure (ASM).
 Step-3: Divide the S into subsets that contains possible values for the best attributes.
 Step-4: Generate the decision tree node, which contains the best attribute.
 Step-5: Recursively make new decision trees using the subsets of the dataset created in step
-3. Continue this process until a stage is reached where you cannot further classify the
nodes and called the final node as a leaf node Classification and Regression Tree algorithm.

Advantages of the Decision Tree:

1. It is simple to understand as it follows the same process which a human follow while
making any decision in real-life.
2. It can be very useful for solving decision-related problems.
3. It helps to think about all the possible outcomes for a problem.
4. There is less requirement of data cleaning compared to other algorithms.
Disadvantages of the Decision Tree:
1. The decision tree contains lots of layers, which makes it complex.
2. It may have an overfitting issue, which can be resolved using the Random Forest
algorithm.
3. For more class labels, the computational complexity of the decision tree may increase.
What are appropriate problems for Decision tree learning?

Although a variety of decision tree learning methods have been developed with somewhat
differing capabilities and requirements, decision tree learning is generally best suited to
problems with the following characteristics:
1. Instances are represented by attribute-value pairs:
In the world of decision tree learning, we commonly use attribute-value pairs to represent
instances. An instance is defined by a predetermined group of attributes, such as
temperature, and its corresponding value, such as hot. Ideally, we want each attribute to
have a finite set of distinct values, like hot, mild, or cold. This makes it easy to construct
decision trees. However, more advanced versions of the algorithm can accommodate
attributes with continuous numerical values, such as representing temperature with a
numerical scale.
2. The target function has discrete output values:
The marked objective has distinct outcomes. The decision tree method is ordinarily
employed for categorizing Boolean examples, such as yes or no. Decision tree approaches
can be readily expanded for acquiring functions with beyond dual conceivable outcome
values. A more substantial expansion lets us gain knowledge about aimed objectives with
numeric outputs, although the practice of decision trees in this framework is comparatively
rare.
3. Disjunctive descriptions may be required:
Decision trees naturally represent disjunctive expressions.
4.The training data may contain errors:
“Techniques of decision tree learning demonstrate high resilience towards discrepancies,
including inconsistencies in categorization of sample cases and discrepancies in the feature
details that characterize these cases.”
5. The training data may contain missing attribute values:
In certain cases, the input information designed for training might have absent
characteristics. Employing decision tree approaches can still be possible despite
experiencing unknown features in some training samples. For instance, when considering
the level of humidity throughout the day, this information may only be accessible for a
specific set of training specimens.
Practical issues in learning decision trees include:
 Determining how deeply to grow the decision tree,
 Handling continuous attributes,
 Choosing an appropriate attribute selection measure,
 Handling training data with missing attribute values,
 Handling attributes with differing costs, and
 Improving computational efficiency.

To build the Decision Tree, CART (Classification and Regression Tree) algorithm is used.
It works by selecting the best split at each node based on metrics like Gini impurity or
information Gain. In order to create a decision tree. Here are the basic steps of the CART
algorithm:
1. The root node of the tree is supposed to be the complete training dataset.
2. Determine the impurity of the data based on each feature present in the dataset. Impurity
can be measured using metrics like the Gini index or entropy for classification and Mean
squared error, Mean Absolute Error, friedman_mse, or Half Poisson deviance for
regression.
3. Then selects the feature that results in the highest information gain or impurity reduction
when splitting the data.
4. For each possible value of the selected feature, split the dataset into two subsets (left and
right), one where the feature takes on that value, and another where it does not. The split
should be designed to create subsets that are as pure as possible with respect to the target
variable.
5. Based on the target variable, determine the impurity of each resulting subset.
6. For each subset, repeat steps 2–5 iteratively until a stopping condition is met. For example,
the stopping condition could be a maximum tree depth, a minimum number of samples
required to make a split or a minimum impurity threshold.
7. Assign the majority class label for classification tasks or the mean value for regression
tasks for each terminal node (leaf node) in the tree.
Strengths and Weaknesses of the Decision Tree Approach

The strengths of decision tree methods are:

 Decision trees are able to generate understandable rules.

 Decision trees perform classification without requiring much computation.
 Decision trees are able to handle both continuous and categorical variables.
 Decision trees provide a clear indication of which fields are most important for
prediction or classification.
 Ease of use: Decision trees are simple to use and don’t require a lot of technical
expertise, making them accessible to a wide range of users.
 Scalability: Decision trees can handle large datasets and can be easily parallelized to
improve processing time.
 Missing value tolerance: Decision trees are able to handle missing values in the data,
making them a suitable choice for datasets with missing or incomplete data.
 Handling non-linear relationships: Decision trees can handle non-linear relationships
between variables, making them a suitable choice for complex datasets.
 Ability to handle imbalanced data: Decision trees can handle imbalanced datasets,
where one class is heavily represented compared to the others, by weighting the
importance of individual nodes based on the class distribution.
The weaknesses of decision tree methods:

 Decision trees are less appropriate for estimation tasks where the goal is to predict the
value of a continuous attribute.
 Decision trees are prone to errors in classification problems with many classes and a
relatively small number of training examples.
 Decision trees can be computationally expensive to train. The process of growing a
decision tree is computationally expensive. At each node, each candidate splitting field
must be sorted before its best split can be found. In some algorithms, combinations of
fields are used, and a search must be made for optimal combining weights. Pruning
algorithms can also be expensive since many candidates’ sub-trees must be formed and
compared.
 Decision trees are prone to overfitting the training data, particularly when the tree is
very deep or complex. This can result in poor performance on new, unseen data.
 Small variations in the training data can result in different decision trees being
generated, which can be a problem when trying to compare or reproduce results.
 Many decision tree algorithms do not handle missing data well and require imputation
or deletion of records with missing values.
 The initial splitting criteria used in decision tree algorithms can lead to biased trees,
particularly when dealing with unbalanced datasets or rare classes.
 Decision trees are limited in their ability to represent complex relationships between
variables, particularly when dealing with nonlinear or interactive effects.
 Decision trees can be sensitive to the scaling of input features, particularly when using
distance-based metrics or decision rules that rely on comparisons between values.
Example of Decision Tree

Let’s understand decision trees with the help of an example:

Decision trees are upside down which means the root is at the top and then this
root is split into various several nodes. Decision trees are nothing but a bunch of if-
else statements in layman terms. It checks if the condition is true and if it is then it
goes to the next node attached to that decision.

In the below diagram the tree will first ask what is the weather? Is it sunny, cloudy,
or rainy? If yes then it will go to the next feature which is humidity and wind. It
will again check if there is a strong wind or weak, if it’s a weak wind and it’s rainy
then the person may go and play.
Did you notice anything in the above flowchart? We see that if the weather is
cloudy then we must go to play. Why didn’t it split more? Why did it stop there?

To answer this question, we need to know about few more concepts like entropy,
information gain, and Gini index. But in simple terms, I can say here that the
output for the training dataset is always yes for cloudy weather, since there is no
disorderliness here we don’t need to split the node further.

The goal of machine learning is to decrease uncertainty or disorders from the

dataset and for this, we use decision trees.

Now you must be thinking how do I know what should be the root node? what
should be the decision node? when should I stop splitting? To decide this, there is a
metric called “Entropy” which is the amount of uncertainty in the dataset.

Your Electronic Ticket Receipt
No ratings yet
Your Electronic Ticket Receipt
2 pages
Airlines Reservation System Analysis and Design
40% (5)
Airlines Reservation System Analysis and Design
30 pages
PHE Steam
No ratings yet
PHE Steam
79 pages
Late 19th Century Industrial Era Work Welfare++21
No ratings yet
Late 19th Century Industrial Era Work Welfare++21
5 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
DECSION TREE
No ratings yet
DECSION TREE
6 pages
Decision Tree (Autosaved)
No ratings yet
Decision Tree (Autosaved)
14 pages
Unit-3 Decision Tree Learning (Februray 26, 2024)
No ratings yet
Unit-3 Decision Tree Learning (Februray 26, 2024)
51 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Lab 2
No ratings yet
Lab 2
3 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
Decision Tree Algorithm in Machine Learning
No ratings yet
Decision Tree Algorithm in Machine Learning
17 pages
Experiment 8_decisionTree
No ratings yet
Experiment 8_decisionTree
2 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
FMLanswerkey-IT 2.docx (1) (1) (1)
No ratings yet
FMLanswerkey-IT 2.docx (1) (1) (1)
11 pages
DS Unit - 4
No ratings yet
DS Unit - 4
76 pages
Machine Learning chapter 4
No ratings yet
Machine Learning chapter 4
9 pages
Decision Trees and How To Build and Optimize Decision Tree Classifier
No ratings yet
Decision Trees and How To Build and Optimize Decision Tree Classifier
16 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Trinh Khanh Ly 20213676
No ratings yet
Trinh Khanh Ly 20213676
13 pages
ML Exp6
No ratings yet
ML Exp6
3 pages
Decision Tree
100% (1)
Decision Tree
57 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Tree
No ratings yet
Tree
7 pages
Decision Tree R
No ratings yet
Decision Tree R
5 pages
Unit-4 Data Mining
No ratings yet
Unit-4 Data Mining
19 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
5 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
4 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
Prac 6
No ratings yet
Prac 6
6 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
CSL0777 L25
No ratings yet
CSL0777 L25
39 pages
Unit 4
No ratings yet
Unit 4
33 pages
Assignment 04
No ratings yet
Assignment 04
17 pages
Mod4 Eda
No ratings yet
Mod4 Eda
13 pages
decision tree
No ratings yet
decision tree
13 pages
Unit-3 Introduction To Machine Learning Algorithms
No ratings yet
Unit-3 Introduction To Machine Learning Algorithms
18 pages
ML Unit-III
No ratings yet
ML Unit-III
30 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Deciosn_tree_(1)
No ratings yet
Deciosn_tree_(1)
5 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Unit 3 Classification - Dr. Vidyut D
No ratings yet
Unit 3 Classification - Dr. Vidyut D
72 pages
Decisiontree
No ratings yet
Decisiontree
6 pages
Recommendation Systems
No ratings yet
Recommendation Systems
27 pages
ML UNIT4
No ratings yet
ML UNIT4
10 pages
Unit 3
No ratings yet
Unit 3
31 pages
Data Minin1
No ratings yet
Data Minin1
104 pages
Practical 5
No ratings yet
Practical 5
3 pages
2179-Unit-3
No ratings yet
2179-Unit-3
29 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Decision Trees Report
No ratings yet
Decision Trees Report
3 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Decision Trees
No ratings yet
Decision Trees
3 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
MST 2 Paper Format
No ratings yet
MST 2 Paper Format
1 page
Lecture 4
No ratings yet
Lecture 4
7 pages
Lecture Notes 2
No ratings yet
Lecture Notes 2
5 pages
R Programming Lab Manual
0% (1)
R Programming Lab Manual
16 pages
CF CS3 CN 602
No ratings yet
CF CS3 CN 602
197 pages
Add Math Sba
No ratings yet
Add Math Sba
12 pages
Yaris (1may2022)
No ratings yet
Yaris (1may2022)
2 pages
Summary Sheet 2nd Conchem
No ratings yet
Summary Sheet 2nd Conchem
2 pages
What Is Cardiac Axis ECG Interpretation Geeky Medics
No ratings yet
What Is Cardiac Axis ECG Interpretation Geeky Medics
1 page
Excel e Rice
No ratings yet
Excel e Rice
5 pages
Hrf628af6 PDF
No ratings yet
Hrf628af6 PDF
34 pages
Comprehensive Systematic Review On Virtual Reality For Cultural Heritage Practices Coherent Taxonomy and Motivations
No ratings yet
Comprehensive Systematic Review On Virtual Reality For Cultural Heritage Practices Coherent Taxonomy and Motivations
17 pages
FS5 Field Study Learning Episode: The K To 12 Grading System
No ratings yet
FS5 Field Study Learning Episode: The K To 12 Grading System
9 pages
Work Permit Receiver
No ratings yet
Work Permit Receiver
45 pages
Tutorial 7 Matrix Algebra For Homogeneous Linear Algebraic System
No ratings yet
Tutorial 7 Matrix Algebra For Homogeneous Linear Algebraic System
3 pages
Tecumseh Pull Starter Identification
No ratings yet
Tecumseh Pull Starter Identification
1 page
Airbus A350 Firm Orders: Commandes Fermes - FESTBESTELLUNGEN - Pedidos en Firme
No ratings yet
Airbus A350 Firm Orders: Commandes Fermes - FESTBESTELLUNGEN - Pedidos en Firme
2 pages
As En6 Q2 W3 D3 PDF
No ratings yet
As En6 Q2 W3 D3 PDF
6 pages
E 3102 AYTB Model - PDF 2
No ratings yet
E 3102 AYTB Model - PDF 2
1 page
Sell ClickBank Products Using Ebay Classified Ads. (PDFDrive)
No ratings yet
Sell ClickBank Products Using Ebay Classified Ads. (PDFDrive)
112 pages
General Instruction Manual Hamilton
No ratings yet
General Instruction Manual Hamilton
100 pages
Special Situations: Intradialytic Hypertension/chronic Hypertension and Intradialytic Hypotension
No ratings yet
Special Situations: Intradialytic Hypertension/chronic Hypertension and Intradialytic Hypotension
8 pages
1 Handbook 2018-2019
No ratings yet
1 Handbook 2018-2019
179 pages
3.5 Cross Price Elasticity
0% (1)
3.5 Cross Price Elasticity
21 pages
ASI Library For The Arduino - ByVac
No ratings yet
ASI Library For The Arduino - ByVac
3 pages
Cryptographic Algorithm Validation Program
No ratings yet
Cryptographic Algorithm Validation Program
21 pages
Article THE CPA GOES CSI
No ratings yet
Article THE CPA GOES CSI
5 pages
STM
No ratings yet
STM
5 pages
Automatic Transmission / Trans: - Automatic Transaxle Assy (U250E)
No ratings yet
Automatic Transmission / Trans: - Automatic Transaxle Assy (U250E)
2 pages
Gingival Enlargement
No ratings yet
Gingival Enlargement
141 pages
Include
No ratings yet
Include
3 pages