0% found this document useful (0 votes)

7 views

Decision Tree Algorithm

DATA SCIENCE

Uploaded by

Buvanesh Nallaperumal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

Decision Tree Algorithm

DATA SCIENCE

Uploaded by

Buvanesh Nallaperumal

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Decision Trees

Linear Regression Algorithms Agenda

1. Introduction to Decision Trees

2. Anatomy of Decision Trees

3. Working Principle

4. Example of a Decision Tree

5. Advantages, Disadvantages & Practices

6. Hyperparameters & Tuning

7. Overfitting & its solutions

8. Extensions of Decision Trees

9. Conclusion
What is a Decision Tree
 A supervised machine learning algorithm used for both classification and
regression tasks.
Introduction to Decision Trees

 It mimics human decision-making by splitting data into branches at decision nodes

based on feature values.

Example of a Tree Decision Why use Decision Trees

 Easy to understand and interpret.

 Can handle both numerical & categorical data.
 Requires little data preprocessing.

Characteristics of DT
Non Linear
Non Parametric (No assump. / learning parametrs)
Low bias and high variance model
Prone to overfitting
Acts as a basic feature selection algorithm
Anatomy of a Decision Tree

Root Node
 The starting point that represents the entire dataset.
 Splits into two or more branches.

Decision Nodes
 Intermediate nodes where data is split based on feature values.
 Represents the feature or question that splits the data.
Leaf Nodes
 End nodes that represent the final decision or output.
 No further splitting occurs.
Splitting Criteria:
 Decision Trees split data based on criteria like Gini Index, Information Gain, or
Mean Squared Error (for regression).
Gini Impurity:
 Measures the likelihood of an incorrect classification of a random instance.
Working Principle

 Lower Gini Impurity values are preferred for splits.

Information Gain:
 Measures the reduction in entropy after a dataset is split on an attribute.
 Higher Information Gain values are preferred.
Mean Squared Error (MSE):
 Used in regression trees to minimize the error between predicted and actual
values.
Recursion:
 The tree grows recursively by splitting data at each node until a stopping criterion is
met (e.g., maximum depth, minimum samples per leaf).
Pruning:
 Post-processing step to remove unnecessary branches from the tree to prevent
overfitting.
Advantages, Disad., & Practices Advantages
 Easy to interpret and visualize.
 Can handle both categorical and numerical data.
 Requires little data preprocessing (e.g., no need for feature scaling).

Disadvantages
 Prone to overfitting, especially with deep trees.
 Can be biased towards dominant classes in an imbalanced dataset.
 Small changes in data can lead to large changes in the tree structure.

Practical Use Cases: Libraries & Tools:

 Credit Scoring  Scikit-Learn
 Medical Diagnosis  XGBoost (for boosted decision trees)
 Customer Churn Prediction  Spark MLlib
 Recommendation Systems
Hyperparameters

Hyperparameters are settings / configurations that control the behavior of a machine learning model
and the learning process.
Hyperparameters & Tuning

They influence how the model learns and generalizes to new data.

1. Criterion: The function to measure the quality of a split (e.g., Gini, Entropy, MSE, MAE).

2. max_depth: Limits the depth of the tree.

3. min_samples_split: Minimum number of samples required to split a node.

4. min_samples_leaf: Minimum number of samples required to be at a leaf node.

5. max_features: The number of features to consider when looking for the best split.

6. max_leaf_nodes: The maximum number of leaf nodes in the tree.

7. Splitter: The strategy used to choose the split at each node.

Tuning Techniques
 Grid Search

 Random Search

 Cross-validation
Overfitting :
 Decision Trees are prone to overfitting, especially when the tree is too deep. They may
capture noise in the training data, leading to poor generalization to new data.
Overfitting and solutions

 Techniques like pruning (reducing the size of the tree) or setting a maximum depth
limit are used to mitigate overfitting.

Pruning Methods:

 Pre-pruning: Limit tree growth by `setting conditions like maximum depth or minimum
samples per leaf.

 Post-pruning: Reduce the size of the tree after it has been fully grown by removing
branches that have little importance.

Cross-Validation:

 Helps in selecting the right model complexity by splitting the dataset into training and
validation sets.
Extensions of Decision Trees Random Forest (Bagging):

 An ensemble method that builds multiple decision trees and merges them together for
a more accurate and stable prediction.

 Builds multiple independent trees and averages their predictions, focusing on reducing
variance.

Gradient Boosting (Boosting):

 Sequentially builds trees, where each

` new tree corrects the errors of the previous
ones.

 Focusing on reducing bias by correcting the errors of previous trees.

XGBoost (Boosting):

 An efficient implementation of Gradient Boosting for large-scale datasets.

 optimized and regularized version of Gradient Boosting that includes enhancements

like regularization, parallel processing, and more efficient tree pruning.
Characteristics:

 An bagging type of ensembling algorithm

 Constructs a Forest of Multiple Decision Trees

 Randomness (Random Subset of features, Bootstrap sampling)

Random Forest

 Stability – Low variance, Handles Collinearity

 Scalability – parallel processing on High Dim. data

 Robustness (Resistance to overfitting) `

 Interpretablity – Feature Importance

Summary of Key Points:

 Decision Trees are a versatile and powerful tool for both classification and regression.

 Understanding the tree's structure and tuning it properly is key to preventing overfitting
and improving model performance.

Next Steps:
Conclusion

 Explore Decision Trees in-depth with hands-on exercises.

`
 Learn about ensemble methods like Random Forest and Gradient Boosting.
Books :

a) Statistics & EDA Book : Think Stats

https://www.amazon.in/Think-Stats-Allen-B-Downey/dp/9351108562/ref=sr_1_1?
keywords=think+stats&qid=1647758541&s=books&sprefix=Think+stat%2Cstripbooks%2C196&sr=1-1

b) Feature Engineering : Feature Engineering https://www.amazon.in/Feature-Engineering-Machine-

Learning-Principles/dp/9352137116/ref=sr_1_4?
crid=OENMJAZHTYG0&keywords=feature+engineering&qid=1647758574&s=books&sprefix=feature+
Backup Slide

engineering+%2Cstripbooks%2C189&sr=1-4

c) Machine Learning Best Textbook:

- ISLR : https://www.amazon.in/Introduction-Statistical-Learning-Applications-Statistics/dp/
1461471370/ref=sr_1_3?
`
keywords=element+of+statistical+learning&qid=1647758362&s=books&sprefix=element+of+stat
%2Cstripbooks%2C207&sr=1-3
- ML with Scikit learn https://www.amazon.in/Machine-Learning-Python-Cookbook-Preprocessing/dp/
9352137302/ref=sr_1_7?
keywords=oreilly+machine+learning&qid=1647758445&s=books&sprefix=O+reilly+Machin
%2Cstripbooks%2C186&sr=1-7
- https://www.amazon.in/Introducing-Data-Science-Machine-Learning/dp/9351199371/ref=sr_1_13?
crid=25KZ60O8L2WAI&keywords=machine+learning+books&qid=1647758496&s=books&sprefix=mac
hine+learning+books%2Cstripbooks%2C210&sr=1-13

Course :
https://www.coursera.org/specializations/machine-learning#courses
Backup Slide Tree Splitting Architecture

`
Backup Slide Tree Boundary Building

Unit 4
No ratings yet
Unit 4
33 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Introduction to Decision Trees
No ratings yet
Introduction to Decision Trees
10 pages
12500221027
No ratings yet
12500221027
12 pages
HSMC
No ratings yet
HSMC
5 pages
Unit 4-2
No ratings yet
Unit 4-2
20 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Team 5
No ratings yet
Team 5
12 pages
Assignment Decision Tree
No ratings yet
Assignment Decision Tree
15 pages
entropy and information gain for decision tree algorithm
No ratings yet
entropy and information gain for decision tree algorithm
12 pages
Decision Trees Cheat Sheet PDF
No ratings yet
Decision Trees Cheat Sheet PDF
2 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Assignment of Decision Tree in Machine Learning
No ratings yet
Assignment of Decision Tree in Machine Learning
15 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
An Introduction TO Decision Trees
No ratings yet
An Introduction TO Decision Trees
30 pages
ml unit3
No ratings yet
ml unit3
8 pages
Decision Tree and Random Forest
No ratings yet
Decision Tree and Random Forest
41 pages
Decision Tree
No ratings yet
Decision Tree
45 pages
Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
Unit-5 Decision Trees & Ensembles Methods
No ratings yet
Unit-5 Decision Trees & Ensembles Methods
11 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Machine learning lecture 2,3,4
No ratings yet
Machine learning lecture 2,3,4
26 pages
Decision Trees Set-1
No ratings yet
Decision Trees Set-1
7 pages
Machine learning note 2
No ratings yet
Machine learning note 2
2 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
Tree Based Learning Methods
No ratings yet
Tree Based Learning Methods
28 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
EST Cheatsheet
No ratings yet
EST Cheatsheet
5 pages
Dl
No ratings yet
Dl
10 pages
2023AIB1008_Lab08
No ratings yet
2023AIB1008_Lab08
8 pages
Decision Trees - Neha Chowdhary PPT
No ratings yet
Decision Trees - Neha Chowdhary PPT
20 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
16 pages
Learning Decision Trees
No ratings yet
Learning Decision Trees
13 pages
MLS+1+-+Decision+Trees+and+Random+Forests
No ratings yet
MLS+1+-+Decision+Trees+and+Random+Forests
16 pages
kiran ppt
No ratings yet
kiran ppt
12 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Supervised Learning Algorithm DT
No ratings yet
Supervised Learning Algorithm DT
15 pages
Prac 6
No ratings yet
Prac 6
6 pages
Decision Trees Report
No ratings yet
Decision Trees Report
3 pages
Decision Tree Induction Algorithm
No ratings yet
Decision Tree Induction Algorithm
6 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
Decision Trees
No ratings yet
Decision Trees
24 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
15 pages
Decision Tree in ML
No ratings yet
Decision Tree in ML
21 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
Random Forest Algorithm
No ratings yet
Random Forest Algorithm
39 pages
Unit 3
No ratings yet
Unit 3
31 pages
Decision Tree
100% (1)
Decision Tree
57 pages
ESGB_2025_classification and regression tress [Enregistré automatiquement]
No ratings yet
ESGB_2025_classification and regression tress [Enregistré automatiquement]
43 pages
Decision Tree
No ratings yet
Decision Tree
16 pages
Decision Trees
No ratings yet
Decision Trees
21 pages
Decision Tree Pruning: Fundamentals and Applications
From Everand
Decision Tree Pruning: Fundamentals and Applications
Fouad Sabry
No ratings yet
Alternating Decision Tree: Fundamentals and Applications
From Everand
Alternating Decision Tree: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Analytics Syllabus
No ratings yet
Data Analytics Syllabus
2 pages
Statistics N Probability
No ratings yet
Statistics N Probability
31 pages
Logistic Regression Algorithm
No ratings yet
Logistic Regression Algorithm
8 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Pipelines, Functions, Oops
No ratings yet
Pipelines, Functions, Oops
14 pages
Machine Learning Water Quality
No ratings yet
Machine Learning Water Quality
20 pages
Interview Questions For DS & DA (ML)
100% (1)
Interview Questions For DS & DA (ML)
66 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
4.3 BSMM-8710 - Introduction To Data Analytics (2023S) - Lecture 7 - Classification Models - v1.0
No ratings yet
4.3 BSMM-8710 - Introduction To Data Analytics (2023S) - Lecture 7 - Classification Models - v1.0
50 pages
ML_LAB Record_final
No ratings yet
ML_LAB Record_final
39 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Data Pruning
No ratings yet
Data Pruning
52 pages
Unit 5 Classification PDF
No ratings yet
Unit 5 Classification PDF
131 pages
Assignment 3 - LP1
No ratings yet
Assignment 3 - LP1
13 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
43 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Lecture 12
No ratings yet
Lecture 12
19 pages
AD3461_ML Lab Manual
No ratings yet
AD3461_ML Lab Manual
54 pages
Lab-Practice-I(ML)-Lab Manual-Vaishali
No ratings yet
Lab-Practice-I(ML)-Lab Manual-Vaishali
57 pages
Random Forest Regression
No ratings yet
Random Forest Regression
57 pages
Crop Recommendation System KEC Conference
No ratings yet
Crop Recommendation System KEC Conference
16 pages
ML Document-1 - Merged
No ratings yet
ML Document-1 - Merged
19 pages
DSTBD_10-DMClassification-ENG
No ratings yet
DSTBD_10-DMClassification-ENG
160 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
06 - Decision Trees
100% (1)
06 - Decision Trees
83 pages
Rafsanzani Pane
No ratings yet
Rafsanzani Pane
11 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
BookSlides 4B Information Based Learning Edited
No ratings yet
BookSlides 4B Information Based Learning Edited
64 pages
Interview - Preparation-Machine Learning Questions & Answers
No ratings yet
Interview - Preparation-Machine Learning Questions & Answers
37 pages
MACHINE LEARNING Unit-1
No ratings yet
MACHINE LEARNING Unit-1
23 pages
Random Forests
No ratings yet
Random Forests
22 pages
ML-chap-3
No ratings yet
ML-chap-3
52 pages
Decision Tree Tutorial by Kardi Teknomo
No ratings yet
Decision Tree Tutorial by Kardi Teknomo
19 pages
Malware Detection Using Machine Learning
No ratings yet
Malware Detection Using Machine Learning
5 pages
Decision Trees 4
No ratings yet
Decision Trees 4
56 pages