Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Decision Tree Algorithm

DATA SCIENCE
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Decision Tree Algorithm

DATA SCIENCE
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 14

Decision Trees

Linear Regression Algorithms Agenda

1. Introduction to Decision Trees

2. Anatomy of Decision Trees

3. Working Principle

4. Example of a Decision Tree

5. Advantages, Disadvantages & Practices

6. Hyperparameters & Tuning

7. Overfitting & its solutions

8. Extensions of Decision Trees

9. Conclusion
What is a Decision Tree
 A supervised machine learning algorithm used for both classification and
regression tasks.
Introduction to Decision Trees

 It mimics human decision-making by splitting data into branches at decision nodes


based on feature values.

Example of a Tree Decision Why use Decision Trees

 Easy to understand and interpret.


 Can handle both numerical & categorical data.
 Requires little data preprocessing.

Characteristics of DT
Non Linear
Non Parametric (No assump. / learning parametrs)
Low bias and high variance model
Prone to overfitting
Acts as a basic feature selection algorithm
Anatomy of a Decision Tree

Root Node
 The starting point that represents the entire dataset.
 Splits into two or more branches.

Decision Nodes
 Intermediate nodes where data is split based on feature values.
 Represents the feature or question that splits the data.
Leaf Nodes
 End nodes that represent the final decision or output.
 No further splitting occurs.
Splitting Criteria:
 Decision Trees split data based on criteria like Gini Index, Information Gain, or
Mean Squared Error (for regression).
Gini Impurity:
 Measures the likelihood of an incorrect classification of a random instance.
Working Principle

 Lower Gini Impurity values are preferred for splits.


Information Gain:
 Measures the reduction in entropy after a dataset is split on an attribute.
 Higher Information Gain values are preferred.
Mean Squared Error (MSE):
 Used in regression trees to minimize the error between predicted and actual
values.
Recursion:
 The tree grows recursively by splitting data at each node until a stopping criterion is
met (e.g., maximum depth, minimum samples per leaf).
Pruning:
 Post-processing step to remove unnecessary branches from the tree to prevent
overfitting.
Advantages, Disad., & Practices Advantages
 Easy to interpret and visualize.
 Can handle both categorical and numerical data.
 Requires little data preprocessing (e.g., no need for feature scaling).

Disadvantages
 Prone to overfitting, especially with deep trees.
 Can be biased towards dominant classes in an imbalanced dataset.
 Small changes in data can lead to large changes in the tree structure.

Practical Use Cases: Libraries & Tools:


 Credit Scoring  Scikit-Learn
 Medical Diagnosis  XGBoost (for boosted decision trees)
 Customer Churn Prediction  Spark MLlib
 Recommendation Systems
Hyperparameters

Hyperparameters are settings / configurations that control the behavior of a machine learning model
and the learning process.
Hyperparameters & Tuning

They influence how the model learns and generalizes to new data.

1. Criterion: The function to measure the quality of a split (e.g., Gini, Entropy, MSE, MAE).

2. max_depth: Limits the depth of the tree.

3. min_samples_split: Minimum number of samples required to split a node.

4. min_samples_leaf: Minimum number of samples required to be at a leaf node.

5. max_features: The number of features to consider when looking for the best split.

6. max_leaf_nodes: The maximum number of leaf nodes in the tree.

7. Splitter: The strategy used to choose the split at each node.

Tuning Techniques
 Grid Search

 Random Search

 Cross-validation
Overfitting :
 Decision Trees are prone to overfitting, especially when the tree is too deep. They may
capture noise in the training data, leading to poor generalization to new data.
Overfitting and solutions

 Techniques like pruning (reducing the size of the tree) or setting a maximum depth
limit are used to mitigate overfitting.

Pruning Methods:

 Pre-pruning: Limit tree growth by `setting conditions like maximum depth or minimum
samples per leaf.

 Post-pruning: Reduce the size of the tree after it has been fully grown by removing
branches that have little importance.

Cross-Validation:

 Helps in selecting the right model complexity by splitting the dataset into training and
validation sets.
Extensions of Decision Trees Random Forest (Bagging):

 An ensemble method that builds multiple decision trees and merges them together for
a more accurate and stable prediction.

 Builds multiple independent trees and averages their predictions, focusing on reducing
variance.

Gradient Boosting (Boosting):

 Sequentially builds trees, where each


` new tree corrects the errors of the previous
ones.

 Focusing on reducing bias by correcting the errors of previous trees.

XGBoost (Boosting):

 An efficient implementation of Gradient Boosting for large-scale datasets.

 optimized and regularized version of Gradient Boosting that includes enhancements


like regularization, parallel processing, and more efficient tree pruning.
Characteristics:

 An bagging type of ensembling algorithm

 Constructs a Forest of Multiple Decision Trees

 Randomness (Random Subset of features, Bootstrap sampling)


Random Forest

 Stability – Low variance, Handles Collinearity

 Scalability – parallel processing on High Dim. data

 Robustness (Resistance to overfitting) `

 Interpretablity – Feature Importance


Summary of Key Points:

 Decision Trees are a versatile and powerful tool for both classification and regression.

 Understanding the tree's structure and tuning it properly is key to preventing overfitting
and improving model performance.

Next Steps:
Conclusion

 Explore Decision Trees in-depth with hands-on exercises.

`
 Learn about ensemble methods like Random Forest and Gradient Boosting.
Books :

a) Statistics & EDA Book : Think Stats


https://www.amazon.in/Think-Stats-Allen-B-Downey/dp/9351108562/ref=sr_1_1?
keywords=think+stats&qid=1647758541&s=books&sprefix=Think+stat%2Cstripbooks%2C196&sr=1-1

b) Feature Engineering : Feature Engineering https://www.amazon.in/Feature-Engineering-Machine-


Learning-Principles/dp/9352137116/ref=sr_1_4?
crid=OENMJAZHTYG0&keywords=feature+engineering&qid=1647758574&s=books&sprefix=feature+
Backup Slide

engineering+%2Cstripbooks%2C189&sr=1-4

c) Machine Learning Best Textbook:


- ISLR : https://www.amazon.in/Introduction-Statistical-Learning-Applications-Statistics/dp/
1461471370/ref=sr_1_3?
`
keywords=element+of+statistical+learning&qid=1647758362&s=books&sprefix=element+of+stat
%2Cstripbooks%2C207&sr=1-3
- ML with Scikit learn https://www.amazon.in/Machine-Learning-Python-Cookbook-Preprocessing/dp/
9352137302/ref=sr_1_7?
keywords=oreilly+machine+learning&qid=1647758445&s=books&sprefix=O+reilly+Machin
%2Cstripbooks%2C186&sr=1-7
- https://www.amazon.in/Introducing-Data-Science-Machine-Learning/dp/9351199371/ref=sr_1_13?
crid=25KZ60O8L2WAI&keywords=machine+learning+books&qid=1647758496&s=books&sprefix=mac
hine+learning+books%2Cstripbooks%2C210&sr=1-13

Course :
https://www.coursera.org/specializations/machine-learning#courses
Backup Slide Tree Splitting Architecture

`
Backup Slide Tree Boundary Building

You might also like