Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Overfitting &
Underfitting
ML HUB
By : Soumit Kar
Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. It is the
result of an overly complex model with an excessive number of training points. A model that is overfitted is inaccurate
because the model has effectively memorized existing data points.
Overfitting = Low bias+High variance
Underfitting is a modeling error which occurs when a function does not fit the data points well enough. It is the
result of a simple model with an insufficient number of training points. A model that is under fitted is inaccurate
because the trend does not reflect the reality of the data.
Underfitting = High bias+Low variance
Bias: The difference Between expected (avg) prediction of the
Model & the actual value .
Variance: How the prediction for a given point vary between
different realisation for the model .
Regression: If the final “Best Fit” line crosses over every single data point by
forming an unnecessarily complex curve, then the model is likely overfitting.
Overfitting with high variance, low bias
Degree of polynomial increase
Appropriate fit
Low bias , Low variance
Overfitting Appropriate fitting
Train
Accuracy
Increases Appropriate
Test
Accuracy
Decreases Appropriate
Classification: If every single class is properly classified on the training set by forming a very complex
decision boundary, then there is a good chance that the model is overfitting.
The green line shows its overfitting .
The black line shows its appropriate fitting .
Overfitting Low bias, high variance
Appropriate fitting Low bias , Low variance
Underfitting
Regression : As shown in the figure below, the data points are laid out in a given pattern, but the model is unable to “Fit”
properly to the given data due to low model complexity.
Classification:As shown in the figure below, the model is trained to classify between the circles and crosses. However, it
is unable to do so properly due to the straight line, which fails to properly classify either of the two classes.
FIX For Overfitting
Cross Validation : It is a technique that is used for the assessment of how the results of statistical analysis generalize to
an
independent data set.
Early Stopping: Its rules provide us with guidance as to how many iterations can be run before the learner begins to
over-fit.
Pruning: Pruning is extensively used while building related models. It simply removes the nodes which add
little predictive power for the problem in hand.
Regularization: It introduces a cost term for bringing in more features with the objective function. Hence it
tries to push the coefficients for many variables to zero and hence reduce cost term.
Remove features: Some algorithms have built-in feature selection. For those that don’t, you can manually
improve their generalizability by removing irrelevant input features. An interesting way to do so is to tell a
story about how each feature fits into the model. This is like the data scientist’s spin on software engineer’s
rubber duck debugging technique, where they debug their code by explaining it, line-by-line, to a rubber
duck.
Train with more data: It won’t work every time, but training with more data can help algorithms detect
the signal better. In the earlier example of modelling height vs. age in children, it’s clear how sampling
more schools will help your model. Of course, that’s not always the case. If we just add more noisy data,
this technique won’t help. That’s why you should always ensure your data is clean and relevant.
Ensembling: Ensembles are machine learning methods for combining predictions from multiple separate
models.
There are a few different methods for ensembling, but the two most common are:
Bagging attempts to reduce the chance of overfitting complex models.
1. It trains a large number of “strong” learners in parallel.
2. A strong learner is a model that’s relatively unconstrained.
3. Bagging then combines all the strong learners together in order to “smooth out” their predictions.
e.g. RandomForest
Boosting attempts to improve the predictive flexibility of simple models.
1. It trains a large number of “weak” learners in sequence.
2. A weak learner is a constrained model (i.e. you could limit the max depth of each decision tree).
3. Each one in the sequence focuses on learning from the mistakes of the one before it.
4. Boosting then combines all the weak learners into a single strong learner.
e.g. XGboost, Gradiant boosting, Adaboost
Handling Underfitting
1. Get more training data.
2. Increase the size or number of parameters in the model.
3. Increase the complexity of the model.
4. Increasing the training time, until cost function is minimised.
Bias Variance Tradeoff
If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and low variance
condition and thus is error-prone. If algorithms fit too complex ( hypothesis with high degree eq.) then it
may be on high variance and low bias. In the latter condition, the new entries will not perform well.
Well, there is something between both of these conditions, known as Trade-off or Bias Variance Trade-
off.
To build a good predictive model, you'll need to find a balance between bias and variance that
minimizes the total error.

More Related Content

What's hot

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
Prakash Pimpale
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
DataminingTools Inc
 
Back propagation
Back propagationBack propagation
Back propagation
Nagarajan
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
Azad public school
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
Justin Cletus
 
Machine learning
Machine learningMachine learning
Machine learning
Amit Kumar Rathi
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
Mohamed Talaat
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
butest
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Salah Amean
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
NAGUR SHAREEF SHAIK
 
Neuro-fuzzy systems
Neuro-fuzzy systemsNeuro-fuzzy systems
Neuro-fuzzy systems
Sagar Ahire
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
amalalhait
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
Mainul Hassan
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Salah Amean
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
Mohammad Junaid Khan
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
Kien Le
 
The fundamentals of Machine Learning
The fundamentals of Machine LearningThe fundamentals of Machine Learning
The fundamentals of Machine Learning
Hichem Felouat
 
Linear regression
Linear regressionLinear regression
Linear regression
MartinHogg9
 
Bayesian networks in AI
Bayesian networks in AIBayesian networks in AI
Bayesian networks in AI
Byoung-Hee Kim
 

What's hot (20)

Support Vector Machines for Classification
Support Vector Machines for ClassificationSupport Vector Machines for Classification
Support Vector Machines for Classification
 
Data mining: Classification and prediction
Data mining: Classification and predictionData mining: Classification and prediction
Data mining: Classification and prediction
 
Back propagation
Back propagationBack propagation
Back propagation
 
2. visualization in data mining
2. visualization in data mining2. visualization in data mining
2. visualization in data mining
 
Mining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and CorrelationsMining Frequent Patterns, Association and Correlations
Mining Frequent Patterns, Association and Correlations
 
Machine learning
Machine learningMachine learning
Machine learning
 
Artificial Neural Networks - ANN
Artificial Neural Networks - ANNArtificial Neural Networks - ANN
Artificial Neural Networks - ANN
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Machine Learning presentation.
Machine Learning presentation.Machine Learning presentation.
Machine Learning presentation.
 
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
Data Mining: Concepts and Techniques chapter 07 : Advanced Frequent Pattern M...
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Neuro-fuzzy systems
Neuro-fuzzy systemsNeuro-fuzzy systems
Neuro-fuzzy systems
 
Unsupervised learning
Unsupervised learningUnsupervised learning
Unsupervised learning
 
Apriori algorithm
Apriori algorithmApriori algorithm
Apriori algorithm
 
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...Data Mining:  Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
Data Mining: Concepts and Techniques_ Chapter 6: Mining Frequent Patterns, ...
 
Support Vector Machines ( SVM )
Support Vector Machines ( SVM ) Support Vector Machines ( SVM )
Support Vector Machines ( SVM )
 
Regularization in deep learning
Regularization in deep learningRegularization in deep learning
Regularization in deep learning
 
The fundamentals of Machine Learning
The fundamentals of Machine LearningThe fundamentals of Machine Learning
The fundamentals of Machine Learning
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Bayesian networks in AI
Bayesian networks in AIBayesian networks in AI
Bayesian networks in AI
 

Similar to Overfitting & Underfitting

Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
Mohamed Essam
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
butest
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
juan631
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Daniel Katz
 
Regresión
RegresiónRegresión
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
kavinilavuG
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
Datacademy.ai
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
AaryanArora10
 
Mastering Customer Segmentation with LLM.pdf
Mastering Customer Segmentation with LLM.pdfMastering Customer Segmentation with LLM.pdf
Mastering Customer Segmentation with LLM.pdf
Yugank Aman
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
NAGARAJANS68
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
Roger Barga
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
AnanthReddy38
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
SVasuKrishna1
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
Aun Akbar
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
AnushaSharma81
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
Subrat Panda, PhD
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
JVSTHARUNSAI
 
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
andreyhapantenda
 
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptxFeature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Nishant83346
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
Matthew Evans
 

Similar to Overfitting & Underfitting (20)

Regularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptxRegularization_BY_MOHAMED_ESSAM.pptx
Regularization_BY_MOHAMED_ESSAM.pptx
 
notes as .ppt
notes as .pptnotes as .ppt
notes as .ppt
 
Dimd_m_004 DL.pdf
Dimd_m_004 DL.pdfDimd_m_004 DL.pdf
Dimd_m_004 DL.pdf
 
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
Legal Analytics Course - Class 6 - Overfitting, Underfitting, & Cross-Validat...
 
Regresión
RegresiónRegresión
Regresión
 
Machine learning interview questions and answers
Machine learning interview questions and answersMachine learning interview questions and answers
Machine learning interview questions and answers
 
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
 
Sample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdfSample_Subjective_Questions_Answers (1).pdf
Sample_Subjective_Questions_Answers (1).pdf
 
Mastering Customer Segmentation with LLM.pdf
Mastering Customer Segmentation with LLM.pdfMastering Customer Segmentation with LLM.pdf
Mastering Customer Segmentation with LLM.pdf
 
MACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptxMACHINE LEARNING YEAR DL SECOND PART.pptx
MACHINE LEARNING YEAR DL SECOND PART.pptx
 
Barga Data Science lecture 10
Barga Data Science lecture 10Barga Data Science lecture 10
Barga Data Science lecture 10
 
Top 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdfTop 20 Data Science Interview Questions and Answers in 2023.pdf
Top 20 Data Science Interview Questions and Answers in 2023.pdf
 
lec1.ppt
lec1.pptlec1.ppt
lec1.ppt
 
Lecture 1
Lecture 1Lecture 1
Lecture 1
 
ML-Unit-4.pdf
ML-Unit-4.pdfML-Unit-4.pdf
ML-Unit-4.pdf
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI ModelsUNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
UNIT-II-Machine-Learning.pptx Machine Learning Different AI Models
 
13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf13_Data Preprocessing in Python.pptx (1).pdf
13_Data Preprocessing in Python.pptx (1).pdf
 
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptxFeature Scaling and Normalization Feature Scaling and Normalization.pptx
Feature Scaling and Normalization Feature Scaling and Normalization.pptx
 
Pricing like a data scientist
Pricing like a data scientistPricing like a data scientist
Pricing like a data scientist
 

Recently uploaded

Selcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdfSelcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdf
SelcukTOPAL2
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
evwcarr
 
Databricks Vs Snowflake off Page PDF submission.pptx
Databricks Vs Snowflake off Page PDF submission.pptxDatabricks Vs Snowflake off Page PDF submission.pptx
Databricks Vs Snowflake off Page PDF submission.pptx
dewsharon760
 
ChessMaster Project Presentation for Batch 1643.pptx
ChessMaster Project Presentation for Batch 1643.pptxChessMaster Project Presentation for Batch 1643.pptx
ChessMaster Project Presentation for Batch 1643.pptx
duduphc
 
chapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someonechapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someone
abeeeeeeeer588
 
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Integrated Optical Fiber/Wireless Systemsfor Environmental MonitoringIntegrated Optical Fiber/Wireless Systemsfor Environmental Monitoring
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Larry Smarr
 
Toward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive ComputingToward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive Computing
Larry Smarr
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Deepikakumari457585
 
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
femim26318
 
Flow Diagram Infographics by Slidesgo.pptx
Flow Diagram Infographics by Slidesgo.pptxFlow Diagram Infographics by Slidesgo.pptx
Flow Diagram Infographics by Slidesgo.pptx
DannyInfante1
 
Systane Global education training centre
Systane Global education training centreSystane Global education training centre
Systane Global education training centre
AkhinaRomdoni
 
emotional interface - dehligame satta for you
emotional interface  -  dehligame satta for youemotional interface  -  dehligame satta for you
emotional interface - dehligame satta for you
bkldehligame1
 
004_Cybersecurity Fundamentals Network Security.pdf
004_Cybersecurity Fundamentals Network Security.pdf004_Cybersecurity Fundamentals Network Security.pdf
004_Cybersecurity Fundamentals Network Security.pdf
DaraputriOktiara
 
BRIGADA eskwela 2024 slip BRIGADA eskwela 2024 slip
BRIGADA eskwela  2024 slip  BRIGADA eskwela  2024 slipBRIGADA eskwela  2024 slip  BRIGADA eskwela  2024 slip
BRIGADA eskwela 2024 slip BRIGADA eskwela 2024 slip
Lucien Maxwell
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
talha2khan2k
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
amazenolmedojeruel
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
da42ki0
 
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to EdgeNYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
Timothy Spann
 
Why You Need Real-Time Data to Compete in E-Commerce
Why You Need  Real-Time Data to Compete in  E-CommerceWhy You Need  Real-Time Data to Compete in  E-Commerce
Why You Need Real-Time Data to Compete in E-Commerce
PromptCloud
 
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
ks1ni2di
 

Recently uploaded (20)

Selcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdfSelcuk Topal Arbitrum Scientific Report.pdf
Selcuk Topal Arbitrum Scientific Report.pdf
 
Audits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdfAudits Of Complaints Against the PPD Report_2022.pdf
Audits Of Complaints Against the PPD Report_2022.pdf
 
Databricks Vs Snowflake off Page PDF submission.pptx
Databricks Vs Snowflake off Page PDF submission.pptxDatabricks Vs Snowflake off Page PDF submission.pptx
Databricks Vs Snowflake off Page PDF submission.pptx
 
ChessMaster Project Presentation for Batch 1643.pptx
ChessMaster Project Presentation for Batch 1643.pptxChessMaster Project Presentation for Batch 1643.pptx
ChessMaster Project Presentation for Batch 1643.pptx
 
chapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someonechapter one 1 cloudcomputing .pptx someone
chapter one 1 cloudcomputing .pptx someone
 
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
Integrated Optical Fiber/Wireless Systemsfor Environmental MonitoringIntegrated Optical Fiber/Wireless Systemsfor Environmental Monitoring
Integrated Optical Fiber/Wireless Systems for Environmental Monitoring
 
Toward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive ComputingToward a National Research Platform to Enable Data-Intensive Computing
Toward a National Research Platform to Enable Data-Intensive Computing
 
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop ServiceCal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
Cal Girls Hotel Safari Jaipur | | Girls Call Free Drop Service
 
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
Cal Girls Mansarovar Jaipur | 08445551418 | Rajni High Profile Girls Call in ...
 
Flow Diagram Infographics by Slidesgo.pptx
Flow Diagram Infographics by Slidesgo.pptxFlow Diagram Infographics by Slidesgo.pptx
Flow Diagram Infographics by Slidesgo.pptx
 
Systane Global education training centre
Systane Global education training centreSystane Global education training centre
Systane Global education training centre
 
emotional interface - dehligame satta for you
emotional interface  -  dehligame satta for youemotional interface  -  dehligame satta for you
emotional interface - dehligame satta for you
 
004_Cybersecurity Fundamentals Network Security.pdf
004_Cybersecurity Fundamentals Network Security.pdf004_Cybersecurity Fundamentals Network Security.pdf
004_Cybersecurity Fundamentals Network Security.pdf
 
BRIGADA eskwela 2024 slip BRIGADA eskwela 2024 slip
BRIGADA eskwela  2024 slip  BRIGADA eskwela  2024 slipBRIGADA eskwela  2024 slip  BRIGADA eskwela  2024 slip
BRIGADA eskwela 2024 slip BRIGADA eskwela 2024 slip
 
Acid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjkAcid Base Practice Test 4- KEY.pdfkkjkjk
Acid Base Practice Test 4- KEY.pdfkkjkjk
 
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptxPRODUCT | RESEARCH-PRESENTATION-1.1.pptx
PRODUCT | RESEARCH-PRESENTATION-1.1.pptx
 
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
一比一原版(uc毕业证书)加拿大卡尔加里大学毕业证如何办理
 
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to EdgeNYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
NYCMeetup07-25-2024-Unstructured Data Processing From Cloud to Edge
 
Why You Need Real-Time Data to Compete in E-Commerce
Why You Need  Real-Time Data to Compete in  E-CommerceWhy You Need  Real-Time Data to Compete in  E-Commerce
Why You Need Real-Time Data to Compete in E-Commerce
 
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
一比一原版(unb毕业证书)新布伦瑞克大学毕业证如何办理
 

Overfitting & Underfitting

  • 2. Overfitting is a modeling error which occurs when a function is too closely fit to a limited set of data points. It is the result of an overly complex model with an excessive number of training points. A model that is overfitted is inaccurate because the model has effectively memorized existing data points. Overfitting = Low bias+High variance Underfitting is a modeling error which occurs when a function does not fit the data points well enough. It is the result of a simple model with an insufficient number of training points. A model that is under fitted is inaccurate because the trend does not reflect the reality of the data. Underfitting = High bias+Low variance Bias: The difference Between expected (avg) prediction of the Model & the actual value . Variance: How the prediction for a given point vary between different realisation for the model .
  • 3. Regression: If the final “Best Fit” line crosses over every single data point by forming an unnecessarily complex curve, then the model is likely overfitting. Overfitting with high variance, low bias Degree of polynomial increase Appropriate fit Low bias , Low variance Overfitting Appropriate fitting Train Accuracy Increases Appropriate Test Accuracy Decreases Appropriate
  • 4. Classification: If every single class is properly classified on the training set by forming a very complex decision boundary, then there is a good chance that the model is overfitting. The green line shows its overfitting . The black line shows its appropriate fitting . Overfitting Low bias, high variance Appropriate fitting Low bias , Low variance
  • 5. Underfitting Regression : As shown in the figure below, the data points are laid out in a given pattern, but the model is unable to “Fit” properly to the given data due to low model complexity. Classification:As shown in the figure below, the model is trained to classify between the circles and crosses. However, it is unable to do so properly due to the straight line, which fails to properly classify either of the two classes.
  • 6. FIX For Overfitting Cross Validation : It is a technique that is used for the assessment of how the results of statistical analysis generalize to an independent data set. Early Stopping: Its rules provide us with guidance as to how many iterations can be run before the learner begins to over-fit.
  • 7. Pruning: Pruning is extensively used while building related models. It simply removes the nodes which add little predictive power for the problem in hand. Regularization: It introduces a cost term for bringing in more features with the objective function. Hence it tries to push the coefficients for many variables to zero and hence reduce cost term. Remove features: Some algorithms have built-in feature selection. For those that don’t, you can manually improve their generalizability by removing irrelevant input features. An interesting way to do so is to tell a story about how each feature fits into the model. This is like the data scientist’s spin on software engineer’s rubber duck debugging technique, where they debug their code by explaining it, line-by-line, to a rubber duck. Train with more data: It won’t work every time, but training with more data can help algorithms detect the signal better. In the earlier example of modelling height vs. age in children, it’s clear how sampling more schools will help your model. Of course, that’s not always the case. If we just add more noisy data, this technique won’t help. That’s why you should always ensure your data is clean and relevant. Ensembling: Ensembles are machine learning methods for combining predictions from multiple separate models. There are a few different methods for ensembling, but the two most common are:
  • 8. Bagging attempts to reduce the chance of overfitting complex models. 1. It trains a large number of “strong” learners in parallel. 2. A strong learner is a model that’s relatively unconstrained. 3. Bagging then combines all the strong learners together in order to “smooth out” their predictions. e.g. RandomForest Boosting attempts to improve the predictive flexibility of simple models. 1. It trains a large number of “weak” learners in sequence. 2. A weak learner is a constrained model (i.e. you could limit the max depth of each decision tree). 3. Each one in the sequence focuses on learning from the mistakes of the one before it. 4. Boosting then combines all the weak learners into a single strong learner. e.g. XGboost, Gradiant boosting, Adaboost
  • 9. Handling Underfitting 1. Get more training data. 2. Increase the size or number of parameters in the model. 3. Increase the complexity of the model. 4. Increasing the training time, until cost function is minimised.
  • 10. Bias Variance Tradeoff If the algorithm is too simple (hypothesis with linear eq.) then it may be on high bias and low variance condition and thus is error-prone. If algorithms fit too complex ( hypothesis with high degree eq.) then it may be on high variance and low bias. In the latter condition, the new entries will not perform well. Well, there is something between both of these conditions, known as Trade-off or Bias Variance Trade- off. To build a good predictive model, you'll need to find a balance between bias and variance that minimizes the total error.