Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
SlideShare a Scribd company logo
Performance Measurement
Usman Khan
Confusion Matrix (1)
• A confusion matrix is a table that is often used
to describe the performance of a classification
model (or “classifier”) on a set of test data for
which the true values are known
• It allows the visualization of the performance of
an algorithm
Confusion Matrix (2)
• It allows easy identification of confusion
between classes e.g. one class is commonly
mislabeled as the other.
• Most performance measures are computed
from the confusion matrix.
Confusion Matrix (3)
• A confusion matrix is a summary of prediction
results on a classification problem
• The number of correct and incorrect
predictions are summarized with count values
and broken down by each class. This is the key
to the confusion matrix
Confusion Matrix (4)
• The confusion matrix shows the ways in which
your classification model is confused when it
makes predictions
• It gives us insight not only into the errors being
made by a classifier but more importantly the
types of errors that are being made
Confusion Matrix (5)
Confusion Matrix (6)
• Here,
Class 1 : Positive
Class 2 : Negative
Definition of the Terms:
• Positive (P) : Observation is positive
(for example: is an apple).
• Negative (N) : Observation is not positive
• (for example: is not an apple).
Confusion Matrix (7)
• True Positive (TP) :
Observation is positive, and is predicted to be
• False Negative (FN) :
Observation is positive, but is predicted negative.
• True Negative (TN) :
Observation is negative, and is predicted to be
• False Positive (FP) :
Observation is negative, but is predicted positive.
Confusion Matrix (8)
• Total number of test samples are 165
Classification Rate/Accuracy
• Classification Rate or Accuracy is given by the
Confusion Matrix (9)
Sensitivity and Specificity
• Sensitivity and specificity values can be used
to quantify the performance of a case
definition or the results of a diagnostic test.
• Even with a highly specific diagnostic test, if a
disease is uncommon among those people
tested, a large proportion of positive test
results will be false positive, and the positive
predictive value will be low.
Sensitivity and Specificity
• If the test is applied more selectively such that
the proportion of people tested who truly have
disease is greater, the test's predictive value will
be improved
• Thus, sensitivity and specificity are characteristics
of the test, whereas predictive values depend
both on test sensitivity and specificity and on the
disease prevalence in the population in which
the test is applied
Sensitivity and Specificity
• Sensitivity/Recall
• Sensitivity (Se) is defined as the proportion of
individuals that have a positive test result.
Sensitivity and Specificity
• Specificity
• Specificity is defined as the proportion of
individuals have negative test result
• To get the value of precision we divide the total
number of correctly classified positive examples by
the total number of predicted positive examples.
High Precision indicates an example labeled as
positive is indeed positive (small number of FP).
Precision is the fraction of true positive examples
among the examples that the model classified as
positive. In other words, the number of true
positives divided by the number of false positives
plus true positives.
Recall, also known as sensitivity, is the fraction of
examples classified as positive, among the total
number of positive examples. In other words, the
number of true positives divided by the number of
true positives plus false negatives.
The number of true positives classified by the
The number of false negatives classified by the
The number of false positives classified by the
F1 Score
• The F-score, also called the F1-score, is a measure of a model’s accuracy
on a dataset. It is used to evaluate binary classification systems,
which classify examples into ‘positive’ or ‘negative’
• The F-score is a way of combining the precision and recall of the model,
and it is defined as the harmonic mean of the model’s precision and recall
Calculating F-score
• Let us imagine we have a tree with ten apples
on it. Seven are ripe and three are still unripe,
but we do not know which one is which. We
have an AI which is trained to recognize which
apples are ripe for picking, and pick all the
ripe apples and no unripe apples. We would
like to calculate the F-score, and we consider
both precision and recall to be equally
important, so we use the F1-score.
The AI picks five ripe apples but also picks one unripe apple.
Confusion Matrix for Model 1
Ripe Unripe
Picked 5 1
Unpicked 2 2
Precision and Recall for model 1
• Precision = 0.83
• Recall = 0.71
• F1 Score = 0.77
Confusion Matrix for Model 2
Ripe Unripe
Picked 4 1
Unpicked 2 3
Precision and Recall for model 1
• Precision = 0.8
• Recall = 0.666
• F1 Score = 0.72
• High recall, low precision:
This means that most of the positive examples
are correctly recognized (low FN) but there are a
lot of false positives.
• Low recall, high precision:
This shows that we miss a lot of positive
examples (high FN) but those we predict as
positive are indeed positive (low FP)
F-score vs Accuracy
• There are a number of metrics which can be used to evaluate a binary
classification model, and accuracy is one of the simplest to understand.
Accuracy is defined as simply the number of correctly categorized
examples divided by the total number of examples. Accuracy can be useful
but does not take into account the subtleties of class imbalances, or
differing costs of false negatives and false positives.
• The F1-score is useful:
where there are either differing costs of false positives or false negatives,
• or where there is a large class imbalance, such as if 10% of apples on
trees tend to be unripe. In this case the accuracy would be misleading,
since a classifier that classifies all apples as ripe would automatically get
90% accuracy but would be useless for real-life applications.
• The accuracy has the advantage that it is very easily interpretable, but the
disadvantage that it is not robust when the data is unevenly distributed, or
where there is a higher cost associated with a particular type of error.
Mean Absolute Error or MAE
• We know that an error basically is the absolute difference
between the actual or true values and the values that are
predicted. Absolute difference means that if the result has a
negative sign, it is ignored.
• Hence, MAE = True values – Predicted values
• MAE takes the average of this error from every sample in a
dataset and gives the output.
Mean Squared Error or MSE
• MSE is calculated by taking the average of the
square of the difference between the original
and predicted values of the data.
• Hence, MSE =
Root Mean Squared Error or RMSE
R Squared
Where to use which Metric to determine the Performance of a
Machine Learning Model?
• MAE: It is not very sensitive to outliers in comparison to MSE since it
doesn't punish huge errors. It is usually used when the performance is
measured on continuous variable data. It gives a linear value, which
averages the weighted individual differences equally. The lower the value,
better is the model's performance.
• MSE: It is one of the most commonly used metrics, but least useful when a
single bad prediction would ruin the entire model's predicting abilities, i.e
when the dataset contains a lot of noise. It is most useful when the
dataset contains outliers, or unexpected values (too high or too low
• RMSE: In RMSE, the errors are squared before they are averaged. This
basically implies that RMSE assigns a higher weight to larger errors. This
indicates that RMSE is much more useful when large errors are present
and they drastically affect the model's performance. It avoids taking the
absolute value of the error and this trait is useful in many mathematical
calculations. In this metric also, lower the value, better is the performance
of the model.
Cross Validation
Usman Khan
Cross Validation (1)
• In machine learning is to not use the entire data
set when training a learner.
• Some of the data is removed before training
• Then when training is done, the data that was
removed can be used to test the performance of
the learned model on ``new'' data.
• This is the basic idea for a whole class of model
evaluation methods called cross validation
Cross Validation (2)
• Method of estimating expected predicting
• Helps selecting the best fit model
• Helps ensuring model is not over fit
Cross Validation (3)
1) Holdout method
2) K-Fold CV
3) Leave one out CV
4) Bootstraps Methods
Holdout method
• The holdout cross validation method is the
simplest of all.
• In this method, you randomly assign data
points to two sets. The size of the sets does
not matter
• K-fold cross validation is one way to improve
over the holdout method. The data set is
divided into k subsets and the holdout
method is repeated k times
• Each time, one of the k subsets is used as the
test set and the other k-1 subsets are put
together to form a training set
• Disadvantages
• ???
• Stratified K-Fold
Leave one out CV (1)
• Leave-one-out cross validation is K-fold cross
validation taken to its logical extreme, with K
equal to N, the number of data points in the set
• That means that N separate times, the function
approximate is trained on all the data except for
one point and a prediction is made for that point
• As before the average error is computed and
used to evaluate the model.
Leave one out CV (2)
• Specific case of K-fold validation
Leave one out CV (3)
• Disadvantages
• ???
Bootstrap (1)
• Randomly draw datasets from the training
• Each sample same size as the training sample
• Refit the model with the bootstrap samples
• Examine the model
Bootstrap (2)

More Related Content

Similar to Performance Measurement for Machine Leaning.pptx

Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
Sri Ambati
Cross Validation Cross ValidationmCross Validation.pptx
Cross Validation Cross ValidationmCross Validation.pptxCross Validation Cross ValidationmCross Validation.pptx
Cross Validation Cross ValidationmCross Validation.pptx
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
Bob Smullen
al amin
How to determine sample size
How to determine sample size How to determine sample size
How to determine sample size
saifur rahman
Sample size determination
Sample size determinationSample size determination
Sample size determination
Augustine Gatimu
LR 9 Estimation.pdf
LR 9 Estimation.pdfLR 9 Estimation.pdf
LR 9 Estimation.pdf
Data analysis
Data analysisData analysis
Data analysis
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Sri Ambati
Habib Gul
Performance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper ParameterPerformance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper Parameter
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
Model validation
Model validationModel validation
Model validation
Utkarsh Sharma
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
Abhimanyu Dwivedi

Similar to Performance Measurement for Machine Leaning.pptx (20)

Top 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark LandryTop 10 Data Science Practioner Pitfalls - Mark Landry
Top 10 Data Science Practioner Pitfalls - Mark Landry
Cross Validation Cross ValidationmCross Validation.pptx
Cross Validation Cross ValidationmCross Validation.pptxCross Validation Cross ValidationmCross Validation.pptx
Cross Validation Cross ValidationmCross Validation.pptx
6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx6 Evaluating Predictive Performance and ensemble.pptx
6 Evaluating Predictive Performance and ensemble.pptx
Top 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdfTop 100+ Google Data Science Interview Questions.pdf
Top 100+ Google Data Science Interview Questions.pdf
1. complete stats notes
1. complete stats notes1. complete stats notes
1. complete stats notes
How to determine sample size
How to determine sample size How to determine sample size
How to determine sample size
Sample size determination
Sample size determinationSample size determination
Sample size determination
LR 9 Estimation.pdf
LR 9 Estimation.pdfLR 9 Estimation.pdf
LR 9 Estimation.pdf
Data analysis
Data analysisData analysis
Data analysis
Top 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner PitfallsTop 10 Data Science Practitioner Pitfalls
Top 10 Data Science Practitioner Pitfalls
Performance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper ParameterPerformance Metrics, Baseline Model, and Hyper Parameter
Performance Metrics, Baseline Model, and Hyper Parameter
IME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptxIME 672 - Classifier Evaluation I.pptx
IME 672 - Classifier Evaluation I.pptx
Model validation
Model validationModel validation
Model validation
Cross validation.pptx
Cross validation.pptxCross validation.pptx
Cross validation.pptx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docxAs mentioned earlier, the mid-term will have conceptual and quanti.docx
As mentioned earlier, the mid-term will have conceptual and quanti.docx
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)

Recently uploaded

Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
Asher Sterkin
Non-Functional Testing Guide_ Exploring Its Types, Importance and Tools.pdf
Non-Functional Testing Guide_ Exploring Its Types, Importance and Tools.pdfNon-Functional Testing Guide_ Exploring Its Types, Importance and Tools.pdf
Non-Functional Testing Guide_ Exploring Its Types, Importance and Tools.pdf
Schrodinger’s Backup: Is Your Backup Really a Backup?
Schrodinger’s Backup: Is Your Backup Really a Backup?Schrodinger’s Backup: Is Your Backup Really a Backup?
Schrodinger’s Backup: Is Your Backup Really a Backup?
Ortus Solutions, Corp
dachnug51 - HCLs evolution of the employee experience platform.pdf
dachnug51 - HCLs evolution of the employee experience platform.pdfdachnug51 - HCLs evolution of the employee experience platform.pdf
dachnug51 - HCLs evolution of the employee experience platform.pdf
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
Ortus Solutions, Corp
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio, Inc.
Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
Ortus Solutions, Corp
Revolutionizing Task Scheduling in ColdBox
Revolutionizing Task Scheduling in ColdBoxRevolutionizing Task Scheduling in ColdBox
Revolutionizing Task Scheduling in ColdBox
Ortus Solutions, Corp
ANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdfANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdf
sachin chaurasia
Major Outages in Major Enterprises Payara Conference
Major Outages in Major Enterprises Payara ConferenceMajor Outages in Major Enterprises Payara Conference
Major Outages in Major Enterprises Payara Conference
Tier1 app
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
e-Definers Technology
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model SafeKolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Misti Soneji
How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
Ortus Solutions, Corp
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdfWhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
How to Break Your App with Playwright Tests
How to Break Your App with Playwright TestsHow to Break Your App with Playwright Tests
How to Break Your App with Playwright Tests
Ortus Solutions, Corp
Disk to Cloud: Abstract your File Operations with CBFS
Disk to Cloud: Abstract your File Operations with CBFSDisk to Cloud: Abstract your File Operations with CBFS
Disk to Cloud: Abstract your File Operations with CBFS
Ortus Solutions, Corp

Recently uploaded (20)

Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
Ported to Cloud with Wing_ Blue ZnZone app from _Hexagonal Architecture Expla...
Non-Functional Testing Guide_ Exploring Its Types, Importance and Tools.pdf
Non-Functional Testing Guide_ Exploring Its Types, Importance and Tools.pdfNon-Functional Testing Guide_ Exploring Its Types, Importance and Tools.pdf
Non-Functional Testing Guide_ Exploring Its Types, Importance and Tools.pdf
Schrodinger’s Backup: Is Your Backup Really a Backup?
Schrodinger’s Backup: Is Your Backup Really a Backup?Schrodinger’s Backup: Is Your Backup Really a Backup?
Schrodinger’s Backup: Is Your Backup Really a Backup?
dachnug51 - HCLs evolution of the employee experience platform.pdf
dachnug51 - HCLs evolution of the employee experience platform.pdfdachnug51 - HCLs evolution of the employee experience platform.pdf
dachnug51 - HCLs evolution of the employee experience platform.pdf
WEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service ProvidersWEBINAR SLIDES: CCX for Cloud Service Providers
WEBINAR SLIDES: CCX for Cloud Service Providers
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
ColdBox Debugger v4.2.0: Unveiling Advanced Debugging Techniques for ColdBox ...
Break data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud ConnectorsBreak data silos with real-time connectivity using Confluent Cloud Connectors
Break data silos with real-time connectivity using Confluent Cloud Connectors
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data PlatformAlluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Alluxio Webinar | 10x Faster Trino Queries on Your Data Platform
Intro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AIIntro to Amazon Web Services (AWS) and Gen AI
Intro to Amazon Web Services (AWS) and Gen AI
Revolutionizing Task Scheduling in ColdBox
Revolutionizing Task Scheduling in ColdBoxRevolutionizing Task Scheduling in ColdBox
Revolutionizing Task Scheduling in ColdBox
ANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdfANSYS Mechanical APDL Introductory Tutorials.pdf
ANSYS Mechanical APDL Introductory Tutorials.pdf
Major Outages in Major Enterprises Payara Conference
Major Outages in Major Enterprises Payara ConferenceMajor Outages in Major Enterprises Payara Conference
Major Outages in Major Enterprises Payara Conference
Top 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your WebsiteTop 10 Tips To Get Google AdSense For Your Website
Top 10 Tips To Get Google AdSense For Your Website
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model SafeKolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
Kolkata @ℂall @Girls ꧁❤ 000000000 ❤꧂@ℂall @Girls Service Vip Top Model Safe
How we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hoursHow we built TryBoxLang in under 48 hours
How we built TryBoxLang in under 48 hours
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdfWhatsApp Tracker -  Tracking WhatsApp to Boost Online Safety.pdf
WhatsApp Tracker - Tracking WhatsApp to Boost Online Safety.pdf
How to Break Your App with Playwright Tests
How to Break Your App with Playwright TestsHow to Break Your App with Playwright Tests
How to Break Your App with Playwright Tests
Disk to Cloud: Abstract your File Operations with CBFS
Disk to Cloud: Abstract your File Operations with CBFSDisk to Cloud: Abstract your File Operations with CBFS
Disk to Cloud: Abstract your File Operations with CBFS

Performance Measurement for Machine Leaning.pptx

  • 2. Confusion Matrix (1) • A confusion matrix is a table that is often used to describe the performance of a classification model (or “classifier”) on a set of test data for which the true values are known • It allows the visualization of the performance of an algorithm
  • 3. Confusion Matrix (2) • It allows easy identification of confusion between classes e.g. one class is commonly mislabeled as the other. • Most performance measures are computed from the confusion matrix.
  • 4. Confusion Matrix (3) • A confusion matrix is a summary of prediction results on a classification problem • The number of correct and incorrect predictions are summarized with count values and broken down by each class. This is the key to the confusion matrix
  • 5. Confusion Matrix (4) • The confusion matrix shows the ways in which your classification model is confused when it makes predictions • It gives us insight not only into the errors being made by a classifier but more importantly the types of errors that are being made
  • 7. Confusion Matrix (6) • Here, Class 1 : Positive Class 2 : Negative Definition of the Terms: • Positive (P) : Observation is positive (for example: is an apple). • Negative (N) : Observation is not positive • (for example: is not an apple).
  • 8. Confusion Matrix (7) • True Positive (TP) : Observation is positive, and is predicted to be positive • False Negative (FN) : Observation is positive, but is predicted negative. • True Negative (TN) : Observation is negative, and is predicted to be negative. • False Positive (FP) : Observation is negative, but is predicted positive.
  • 9. Confusion Matrix (8) • Total number of test samples are 165
  • 10. Classification Rate/Accuracy • Classification Rate or Accuracy is given by the relation:
  • 12. Sensitivity and Specificity • Sensitivity and specificity values can be used to quantify the performance of a case definition or the results of a diagnostic test. • Even with a highly specific diagnostic test, if a disease is uncommon among those people tested, a large proportion of positive test results will be false positive, and the positive predictive value will be low.
  • 13. Sensitivity and Specificity • If the test is applied more selectively such that the proportion of people tested who truly have disease is greater, the test's predictive value will be improved • Thus, sensitivity and specificity are characteristics of the test, whereas predictive values depend both on test sensitivity and specificity and on the disease prevalence in the population in which the test is applied
  • 14. Sensitivity and Specificity • Sensitivity/Recall • Sensitivity (Se) is defined as the proportion of individuals that have a positive test result.
  • 15. Sensitivity and Specificity • Specificity • Specificity is defined as the proportion of individuals have negative test result
  • 16. Precision • To get the value of precision we divide the total number of correctly classified positive examples by the total number of predicted positive examples. High Precision indicates an example labeled as positive is indeed positive (small number of FP).
  • 17. precision Precision is the fraction of true positive examples among the examples that the model classified as positive. In other words, the number of true positives divided by the number of false positives plus true positives. recall Recall, also known as sensitivity, is the fraction of examples classified as positive, among the total number of positive examples. In other words, the number of true positives divided by the number of true positives plus false negatives. TP The number of true positives classified by the model. FN The number of false negatives classified by the model. FP The number of false positives classified by the model.
  • 18. F1 Score • The F-score, also called the F1-score, is a measure of a model’s accuracy on a dataset. It is used to evaluate binary classification systems, which classify examples into ‘positive’ or ‘negative’ • The F-score is a way of combining the precision and recall of the model, and it is defined as the harmonic mean of the model’s precision and recall
  • 19. Calculating F-score • Let us imagine we have a tree with ten apples on it. Seven are ripe and three are still unripe, but we do not know which one is which. We have an AI which is trained to recognize which apples are ripe for picking, and pick all the ripe apples and no unripe apples. We would like to calculate the F-score, and we consider both precision and recall to be equally important, so we use the F1-score.
  • 20. The AI picks five ripe apples but also picks one unripe apple.
  • 21. Confusion Matrix for Model 1 Ripe Unripe Picked 5 1 Unpicked 2 2
  • 22. Precision and Recall for model 1 • Precision = 0.83 • Recall = 0.71 • F1 Score = 0.77
  • 23. Confusion Matrix for Model 2 Ripe Unripe Picked 4 1 Unpicked 2 3
  • 24. Precision and Recall for model 1 • Precision = 0.8 • Recall = 0.666 • F1 Score = 0.72
  • 25. Conclusion • High recall, low precision: This means that most of the positive examples are correctly recognized (low FN) but there are a lot of false positives. • Low recall, high precision: This shows that we miss a lot of positive examples (high FN) but those we predict as positive are indeed positive (low FP)
  • 26. F-score vs Accuracy • There are a number of metrics which can be used to evaluate a binary classification model, and accuracy is one of the simplest to understand. Accuracy is defined as simply the number of correctly categorized examples divided by the total number of examples. Accuracy can be useful but does not take into account the subtleties of class imbalances, or differing costs of false negatives and false positives. • The F1-score is useful: where there are either differing costs of false positives or false negatives, • or where there is a large class imbalance, such as if 10% of apples on trees tend to be unripe. In this case the accuracy would be misleading, since a classifier that classifies all apples as ripe would automatically get 90% accuracy but would be useless for real-life applications. • The accuracy has the advantage that it is very easily interpretable, but the disadvantage that it is not robust when the data is unevenly distributed, or where there is a higher cost associated with a particular type of error.
  • 27. Mean Absolute Error or MAE • We know that an error basically is the absolute difference between the actual or true values and the values that are predicted. Absolute difference means that if the result has a negative sign, it is ignored. • Hence, MAE = True values – Predicted values • MAE takes the average of this error from every sample in a dataset and gives the output.
  • 28. Mean Squared Error or MSE • MSE is calculated by taking the average of the square of the difference between the original and predicted values of the data. • Hence, MSE =
  • 29. Root Mean Squared Error or RMSE
  • 31. Where to use which Metric to determine the Performance of a Machine Learning Model? • MAE: It is not very sensitive to outliers in comparison to MSE since it doesn't punish huge errors. It is usually used when the performance is measured on continuous variable data. It gives a linear value, which averages the weighted individual differences equally. The lower the value, better is the model's performance. • MSE: It is one of the most commonly used metrics, but least useful when a single bad prediction would ruin the entire model's predicting abilities, i.e when the dataset contains a lot of noise. It is most useful when the dataset contains outliers, or unexpected values (too high or too low values). • RMSE: In RMSE, the errors are squared before they are averaged. This basically implies that RMSE assigns a higher weight to larger errors. This indicates that RMSE is much more useful when large errors are present and they drastically affect the model's performance. It avoids taking the absolute value of the error and this trait is useful in many mathematical calculations. In this metric also, lower the value, better is the performance of the model.
  • 33. Cross Validation (1) • In machine learning is to not use the entire data set when training a learner. • Some of the data is removed before training begins. • Then when training is done, the data that was removed can be used to test the performance of the learned model on ``new'' data. • This is the basic idea for a whole class of model evaluation methods called cross validation
  • 34. Cross Validation (2) • Method of estimating expected predicting error • Helps selecting the best fit model • Helps ensuring model is not over fit
  • 35. Cross Validation (3) 1) Holdout method 2) K-Fold CV 3) Leave one out CV 4) Bootstraps Methods
  • 36. Holdout method • The holdout cross validation method is the simplest of all. • In this method, you randomly assign data points to two sets. The size of the sets does not matter
  • 37. K-FOLD • K-fold cross validation is one way to improve over the holdout method. The data set is divided into k subsets and the holdout method is repeated k times • Each time, one of the k subsets is used as the test set and the other k-1 subsets are put together to form a training set
  • 38. K - FOLD • Disadvantages • ??? • Stratified K-Fold
  • 39. Leave one out CV (1) • Leave-one-out cross validation is K-fold cross validation taken to its logical extreme, with K equal to N, the number of data points in the set • That means that N separate times, the function approximate is trained on all the data except for one point and a prediction is made for that point • As before the average error is computed and used to evaluate the model.
  • 40. Leave one out CV (2) • Specific case of K-fold validation
  • 41. Leave one out CV (3) • Disadvantages • ???
  • 42. Bootstrap (1) • Randomly draw datasets from the training sample • Each sample same size as the training sample • Refit the model with the bootstrap samples • Examine the model

Editor's Notes

  1.  The harmonic mean can be described as the reciprocal of the arithmetic mean of the reciprocals of the data