19-Performance Metrics
19-Performance Metrics
19-Performance Metrics
References:
1. Ethem Alpaydin, "Introduction to
Machine Learning”, MIT Press, Prentice
Hall of India
Performance evaluation
• How predictive is the model we learned?
– For regression, usually R2 or MSE
– For classification, many options
• Accuracy, Precision, Recall can be used
– Performance on the training data (data used to
build models) is not a good indicator of
performance on future data
• Because new data will probably not be exactly the
same as the training data!
Measuring Classifier Performance
• For classification, a variety of measures have been proposed.
• There are four possible cases, as shown in table 19.1.
• A confusion matrix is a table that is often used to describe the
performance of a classification model (or "classifier") on a set of test data
• For a positive example,
– if the prediction is also positive, this is a true positive;
– if our prediction is negative for a positive example, this is a false negative.
• For a negative example,
– if the prediction is also negative, we have a true negative, and
– if we predict a negative example as positive we have a false positive
Measuring Classifier Performance
• p=tp+fn; p’ =tp+fp
• n=fp+tn; n’ =tn+fn
Confusion Matrix
• A confusion matrix is a table that is often used to describe the performance of a
classification model (or "classifier") on a set of test data
• TRUE POSITIVE (TP):
– A document is actually SPAM (positive) and classified as SPAM (positive).
• TRUE NEGATIVE (TN):
– A document which is actually NONSPAM(negative) and classified as
NONSPAM(negative).
• FALSE POSITIVE (FP):
– A document which is actually NONSPAM(negative) and classified as SPAM(positive).
• FALSE NEGATIVE (FN):
– A document which is actually SPAM(positive) and classified as NONSPAM(negative).
Predicted: Predicted:
Positive Negative
Actual: TP FN
Positive
Actual: FP TN
Negative
Confusion Matrix
A confusion matrix is a table that is often used to describe the performance
of a classification model (or "classifier") on a set of test data
• True Positives (TP): These are cases in which we predicted positive (they
have the disease), and they do have the disease.
• True Negatives (TN): Predicted negative, and they don't have the disease.
• False Positives (FP): Predicted positive, but they don't actually have the
disease. (Also known as a "Type I error.")
• False Negatives (FN): Predicted negative, but they actually do have the
disease. (Also known as a "Type II error.")
Predicted: Predicted:
Positive Negative
Actual: TP FN
Positive
Actual: FP TN
Negative
Precision and Recall
• Accuracy = No. of correct predictions/All predictions
Predicted Negative 8 90
Actual
P 150 40 FN P 250 45
N 60 250 N 5 200
FP Predicted Predicted
Cost matrix
P N
Accuracy: 80% Accuracy: 90%
Cost: 150x-1 + 40x100 + 60x1=3910 P -1 100 Cost: 250x-1 + 45x100 +5x1 = 4255
N 1 0
• If we are focusing on accuracy then we will go with the Model 2 (In this
case we need to compromise on cost) , however if we are focusing on
cost then we will go with the Model 1 (In this case we need to
compromise on accuracy).
ROC Curves
• A receiver operating characteristic curve, i.e. ROC curve, is
a graphical plot that illustrates the diagnostic ability of a
binary classifier system as its discrimination threshold is
varied.
• The diagnostic performance of a test, or the accuracy of a
test to discriminate diseased cases from normal cases is
evaluated using Receiver Operating Characteristic (ROC)
curve analysis.
• A ROC Curve is a way to compare diagnostic tests.
• It is a plot of the true positive rate against the false positive
rate.
ROC Curves