Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
84 views

Classification Metrics in Machine Learning

The document discusses various classification metrics used in machine learning models including confusion matrix, accuracy, precision, recall, and F-beta. It explains that choosing the right metrics is important to properly evaluate model performance. Confusion matrix, precision, and recall are described in detail and examples are provided to illustrate when each metric would be most useful, such as using recall when false negatives could have high consequences. The F-beta score is also introduced as a metric that balances precision and recall using an adjustable beta parameter.

Uploaded by

xlnc1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
84 views

Classification Metrics in Machine Learning

The document discusses various classification metrics used in machine learning models including confusion matrix, accuracy, precision, recall, and F-beta. It explains that choosing the right metrics is important to properly evaluate model performance. Confusion matrix, precision, and recall are described in detail and examples are provided to illustrate when each metric would be most useful, such as using recall when false negatives could have high consequences. The F-beta score is also introduced as a metric that balances precision and recall using an adjustable beta parameter.

Uploaded by

xlnc1
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

CLASSIFICATION METRICS IN

MACHINE LEARNING
INTRODUCTION
Choosing the right Classification Metrics is very crucial for model evaluation. Metrics
like Confusion Matrix is a simple yet a very powerful Classification Metrics when it comes
to evaluating the performance of a classification problem. Confusion Matrix is a performance
measurement for machine learning problem where output can be two or more classes. Similarly
we have Precision which is defined as the fraction of relevant instances among the retrieved
instances, Recall which is the fraction of the total amount of relevant instances that were
actually retrieved, F-Beta is the weighed harmonic mean of Precision and Recall.

We will discuss these in detail in the upcoming sections. Below are the various Classification
metrics that we should use in Machine Learning.

• Confusion Matrix
• Accuracy
• Recall (True Positive Rate, Sensitivity)
• Precision (Positive Prediction Value)
• F – Beta
• Cohen Kappa
• ROC Curve, AUC Score
• PR Curve

It is very important to use the correct kind of metrics to find out how good the model is. If we
are not using the correct metrics, then it would be really difficult tell the efficiency of our
model.
So, let’s understand every metrics and see which one will best fit in what kind of scenarios.

Now let’s consider a classification problem statement. As shown in the below figure, there
are two ways in which we can solve a classification problem statement:

1.PREDICTING THE CLASS LABELS.

Suppose we have a binary classification with classes A and B. The threshold boundary in this
case will by default be 0.5 as we have 2 classes.

So, let’s say our prediction value is greater than 0.5 then it will belong to class B and if it’s
less than 0.5 then it will belong to class A.

2. PROBABILITY

In case of probability also we have to find out the class labels by selecting the right threshold
value.

The threshold value which we will choose will depend on a case by case basis. Let’s say we
want to predict whether a person is having cancer or not. In this case the choosing threshold
value is very critical and should be chosen in a proper way.

The probability approach involves following classification metrics which we can use for
predicting the correct threshold.

i. ROC Curve
ii. AUC Score
iii. PR Curve

Now that we know how we can solve a classification problem, let’s understand what metrics
will be used for a dataset.

a. If we have a dataset which has 1000 records and is split into equal halves, then that means it
is a balanced dataset. In such cases we use Accuracy to be the classification metric.
b. If we have an imbalanced dataset where the distribution of data is not equal for the binary
classification, then we consider Recall, Precision and F-beta to be the classification metric.
Now that we have briefly discussed about balanced and imbalanced and what type of metrics
should be used for each, let’s understand each of them in detail.

1. CONFUSION MATRIX

Confusion Matrix, in case of binary classification is a 2X2 matrix as shown below. The top
values are the actual values and the left part are the predicted values. It is an error matrix,
which allows visualization of the performance of an algorithm.

i. The first field corresponding to 1 for predicted value and 1 for actual value is the True
Positive (TP) field.
ii. Similarly, the field corresponding to 1 for predicted value and 0 for actual value is the False
Positive (FP) field which is also called the type I error or the false positive rate (FPR)
iii. The field corresponding to 0 for predicted value and 1 for actual value is the False Negative
(FN) which is also called the type II error or the false negative rate (FNR).
iv. The field corresponding to 0 for the predicted value and 0 for the actual value is the True
Negative (TN).

One way to remember the formula for FPR is, we consider all the false value (FP, TN) with
respect to the actual predicted value (FP).
Our most accurate results are TP and TN. Our aim should always be to reduce the type I error
and the type II error.

2. ACCURACY

As we discussed before, if our dataset is a balanced one then we use Accuracy as the
classification metric.

The formula for Accuracy is:

Here TP and TN are the most accurate results out of all the other results.

Now what would happen if our dataset is not balanced. What if we still use the Accuracy
Metric as the classification metric. To understand this let’s take an example:

Suppose we have 10K records with label A being 9k and label B being 1K. Now suppose we
are calculating the Accuracy, then its obvious that we will get a 90% accuracy were the
model predicts most of the records being tagged to label A.

Clearly this is not a good way of calculating the efficiency of the model if our dataset is not
balanced.

So, in such such situations we use Recall, Precision, F-beta as the classification metric.

3.RECALL (TRUE POSITIVE RATE, SENSITIVITY)

For a classification matrix Recall says that out of the total actual positive values, how many
positive were we able to predict correctly. This can be seen in the figure below.

One thing to remember here is, in case of Recall we deal with False Negative.
4. PRECISION (POSITIVE PREDICTION VALUE)

Out of the total predicted positive result, how many results were actually positive. One thing
to remember here is, in case of Precision we deal with False Positive.

Now let’s take few examples to better understand the scenarios where we could
use Precision and Recall.

Precision Example

i. Let’s take a use case of Spam Detection. In this case, we mostly have to consider
the Precision. Let’s say we got an email which is originally not a spam, but the model
detected it as a spam, which means it is a False Positive.

In such cases such cases, where the False Positive value is high, our main focus should
always be to reduce it to minimum so that if we get an important email, it should not be
wrongly classified as a spam email.

Recall Example

Now let’s say our model is tasked to predict whether a person is covid positive or not.
Suppose the model predicted it as not having covid whereas he was actually covid positive
which is a False Negative. This might turn out to be a blunder by the model.

In such cases a False Positive won’t be a very big issue because even if the person is not
covid positive but is predicted as positive then he/she could go for another test to verify the
result.

But if the person has covid and is predicted as negative (False Negative) then chances are he
might not go for another test which might turn out to be a disaster.

Therefore, it’s important to use Recall in such situations.

NOTE: Our goal should always be to reduce Precision and Recall, however:

i. Whenever the False Positive is of more importance with respect to the problem statement,
then use precision
ii. If the False Negative has greater importance with respect to the problem statement, then
use Recall.

Now that we have understood what Precision and Recall is, let’s go ahead and understand F-
Beta and where can we possibly use it.
5. F-BETA

We will encounter some of the scenarios in which both the False Positive and False
Negative play an important role in an imbalanced dataset. In such cases we have to consider
both Recall and Precision.

So, if we are considering both these metrics, the we have to use the F-Beta score.

If the Beta value is 1, then the F-Beta becomes a F1-Score. Similarly Beta value can also be
0.5 or 2.

If, β = 1 then,

This formula is a representation of Harmonic mean between Precision and Recall. Now, let’s
understand when to choose what values of Beta.

Case I:

If both False Positive and False Negative are equally important, then we will select Beta = 1.

Case II:

Suppose False Positive is having more impact than the False Negative, then we need to
reduce the Beta value by selecting something between 0 to 1.
Ex: Beta=0.5

Case III:

Suppose the False Negative impact is high which is basically the Recall, then in such cases
we increase the Beta value more than 1.
Ex: Beta=2

You might also like