Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Laboratory Work Preparation: Lab Work 9: Binary Classification

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18

LABORATORY WORK PREPARATION

Lab Work 9: Binary classification


Biomedical Informatics
Assoc. Prof. Tomaž Vrtovec, Ph.D.

University of Ljubljana, Faculty of Electrical Engineering Electrical Engineering, level 2


Laboratory of Imaging Technologies International course
Lab Work 9: Binary classification 2 / 18

BINARY CLASSIFICATION
What is binary classification?

Binary or binomial classification is the task of assigning samples into two classes
according to a decision. In medicine, such binary decisions are most frequent, as
we commonly want to know whether the observed subject is diseased or healthy.
Actual condition: Test results:
- positive samples – P - positive results – RP
Number of diseased subjects. Number of positive subjects.
- negative samples – N - negative results – RN
Number of healthy subjects. Number of negative subjects.

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 3 / 18

BINARY CLASSIFICATION
Classification of the results

The samples can be, according to the actual condition and the test results,
classified into four groups:
- true positives – TP - true negatives – TN
The number of diseased subjects The number of healthy subjects
(P) that are correctly classified (N) that are correctly classified
as diseased (RP). as healthy (RN).

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 4 / 18

BINARY CLASSIFICATION
Classification of the results (2)

- false positives – FP - false negatives – FN


The number of healthy subjects The number of diseased subjects
(N) that are wrongly classified as (P) that are wrongly classified as
diseased (RP). healthy (RN).

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 5 / 18

BINARY CLASSIFICATION
Contingency table

The contingency table represents a condensed description of sample


classification according to the actual condition and the test results.

Positive test Negative test


(RP) (RN)
Diseased
TP FN P = TP + FN
subjects (P)
Healthy
FP TN N = FP + TN
subjects (N)
P+N=
RP = TP + FP RN = FN + TN
= RP + RN

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 6 / 18

BINARY CLASSIFICATION
Ideal classification

In the case of an ideal classification, there are no false


positives and false negatives (FP = FN = 0), as all
positive test results actually correspond to positive
samples (RP = P = TP) and all negative test results
actually correspond to negative samples (RN = N = TN).

threshold t threshold t
RP RN RP RN
diseased (P)

Subjects (P, N)
TP TP TN

result
result
healthy (N)

TN

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 7 / 18

BINARY CLASSIFICATION
Real classification

In the case of a real classification, false positives and


false negatives exist (FP ≠ 0, FN ≠ 0) mostly due to:
- the influence of the biological variability of samples
- the influence of the variability of reference data
- the influence of the variability of the test
threshold t threshold t
RP RN RP RN
diseased (P)

Subjects (P, N)
TP TP TN

FN FP FN
result
result
FP
healthy (N)

TN

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 8 / 18

BINARY CLASSIFICATION
Performance measures

The measures of consistency are:


- true positive rate (TPR) - true negative rate (TNR)
TP TP TN TN
TPR = = TNR = =
P TP + FN N TN + FP
Sensitivity is the probability that Specificity is the probability that
when testing a diseased subject (P) when testing a healthy subject (N) the
the result will be positive (RP). result will be negative (RN).

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 9 / 18

BINARY CLASSIFICATION
Performance measures (2)

The measures of inconsistency are:


- false positive rate (FPR) - false negative rate (FNR)
FP FP FN FN
FPR = = FNR = =
N TN + FP P TP + FN
Non-specificity is the probability that Non-specificity is the probability that
when testing a healthy subject (N) the when testing a diseased subject (P)
result will be positive (RP). the result will be negative (RN).

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 10 / 18

BINARY CLASSIFICATION
Performance measures (3)

The predictive values are:


- positive predictive value (PPV) - negative predictive value (NPV)
TP TP TN TN
PPV = = NPV = =
RP TP + FP RN TN + FN
The probability that a subject that The probability that a subject that
tested positive (RP) is actually tested negative (RN) is actually
diseased (P). healthy (N).

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 11 / 18

BINARY CLASSIFICATION
Performance measures (4)

Two interesting measures are also


- false discovery rate (FDR) - accuracy (ACC)
FP FP TP + TN
FDR = = ACC =
RP TP + FP P+N
The probability that a subject that The ration of all correctly
tested positive (RP) is actually classified subjects.
healthy (N).

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 12 / 18

BINARY CLASSIFICATION
ROC curve (2)

The receiver operating characteristic (ROC) curve displays the course of


sensitivity (TPR = 1 – FNR) against non-specificity (FPR = 1 – TNR) at
different values of threshold t.

FNR
FPR TNR
(sensitivity)
TPR

TPR

0
0 FPR 1
(non-specificity)
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 13 / 18

BINARY CLASSIFICATION
ROC curve (2)
RP RN
1
subjects

TP TN

1 1

result
2
2
3

TPR
TP TN

FP FN
result

3
0
FP FN 0 FPR 1

result

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 14 / 18

BINARY CLASSIFICATION
ROC curve (3)

The area under the [ROC] curve (AUC) is equal to the probability that
a (diagnostic) test will classify a randomly chosen positive sample better
than a randomly chosen negative sample.
The test (or its success) can be therefore 1
labeled as:
- excellent
1.0 ≥ AUC > 0.9
- good

TPR
0.9 ≥ AUC > 0.8 AUC
- fair
0.8 ≥ AUC > 0.7
- poor
0
0.7 ≥ AUC > 0.6
0 FPR 1
- fail
0.6 ≥ AUC > 0.5
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 15 / 18

BINARY CLASSIFICATION
Example

Positive test Negative test


(RP) (RN)
Diseased
TP = 8 FN = 12 P = 20
subjects (P)
Healthy
FP = 4 TN = 12 N = 16
subjects (N)
RP = 12 RN = 24 P + N = 36

TP TP 8 TP TP 8
TPR = = = = 40.0% PPV = = = = 66.7%
P TP + FN 8 + 12 RP TP + FP 8 + 4

TN TN 12 TN TN 12
TNR = = = = 75.0% NPV = = = = 50.0%
N TN + FP 12 + 4 RN TN + FN 12 + 12

FP FP 4 FP FP 4
FPR = = = = 25.0% FDR = = = = 33.3%
N TN + FP 12 + 4 RP TP + FP 8 + 4

FN FN 12 TP + TN 8 + 12
FNR = = = = 60.0% ACC = = = 55.6%
P TP + FN 8 + 12 P+N 36
University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 16 / 18

LABORATORY WORK
Lab work 9: Binary classification

Your task will be to perform binary classification of results that represent


blood pressure measurements for a larger group of diseased and healthy
subjects according to four different methods (four diagnostic tests).

Classification will be based on comparison to reference data that defines


two classes of subjects, namely:
- a class labeled with “0” for healthy subjects
- a class labeled with “1” for diseased subjects

You will select a classification threshold and assign the subjects into groups
(TP, TN, FP and FN), and then compute the performance measures (TPR,
TNR, FPR, FNR, PPV, NPV, FDR and ACC). According to the obtained
results at different classification thresholds you will display ROC curves
and finally compute the AUC for the ROC curve.

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 17 / 18

LABORATORY WORK
Lab work 9: Binary classification

Test 1 Test 2 Test 3 Test 4


Reference results results results results
Subject data (mmHg) (mmHg) (mmHg) (mmHg)
⁞ ⁞ ⁞ ⁞ ⁞ ⁞
71 0 85.5 98.5 114.8 115.0
72 1 144.2 70.6 113.8 115.0
73 0 93.4 145.6 117.9 115.0
⁞ ⁞ ⁞ ⁞ ⁞ ⁞

consecutive
subject
number actual condition
results of four
(0 = healthy,
different tests
1 = diseased)

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course
Lab Work 9: Binary classification 18 / 18

CONCLUSION
Discussion, comments, questions…

- What is binary classification and what is it useful for?


- Which groups can be the samples classified into?
- Describe the contingency table.
- What are the properties of ideal and what of real classification?
- Describe the consistency measures for classification performance.
- Describe the inconsistency measures for classification performance.
- Describe the predictive values for classification performance.
- What other measures for classification performance exist?
- What is the relationship between TPR and FNR, and what between
TNR and FPR?
- Describe the ROC curve.
- What does the area under the ROC curve represent, and how can you
label a test according to the resulting area?

University of Ljubljana, Faculty of Electrical Engineering BIOMEDICAL INFORMATICS Electrical Engineering, level 2
Laboratory of Imaging Technologies Assoc. Prof. Tomaž Vrtovec, Ph.D. International course

You might also like