02 LogisticRegression
02 LogisticRegression
Regression
PhD. Msc. David C. Baldears S.
TC3007C
https://towardsdatascience.com/logistic-regression-detailed-overview-46c4da4303bc
1
Simple linear regression
Age and systolic blood pressure (SBP) among 33 adult women
2
Simple linear regression
● Relation between 2 continuous variables (SBP and age)
● Regression coefficient β1
○ Measures association between y and x
○ Amount by which y changes on average when x changes by one unit
○ Least squares method
3
Multiple linear regression
● Relation between a continuous variable and a set of i continuous variables
4
Multiple linear regression
5
Age and signs of coronary heart disease (CD)
Example — Data
What is the relationship between age and CD?
6
Linear Regression Not a good fit
Increasing relationship between the
age and CD
7
Logistic regression
● It is a type of Regression Machine Learning Algorithms being deployed to
solve Classification Problems/categorical.
● It does this by predicting categorical outcomes, unlike linear regression
that predicts a continuous outcome.
● Problems having binary outcomes, such as Yes/No, 0/1, True/False, are the
ones being called classification problems.
8
Why Apply Logistic Regression?
Linear regression doesn’t give a good fit line for the problems having only two
values.
It will give less accuracy while prediction because it will fail to cover the
datasets, being linear in nature.
9
Logistic regression
Prevalence (%) of signs of CD according to age group
10
Comparing LP and Logistic regression Models
11
Mathematics Involved in Logistic Regression
Sigmoid Function
12
Probability
This can be express as Probability
13
Probability
Odds ratio is the probability of an event occurring divided by the probability of
it not occurring: Thus if π is the probability of an event:
The odds ratio of men voting were = 0.705/.295 = 2.39, and the log odds ratio
were ln(2.39) = 0.8712
The odds ratio of women voting were = 0.729/.271 = 2.69, and the log odds
ratio were ln(2.69) = 0.9896
Note that ln(1.0) = 0, so that when the odds ratio is 1.0 (0.5/0.5) the log odds
ratio is zero
15
Why use Logarithms?
● They have 3 advantages
1. Odds vary from 0 to ∞, whereas log odds vary from -∞ to +∞ and are centered at 0. Odds
less than 1 have negative values in log odds and odds greater than one have positive
values in log odds. This accords better with the natural number system which runs from -∞
to +∞.
2. If we take any two numbers and multiply them together that's is the equivalent of adding
their logs. Thus logs make it possible to convert multiplicative models to additive models, a
useful property in the case of logistic regression which is a non-linear multiplicative model
when not expressed in logs.
3. A useful statistic for evaluating the fit of models is -2log likelihood (also known as
deviance). The model has to be expressed in logarithms for this to work.
16
Logistic Regression Model
● The logistic distribution constrains the estimated probabilities to lie
between 0 and 1.
● The estimated probability is:
17
Cost Function of Logistic Regression
The cost can be defined as:
in a nutshell:
18
Fitting equation to the data
● Linear regression: Least squares
● Logistic regression: Maximum likelihood
● Likelihood function
○ Estimates parameters β0 and β1
○ Practically easier to work with log-likelihood
19
Explanation of Cases of Cost Function
When Y = 1: When Y = 0:
Cost
Model Prediction Model Prediction
20
Note: our model’s prediction won’t exceed 1 and won’t go below 0. So,
that part is outside of our worries.
Derivative of Sigmoid Function
Sigmoid function Simplify the equation
21
Derivative of Sigmoid Function
Sigmoid:
Derivative
22
Gradient Descent and Cost Function Derivatives
Having the cost function
For convenience
it requires to find
Part 1
Adding A + B
24
Gradient Descent and Cost Function Derivatives
Hence, the update for β
25
Maximum likelihood
● Iterative computing
○ Choice of an arbitrary value for the coefficients (usually 0)
○ Computing of log-likelihood
○ Variation of coefficients’ values
○ Reiteration until maximisation (plateau)
● Results
○ Maximum Likelihood Estimates (MLE) for β0 and β1
○ Estimates of P(y) for a given value of x
26
Multiple logistic regression
● More than one independent variable
○ Dichotomous, ordinal, nominal, continuous …
● Interpretation of βi
○ Increase in log-odds for a one unit increase in xi with all the other xis constant
○ Measures association between xi and log-odds adjusted for all other x i
27
Types of Logistic Regression
● Binary Logistic Regression: the dependent/target variable has two
distinct values
○ For example 0 or 1, malignant or benign, passed or failed, admitted or not admitted.
● Multinomial Logistic Regression: the target or independent variable has
three or more possible values.
○ For example, the use of Chest X-ray images as features that give indication about one of
the three possible outcomes (No disease, Viral Pneumonia, COVID-19).
● Ordinal Logistic Regression: the target variable is of ordinal nature. In
this type, the categories are ordered in a meaningful manner and each
category has quantitative significance.
○ For example, the grades obtained on an exam have categories that have quantitative
significance and they are ordered. Keeping it simple, the grades can be A, B, or C.
○ “very poor”, “poor”, “good”, “very good” 28
Examples
● Education sector:
○ Whether a student gets admission into a university program or not is based on test scores
and various other factors.
○ In E-learning platforms to see whether a student will complete a course on time or not
based on past activity and other statistics relevant to the problem.
● Business sector:
○ Predicting whether a credit card transaction made by a user is fraudulent or not.
● Medical sector:
○ Predicting whether a person has a disease or not is based on values obtained from test
reports or other factors in general.
○ A very innovative application of Machine Learning being used by researchers is to predict
whether a person has COVID-19 or not using Chest X-ray images.
● Other applications:
○ Email Classification – Spam or not spam
○ Sentiment Analysis – Person is sad or happy based on a text message
○ Object Detection and Classification – Classifying an image to be a cat image or a dog image 29