Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
53 views

Logistic Regression

The document discusses logistic regression for classification problems where the response variable is binary or categorical. It provides examples of classification problems and describes how logistic regression models the probability of an outcome using a logistic function. The document also covers how to estimate logistic regression parameters using maximum likelihood and how to interpret the results.

Uploaded by

Simarpreet
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
53 views

Logistic Regression

The document discusses logistic regression for classification problems where the response variable is binary or categorical. It provides examples of classification problems and describes how logistic regression models the probability of an outcome using a logistic function. The document also covers how to estimate logistic regression parameters using maximum likelihood and how to interpret the results.

Uploaded by

Simarpreet
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 56

Topic 6-10: Logistic Regression

Introduction- Classification Problems

 Classification problems are an important category of


problems in analytics in which the response variable(Y)
takes a discrete value.

 In classification problems, the primary objective is to


predict the class of a customer (or class probability) based
on the values of explanatory variables or predictors
Examples- Classification Problems
• A bank may like to classify their customers based on risk such as low-,
medium- and high-risk customers under loan portfolio. Here the
response variable Y takes 3 values (e.g., Y = 1 for low risk, Y = 2 for
medium risk and Y = 3 for high risk).

• An organization may like to predict the customers who are likely to


churn (here Y takes two values, Y = 1 for churn and Y = 0 for do not
churn).

• Health service providers based on diagnostic tests may classify the


patients as positive, that is presence of a disease (Y = 1) or negative, that
is absence of a disease (Y = 0).

• Customers who are likely to respond to a marketing campaign through


phone calls/emails (Y = 1 will respond to the campaign; Y = 0 will not
respond).

• Human Resource Department of a firm may try to predict whether an


applicant would accept the job offer (two categories: accept and do not
accept).
Examples- Classification Problems
• Movie production houses may like to predict whether a movie will be a
hit or not at the box office.

• Predict outcome of any sporting event, for example, in case of football


the outcome will be Win, Draw or Loss.

• Many organizations such as banks, e-commerce and insurance


companies have to deal with fraudulent transactions. They may like to
predict whether a transaction is fraud or not.

• Few companies manipulate the accounts, so the policy makers (or


regulators) may like to predict whether the companies are manipulating
the accounts or not.

• Sentiment about a product or service in social media can be classified as


positive, negative, or neutral, which enables an organization to
understand sentiments about their product/service. Organizations may
like to understand the reasons for negative sentiment if exists and take
corrective actions.
Categorical Response Variables
Examples:
 Non  smoker
Whether or not a person Y 
smokes Binary Response Smoker
Survives
Success of a medical Y 
treatment Dies

Opinion poll responses Agree



Ordinal Response Y   Neutral
Disagree

Logistic Regression

 In many ways logistic regression is like ordinary


regression. It requires a dependent variable, y, and
one or more independent variables.
 Logistic regression can be used to model situations in
which the dependent variable, y, may only assume
two discrete values, such as 0 and 1.
 The ordinary multiple regression model is not
applicable.
Why not linear regression
When the response variable has only 2 possible values, it is
desirable to have a model that predicts the value either as 0 or 1
or as a probability score that ranges between 0 and 1.
Linear regression does not have this capability. Because, If you
use linear regression to model a binary response variable, the
resulting model may not restrict the predicted Y values within 0
and 1.
INTRODUCTION TO BINARY LOGISTIC REGRESSION

Assume that the value of Y is either 1 (conventionally known as


positive outcome) or 0 (conventionally known as negative
outcome). When there are more than two values of Y, then
multinomial logistic regression model is used.
The binary logistic regression model is given by

Eq.11.1

Where,

Here X1, X2, …, Xm are the independent variables


Logistic function

The objectives of classification problems is to predict the class


probability, that is the probability that an observation will
belong to a particular class.
Logistic function is a probability function, and has an S-
shaped curve as shown in Figure
Logistic Regression
Logistic regression function defined in Eq. (11.1) can be
transformed as follows:

Can be written as:

This Equation known as logit (logistic probability unit) function


and
Logit function
Logit function is similar to a multiple linear regression model.
Such models are called generalized linear models (GLM), in
GLM the errors do not follow normal distribution and there
exists a transformation function of the outcome variable that
takes a linear function. For example, consider a regression
equation in which the response variable Y takes only two values
(0 or 1):

Thus, for a given value of X1i, the error can take only two values
as given in above Equationand thus will not follow normal
distribution.
ESTIMATION OF PARAMETERS IN LOGISTIC REGRESSION

• One of the major assumptions of multiple linear regression


model is that the residuals follow a normal distribution.
However, the residuals in logistic regression will not a follow
normal distribution and thus we cannot use method of
ordinary least squares (OLS).

• Regression parameters in the case of logistic regression are


estimated using Maximum Likelihood Estimator (MLE).

• In binary logistic regression, the response variable Y takes


only two values (Y = 0 and1). Let
ESTIMATION OF PARAMETERS IN LOGISTIC
REGRESSION
The probability (likelihood) function of binary logistic
regression for specific observation Yi (Yi = 0 or 1) is given by
ESTIMATION OF PARAMETERS IN LOGISTIC
REGRESSION

Solving Eqs. (11.12) and (11.13) will yield the estimated values of
b and b .
0 1
Example : Space Shuttle Challenger Data
Space shuttle orbiter Challenger (Mission STS-51-L) was the
25th shuttle launched by NASA on January 28, 1986 (Smith,
1986; Feynman 1988). The Challenger crashed 73 seconds into
its flight due to the erosion of O-rings which were part of the
solid rocket boosters of the shuttle. Before the launch, the
engineers at NASA were concerned about the outside
temperature which was very low (the actual launch occurred at
36°F). Data in Table shows the O-ring erosion and the launch
temperature of the previous shuttle launches, where ‘damage to
O-ring = 1’ implies there was a damage to O-ring and ‘damage
to O-ring = 0’ implies there was no damage to O-ring during
that launch. In this case, the outcome is binary either there is a
damage to O-ring or there is no damage to O-ring. We can
develop a logistic regression model to predict the probability of
erosion of O-ring based on the launch temperature.
Dataset
R-Code and Analysis

Logistic regression coefficient:

The probability of damage to O-ring as a function of launch


temperature is given by:
Plot

The probability of damage decreases as the launch temperature


increases. However, we have to test the validity of the model
using diagnostic tests before the model can be accepted.
INTERPRETATION OF LOGISTIC REGRESSION PARAMETERS
Interpretation of logistic regression parameters is not as simple as in the
case of linear regression. Consider the logit function defined below:
INTERPRETATION OF LOGISTIC REGRESSION PARAMETERS
Diagnosing the Model
Hosmer-Lemeshow goodness of fit test
HosmerLemeshow (HL) is a chi-square goodness of fit test.
The HL test is constructed by dividing the data set into 10 groups
(deciles). The HL test checks whether the observed and expected
frequencies in each group are equal. The null and alternative
hypotheses in HL test are
H : The logistic regression model fits the data
0

H : The logistic regression model does not fit the data


1

Since the null hypothesis is that the logistic regression is a good fit for the
data, we claim that the logistic regression is appropriate (p-value = 0.1411).
Conclusion
Example 2: Simmons Stores
Simmons’ catalogs are expensive and Simmons would like to
send them to only those customers who have the highest
probability of making a $200 purchase using the discount
coupon included in the catalog.
Simmons’ management thinks that annual spending at Simmons
Stores and whether a customer has a Simmons credit card are
two variables that might be helpful in predicting whether a
customer who receives the catalog will use the coupon to make a
$200purchase.
Logistic Regression

 Example: Simmons Stores


Simmons conducted a study by sending out 100
catalogs, 50 to customers who have a Simmons credit
card and 50 to customers who do not have the card.
At the end of the test period, Simmons noted for each of
the 100 customers:
1) the amount the customer spent last year at Simmons,
2) whether the customer had a Simmons credit card, and
3) whether the customer made a $200 purchase.
A portion of the test data is shown on the next slide.
Logistic Regression

 Simmons Test Data (partial) x1 x2 y

Annual Spending Simmons $200


Customer ($1000) Credit Card Purchase
1 2.291 1 0
2 3.215 1 0
3 2.135 1 0
4 3.924 0 0
5 2.528 1 0
6 2.473 0 1
7 2.384 0 0
8 7.076 0 0
9 1.182 1 1
10 3.345 0 0
Logistic Regression

 Simmons Logistic Regression Table

Odds 95% CI
Predictor Coef SE Coef Z p Ratio Lower Upper

Constant -2.1464 0.5772 -3.72 0.000


Spending 0.3416 0.1287 2.66 0.008 1.41 1.09 1.81
Card 1.0987 0.4447 2.47 0.013 3.00 1.25 7.17

Log-Likelihood = -60.487
Test that all slopes are zero: G = 13.628, DF = 2, P-Value = 0.001
Logistic Regression

 Simmons Estimated Logistic Regression Equation

 2.1464  0.3416 x 1  1.0987 x 2


e
yˆ 
1  e 2.1464  0.3416 x1  1.0987 x 2
Logistic Regression

 Using the Estimated Logistic Regression Equation


• For customers that spend $2000 annually
and do not have a Simmons credit card:
e 2.1464  0.3416( 2 ) 1.0987 ( 0 )
yˆ   2.1464  0.3416( 2 )  1.0987 ( 0 )
 0.1880
1 e
• For customers that spend $2000 annually
and do have a Simmons credit card:
e 2.1464  0.3416( 2 ) 1.0987 ( 1)
yˆ   2.1464  0.3416( 2 )  1.0987 ( 1)
 0.4099
1 e
Logistic Regression
With logistic regression is difficult to interpret the relation-
ship between the variables because the equation is not linear so
we use the concept called the odds ratio.
The odds in favor of an event occurring is defined as the
probability the event will occur divided by the probability
the event will not occur.
Odds in Favor of an Event Occurring
P ( y  1| x 1 , x 2 , , x p ) P ( y  1| x 1 , x 2 , , x p )
odds  
P ( y  0| x 1 , x 2 , , x p ) 1  P ( y  1| x 1 , x 2 , , x p )

 Odds Ratio

odds 1
Odds Ratio 
odds 0
Logistic Regression

 Estimated Probabilities

Annual Spending
$1000 $2000 $3000 $4000 $5000 $6000 $7000

Credit Yes 0.3305 0.4099 0.4943 0.5790 0.6593 0.7314 0.7931


Card No 0.1413 0.1880 0.2457 0.3143 0.3921 0.4758 0.5609

Computed
earlier
Logistic Regression

 Comparing Odds
Suppose we want to compare the odds of making a
$200 purchase for customers who spend $2000 annually
and have a Simmons credit card to the odds of making a
$200 purchase for customers who spend $2000 annually
and do not have a Simmons credit card.
.4099
estimate of odds 1   .6946
1 - .4099
.1880
estimate of odds 0   .2315
1 - .1880
.6946
Estimate of odds ratio   3.00
.2315
Example 2 - Donner Party
In 1846 the Donner and Reed families left Springfield, Illinois,
for California by covered wagon. In July, the Donner Party, as
it became known, reached Fort Bridger, Wyoming. There its
leaders decided to attempt a new and untested route to the
Sacramento Valley. Having reached its full size of 87 people
and 20 wagons, the party was delayed by a difficult crossing of
the Wasatch Range and again in the crossing of the desert
west of the Great Salt Lake. The group became stranded in the
eastern Sierra Nevada mountains when the region was hit by
heavy snows in late October. By the time the last survivor was
rescued on April 21, 1847, 40 of the 87 members had died
from famine and exposure to extreme cold.

From Ramsey, F.L. and Schafer, D.W. (2002). The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)
Lec 20 April 15, 2013 5 / 30
Example - Donner Party - Data

Age Sex Status


1 23.00 Male Died
2 40.00 Female Survived
3 40.00 Male Survived
4 30.00 Male Died
5 28.00 Male
..
Died
.
..
..
.
4323.00

Male

Survived
4424.00
Example - Donner Party - EDA

Status vs. Gender:

Male Female
Died 20 5
Survived 10 10

Questions of primary interest are:


1.What is the relationship between survival and sex? Are males more
or less likely to survive than females?
2.What is the relationship between survival and age? Can the
probability of survival be predicted as a function of age?
3.Does age affect the survival rate of males and females differently?
Example - Donner Party - EDA
Status vs. Gender:
Male Female
Died 20 5
Survived 10 10

Status vs. Age:

Died Survived
Example - Donner Party - ???
It seems clear that both age and gender have an effect on
someone’s survival, how do we come up with a model that will let
us explore this relationship?
Example - Donner Party - ???
It seems clear that both age and gender have an effect on someone’s
survival, how do we come up with a model that will let us explore
this relationship?

Even if we set Died to 0 and Survived to 1, this isn’t something we


can transform our way out of - we need something more.
Example - Donner Party - ???
It seems clear that both age and gender have an effect on someone’s
survival, how do we come up with a model that will let us explore this
relationship?

Even if we set Died to 0 and Survived to 1, this isn’t something we can


transform our way out of - we need something more.

One way to think about the problem - we can treat Survived and Died
as successes and failures arising from a binomial distribution where the
probability of a success is given by a transformation of a linear model
of the predictors.
Example - Donner Party - Model
In R we fit a GLM in the same was as a linear model except using glm
instead of lm and we must also specify the type of GLM to fit using the
family argument.

summary(glm(Status ~ Age, data=donner, family=binomial)) ## Call:


## glm(formula = Status ~ Age, family = binomial, data = donner) ##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
39 / 30

## (Intercept) 1.81852 0.99937 1.820 0.0688 .


## Age -0.06647 0.03222 -2.063 0.0391 *
##
## Null deviance: 61.827 on 44 degrees of freedom
## Residual deviance: 56.291 on 43 degrees of freedom ## AIC: 60.291
##
## Number of Fisher Scoring iterations: 4
Logistic Regression

Example - Donner Party - Prediction

Estimate Std. Error z value Pr(>|z|)


(Intercept) 1.8185 0.9994 1.82 0.0688
Age -0.0665 0.0322 -2.06 0.0391

Model:
p = 1.8185 − 0.0665 × Age
log
1−p

Lec 20 14 / 30
Logistic Regression

Example - Donner Party - Prediction

Estimate Std. Error z value Pr(>|z|)


(Intercept) 1.8185 0.9994 1.82 0.0688
Age -0.0665 0.0322 -2.06 0.0391

Model:
p
log = 1.8185 − 0.0665 × Age
1−p

Odds / Probability of survival for a newborn


(Age=0):
Example - Donner Party - Prediction

Estimate Std. Error z value Pr(>|z|)


(Intercept) 1.8185 0.9994 1.82 0.0688
Age -0.0665 0.0322 -2.06 0.0391

Model:
p
log = 1.8185 − 0.0665 × Age
1−p

Odds / Probability of survival for a newborn


(Age=0):
p
log = 1.8185 − 0.0665 × 0
1−p
p
= exp(1.8185) = 6.16
1−p
p = 6.16/7.16 = 0.86
Logistic Regression

Example - Donner Party - Prediction (cont.)


Model:
p
log = 1.8185 − 0.0665 × Age
1−p

Odds / Probability of survival for a 25 year


old:
Example - Donner Party - Prediction (cont.)
Model:
p
log = 1.8185 − 0.0665 × Age
1−p

Odds / Probability of survival for a 25 year


old: p
log = 1.8185 − 0.0665 × 25
1−p
p
= exp(0.156) = 1.17
1−p
p = 1.17/2.17 = 0.539
Example - Donner Party - Prediction (cont.)
Model:
p
log = 1.8185 − 0.0665 × Age
1−p

Odds / Probability of survival for a 25 year


old: p
log = 1.8185 − 0.0665 × 25
1−p
p
= exp(0.156) = 1.17
1April−15,p2013
p = 1.17/2.17 = 0.539

Odds /
Probability of
survival for a 50
year old:
Example - Donner Party - Prediction (cont.)
Model:
p
log = 1.8185 − 0.0665 × Age
1−p

Odds / Probability of survival for a 25 year


old: p
log = 1.8185 − 0.0665 × 25
1−p
p
= exp(0.156) = 1.17
1−p
p = 1.17/2.17 = 0.539

Odds /
p Probability of
log = 1.8185 − 0.0665 × 0
1−p survival for a 50
p year old:
= exp(−1.5065) = 0.222
1−p
p = 0.222/1.222 = 0.181
Example - Donner Party - Prediction (cont.)

p
log = 1.8185 − 0.0665 × Age
1−p

0 20 40 60 80
Logistic Regression

Example - Donner Party - Prediction (cont.)

p
log = 1.8185 − 0.0665 × Age
1−p

0 20 40 60 80
Example - Donner Party - Interpretation

Estimate Std. Error z value Pr(>|


z|) (Intercept) 1.8185 0.9994 1.82
0.0688
Age -0.0665 0.0322 -2.06
0.0391

Simple interpretation is only possible in terms of log odds


and log odds ratios for intercept and slope terms.

Intercept: The log odds of survival for a party member


with an age of 0. From this we can calculate the odds or
probability, but additional calculations are necessary.

Slope: For a unit increase in age (being 1 year older)


how much will the log odds ratio change, not
particularly intuitive. More often then not we care only
about sign and relative magnitude.
Example - Donner Party - Interpretation - Slope

p1
log = 1.8185 − 0.0665(x + 1)
1 − p1
= 1.8185 − 0.0665x − 0.0665
p2
log = 1.8185 − 0.0665x
1 − p2

p1 p2
log — = −0.0665
1 − p1 log 1 − p2
p1 p2
log = −0.0665
1 − p1 1−
p2 p1 p2 = exp(−0.0665) = 0.94
1 − p1 1−
p2
Example - Donner Party - Age and Gender
summary(glm(Status ~ Age + Sex, data=donner, family=binomial))

## Call:
## glm(formula = Status ~ Age + Sex, family = binomial, data = donner) ##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)

## (Intercept) 1.63312 1.11018 1.471 0.1413


## Age -0.07820 0.03728 -2.097 0.0359 *
## SexFemale 1.59729 0.75547 2.114 0.0345 *
## ---
##
## (Dispersion parameter for binomial family taken to be 1) ##
## Null deviance: 61.827 on 44 degrees of freedom
## Residual deviance: 51.256 on 42 degrees of freedom ## AIC: 57.256
##
## Number of Fisher Scoring iterations: 4

Gender slope: When the other predictors are held constant this is the
log odds ratio between the given level (Female) and the reference level
(Male).
Example - Donner Party - Gender Models
Just like MLR we can plug in gender to arrive at two status vs age
models for men and women respectively.

General model:
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × Sex
1 − p1

Male model:
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × 0
1 − p1
= 1.63312 + −0.07820 × Age

Female model:
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × 1
1 − p1
= 3.23041 + −0.07820 × Age

Lec 20
Example - Donner Party - Gender Models (cont.)
Male
Female

0 20 40 60 80

Age
Example - Donner Party - Gender Models (cont.)

Male
Female

Females

Males

0 20 40 60 80

Age
LOGISTIC REGRESSION MODEL DIAGNOSTICS

1. Omnibus test: Check whether the explained variance in


the model is significantly higher than the unexplained
variation. For example, in MLR model, F-test is an
omnibus test.
2. Wald’s test: Wald’s test is used for checking whether an
individual explanatory variable is statistically significant.
Wald’s test is a chi-square test.
3. Hosmer−Lemeshow test: It is a chi-square goodness of fit
test for binary logistic regression.
4. Pseudo R2: Pseudo R2 is a measure of goodness of the
model. It is called pseudo R2 because it does not have the
same interpretation of R2 as in the MLR model.
Omnibus Test (Likelihood Ratio Test)

You might also like