Logistic Regression
Logistic Regression
Eq.11.1
Where,
Thus, for a given value of X1i, the error can take only two values
as given in above Equationand thus will not follow normal
distribution.
ESTIMATION OF PARAMETERS IN LOGISTIC REGRESSION
Solving Eqs. (11.12) and (11.13) will yield the estimated values of
b and b .
0 1
Example : Space Shuttle Challenger Data
Space shuttle orbiter Challenger (Mission STS-51-L) was the
25th shuttle launched by NASA on January 28, 1986 (Smith,
1986; Feynman 1988). The Challenger crashed 73 seconds into
its flight due to the erosion of O-rings which were part of the
solid rocket boosters of the shuttle. Before the launch, the
engineers at NASA were concerned about the outside
temperature which was very low (the actual launch occurred at
36°F). Data in Table shows the O-ring erosion and the launch
temperature of the previous shuttle launches, where ‘damage to
O-ring = 1’ implies there was a damage to O-ring and ‘damage
to O-ring = 0’ implies there was no damage to O-ring during
that launch. In this case, the outcome is binary either there is a
damage to O-ring or there is no damage to O-ring. We can
develop a logistic regression model to predict the probability of
erosion of O-ring based on the launch temperature.
Dataset
R-Code and Analysis
Since the null hypothesis is that the logistic regression is a good fit for the
data, we claim that the logistic regression is appropriate (p-value = 0.1411).
Conclusion
Example 2: Simmons Stores
Simmons’ catalogs are expensive and Simmons would like to
send them to only those customers who have the highest
probability of making a $200 purchase using the discount
coupon included in the catalog.
Simmons’ management thinks that annual spending at Simmons
Stores and whether a customer has a Simmons credit card are
two variables that might be helpful in predicting whether a
customer who receives the catalog will use the coupon to make a
$200purchase.
Logistic Regression
Odds 95% CI
Predictor Coef SE Coef Z p Ratio Lower Upper
Log-Likelihood = -60.487
Test that all slopes are zero: G = 13.628, DF = 2, P-Value = 0.001
Logistic Regression
Odds Ratio
odds 1
Odds Ratio
odds 0
Logistic Regression
Estimated Probabilities
Annual Spending
$1000 $2000 $3000 $4000 $5000 $6000 $7000
Computed
earlier
Logistic Regression
Comparing Odds
Suppose we want to compare the odds of making a
$200 purchase for customers who spend $2000 annually
and have a Simmons credit card to the odds of making a
$200 purchase for customers who spend $2000 annually
and do not have a Simmons credit card.
.4099
estimate of odds 1 .6946
1 - .4099
.1880
estimate of odds 0 .2315
1 - .1880
.6946
Estimate of odds ratio 3.00
.2315
Example 2 - Donner Party
In 1846 the Donner and Reed families left Springfield, Illinois,
for California by covered wagon. In July, the Donner Party, as
it became known, reached Fort Bridger, Wyoming. There its
leaders decided to attempt a new and untested route to the
Sacramento Valley. Having reached its full size of 87 people
and 20 wagons, the party was delayed by a difficult crossing of
the Wasatch Range and again in the crossing of the desert
west of the Great Salt Lake. The group became stranded in the
eastern Sierra Nevada mountains when the region was hit by
heavy snows in late October. By the time the last survivor was
rescued on April 21, 1847, 40 of the 87 members had died
from famine and exposure to extreme cold.
From Ramsey, F.L. and Schafer, D.W. (2002). The Statistical Sleuth: A Course in Methods of Data Analysis (2nd ed)
Lec 20 April 15, 2013 5 / 30
Example - Donner Party - Data
Male
Survived
4424.00
Example - Donner Party - EDA
Male Female
Died 20 5
Survived 10 10
Died Survived
Example - Donner Party - ???
It seems clear that both age and gender have an effect on
someone’s survival, how do we come up with a model that will let
us explore this relationship?
Example - Donner Party - ???
It seems clear that both age and gender have an effect on someone’s
survival, how do we come up with a model that will let us explore
this relationship?
One way to think about the problem - we can treat Survived and Died
as successes and failures arising from a binomial distribution where the
probability of a success is given by a transformation of a linear model
of the predictors.
Example - Donner Party - Model
In R we fit a GLM in the same was as a linear model except using glm
instead of lm and we must also specify the type of GLM to fit using the
family argument.
Model:
p = 1.8185 − 0.0665 × Age
log
1−p
Lec 20 14 / 30
Logistic Regression
Model:
p
log = 1.8185 − 0.0665 × Age
1−p
Model:
p
log = 1.8185 − 0.0665 × Age
1−p
Odds /
Probability of
survival for a 50
year old:
Example - Donner Party - Prediction (cont.)
Model:
p
log = 1.8185 − 0.0665 × Age
1−p
Odds /
p Probability of
log = 1.8185 − 0.0665 × 0
1−p survival for a 50
p year old:
= exp(−1.5065) = 0.222
1−p
p = 0.222/1.222 = 0.181
Example - Donner Party - Prediction (cont.)
p
log = 1.8185 − 0.0665 × Age
1−p
0 20 40 60 80
Logistic Regression
p
log = 1.8185 − 0.0665 × Age
1−p
0 20 40 60 80
Example - Donner Party - Interpretation
p1
log = 1.8185 − 0.0665(x + 1)
1 − p1
= 1.8185 − 0.0665x − 0.0665
p2
log = 1.8185 − 0.0665x
1 − p2
p1 p2
log — = −0.0665
1 − p1 log 1 − p2
p1 p2
log = −0.0665
1 − p1 1−
p2 p1 p2 = exp(−0.0665) = 0.94
1 − p1 1−
p2
Example - Donner Party - Age and Gender
summary(glm(Status ~ Age + Sex, data=donner, family=binomial))
## Call:
## glm(formula = Status ~ Age + Sex, family = binomial, data = donner) ##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
Gender slope: When the other predictors are held constant this is the
log odds ratio between the given level (Female) and the reference level
(Male).
Example - Donner Party - Gender Models
Just like MLR we can plug in gender to arrive at two status vs age
models for men and women respectively.
General model:
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × Sex
1 − p1
Male model:
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × 0
1 − p1
= 1.63312 + −0.07820 × Age
Female model:
p1
log = 1.63312 + −0.07820 × Age + 1.59729 × 1
1 − p1
= 3.23041 + −0.07820 × Age
Lec 20
Example - Donner Party - Gender Models (cont.)
Male
Female
0 20 40 60 80
Age
Example - Donner Party - Gender Models (cont.)
Male
Female
Females
Males
0 20 40 60 80
Age
LOGISTIC REGRESSION MODEL DIAGNOSTICS