Regression Logistic 4
Regression Logistic 4
Ŷi = b0 + b1 X 1i + b2 X 2i + + bk X ki
• Interpretation of the Slopes: (referred to as a Net
Regression Coefficient)
– b1=The change in the mean of Y per unit change in X1,
taking into account the effect of X2 (or net of X2)
– b0 Y intercept. It is the same as simple regression. 2
Binary Logistic Regression
• Binary logistic regression is a type of
regression analysis where the dependent
variable is a dummy variable (coded 0, 1)
• Why not just use ordinary least squares?
^
Y = a + bx
– You would typically get the correct answers in
terms of the sign and significance of
coefficients
– However, there are three problems
3
Binary Logistic Regression
OLS on a dichotomous dependent variable:
Yes = 1
No = 0
Y = Support
Privatizing
Social
Security 1 10
X = Income
4
Binary Logistic Regression
– However, there are three problems
1. The error terms are heteroskedastic (variance of
the dependent variable is different with different
values of the independent variables
2. The error terms are not normally distributed
3. And most importantly, for purpose of
interpretation, the predicted probabilities can be
greater than 1 or less than 0, which can be a
problem for subsequent analysis.
5
Binary Logistic Regression
• The “logit” model solves these problems:
– ln[p/(1-p)] = a + BX
or
– p/(1-p) = ea + BX
– p/(1-p) = ea (eB)X
Where:
“ln” is the natural logarithm, logexp, where e=2.71828
“p” is the probability that Y for cases equals 1, p (Y=1)
“1-p” is the probability that Y for cases equals 0,
1 – p(Y=1)
“p/(1-p)” is the odds
ln[p/1-p] is the log odds, or “logit” 6
Binary Logistic Regression
• Logistic Distribution
P (Y=1)
x
• Transformed, however,
the “log odds” are linear.
ln[p/(1-p)]
7
x
8
Binary Logistic Regression
• The logistic regression model is simply a
non-linear transformation of the linear
regression.
• The logistic distribution is an S-shaped
distribution function (cumulative density
function) which is similar to the standard
normal distribution and constrains the
estimated probabilities to lie between 0
and 1.
9
Binary Logistic Regression
• Logistic Distribution
With the logistic transformation, we’re fitting
the “model” to the data better.
P(Y = 1) 1
.5
0
X= 0 10 20
12
Logistic regression predicts the
natural logarithm of the odds
• Formula:
Z = log (p/1-p) = B0 + B1.X1 + B2.X2 +
B3.X3 . . . e
13
• B’s in logistic regression are analogous to
b’s in OLS
• B1 is the average change in Z per one unit
increase in X1, controlling for the other
predictors
• We calculate changes in the log odds of
the dependent variable, not changes in the
dependent variable (as in OLS).
14
Interpreting logistic regression results
• In SPSS output, look for:
1) Model chi-square (equivalent to F)
2) WALD statistics and “Sig.” for each B
3) Logistic regression coefficients (B’s)
4) Exp(B) = odds ratio
15
Interpreting logistic coefficients
• Identify which predictors are significant
by looking at “Sig.”
• Look at the sign of B1
* If B1 is positive, a unit change in x1 is
raising the odds of the event happening,
after controlling for the other predictors
* If B1 is negative, the odds of the event
decrease with a unit increase in x1.
16
Interpreting the odds ratio
•Look at the column labeled Exp(B)
Exp(B) means “e to the power B” or eB
Called the “odds ratio” (Gr. symbol: Ψ)
e is a mathematical constant used as the
“base” for natural logarithms
• In logistic regression, eB is the factor by
which the odds change when X increases by
one unit.
17
Interpreting the odds ratio
• New odds / Old odds = eB = odds ratio
• e.g. if the odds-ratio for EDUC is 1.05, that
means that for every year of education, the
odds of the outcome (e.g. voting) increase by a
factor of 1.05.
• Odds ratios > 1 indicate a positive relationship
between IV and DV (event likely to occur)
• Odds ratios < 1 indicate a negative relationship
between IV and DV (event less likely to occur)
18
Let’s come up with an
example … run it … and
interpret it …
Binary Logistic Regression
• A researcher is interested in the likelihood of
gun ownership in the US, and what would
predict that.
• She uses the GSS to test the following
research hypotheses:
20
Binary Logistic Regression
• Variables are measured as such:
Dependent:
Havegun: no gun = 0, own gun(s) = 1
Independent:
1. Sex: men = 0, women = 1
2. Age: entered as number of years
3. White: all other races = 0, white =1
4. Education: entered as number of years
SPSS: Analyze Regression Binary Logistic
Enter your variables and for output below, under
21
options, I checked “iteration history”
Binary Logistic Regression
SPSS Output: Some descriptive information first…
22
Binary Logistic Regression
Goodness-of-fit statistics for new model come next…
Test of new model vs. intercept-
only model (the null model), based
on difference of -2LL of each. The
difference has a X2 distribution. Is
new -2LL significantly smaller?
-2(∑(Yi * ln[P(Yi)] + (1-Yi) ln[1-P(Yi)])
These are attempts to
replicate R2 using information
based on -2 log likelihood,
(C&S cannot equal 1)
The -2LL number is “ungrounded,” but it has a χ2 Assessment of new model’s
distribution. Smaller is better. In a perfect model, -2 log predictions
likelihood would equal 0.
23
Binary Logistic Regression
Goodness-of-fit statistics for new model come next…
24
Remember When Assessing
Predictors, The Odds Ratio or Exp(b)…
Odds after a unit change in the predictor
Exp(b) = Odds before a unit change in the predictor
25
Binary Logistic Regression
Interpreting Coefficients…
ln[p/(1-p)] = a + b1X1 + b2X2 + b3X3 + b4X4
eb
X1 b1
X2 b2
X3 b3
X4 b4
1 a
26
Binary Logistic Regression
Each coefficient increases the odds by a multiplicative amount, the amount is eb. “Every
unit increase in X increases the odds by eb.”
In the example above, eb = Exp(B) in the last column.
New odds / Old odds = eb = odds ratio
For Female: e-.780 = .458 …females are less likely to own a gun by a factor of .458.
Age: e.020=1.020 … for every year of age, the odds of owning a gun increases by a
factor of 1.020.
White: e1.618 = 5.044 …Whites are more likely to own a gun by a factor of 5.044.
Educ: e-.023 = .977 …Not significant
27
Binary Logistic Regression
Each coefficient increases the odds by a multiplicative amount, the amount is eb. “Every
unit increase in X increases the odds by eb.”
In the example above, eb = Exp(B) in the last column.
For Sex: e-.780 = .458 … If you subtract 1 from this value, you get the proportion increase
(or decrease) in the odds caused by being male, -.542. In percent terms, odds of
owning a gun decrease 54.2% for women.
Age: e.020=1.020 … A year increase in age increases the odds of owning a gun by 2%.
White: e1.618 = 5.044 …Being white increases the odd of owning a gun by 404%
Educ: e-.023 = .977 …Not significant
28
Here’s another example …
and another way to interpret
the results
Equation for Step 1
We can say that the odds of a patient who is treated being cured are 3.41
times higher than those of a patient who is not treated, with a 95% CI of
1.561 to 7.480.
The important thing about this confidence interval is that it doesn’t cross 1
(both values are greater than 1). This is important because values greater
than 1 mean that as the predictor variable(s) increase, so do the odds of (in
this case) being cured. Values less than 1 mean the opposite: as the
predictor increases, the odds of being cured decreases. 30
Output: Step 1
31
Binary Logistic Regression
Binary Logistic Regression
The test you choose depends on level of measurement:
Independent Variable Dependent Variable Test
Two or More…
Interval-Ratio
Dichotomous Interval-Ratio Multiple Regression
Interval-Ratio
Dichotomous Dichotomous Binary Logistic Regression
33
The Multiple Regression Model building
Yi = β0 + β1 X 1i + β2 X 2i + + βk X ki + ε
34
Binary Logistic Regression
• So what are natural logs and exponents?
– If you didn’t learn about them before this class, you
obviously don’t need to know it to get your degree …
so don’t worry about it.
– But, for those who did learn it, ln(x)=y is the same as:
x=ey
42
Binary Logistic Regression
• ln[p/(1-p)] = a + b1X1 + …+bkXk, the power to which you
need to take e to get:
P P
1–P So… 1 – P = ea + b1X1+…+bkXk
• Ergo, plug in values of x to get the odds ( = p/1-p).
44
Binary Logistic Regression
Age: e.020 = 1.020 … A year increase in age increases the odds of owning a gun by 2%.
How would 10 years’ increase in age affect the odds? Recall (eb)X is the equation component
for a variable. For 10 years, (1.020)10 = 1.219. The odds jump by 22% for ten years’ increase
in age.
Note: You’d have to know the current prediction level for the dependent variable to know if this
percent change is actually making a big difference or not!
45
Binary Logistic Regression
Note: You’d have to know the current prediction level for the dependent variable to
know if this percent change is actually making a big difference or not!
Recall that the logistic regression tells us two things at once.
• Transformed, the “log odds” are linear.
ln[p/(1-p)]
x
• Logistic Distribution
P (Y=1)
46
x
Binary Logistic Regression
We can also get p(y=1) for particular folks.
Odds = p/(1-p); p = P(Y=1)
With algebra…
Odds(1-p) = p … Odds-p(odds) = p …
Odds = p+p(odds) … Odds = p(1+odds)
… Odds/1+odds = p or
p = Odds/(1+odds)
Ln(odds) = a + bx and odds = e a + bx
so…
P = ea+bX/(1+ ea+bX)
We can therefore plug in numbers for X to get P
If a + BX = 0, then p = .5 As a + BX gets really big, p approaches 1
As a + BX gets really small, p approaches 0 (our model is an S curve)
47
Binary Logistic Regression
49
Binary Logistic Regression
Inferential statistics are as before:
50
Binary Logistic Regression
So how would I do hypothesis testing? An Example: