L5 Logistic Regression (2011)
L5 Logistic Regression (2011)
by
Dr Lin Naing @ Mohd. Ayub Sadiq
1
Contents
Why Logistic Regression? (Theory)
2
Relationship betw. Hypercholesterolemia & Age
Hypercholesterolemia: Yes=1; No=0 (Dichotomous Categorical Variable)
3
Relationship betw. Hypercholesterolemia & Age
RISK
Probability of HC
Age in year
HC = Hypercholesterolaemia
4
Relationship betw. Hypercholesterolemia & Age
RISK
Probability of HC
Age in year
HC = Hypercholesterolaemia
5
Probability of HC = P
Odds of HC = P / (1- P)
8
More about Odds Ratio
What is Odds Ratio?
Risk Ratio (Relative Risk)
Risk to get disease among exposed
Risk Ratio =
Risk to get disease among non-exposed
Odds of getting disease among exposed
Odds Ratio =
Odds of getting disease among non-exposed
Odds ratio 3 means .. “the odds of getting disease among
exposed is 3 times of that of among non-exposed”.
9
exp(β) = Odds Ratio … WHY?
10
When to use Logistic Regression?
prediction model).
13
Step 2: Simple logistic regression
Use file name - hypercholesterol
1
Analyze –
Regression –
2
Binary logistic...
One at a time
4
3
In option, please click ’CI for exp(b)’
14
4
It means, female is
‘first’ and male is
‘last’ (‘zero’ comes
first before ‘1’).
Therefore, if we want ‘male’ as reference, we don’t do
anything.
If we want ‘female’ as reference, we click ‘first’ and then
click ‘change’ button (Let's do this !!).
15
OR & 95% CI OR
What has LR test done? The model with (only intercept) and the model with
(intercept + sex), are compared their "likelihoods".
In our study, the 2 models are significantly different (it means that sex
contributes significantly in the model). If not significant, sex is a useless
variable.
16
OR & 95% CI OR
19
Step 3.1: Variable selection
3
1
22
Step 3.2: Linear in logit for numerical
independent variables
In our independent
variables, only age is a
numerical variable. We
have to test whether age
is linear in logit (of
outcome) or not? HOW?
Transform ...
>>Visual binning 2
1
5
4
3 8
6
7
25
Step 2: Calculate the midpoints of the quartile groups
Analyze >>
Compare means >>
means... 1
4
2 30.5
38.0
41.5
48.0
Age4 MidPt. b
Age4(0) 30.5 0.00
Age4(1) 38.0 1.11
Age4(2) 41.5 1.28
Age4(3) 48.0 1.63
1
remove age
add age4
2
3
27
Step 4: Plot the midpoints and "b" to observe the linearity
Age4 MidPt. b 1
Age4(0) 30.5 0.00
Age4(1) 38.0 1.11
Age4(2) 41.5 1.28 enter in SPSS
Age4(3) 48.0 1.63
2
This will be
considered
Make a
'linear'.
scatter plot
Because it is
We call it as Linear very rare to
Quartile get perfect
Plot. linear in
practice.
28
Step 4: Plot the midpoints and "b" to observe the linearity
Linear
Nonlinear
29
Step 4.1: Checking Interactions
⌦All possible 2-ways interactions are checked.
Interaction terms are created.
Add into the "Pre. Main Effect Model" as additional
independent variable.
Run the model using "enter".
If an interaction term is significant (P<0.05), it
means that there is an interaction between the 2
variables. And therefore, the appropriate model is
the main effect variables plus the significant
interaction term.
Check one interaction term at a time.
30
How to create an interaction term?
Select "age" first.
Then select "dm" while
1 you press the "control"
key, so that both
variables are selected.
Run the model using
"enter".
Check the interaction
term is significant or
not?
2 Do this process for one
3
interaction term at a
time (as in MLR).
31
Int
era
cti
on
ter
ms
33
Step 4.2: Checking Multicollinearity (MC)
• Just run the Preliminary main effect model (without interaction
terms) by using ‘enter’, and click ‘collinearity diagnostics’ in
the statistics window.
34
Step 4.2: Checking Multicollinearity (MC)
• Just run the Preliminary main effect model (without interaction
terms) by using ‘enter’, and click ‘collinearity diagnostics’ in
the statistics window.
35
Step 5: Checking assumptions & outliers
• One main assumption statistically, i.e. overall model fitness
• There were several measures of goodness-of-fit.
• We run the model (with interaction if present), and check ..
– (1) Hosmer-Lemeshow goodness-of-fit test
Specificity
Sensitivity
37
Step 5: Checking assumptions & outliers
– (3) Area under the ROC (Receiver Operating Characteristic) curve - shows
the model's ability to discriminate between 2 outcomes
1. Run the preliminary final model & save Predicted Value (Probabilities)
2. Go to Analyze > ROC Curve...
1
3
2
The cut-off point for Cook's influential statistic is 1.0. Data point above 1.0 is
considered influential outlier.
In our data, none is more than 1.0. Therefore, there is no influential outlier.
39
Step 6: Interpretation & data presentation
Table 5. Factors associated with hypercholesterolemia (using
multiple logistic regression) Tip: To obtain
Variable Adj. OR (95% CI OR) X2 stat. (df)a P valuea LR test result
(Table of 'model
Age (year) 1.16 (1.08, 01.24) 16.28 (1) <0.001
if term remove),
Sex Male 7.99 (3.19, 20.01) 21.60 (1) <0.001
we have to run
Female 1.00 -
forward or
DM 54.51 (2) <0.001
backward LR,
Uncontr. 8.16 (4.30, 15.48) 41.31 (1)b <0.001b
Contr. 0.95 (0.60, 01.51) 00.05 (1)b <0.827b and click 'at
Normal 1.00 - each step' in
Exercise No 6.48 (3.19, 13.16) 33.16 (1) <0.001 option window.
Yes 1.00 - Using 'enter' will
Sex*Exercise 2.99 (1.09, 08.25) 04.47 (2) 0.034 not give LR test
Adj. OR = Adjusted odds ratio a Likelihood Ratio (LR) test b Wald test result.
40
Step 6: Interpretation & data presentation
Sex*Exercise 2.99 (1.09, 08.25) 04.47 (2) 0.034
When you have (an) interaction(s), we have more work for interpretation.
In our case, Age and DM are not involved in interaction, so that they can be
interpreted as usual but Sex and Exercise involve in interaction.
Effect of Sex:
Exercise & Sex Male OR1 6.96
interaction Female
means …..
OR1 ≠ OR2
Those who do not exercise
OR for Sex …
We need to Effect of Sex:
present two Male OR2 22.76
ORs Female
41
Step 6: Interpretation & data presentation
Sex*Exercise 2.99 (1.09, 08.25) 04.47 (2) 0.034
When you have (an) interaction(s), we have more work for interpretation.
In our case, Age and DM are not involved in interaction, so that they can be
interpreted as usual but Sex and Exercise involve in interaction.
Male
Effect of Exercise:
Exercise & Sex Those who do not do exercise OR3 18.78
interaction Those who do exercise
means …..
When you have (an) interaction(s), we have more work for interpretation.
In our case, Age and DM are not involved in interaction, so that they can be
interpreted as usual but Sex and Exercise involve in interaction.
Effect of DM:
Exercise & DM Uncontrolled DM OR5
No interaction No DM
means …..
43
Step 6: Interpretation & data presentation
Why an interaction causes these different ORs?
44
Step 6: Interpretation & data presentation
Why an interaction causes these different ORs?
Hyperchol. Gender Exercise
45
Step 6: Interpretation & data presentation
Why an interaction causes these different ORs?
Hyperchol. Gender Exercise
46
Step 6: Interpretation & data presentation
Table 5. Factors associated with hypercholesterolemia (using
multiple logistic regression)
Variable Adj. OR (95% CI OR) X2 stat. (df)a P valuea
Age (year) 1.16 (1.08, 01.24) 16.28 (1) <0.001
OR for age (1.16): Imagine comparing two groups, one group who are one (1)
year older than the other, the older group have 1.16 times of odds of having HC
of the younger group (P<0.001).
For example, those with 31 years old people have 1.16 times of odds of having
HC compared to those with 30 years old.
In other words, those with 31 years old people have 16% more odds of having
HC compared to those with 30 years old.
But in practice, one year older is not that important difference. It is more
practical if we interpret for 5 or 10 years older group.
In this case, the odds ratio for 5 years older = 1.165 = 2.10
The group with 5 years older have 2.1 times of odds of having HC compared to
the younger age group.
47
Step 6: Interpretation & data presentation
Table 5. Factors associated with hypercholesterolemia (using
multiple logistic regression)
Variable Adj. OR (95% CI OR) X2 stat. (df)a P valuea
DM 54.51 (2) <0.001
Uncontr. 8.16 (4.30, 15.48) 41.31 (1)b <0.001b
Contr. 0.95 (0.60, 01.51) 00.05 (1)b <0.827b
Normal 1.00 -
48
Step 6: Interpretation & data presentation
Because of the presence of interaction between Sex and Exercise, the effect of
Sex should be interpreted for each level of exercise.
Similarly, the effect of exercise should be interpreted for each category of Sex
(male and female). For this purpose, we have to run some specific (selected)
models.
Here.. we want to obtain the effect of Sex for
those who have no exercise.
2
This is OR of Sex among those who do not
do exercise. Male are 22.8 times of ... female.
49
Step 6: Interpretation & data presentation
Because of the presence of interaction between Sex and Exercise, the effect of
Sex should be interpreted for each level of exercise.
Similarly, the effect of exercise should be interpreted for each category of Sex
(male and female). For this purpose, we have to run some specific (selected)
models.
50
n = 392
n = 408
Those who do exercise.
For those who do not do exercise, the OR for sex (male as compared to
female) is 22.8 and for those who do exercise, it is 7.0.
It shows the effect of Sex (Male vs Female) is different between those who do
exercise and those who do not do exercise. That is an 'interaction'.
51
Step 6: Interpretation & data presentation
Effect of exercise also have to be interpreted the same manner.
Here.. we want to obtain the effect of
exercise among male.
n = 368
52
Step 6: Interpretation & data presentation
Table 5. Factors associated with hypercholesterolemia (using multiple logistic
regression)
Variable Adj. OR (95% CI OR) X2 stat. (df)a P valuea
Age (year) 1.16 (1.08, 01.24) 16.28 (1) <0.001
DM 54.51 (2) <0.001
Uncontr. 8.16 (4.30, 15.48) 41.31 (1)b <0.001b
Contr. 0.95 (0.60, 01.51) 00.05 (1)b <0.827b
Normal 1.00 -
(Exercise=No)
Sex Male 22.76 (8.16, 63.50) 35.64 (1)b <0.001b
Female 1.00 -
(Exercise=Yes)
Sex Male 6.96 (2.29, 21.13) 11.74 (1)b 0.001b
Female 1.00 -
(Male)
Exercise No 18.78 (9.00, 39.18) 61.13 (1)b <0.001b
Yes 1.00 -
(Female)
Exercise No 6.43 (3.16, 13.07) 26.44 (1)b <0.001b
Yes 1.00 -
Adj. OR = Adjusted odds ratio a Likelihood Ratio (LR) test b Wald test
53
Step 6: Interpretation & data presentation
If the interaction is with a numerical variable, categorize the
numerical variable into dichotomous (cutpoint at median),
and analyse & interpret as described earlier.
During running selected model for interpretation of
interaction variables, the effect of other variables might
change (e.g. OR of age become not significant). We do NOT
have to concern about it.
If the effect of Sex in 2 situations (those with exercise and
without exercise) differ but considered not important
difference, the interaction term can be decided to drop at
this stage.
When running selected models, sample sizes become
smaller, therefore, sometimes, 95% CI becomes extremely
wider. Therefore, many levels in categorical variable is not
desirable.
54
Questions
Questions
and
and
Answers
Answers
Reference:
Hosmer D.W., Lemeshow S. (2000). Applied logistic
regression. (2nd Ed.) New York: Wiley & Sons, Inc.
55