Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
58 views

Multilevel Binary Logistic Regression 3ab

The document describes multilevel binary logistic regression using SPSS. It discusses modeling binary outcomes where observations are nested within higher-level units. Key differences from standard multilevel modeling are that the dependent variable is the predicted logit and there is no estimate of level 1 residual variance. An example model predicts the probability that students own a Justin Bieber album based on student and teacher characteristics.

Uploaded by

Armand Mutwadi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views

Multilevel Binary Logistic Regression 3ab

The document describes multilevel binary logistic regression using SPSS. It discusses modeling binary outcomes where observations are nested within higher-level units. Key differences from standard multilevel modeling are that the dependent variable is the predicted logit and there is no estimate of level 1 residual variance. An example model predicts the probability that students own a Justin Bieber album based on student and teacher characteristics.

Uploaded by

Armand Mutwadi
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 52

Multilevel binary logistic regression using IBM SPSS

Mike Crowson, Ph.D.


March 2020
(last updated March 27)

Please cite Powerpoint as:

Crowson, H. M. (2020). Multilevel binary logistic regression using IBM SPSS


[Powerpoint slides] Retrieved from https://drive.google.com/open?id=16UJsWJodaVFdxJesu7OTQFgGWtrsITzv

Youtube video address: https://youtu.be/roWTULimNPk


Logistic regression models are designed to predict the probability of a case falling in a target group (Y=1) on a binary
outcome variable. Because probabilities are bounded at 0 and 1, the relationship between the predictors in a model and the
outcome is inherently non-linear (where predicted probabilities follow an S-shaped curve; logistic curve). In order to
“linearize” the relationship between predictors and the probability of a case falling into the target group, logistic regression
analysis relies on the logit, which is a mathematical transformation of the predicted probability of target group membership.
The mathematical relationship between a probability and a logit in logistic regression is given by:

𝑙𝑜𝑔𝑖𝑡 (𝑌 =1)=ln ⁡( 𝑜𝑑𝑑𝑠=𝑌 =1)=𝑙𝑛 ( 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 ( 𝑌 =1)


𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 (𝑌 =0) )
In effect, although the dependent variable in logistic regression is a binary outcome, we model the relationship between
the predictors and the probability that Y=1. This is accomplished ‘indirectly’ through the use of a log-link function (which
linearizes the relationship between predictors and the outcome).

In single-level, binary logistic regression, the prediction equation is given by:

ln ( 𝑜𝑑𝑑𝑠=𝑌=1 )= 𝑏0 +∑ 𝑏𝑘 𝑋 𝑘 =𝑏0 +𝑏1 𝑋 1 +…𝑏𝑘 𝑋 𝑘 As you can see, logits are predicted using a linear model.
This model specifies the structural relations between the
independent variables and predicted logits.

𝑏0 +𝑏1 𝑋 1+ …𝑏𝑘 𝑋 𝑘
This formula represents the conversion of predicted logits into predicted
1 +𝑏0 +𝑏1 𝑋 1+… 𝑏𝑘 𝑋𝑘
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 ( 𝑌 =1 )=𝑒 probabilities
Multilevel binary logistic regression can be used when one is modeling binary outcomes where observations are nested
within higher-level units. In general, many of the principles and strategies we see with standard multilevel modeling with
continuous outcomes are applicable in the context of multilevel binary logistic regression. A key difference is the nature
of the dependent variable, where it is the predicted logit(Y=1). Additionally, there is no estimate of Level 1 residual
variance (as in the case of standard multilevel modeling).

The demonstrations in this presentation are based on the example and data by Sommet and Morselli (2017).

Citation:

Sommet, N. and Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified Three-Step
Procedure Using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30(1), 203–218, DOI: https://doi.
org/10.5334/irsp.90

[This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License
(CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author
and source are credited.]

The authors provided a copy of their data at (referenced in their article):


https://figshare.com/articles/_A_Procedure_for_Multilevel_Logistic_Modeling_Appendix_Datasets_and_Syntax_Files/53
50786

To prevent having to search for it, I have created a modified file that can be downloaded here:
https://drive.google.com/open?id=1KtSNyBDwevvI-397LH20MKlBGi7lWswP
Sommet and Morselli (2018) cover several multilevel logistic regression models in their article using the following scenario:

You are studying predictors of whether students will report owning the Justin Beiber album, ‘Purpose’. The binary dependent
variable is dummy coded 0=does not own the album and 1=owns the album. The students (Level 1 units) are nested within
classrooms (Level 2 units). The Level 1 predictor is student GPA centered within classrooms (gpa_cmc). The Level 2 predictor
is a dummy-coded* (teacher_fan coded 0 = not a fan, 1=fan) indicator of whether each student’s classroom teacher is a fan
of Justin Beiber. All predictors in this demo are treated as ‘scale’.

Notes: The current approach departs depart somewhat from Sommet and Morselli’s (2018) presentation. In the syntax they
provided, they relied on multilevel multinomial logistic regression (which is technically not incorrect as a multinomial logistic
regression reduces to a binary logistic regression when you have a binary outcome). This presentation, however, will
demonstrate how to perform multilevel binary logistic regression using the drop-down menus in SPSS (see Heck et al., 2012).
Additionally, in their presentation the authors used an effect-coded predictor (‘teacher_fan_c; coded as -.5 = not a fan & .5 =
fan). My use of dummy coding* in this presentation necessarily will change the meaning of various coefficients & some
interpretations from the article. [Download the original article from: https://www.rips-irsp.com/articles/10.5334/irsp.90/]
Model 1: Intercept-only model

𝑙𝑜𝑔𝑖𝑡 𝑖𝑗= 𝛽 0 𝑗 Level 1 equation

𝛽 0 𝑗 =𝛾 00 + 𝜇0 𝑗 Level 2 equation

𝑙𝑜𝑔𝑖𝑡 𝑖=𝛾 00 +𝜇 0 𝑗 Combined equation

This is an unconditional model where we are modeling between-classroom variation in logits. The in the Level 1
equation represents a classroom intercept. In this intercept-only model, that estimate represents an unconditional
classroom mean of student logits. The Level 2 equation contains two components. The is the (only) fixed effect in the
model. It is the grand mean of the classroom means. The is the difference between a classroom j’s intercept and the
grand mean.

There are two parameters estimated in this model: (grand mean of classroom means) and (variance of classroom
intercepts).

Results from this model can be used to determine whether there is significant non-independence within groups on the
outcome variable.
Model 1: Intercept-only model
On this screen, you can indicate the ‘Sorting order’ for
any categorical predictors (i.e., predictors being treated
as factors in SPSS) or the binary outcome variable. These
are drop-downs. Currently, with them set as ‘Ascending’
the reference category will be the variable with the
highest value on those variables. However, the default
‘Ascending’ (for the sorting order for categorical targets)
can be reset to ‘Descending’ (Heck et al., 2014, p. 145-
146) as we will do in this analysis. [See next slide.]

These options allow you to make changes to the degrees


of freedom method or adjust model estimates
depending on whether assumptions or met or not.

According to Heck et al. (2012), the Sattherwaite


approach to degrees of freedom can be used when there
are “few Level-2 units, unbalanced data, or more
complex covariance matrices of effects (e.g.,
unstructured)” (p. 147).
Here, I’ve reset the sorting order to ‘Descending’.

Technically, there was no need to request sorting


order for categorical predictors since there are none
included in this model. However, I often will re-set
this default when re-setting the other. By doing this,
the reference group for any factor variable included
in my analyses would be treated as the reference
category.
The Model summary table contains general model fit
information. Typically, when evaluating model fit based on
maximum likelihood estimation (MLE) one can make
comparisons among competing models via comparisons of the
model deviance (-2LL), Akaike’s information criterion (AIC),
and/or Bayesian Information Criterion (BIC). Whereas only the
deviance can be used when comparing nested model, the AIC
and BIC are used to compare non-nested models.

Importantly, Heck et al. (2012; citing Hox, 2010) state:


“Because multilevel estimation procedures with categorical
outcomes are approximate only…related procedures for
comparisons between successive models are more tenuous.”
This is due to the “quasilikelihood estimation and the rescaling
of the variance that takes place each time variables are added
to a model” (p. 27). This likely explains why Heck et al. (2012)
do not appear to spend time reviewing these indices in their
coverage of multilevel binary logistic regression.

Although I include screenshots of this output moving forward,


I will not spend time on this as well.
This table provides an indication of the degree to which the model
accurately predicts group membership on the binary outcome.
The expectation for the average classroom mean (logit) is =.095. [The t-test can be used to test whether this estimate
is significantly different from 0. However, this is seldom done.] The Exp(Coefficient) here is the expected odds(Y=1,
i.e., owns the Bieber album). This is computed by exponentiation of the intercept. . We also can compute the
expected probability of a student within the average classroom owning the Bieber album as:

𝑜𝑑𝑑𝑠 (𝑜𝑤𝑛𝑠 𝑎𝑙𝑏𝑢𝑚) 1.099


𝑃 ( 𝑜𝑤𝑛𝑠 𝑎𝑙𝑏𝑢𝑚 ) = = =.524
1+ 𝑜𝑑𝑑𝑠( 𝑜𝑤𝑛𝑠 𝑎𝑙𝑏𝑢𝑚) 1+1.099

In general, the unconditional probability of a student owning the Bieber album is 52.4%
Here, we have output pertaining to random effects for the model.

There is no estimate of a Level 1 residual variance [as


would be the case if we were performing the analysis
with a continuous outcome]. The variance estimate is
fixed to 1.

Here, we have the estimate of the Level 2


variance: =1.133. We see it is statistically
significant. [Technically, the p-value should be half
that which is printed; see Hox, 2010]
It is possible to compute an ICC to allow you to further assess for clustering effects. This can be done in the following
way:

.256

The ICC is quite large (well above .05; see Heck et al.’s (2014) discussion that .05 is often regarded as a conventional
threshold to indicate more substantial evidence of clustering).

Note: If you refer to the authors’ syntax, it seems there is a slight error in their instruction for computing the ICC. They
included the square of the variance of 1.133. However, the estimate provided in the SPSS output is the variance of the
Level 2 intercepts. (Had the estimate been the standard deviation, then it would need to be squared).
Model 2: Level 1 & Level 2 fixed predictors added

Level 1 equation

𝛽 0 𝑗 =𝛾 00 + 𝛾 0 1 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝜇 0 𝑗
Level 2 equations
𝛽 1 𝑗 =𝛾1 0
Combined equation

𝑙𝑜𝑔𝑖𝑡 𝑖=𝛾 00 +𝛾 01 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝛾 10 𝑔𝑝𝑎 𝑐𝑚𝑐𝑖 𝑗 +𝜇 0 𝑗

In this model, we are predicting the probability of student ownership of the Justin Bieber album as a function of
a student-level predictor (GPA, centered within classroom) and teacher-level (classroom) predictor (teacher fan;
coded 0=not a fan, 1=fan). As a result, there are 3 fixed effects (, , ) and 1 random effect () being estimated with
the model.
Note: In their example, Somett &
Morselli (2017) use the
‘teacher_fan_c’ variable, which is an
effect-coded variable (-.5 & .5). We are
using the original dummy coded
teacher_fan variable.
𝑙𝑜𝑔𝑖𝑡 𝑖𝑗=𝛾 00 +𝛾 01 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝛾 1 0 𝑔𝑝𝑎𝑐𝑚𝑐𝑖 𝑗 +𝜇 0 𝑗

The intercept is the (conditional) grand mean of the mean of the classroom logits. Recalling that GPA was
centered at the cluster/classroom means (transforming the original GPA variable to one with a mean of 0) and
‘teacher_fan’ was coded 0=not a fan and 1=fan, then we would interpret the intercept as the predicted logit(Y=1;
own album) for a student whose GPA falls at the classroom mean in those classrooms with teachers who were not
a fan of Justin Beieber.
Substituting 0’s for the two predictors, results
in the odds.
The Exp(Coefficient) associated with the intercept provides the estimated odds for a student whose GPA falls at
the classroom mean in those classrooms with teachers who were not a fan of Justin Beieber.

If we wish to convert this to a probability:

𝑜𝑑𝑑𝑠 (𝑜𝑤𝑛𝑠 𝑎𝑙𝑏𝑢𝑚) .546


𝑃 ( 𝑜𝑤𝑛𝑠 𝑎𝑙𝑏𝑢𝑚 ) = = =.353
1+ 𝑜𝑑𝑑𝑠( 𝑜𝑤𝑛𝑠 𝑎𝑙𝑏𝑢𝑚) 1+.546

The expectation is that a student in a class led by a teacher who is not a fan of Justin Beiber & who scores at
the class mean on GPA has a 35.3% chance of owning the album, ‘Purpose’.
The regression slope for ‘GPA_cmc’ is positive (=.672, s.e.=.0778, p<.001) and significant. The positive regression slope
indicates that students with higher GPA’s within their classrooms have a greater likelihood of owning ‘Purpose’ than those
with lower GPA’s in their classrooms.

[Formally speaking, the regression slope is interpreted as: ‘For every one unit increase on GPA within classrooms, there
was students’ logits(Y=1) were predicted to increase by .672’.]

--------------------------------------------------------------------------------------------------------------------------------------------------------
In general, when a regression slope is coefficient is positive, you can think of it generally as indicating that with increasing
scores on X, the probability(Y=1) is also increasing. If the slope is negative, you can think of it as indicating that with
increasing scores on X, the probability(Y=1) is decreasing. If the slope is 0, this indicates no relationship between the
predictor X and probability of Y=1.
As shown previously, we know the odds for a child scoring at the classroom mean on GPA in a classroom led by a teacher
who is not a fan: odds=.546. If we increment the ‘gpa_cwc’ by 1 for students in the same classroom, the predicted odds for
these students will be:
−.605 +1.729 ( 0 ) +.672 ( 1) +𝜇 0 𝑗 𝑙𝑜𝑔𝑖𝑡 .067
𝑂𝑑𝑑𝑠 ( 𝑌 =1 ) =𝑒 =𝑒 =𝑒 =1.069 .
𝑜𝑑𝑑𝑠 ( 𝑤h𝑒𝑛 𝑔𝑝𝑎 =1 )
If we form a ratio of these odds, we get: 𝑂𝑅=
𝑜𝑑𝑑𝑠 ¿ ¿
Fortunately, we don’t have to compute the odds ratio through this cumbersome route. It can easily be
computed by exponentiating the regression slope:

𝑂𝑅=𝑒 .672= 1.958

This Odds Ratio is the multiplicative change in odds per unit increase on the predictor, holding the remaining predictors
constant. So, we can interpret the odds ratio for ‘gpa_cmc’ as follows: For every one unit increase on student GPA, the
odds of a student owning the Bieber album changes by a factor of 1.958. [Since it is > 1, this means it the odds are
increasing]
The regression slope for ‘teacher_fan’ is also positive (=1.729, s.e.=.1818, p<.001) and significant. This indicates that
students in classrooms with a teacher who is a fan are more likely to own ‘Purpose’ than those in classrooms with a
teacher who is not a fan.

The odds ratio for this predictor indicates that the odds for students in classes led by a teacher who is a fan (coded 1) is 5.6
times that of the odds for students in classes led by a non-fan (coded 0).
Here, we have the estimate of the Level 2 variance: =.519. We see it is statistically significant. [Technically, the p-value
should be half that which is printed; see Hox, 2010]

.136
With the addition of the predictors, the ICC decreased to .136. The
proportionate reduction in the ICC from the unconditional model is:
(.256-.136)/.136*100% = 88%.
Model 3: Allowing Level 1 slope to randomly vary

Level 1 equation

𝛽 0 𝑗 =𝛾 00+ 𝛾 0 1 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝜇 0 𝑗


Level 2 equations
𝛽 1 𝑗 =𝛾1 0 +𝜇 1 𝑗
Combined equation

For this model, the additional parameter being estimated is the Level 2 slope variance:
Here, I’ve added the Level 1
predictor.

You have the same options for


the Leve 2 covariance matrix as
you do through the standard
mixed models route. I’m going to
leave this set for Variance
component (which will treat the
Level 2 variance estimates as
uncorrelated; a diagonal matrix).
Although some of the fixed effects numbers are at variance with the previous model, they largely tell the
same story. I will leave it to the reader to interpret.
Here, we have the estimate of the Level 2 intercept variance: =.522. We see it is statistically significant. The Level 2 slope
variance (=.589) is also significant.
Brief demo of what an
unstructured Level 2 covariance
matrix looks like.

UN(1,1) = variance estimate for intercept


UN(2,2) = variance estimate for slope
UN(2,1) = estimate of covariance between
intercepts and slopes
Model 4: Modeling cross-level interaction (with diagonal Level 2 covariance matrix)

Level 1 equation

𝛽 0 𝑗 =𝛾 00 + 𝛾 0 1 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝜇 0 𝑗
𝛽 1 𝑗 =𝛾1 0 +𝛾 11 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝜇 1 𝑗 Level 2 equations

Combined equation

()

Cross-level interaction term


When performing moderated regression analysis, the individual effect of each constituent predictor of the
interaction term is interpreted as the predicted effect on the dependent variable when the other is fixed to 0.
Since the ‘teacher_fan’ predictor is coded 0, then we interpret the slope for GPA (‘gpa_cmc’) as the predicted
effect on the probability of a student owning a Bieber album, but only in classrooms led by teachers who are not
fans. Let’s substitute 0 into the formula for the ‘teacher_fan’ predictor.

Slope for gpa in group 0 on ‘teacher_fan’


Since the ‘gpa_cmc’ predictor is centered at each group mean, slope for ‘teacher_fan’ as the predicted difference
between teachers who are not fans (coded 0) and who are fans (coded 1) of Justin Bieber.

Difference in expected logits between teachers who are and are not fans.
We see here that the interaction between the ‘teacher_fan’ and ‘gpa_cmc’ predictors was statistically significant
(=.956, s.e.=.221, p<.001), suggesting that whether or not a teacher is a fan was a moderator of the effect of
GPA on the probability of a student owning the Bieber album.

Regression slope for the effect of ‘gpa_cmc’ when ‘teacher_fan’ = 1


Interpretation of regression coefficients:

The simple effect of GPA on the likelihood of a student owning the Bieber album is positive, but not significant (=.22,
s.e.=.143, p=.126). This finding indicates that students with higher GPA’s were more likely to own a Bieber album
than those with lower GPA’s. However, this is conditional on students being in classrooms led by teachers who are
not fans.

The simple effect of ‘teacher_fan’ is positive and significant (=1.849 s.e.=.189, p<.001). This indicates that students in
classrooms with teachers who are Bieber fans (coded 1) are more likely to own a Bieber album than those in
classrooms with teachers who are not fans (coded 0). However, this observed effect is conditioned on a student
falling at their classroom mean on GPA.
Interpretation of regression coefficients:

The interaction effect was positive and statistically significant (=.956, s.e.=.221, p<.001). This indicates that the effect
of GPA on the probability of a student owning a Bieber album is conditional on whether their teacher is or is not a
Bieber fan.
Here, we have the estimate of the Level 2 intercept variance:
=.549. We see it is statistically significant. The Level 2 slope
variance (=.423) is also significant.

We see that by adding in the Level 2 predictor of the Level 1


slope (modeled as cross-level interaction), the slope variance
decreases by (.589-.423)*100% = 16.6%.
Here, I thought I would provide you with some basics for plotting out the interaction (using predicted logits). Step 1:
Re-run the previous analysis but also selecting ‘save’ predicted probability for categorical targets.
Step 2: Use the Transform  Compute function to calculate odds for each case. This is done by computing the
ratio of the probabilities saved in the SPSS file to 1-predicted probabilities.
Step 3: Use the Transform  Compute function to convert the odds to logits. This is simply the natural log of
the odds.
References

Heck, R. H., Thomas, S. L., & Tabata, L. N. (2014). Multilevel modeling of categorical outcomes using IBM SPSS (2nd
edition). New York: Routledge.

Heck, R. H., Thomas, S. L., & Tabata, L. N. (2012). Multilevel and longitudinal modeling with IBM SPSS. New York:
Routledge.

Hox (2010). Multilevel analysis: Techniques and applications (2nd edition). New York: Routledge.

Sommet, N. and Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified Three-Step
Procedure Using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30(1), 203–218, DOI:
https://doi. org/10.5334/irsp.90

You might also like