Multilevel Binary Logistic Regression 3ab
Multilevel Binary Logistic Regression 3ab
ln ( 𝑜𝑑𝑑𝑠=𝑌=1 )= 𝑏0 +∑ 𝑏𝑘 𝑋 𝑘 =𝑏0 +𝑏1 𝑋 1 +…𝑏𝑘 𝑋 𝑘 As you can see, logits are predicted using a linear model.
This model specifies the structural relations between the
independent variables and predicted logits.
𝑏0 +𝑏1 𝑋 1+ …𝑏𝑘 𝑋 𝑘
This formula represents the conversion of predicted logits into predicted
1 +𝑏0 +𝑏1 𝑋 1+… 𝑏𝑘 𝑋𝑘
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 ( 𝑌 =1 )=𝑒 probabilities
Multilevel binary logistic regression can be used when one is modeling binary outcomes where observations are nested
within higher-level units. In general, many of the principles and strategies we see with standard multilevel modeling with
continuous outcomes are applicable in the context of multilevel binary logistic regression. A key difference is the nature
of the dependent variable, where it is the predicted logit(Y=1). Additionally, there is no estimate of Level 1 residual
variance (as in the case of standard multilevel modeling).
The demonstrations in this presentation are based on the example and data by Sommet and Morselli (2017).
Citation:
Sommet, N. and Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified Three-Step
Procedure Using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30(1), 203–218, DOI: https://doi.
org/10.5334/irsp.90
[This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License
(CC-BY 4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author
and source are credited.]
To prevent having to search for it, I have created a modified file that can be downloaded here:
https://drive.google.com/open?id=1KtSNyBDwevvI-397LH20MKlBGi7lWswP
Sommet and Morselli (2018) cover several multilevel logistic regression models in their article using the following scenario:
You are studying predictors of whether students will report owning the Justin Beiber album, ‘Purpose’. The binary dependent
variable is dummy coded 0=does not own the album and 1=owns the album. The students (Level 1 units) are nested within
classrooms (Level 2 units). The Level 1 predictor is student GPA centered within classrooms (gpa_cmc). The Level 2 predictor
is a dummy-coded* (teacher_fan coded 0 = not a fan, 1=fan) indicator of whether each student’s classroom teacher is a fan
of Justin Beiber. All predictors in this demo are treated as ‘scale’.
Notes: The current approach departs depart somewhat from Sommet and Morselli’s (2018) presentation. In the syntax they
provided, they relied on multilevel multinomial logistic regression (which is technically not incorrect as a multinomial logistic
regression reduces to a binary logistic regression when you have a binary outcome). This presentation, however, will
demonstrate how to perform multilevel binary logistic regression using the drop-down menus in SPSS (see Heck et al., 2012).
Additionally, in their presentation the authors used an effect-coded predictor (‘teacher_fan_c; coded as -.5 = not a fan & .5 =
fan). My use of dummy coding* in this presentation necessarily will change the meaning of various coefficients & some
interpretations from the article. [Download the original article from: https://www.rips-irsp.com/articles/10.5334/irsp.90/]
Model 1: Intercept-only model
𝛽 0 𝑗 =𝛾 00 + 𝜇0 𝑗 Level 2 equation
This is an unconditional model where we are modeling between-classroom variation in logits. The in the Level 1
equation represents a classroom intercept. In this intercept-only model, that estimate represents an unconditional
classroom mean of student logits. The Level 2 equation contains two components. The is the (only) fixed effect in the
model. It is the grand mean of the classroom means. The is the difference between a classroom j’s intercept and the
grand mean.
There are two parameters estimated in this model: (grand mean of classroom means) and (variance of classroom
intercepts).
Results from this model can be used to determine whether there is significant non-independence within groups on the
outcome variable.
Model 1: Intercept-only model
On this screen, you can indicate the ‘Sorting order’ for
any categorical predictors (i.e., predictors being treated
as factors in SPSS) or the binary outcome variable. These
are drop-downs. Currently, with them set as ‘Ascending’
the reference category will be the variable with the
highest value on those variables. However, the default
‘Ascending’ (for the sorting order for categorical targets)
can be reset to ‘Descending’ (Heck et al., 2014, p. 145-
146) as we will do in this analysis. [See next slide.]
In general, the unconditional probability of a student owning the Bieber album is 52.4%
Here, we have output pertaining to random effects for the model.
.256
The ICC is quite large (well above .05; see Heck et al.’s (2014) discussion that .05 is often regarded as a conventional
threshold to indicate more substantial evidence of clustering).
Note: If you refer to the authors’ syntax, it seems there is a slight error in their instruction for computing the ICC. They
included the square of the variance of 1.133. However, the estimate provided in the SPSS output is the variance of the
Level 2 intercepts. (Had the estimate been the standard deviation, then it would need to be squared).
Model 2: Level 1 & Level 2 fixed predictors added
Level 1 equation
𝛽 0 𝑗 =𝛾 00 + 𝛾 0 1 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝜇 0 𝑗
Level 2 equations
𝛽 1 𝑗 =𝛾1 0
Combined equation
In this model, we are predicting the probability of student ownership of the Justin Bieber album as a function of
a student-level predictor (GPA, centered within classroom) and teacher-level (classroom) predictor (teacher fan;
coded 0=not a fan, 1=fan). As a result, there are 3 fixed effects (, , ) and 1 random effect () being estimated with
the model.
Note: In their example, Somett &
Morselli (2017) use the
‘teacher_fan_c’ variable, which is an
effect-coded variable (-.5 & .5). We are
using the original dummy coded
teacher_fan variable.
𝑙𝑜𝑔𝑖𝑡 𝑖𝑗=𝛾 00 +𝛾 01 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝛾 1 0 𝑔𝑝𝑎𝑐𝑚𝑐𝑖 𝑗 +𝜇 0 𝑗
The intercept is the (conditional) grand mean of the mean of the classroom logits. Recalling that GPA was
centered at the cluster/classroom means (transforming the original GPA variable to one with a mean of 0) and
‘teacher_fan’ was coded 0=not a fan and 1=fan, then we would interpret the intercept as the predicted logit(Y=1;
own album) for a student whose GPA falls at the classroom mean in those classrooms with teachers who were not
a fan of Justin Beieber.
Substituting 0’s for the two predictors, results
in the odds.
The Exp(Coefficient) associated with the intercept provides the estimated odds for a student whose GPA falls at
the classroom mean in those classrooms with teachers who were not a fan of Justin Beieber.
The expectation is that a student in a class led by a teacher who is not a fan of Justin Beiber & who scores at
the class mean on GPA has a 35.3% chance of owning the album, ‘Purpose’.
The regression slope for ‘GPA_cmc’ is positive (=.672, s.e.=.0778, p<.001) and significant. The positive regression slope
indicates that students with higher GPA’s within their classrooms have a greater likelihood of owning ‘Purpose’ than those
with lower GPA’s in their classrooms.
[Formally speaking, the regression slope is interpreted as: ‘For every one unit increase on GPA within classrooms, there
was students’ logits(Y=1) were predicted to increase by .672’.]
--------------------------------------------------------------------------------------------------------------------------------------------------------
In general, when a regression slope is coefficient is positive, you can think of it generally as indicating that with increasing
scores on X, the probability(Y=1) is also increasing. If the slope is negative, you can think of it as indicating that with
increasing scores on X, the probability(Y=1) is decreasing. If the slope is 0, this indicates no relationship between the
predictor X and probability of Y=1.
As shown previously, we know the odds for a child scoring at the classroom mean on GPA in a classroom led by a teacher
who is not a fan: odds=.546. If we increment the ‘gpa_cwc’ by 1 for students in the same classroom, the predicted odds for
these students will be:
−.605 +1.729 ( 0 ) +.672 ( 1) +𝜇 0 𝑗 𝑙𝑜𝑔𝑖𝑡 .067
𝑂𝑑𝑑𝑠 ( 𝑌 =1 ) =𝑒 =𝑒 =𝑒 =1.069 .
𝑜𝑑𝑑𝑠 ( 𝑤h𝑒𝑛 𝑔𝑝𝑎 =1 )
If we form a ratio of these odds, we get: 𝑂𝑅=
𝑜𝑑𝑑𝑠 ¿ ¿
Fortunately, we don’t have to compute the odds ratio through this cumbersome route. It can easily be
computed by exponentiating the regression slope:
This Odds Ratio is the multiplicative change in odds per unit increase on the predictor, holding the remaining predictors
constant. So, we can interpret the odds ratio for ‘gpa_cmc’ as follows: For every one unit increase on student GPA, the
odds of a student owning the Bieber album changes by a factor of 1.958. [Since it is > 1, this means it the odds are
increasing]
The regression slope for ‘teacher_fan’ is also positive (=1.729, s.e.=.1818, p<.001) and significant. This indicates that
students in classrooms with a teacher who is a fan are more likely to own ‘Purpose’ than those in classrooms with a
teacher who is not a fan.
The odds ratio for this predictor indicates that the odds for students in classes led by a teacher who is a fan (coded 1) is 5.6
times that of the odds for students in classes led by a non-fan (coded 0).
Here, we have the estimate of the Level 2 variance: =.519. We see it is statistically significant. [Technically, the p-value
should be half that which is printed; see Hox, 2010]
.136
With the addition of the predictors, the ICC decreased to .136. The
proportionate reduction in the ICC from the unconditional model is:
(.256-.136)/.136*100% = 88%.
Model 3: Allowing Level 1 slope to randomly vary
Level 1 equation
For this model, the additional parameter being estimated is the Level 2 slope variance:
Here, I’ve added the Level 1
predictor.
Level 1 equation
𝛽 0 𝑗 =𝛾 00 + 𝛾 0 1 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝜇 0 𝑗
𝛽 1 𝑗 =𝛾1 0 +𝛾 11 𝑡𝑒𝑎𝑐h𝑒𝑟 𝑓𝑎𝑛 𝑗 +𝜇 1 𝑗 Level 2 equations
Combined equation
()
Difference in expected logits between teachers who are and are not fans.
We see here that the interaction between the ‘teacher_fan’ and ‘gpa_cmc’ predictors was statistically significant
(=.956, s.e.=.221, p<.001), suggesting that whether or not a teacher is a fan was a moderator of the effect of
GPA on the probability of a student owning the Bieber album.
The simple effect of GPA on the likelihood of a student owning the Bieber album is positive, but not significant (=.22,
s.e.=.143, p=.126). This finding indicates that students with higher GPA’s were more likely to own a Bieber album
than those with lower GPA’s. However, this is conditional on students being in classrooms led by teachers who are
not fans.
The simple effect of ‘teacher_fan’ is positive and significant (=1.849 s.e.=.189, p<.001). This indicates that students in
classrooms with teachers who are Bieber fans (coded 1) are more likely to own a Bieber album than those in
classrooms with teachers who are not fans (coded 0). However, this observed effect is conditioned on a student
falling at their classroom mean on GPA.
Interpretation of regression coefficients:
The interaction effect was positive and statistically significant (=.956, s.e.=.221, p<.001). This indicates that the effect
of GPA on the probability of a student owning a Bieber album is conditional on whether their teacher is or is not a
Bieber fan.
Here, we have the estimate of the Level 2 intercept variance:
=.549. We see it is statistically significant. The Level 2 slope
variance (=.423) is also significant.
Heck, R. H., Thomas, S. L., & Tabata, L. N. (2014). Multilevel modeling of categorical outcomes using IBM SPSS (2nd
edition). New York: Routledge.
Heck, R. H., Thomas, S. L., & Tabata, L. N. (2012). Multilevel and longitudinal modeling with IBM SPSS. New York:
Routledge.
Hox (2010). Multilevel analysis: Techniques and applications (2nd edition). New York: Routledge.
Sommet, N. and Morselli, D. (2017). Keep Calm and Learn Multilevel Logistic Modeling: A Simplified Three-Step
Procedure Using Stata, R, Mplus, and SPSS. International Review of Social Psychology, 30(1), 203–218, DOI:
https://doi. org/10.5334/irsp.90