Julian Faraway 2024-10-14
See the introduction for an overview.
See a mostly Bayesian analysis analysis of the same data.
This example is discussed in more detail in the book Bayesian Regression Modeling with INLA
Packages used:
library(knitr)
library(here)
In Fitzmaurice and Laird, 1993, data on 537 children aged 7–10 in six Ohio cities are reported. The response is binary — does the child suffer from wheezing (indication of a pulmonary problem) where one indicates yes and zero no. This status is reported for each of four years at ages 7, 8, 9 and 10. There is also an indicator variable for whether the mother of the child is a smoker. Because we have four binary responses for each child, we expect these to be correlated and our model needs to reflect this.
Read in and examine the first two subjects worth of data:
ohio = read.csv(here("data","ohio.csv"),header = TRUE)
kable(ohio[1:8,])
resp | id | age | smoke |
---|---|---|---|
0 | 0 | -2 | 0 |
0 | 0 | -1 | 0 |
0 | 0 | 0 | 0 |
0 | 0 | 1 | 0 |
0 | 1 | -2 | 0 |
0 | 1 | -1 | 0 |
0 | 1 | 0 | 0 |
0 | 1 | 1 | 0 |
We sum the number of smoking and non-smoking mothers:
table(ohio$smoke)/4
0 1
350 187
We use this to produce the proportion of wheezing children classified by age and maternal smoking status:
xtabs(resp ~ smoke + age, ohio)/c(350,187)
age
smoke -2 -1 0 1
0 0.16000 0.14857 0.14286 0.10571
1 0.16578 0.20856 0.18717 0.13904
Age has been adjusted so that nine years old is zero. We see that wheezing appears to decline with age and that there may be more wheezing in children with mothers who smoke. But the effects are not clear and we need modeling to be sure about these conclusions.
A plausible model uses a logit link with a linear predictor of the form:
with
The random effect
Here is the model fit penalized quasi-likelihood using the lme4
package:
library(lme4)
modagh <- glmer(resp ~ age + smoke + (1|id),
family=binomial, data=ohio)
summary(modagh, correlation = FALSE)
Generalized linear mixed model fit by maximum likelihood (Laplace Approximation) ['glmerMod']
Family: binomial ( logit )
Formula: resp ~ age + smoke + (1 | id)
Data: ohio
AIC BIC logLik deviance df.resid
1597.9 1620.6 -794.9 1589.9 2144
Scaled residuals:
Min 1Q Median 3Q Max
-1.403 -0.180 -0.158 -0.132 2.518
Random effects:
Groups Name Variance Std.Dev.
id (Intercept) 5.49 2.34
Number of obs: 2148, groups: id, 537
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.374 0.275 -12.27 <2e-16
age -0.177 0.068 -2.60 0.0093
smoke 0.415 0.287 1.45 0.1485
We see that there is no significant effect due to maternal smoking.
Suppose you do not take into account the correlated response within the individuals and fit a GLM ignoring the ID random effect:
modglm <- glm(resp ~ age + smoke, family=binomial, data=ohio)
faraway::sumary(modglm)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.8837 0.0838 -22.5 <2e-16
age -0.1134 0.0541 -2.1 0.036
smoke 0.2721 0.1235 2.2 0.028
n = 2148 p = 3
Deviance = 1819.889 Null Deviance = 1829.089 (Difference = 9.199)
We see that the effect of maternal smoking is significant (but this would be the incorrect conclusion).
Cannot fit GLMMs (other than normal response).
Cannot fit GLMMs (other than normal response).
See the discussion for the single random effect example for some introduction.
library(glmmTMB)
We can fit the model with:
gtmod <- glmmTMB(resp ~ age + smoke + (1|id),
family=binomial, data=ohio)
summary(gtmod)
Family: binomial ( logit )
Formula: resp ~ age + smoke + (1 | id)
Data: ohio
AIC BIC logLik deviance df.resid
1597.9 1620.6 -794.9 1589.9 2144
Random effects:
Conditional model:
Groups Name Variance Std.Dev.
id (Intercept) 5.49 2.34
Number of obs: 2148, groups: id, 537
Conditional model:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -3.374 0.275 -12.27 <2e-16
age -0.177 0.068 -2.60 0.0093
smoke 0.415 0.287 1.45 0.1484
Same as lme4
.
- In both cases, we do not find evidence of an effect for maternal smoking.
xfun::session_info()
R version 4.4.1 (2024-06-14)
Platform: x86_64-apple-darwin20
Running under: macOS Sonoma 14.7
Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8
Package version:
base64enc_0.1.3 boot_1.3-31 bslib_0.8.0 cachem_1.1.0 cli_3.6.3
coda_0.19-4.1 compiler_4.4.1 cpp11_0.5.0 digest_0.6.37 emmeans_1.10.4
estimability_1.5.1 evaluate_0.24.0 faraway_1.0.8 fastmap_1.2.0 fontawesome_0.5.2
fs_1.6.4 glmmTMB_1.1.10.9000 glue_1.8.0 graphics_4.4.1 grDevices_4.4.1
grid_4.4.1 here_1.0.1 highr_0.11 htmltools_0.5.8.1 jquerylib_0.1.4
jsonlite_1.8.8 knitr_1.48 lattice_0.22-6 lifecycle_1.0.4 lme4_1.1-35.5
MASS_7.3-61 Matrix_1.7-0 memoise_2.0.1 methods_4.4.1 mgcv_1.9-1
mime_0.12 minqa_1.2.8 mvtnorm_1.2-6 nlme_3.1-166 nloptr_2.1.1
numDeriv_2016.8-1.1 parallel_4.4.1 R6_2.5.1 rappdirs_0.3.3 rbibutils_2.3
Rcpp_1.0.13 RcppEigen_0.3.4.0.2 Rdpack_2.6.1 reformulas_0.3.0 rlang_1.1.4
rmarkdown_2.28 rprojroot_2.0.4 rstudioapi_0.16.0 sass_0.4.9 splines_4.4.1
stats_4.4.1 svglite_2.1.3 systemfonts_1.1.0 tinytex_0.52 TMB_1.9.15
tools_4.4.1 utils_4.4.1 xfun_0.47 xtable_1.8-4 yaml_2.3.10