1 Introduction

Generalized linear models (GLMs) encompassing fixed and random effects are traditionally used to analyze repeated measurements, cluster correlated, and multilevel data (Diggle et al. 2002; Jiang 2007; McCullagh et al. 2008; Goldstein 2011). The use of random effects can be interpreted in various forms. They account for the potential covariance structure of the observations in the assumed model. They also represent the individual deviations from the overall intercept beside any further cluster-specific variables. Intrinsically, the variances of the random effects provide a measure of how much the anticipated heterogeneity is inherited by the clustered nature of the data. In this article, we focus on testing the nullity of all variance components in the mixed-effects GLM with responses not necessarily being of a continuous nature.

Inferring the nullity (i.e. boundary value) of variance components is challenging as the asymptotic chi-square distribution of the classical tests do not hold. Various studies considered this problem over more than thirty years ago including Self and Liang (1987), Stram and Lee (1994), Crainiceanu and Ruppert (2004), Zhang et al. (2016), El-Horbaty (2018), and El-Horbaty (2022b). Methods have varied between deriving the limiting distribution or the use of the simulation-based algorithms to generate the finite sample distribution of the likelihood ratio test. Other recent methods relied on the use of exact, asymptotic, or resampling-based properties of some well-constructed test statistics. See for example Samuh et al., (2012), Nobre et al. (2013), Drikvandi et al. (2013), El-Horbaty and Hanafy (2023), and El-Horbaty (2022a). Using a likelihood ratio test statistic, Lee and Braun (2012) proposed a permutation procedure based on uncorrelated exchangeable residuals that is applicable, usually for limited number of random effects, under models with continuous responses. In general, permutation tests have proven wide applicability provided the ease of calculation of the test statistic. Unfortunately, running the likelihood ratio test is frequently associated with fitting models with complicated covariance structure under the alternative hypothesis where the log-likelihood formulation and maximization may be questioned (Lee and Braun 2012). Since score tests avoid this problem, Sinha (2009) proposed the use of a parametric bootstrap score test that applies under GLMs. However, the analytic derivation of the test statistic therein is limited to models for longitudinal data as well as being restrictively assessed under models including random intercepts.

In this article, a first-order Taylor approximation is used to linearize the working GLM. Interestingly, the linearized null model resembles the form of a linear model with residual errors having, approximately, zero mean and diagonal covariance matrix (Lovison 2014). Consequently, the null hypothesis of absence of random effects (hence nullity of their variances) is tested using an analogue of the ANOVA test statistic. A Monte Carlo algorithm is proposed such that a simple permutation method is used to generate the null distribution of the test statistic.

Importantly, we strengthen the outcomes of our study by emphasizing the following four points. First, the test statistic can be derived under GLMs with independent responses although, for the conciseness of the presentation, we will focus on providing an illustrative examples under the logistic and Poisson models. Unlike the simulation studies in Sinha (2009) that are limited to models with random intercepts, the second emphasize is on providing empirical size and power results under models with more than one random effect. Beside the good power achieved using the proposed simple test procedures, the third emphasize is on applying the competing tests to a real dataset. Our test is capable of detecting the departure from nullity compared to bootstrap score test that failed to reject the zero valued variance component. Fourth, we briefly discuss the potential applicability the proposed test when the random effects with general (i.e. unstructured) unknown covariance matrix are not necessarily uncorrelated within or between clusters. Some empirical results are provided for the fourth point.

The rest of this article is organized as follows. Section 2 states the null and alternative hypotheses of interest. The proposed test is introduced in Sect. 3 where a Monte Carlo algorithm is provided to illustrate the test procedures. An example is on how to construct the test statistic under binary logistic and Poisson regression models is provided in Sect. 4. In Sect. 5, simulation experiments are conducted to examine the performance of the proposed test. An application of the proposed test to a real dataset is presented in Sect. 6. Section 7 briefly discusses the potential use of the new test to models with sophisticated covariance structures and Sect. 8 provides the conclusion and an orientation for further research.

2 Notations and hypothesis

Although we shall focus in our illustrative examples on logistic and Poisson regression models, here we consider the general notations of a GLM. Indeed, one can easily find that the linearization step and the derivation of the test statistic apply for any canonical link function that undergo the class of GLMs. As we develop with the hypothesis on the absence of random effects, the working model simplifies to the one assuming independent observations. In Sect. 4, we provide an illustration of how the test statistic could be derived under both the logistic and the Poisson regression models.

Assume that a data set consists of \(m\) clusters with \({n}_{i}\) units within the ith cluster, \(i=1,\dots ,m\). Let \({{\varvec{y}}}_{i}={({y}_{i1},\dots ,{y}_{i{n}_{i}})}^{T}\) be the vector of response variables from the \({i}\)th cluster and assume that

$${{\varvec{y}}}_{i}|{{\varvec{u}}}_{i}\sim {f}_{{{\varvec{Y}}}_{i}|{u}_{i}}({{\varvec{y}}}_{i}|{{\varvec{u}}}_{i})$$

where \({{\varvec{u}}}_{i}\) denotes a vector of \(k\) random effects that are associated with the \({i}{\rm{th}}\) cluster and \({f}_{{{\varvec{Y}}}_{i}|{{\varvec{u}}}_{i}}({{\varvec{y}}}_{i}|{{\varvec{u}}}_{i})\) belongs to the exponential family of distributions. Given the random effects in \({{\varvec{u}}}_{i}\), that are assumed to follow a continuous multivariate distribution, the elements in \({{\varvec{y}}}_{i}\) are conditionally independent. Furthermore, \({f}_{{\varvec{Y}},{\varvec{u}}}\left({\varvec{y}},{\varvec{u}}\right)={\prod }_{i=1}^{m}{f}_{{{\varvec{Y}}}_{i},{u}_{i}}\left({{\varvec{y}}}_{i},{{\varvec{u}}}_{i}\right)\) where \({\varvec{y}}={({{\varvec{y}}}_{1}^{T},\dots ,{{\varvec{y}}}_{m}^{T})}^{T}\), \({\varvec{u}}={({{\varvec{u}}}_{1}^{T},\dots ,{{\varvec{u}}}_{m}^{T})}^{T}\). Indeed, we can model \({{\varvec{y}}}_{i}\) as

$${{\varvec{y}}}_{i}={{\varvec{\mu}}}_{i}+{{\varvec{e}}}_{i}$$
(1)

where \({{\varvec{\mu}}}_{i}={({\mu }_{i1},\dots ,{\mu }_{i{n}_{i}})}^{T}\), \({{\varvec{\mu}}}_{i}=E[{{\varvec{y}}}_{i}|{{\varvec{u}}}_{i}]\) and \({{\varvec{e}}}_{i}\) is an \({n}_{i}\)-vector of independent random error terms. Put \({{\varvec{\eta}}}_{i}={({\eta }_{i1},\dots ,{\eta }_{i{n}_{i}})}^{T}\) such that \({{\varvec{\eta}}}_{i}=g({{\varvec{\mu}}}_{i})\). The link function \(g(.)\) relates the conditional mean response \({{\varvec{\mu}}}_{i}\) to a set of linear predictors that are contained in \({{\varvec{\eta}}}_{i}\) such that

$${{\varvec{\eta}}}_{i}={{\varvec{X}}}_{i}{\varvec{\beta}}+{{\varvec{Z}}}_{i}{{\varvec{u}}}_{i}$$

where \({{\varvec{X}}}_{i}\) denotes an \({n}_{i}\times p\) matrix of explanatory variables that are associated with the unknown vector of fixed effects \({\varvec{\beta}}\) and \({{\varvec{Z}}}_{i}\) denotes an \({n}_{i}\times k\) matrix of explanatory variables that are associated with the unknown vector of random effects. Further, we assume that \({{\varvec{u}}}_{i}\sim N(0,{{\varvec{G}}}_{i})\) where \({{\varvec{G}}}_{i}\) denotes a \(k\times k\) unknown symmetric covariance matrix. The proposed test accommodates various covariance structures for \({{\varvec{G}}}_{i}\), where \(k\ge 1\), which are frequently assumed in practice as will be shown in the sequel.

Our objective is to test the simultaneous presence of the components of \({{\varvec{u}}}_{i}\), which corresponds to examining the significance of the variables in \({{\varvec{Z}}}_{i}\). That is testing

$${H}_{0}:{{\varvec{G}}}_{i}=0,$$
$${H}_{1}:{{\varvec{G}}}_{i}>0.$$

Under the special case, when \(k=1\), the vector \({{\varvec{u}}}_{i}\) reduces to a single random effect, say \({u}_{i}\), where the hypothesis to be tested reduces to

$${H}_{0}:{\tau }^{2}=0,$$
(2)
$${H}_{1}:{\tau }^{2}>0,$$
(3)

where \(var\left({u}_{i}\right)={\tau }^{2}\). Based on (2) and (3), we would be comparing the following typical specifications about \({{\varvec{\eta}}}_{i}\):

$${H}_{0}:{{\varvec{\eta}}}_{i}={{\varvec{X}}}_{i}{\varvec{\beta}},$$
$${H}_{1}:{{\varvec{\eta}}}_{i}={{\varvec{X}}}_{i}{\varvec{\beta}}+{u}_{i}{1}_{i},$$

where \({1}_{i}\) denotes an \({n}_{i}\times 1\) vector of ones. Note that the formulated hypothesis in (2) against (3) is considered in Sinha (2009) where models with only one variance component are examined in the simulation studies therein to assess the performance of the bootstrap score test.

Calculating the log-likelihood function is doable using maximum likelihood (ML) estimation under random intercept models. Unfortunately, extending model (1) and the formulated hypothesis in (2) against (3) to the case where \(k>1\) is not assessed via simulation studies in Sinha (2009). Note also that as \(k\) increases, the derivation and fitting of the log-likelihood function becomes challenging where techniques such as PQL or Laplace shall be cautiously applied (Kim et al. 2013).

Remarkably, the null hypothesis about \({\tau }^{2}\) (or generally about \({{\varvec{G}}}_{i}\)) implies that the elements of \({{\varvec{y}}}_{i}\) in the working null model are unconditionally independent by setting \({{\varvec{u}}}_{i}=0\) for \(i=1,\dots ,m\) in model (1). Thus, the working model reduces to a GLM where the log-likelihood has a closed form expression and can be evaluated using the ML method. As the majority of variance components tests are devoted to linear mixed models, our focus is on situations where \(g({{\varvec{\mu}}}_{i})\) is not the identity link function where \({{\varvec{\mu}}}_{i}=E[{{\varvec{y}}}_{i}|{{\varvec{u}}}_{i}=0]\) is satisfied when \({\tau }^{2}=0\) (or \({{\varvec{G}}}_{i}=0\)). Our particular interest is spotted on the logit and log link functions and is emphasized via our simulation studies in Sect. 5, though the proposed technique naturally extends to other types of link functions.

3 Proposed permutation test

Permutation tests are those in which permutations of data are used to generate the null distribution of the test statistic. A variance components test will have a correct nominal size when the permutations of the cluster indices are made correctly. See for example Lee and Braun (2012) for an argument on the validity of using permutation tests for testing zero variance components by permutation of the cluster indices. The test procedure is given in the end of this section. To distinguish ourselves, we propose an analog of the ANOVA test statistic where the null hypothesis of interest is where all variance components are set equal to zero and hence all random effects are absent from the model.

Note that under the null hypothesis in (2), \({{\varvec{\eta}}}_{i}\) reduces to \({{\varvec{\eta}}}_{i}^{0}={({\eta }_{i1}^{0},\dots ,{\eta }_{i{n}_{i}}^{0})}^{T}\) where \({{\varvec{\eta}}}_{i}^{0}={{\varvec{X}}}_{i}{\varvec{\beta}}\). Besides, the mean response \({{\varvec{\mu}}}_{i}\) can be particularly expressed as \({{\varvec{\mu}}}_{i}^{0}=E[{{\varvec{y}}}_{i}]\) where \({{\varvec{\mu}}}_{i}^{0}={({\mu }_{i1}^{0},\dots ,{\mu }_{i{n}_{i}}^{0})}^{T}\). Thus, the elements in \({{\varvec{y}}}_{i}\) become unconditionally independent within the ith cluster. Consequently, model (1) reduces to a GLM with \(N\) independent observations where \(N={\sum }_{i=1}^{m}{n}_{i}\), and any permutation of the cluster indices, altogether over the \(N\) observations, becomes equally likely. Consequently, a permutation test that randomly permutes the clusters indices can be used to test the variance components, provided that a suitable test statistic exists for this purpose. Lee and Braun (2012) proposed a permutation test that provides a one-sided p-value for the LRT statistic \({T}_{LRT}=-2[{l}_{0}^{ML}-{l}_{1}^{ML}]\). Following the standard permutation procedures therein, we propose using the ANOVA test statistic based on the following simple model linearization.

Assume that the null hypothesis in (2) is true. Then, the conventional model setup for the response \({{\varvec{y}}}_{i}= {{\varvec{\mu}}}_{i}^{0}+{{\varvec{e}}}_{i}\) incorporates a variance function \(V\left({{\varvec{\mu}}}_{i}^{0}\right)=diag[V\left({\mu }_{ij}^{0}\right)]\) and a dispersion function \(a(\phi )\) so that the dispersion matrix of the components of \({{\varvec{y}}}_{i}\) is given by \(a(\phi )V\left({{\varvec{\mu}}}_{i}^{0}\right)\). Assuming that \(a(\phi )\) is known (e.g. under logistic and Poisson models) we can define the adjusted response variable as

$${{{z}}}_{i}={{\varvec{\eta}}}_{i}^{0}+{{\varvec{D}}}_{i}^{-1}({{\varvec{y}}}_{i}-{{\varvec{\mu}}}_{i}^{0})$$
(4)

where \({{{z}}}_{i}={({{{z}}}_{i1},\dots ,{{{z}}}_{i{n}_{i}})}^{T}\) and \({{\varvec{D}}}_{i}=diag(\partial {\mu }_{ij}^{0}/\partial {\eta }_{ij}^{0})\). The adjusted response in (4) has expectation \(E\left[{{{z}}}_{i}\right]={{\varvec{\eta}}}_{i}^{0}\) and dispersion matrix \(\mathcal{D}\left[{{{z}}}_{i}\right]=a(\phi ){{\varvec{W}}}_{i}^{-1}\), where

$${{\varvec{W}}}_{i}=diag\left\{{\left(\partial {\mu }_{ij}^{0}/\partial {\eta }_{ij}^{0}\right)}^{2}/V\left({\mu }_{ij}^{0}\right)\right\}.$$

Rescaling \({{{z}}}_{i}\) as \({{{z}}}_{i}^{*}={{\varvec{W}}}_{i}^{1/2 }{{{z}}}_{i}\) yields

$$E\left[{{{z}}}_{i}^{*}\right]={{\varvec{\eta}}}_{i}^{*}={{\varvec{W}}}_{i}^{1/2 }{{\varvec{X}}}_{i}{\varvec{\beta}}$$

and

$$\mathcal{D}\left[{{{z}}}_{i}^{*}\right]=a\left(\phi \right){{\varvec{I}}}_{{n}_{i}}.$$

Define the vector of discrepancy measures \({{\varvec{\zeta}}}_{i}={({\zeta }_{i1},\dots ,{\zeta }_{i{n}_{i}})}^{T}\) such that \({{\varvec{\zeta}}}_{i}={{{z}}}_{i}^{*}-{{\varvec{\eta}}}_{i}^{*}\), we have the expectation \(E\left[{{\varvec{\zeta}}}_{i}\right]=0\) and the dispersion matrix \(\mathcal{D}\left[{{\varvec{\zeta}}}_{i}\right]=a(\phi ){{\varvec{I}}}_{{n}_{i}}\). If \({{\varvec{\eta}}}_{i}^{*}\) is known, a reasonable measure for assessing the null hypothesis in (2) can be constructed as

$$T_{\varvec{\zeta }} = \mathop \sum \limits_{{i = 1}}^{m} n_{i} \left( {\bar{\zeta }_{{i.}} - \bar{\zeta }_{{..}} } \right)^{2} \left/\mathop \sum \limits_{{i = 1}}^{m} \mathop \sum \limits_{{j = 1}}^{{n_{i} }} \left( {\zeta _{{ij}} - \bar{\zeta }_{{i.}} } \right)^{2}\right.$$

where \({\overline{\zeta }}_{i.}={n}_{i}^{-1}{\sum }_{j=1}^{{n}_{i}}{\zeta }_{ij}\), and \({\overline{\zeta }}_{..}={m}^{-1}{\sum }_{i=1}^{m}{\overline{\zeta }}_{i.}\). The quantity \({T}_{{\varvec{\zeta}}}\) represents an analogue measure of the ANOVA-type test statistic that compares the between and within cluster variabilities for the sake of investigating whether data clustering is beneficial. Under the above settings, the asymptotic distribution of \({T}_{{\varvec{\zeta}}}\) cannot be easily derived except in special cases such as the availability of continuous responses with normally distributed error terms. In general, when \({{\varvec{\eta}}}_{i}^{0}\) is known or can be replace by an adequate estimate, the distribution of \({T}_{{\varvec{\zeta}}}\) could be obtained using a MC algorithm based on generating many permutation samples. See Samuh et al. (2012) for a similar distribution generating procedure.

It is obvious that the adjusted response \({{{z}}}_{i}\) is a theoretical quantity that is unobservable because \({{\varvec{\eta}}}_{i}^{0}\) and the subsequent parameters in (4) are unobservable. The unknown parameters can be replaced by their ML estimates. As the corresponding estimates will not depend on the presence of the random effects, the ML estimator of \({\varvec{\beta}}\), \({{\varvec{\eta}}}_{i}\), \({{\varvec{D}}}_{i}\), and \({{\varvec{W}}}_{i}\) can be obtained under the hypothesis in (2) using the sample data. The proposed test statistic is thus modified to be given by

$${T}_{AOV}=\left.\sum\limits_{i=1}^{m}{n}_{i}{\left({\overline{\widehat{\zeta }} }_{i.}-{\overline{\widehat{\zeta }} }_{..}\right)}^{2}\right/\sum\limits_{i=1}^{m}\sum\limits_{j=1}^{{n}_{i}}{\left({\widehat{\zeta }}_{ij}-{\overline{\widehat{\zeta }} }_{i.}\right)}^{2}$$
(5)

where \({\overline{\widehat{\zeta }} }_{i.}={n}_{i}^{-1}{\sum }_{j=1}^{{n}_{i}}{\widehat{\zeta }}_{ij}\), \({\overline{\widehat{\zeta }} }_{..}={m}^{-1}{\sum }_{i=1}^{m}{\overline{\widehat{\zeta }} }_{i.}\), \({\widehat{\zeta }}_{ij}={\widehat{{z}}}_{ij}^{*}-{\widehat{\eta }}_{ij}^{*}\). Note that \({\widehat{{z}}}_{ij}^{*}\) and \({\widehat{\eta }}_{ij}^{*}\) denote, respectively, the estimators of the \({j}{\rm{th}}\) component of the vectors \({{{z}}}_{i}^{*}\) and \({{\varvec{\eta}}}_{i}^{*}\). The proposed permutation test provides the p-value for the test statistic \({T}_{AOV}\) in (5) and has a correct Type-I error rate under the null hypothesis in (2) regardless of the choice of the link function \(\mathrm{g}(.)\) or the distribution of the random effects.

Recalling that under the null hypothesis, the cluster indices are random labels and all possible permutations are equally likely. Following the procedure in Lee and Braun (2012) the cluster indices are permuted randomly and the null distribution of \({T}_{AOV}\) is generated. To maintain the structure of the original dataset, we hold the number of units within cluster unchanged. The permutation p-values can be calculated using the following Monte Carlo algorithm:

  1. (1)

    Compute the test statistic in Eq. (5) using the original sample data \(({y}_{ij},{{\varvec{x}}}_{ij})\) for \(i=1,\dots m\) and denote the test statistic by \({T}_{AOV}^{obs}\).

  2. (2)

    Randomly permute the cluster indices holding fixed the number of observations within cluster. Then, recalculate the test statistic as in Eq. (5) for the new permutation sample.

  3. (3)

    Repeat the process in the previous step B times, giving B new values of the test statistic, denoted by\({T}^{(g)}\), \(g =1, . . . , B.\)

  4. (4)

    Compute the empirical p-value as the proportion of permutation samples with \({T}^{(g)}\) greater than or equal to \({T}_{AOV}^{obs}\).

  5. (5)

    Given the nominal level \(\alpha\), reject the null hypothesis if \(\alpha\) exceeds the empirical p-value.

Because the number of permutations can be excessively large as \(N={\sum }_{i=1}^{m}{n}_{i}\) increases, the above MC procedures can provide a good approximate as a test of significance when at least \(B\) = 1000 as recommended in Manly (2006).

4 Examples

The applicability of the proposed procedure under some commonly used models is provided in some details as shown next.

4.1 Logistic mixed-effects regression

Consider a dichotomous outcome variable \({y}_{ij}\) (for \(i=1,\dots m\); \(j=1,\dots ,{n}_{i}\)). Let the linear predictor \({\eta }_{ij}={{\varvec{x}}}_{ij}^{T}{\varvec{\beta}}+{{\varvec{z}}}_{ij}^{T}{{\varvec{u}}}_{i}\) where \({{\varvec{x}}}_{ij}^{T}\) and \({{\varvec{z}}}_{ij}^{T}\) denote, respectively, the \({j}{\rm{th}}\) rows of the matrices \({{\varvec{X}}}_{i}\) (for fixed effects contained in \({\varvec{\beta}}\)) and \({{\varvec{Z}}}_{i}\) (for the random effects contained in \({{\varvec{u}}}_{i}\)). Here, we define the needed components to formulate the test statistic under the particular case of a logistic regression model for cluster correlated data. Define \({{\varvec{\mu}}}_{i}=E\left[{{\varvec{y}}}_{i}|{{\varvec{u}}}_{i}\right]={{\varvec{\pi}}}_{i}\), where \({{\varvec{\pi}}}_{i}={({\pi }_{i1},\dots ,{\pi }_{i{n}_{i}})}^{T}\) and

$${\pi }_{ij}={e}^{{\eta }_{ij}} /[1+{e}^{{\eta }_{ij}}]$$

Under \({H}_{0}:{{\varvec{G}}}_{i}=0\), let \({\pi }_{ij}^{0}={\mu }_{ij}^{0}\). Then,

$${\pi }_{ij}^{0}={e}^{{\eta }_{ij}^{0}} /[1+{e}^{{\eta }_{ij}^{0}}]$$

where \({\eta }_{ij}^{0}={{\varvec{x}}}_{ij}^{T}{\varvec{\beta}}\). To obtain \({{{z}}}_{i}\) in (4), it remains to define \(\partial {\mu }_{ij}^{0}/\partial {\eta }_{ij}^{0}={\pi }_{ij}^{0}(1-{\pi }_{ij}^{0})\) while similarly \(V\left({\mu }_{ij}^{0}\right)=\partial {\mu }_{ij}^{0}/\partial {\eta }_{ij}^{0}\). Thus, \({{\varvec{D}}}_{i}=diag\left\{{\pi }_{ij}^{0}\left(1-{\pi }_{ij}^{0}\right)\right\}={{\varvec{W}}}_{i}\).

For example, to calculate of the proposed test statistic \({T}_{AOV}\) in (5) using logit link function, consider a binary mixed model with a single intercept as a random effect

$$y_{ij} |u_{i} \sim {\text{ independent}}\;{\text{Bernoulli}}\;\;\left( {\pi_{ij} } \right),\;\;\;i = 1, \ldots ,m;j = 1, \ldots ,n_{i}$$
(6)
$${\eta }_{ij}=\mathrm{log}\{{\pi }_{ij}/(1-{\pi }_{ij})\}={\beta }_{0}+{u}_{i}$$
(7)

Since the mean responses within a cluster share a common random effect, the observations within clusters are correlated. As noticed from (5), the construction of the proposed test statistic is worked out under the null hypothesis in (2) where the cluster’s observations are, unlike (6), unconditionally independent. This is equivalent to setting \({u}_{i}=0\) in (7). Calculating the proposed test statistic only requires the cluster indices that indicate the cluster membership to be recognized. Under the null hypothesis, we have \({\eta }_{ij}^{0}=g\left({\mu }_{ij}^{0}\right)=\mathrm{log}\{{\mu }_{ij}^{0}/(1-{\mu }_{ij}^{0})\}={\beta }_{0}\), \({\mu }_{ij}^{0}={\pi }_{ij}^{0}=\mathrm{exp}\left({\beta }_{0}\right)/[1+\mathrm{exp}\left({\beta }_{0}\right)]\), \(a\left(\phi \right)=1\), \(\partial {\mu }_{ij}^{0}/{\eta }_{ij}^{0}=V\left({\mu }_{ij}^{0}\right)={\pi }_{ij}^{0}(1-{\pi }_{ij}^{0})\). Thus, we could represent the \({j}{\rm{th}}\) adjusted response variable \({{{z}}}_{ij}\) in the \({i}{\rm{th}}\) cluster as

$${{{z}}}_{ij}={\beta }_{0}+\{({y}_{ij}-{\pi }_{ij}^{0})/[{\pi }_{ij}^{0}(1-{\pi }_{ij}^{0})]\}$$

Hence,

$${\zeta }_{ij}={{{z}}}_{ij}^{*}-{\eta }_{ij}^{0*}=\frac{{y}_{ij}-{\pi }_{ij}^{0}}{\sqrt{{\pi }_{ij}^{0}\left(1-{\pi }_{ij}^{0}\right)}}$$

Replacing the unknown parameters in \({\zeta }_{ij}\) by their ML estimators yields \({\widehat{\zeta }}_{ij}\). The calculation of \({T}_{AOV}\) under (2) is ready then.

4.2 Poisson mixed-effects regression

Here we implement the same steps as in Sect. 4.1 while letting the outcome variable be defined such that \({y}_{ij}|{{\varvec{u}}}_{i}\) follows, independently, Poisson distribution with mean \({\lambda }_{ij}\). Again, let the linear predictor be defined as \({\eta }_{ij}={{\varvec{x}}}_{ij}^{T}{\varvec{\beta}}+{{\varvec{z}}}_{ij}^{T}{{\varvec{u}}}_{i}\) such that \({\eta }_{ij}=\mathrm{log}({\lambda }_{ij})\). Define \({{\varvec{\mu}}}_{i}=E\left[{{\varvec{y}}}_{i}|{{\varvec{u}}}_{i}\right]={{\varvec{\lambda}}}_{i}\), where \({{\varvec{\lambda}}}_{i}={({\lambda }_{i1},\dots ,{\lambda }_{i{n}_{i}})}^{T}\) and

$${\lambda }_{ij}=\mathrm{exp}\left({\eta }_{ij}\right)=\mathrm{exp}\left({{\varvec{x}}}_{ij}^{T}{\varvec{\beta}}+{{\varvec{z}}}_{ij}^{T}{{\varvec{u}}}_{i}\right)$$

Under \({H}_{0}:{{\varvec{G}}}_{i}=0\), \({\lambda }_{ij}^{0}={\mu }_{ij}^{0}\). Then,

$${\lambda }_{ij}^{0}=\mathrm{exp}\left({\eta }_{ij}^{0}\right)$$

where \({\eta }_{ij}^{0}={{\varvec{x}}}_{ij}^{T}{\varvec{\beta}}\). Further, we have \(\partial {\mu }_{ij}^{0}/\partial {\eta }_{ij}^{0}=\mathrm{exp}\left({\eta }_{ij}^{0}\right)={\lambda }_{ij}^{0}\) and thus \(V\left({\mu }_{ij}^{0}\right)=\mathrm{exp}\left({\eta }_{ij}^{0}\right)\). Thus, \({{\varvec{D}}}_{i}=diag\left\{\mathrm{exp}\left({\eta }_{ij}^{0}\right)\right\}={{\varvec{W}}}_{i}\).

In the special case where \({\eta }_{ij}^{0}={\beta }_{0}\) with a single random intercept \({u}_{i}\), the test statistic \({T}_{AOV}\) is calculated based on the relation

$${{{z}}}_{ij}={\beta }_{0}+\{({y}_{ij}-{\lambda }_{ij}^{0})/{\lambda }_{ij}^{0}\}$$

with \(a\left(\phi \right)=1\) and the consequent formula for \({\zeta }_{ij}\) where

$${\zeta }_{ij}={{{z}}}_{ij}^{*}-{\eta }_{ij}^{0*}=\frac{{y}_{ij}-{\lambda }_{ij}^{0}}{\sqrt{{\lambda }_{ij}^{0}}}$$

with the unknown parameters in \({\zeta }_{ij}\) being replaced by their corresponding ML estimates to yield \({\widehat{\zeta }}_{ij}\).

5 Simulation study

In this section, we explore the empirical rejection rates of the proposed test under the null and alternative hypothesis of interest. We have considered two practical settings. First, we assess the performance under a model having one variance component. For this case, we have compared the empirical size and power of the test to the parametric bootstrap score test (Sinha 2009). The latter is referred as the Score-test hereafter. Second, comparisons of the performance of the two tests are made under logistic regression models with logit link function that is expressed as a function of multiple random effects. As a common example, a two-level model with random intercept and random slope is considered. Under all simulation settings, we emphasize that the proposed test procedures described in Sect. 3 are followed while permuting the cluster indices altogether (i.e. treating all \(N={\sum }_{i=1}^{m}{n}_{i}\) observations as one body). Then, a reconstruction of the clusters is made by mimicking the corresponding form in the original sampled observations.

5.1 Testing single variance component

5.1.1 Logistic regression

We explore the finite sample performance of the proposed test using cluster correlated data with one random effect. The simulated data are generated under the binary logistic regression settings. Let the data generating process be defined according the following model

$$logit({{\varvec{\mu}}}_{i}|{u}_{i})=({\beta }_{0}+{u}_{i}){1}_{{n}_{i}}$$

where \({u}_{i}\) denotes the random cluster effect in the \(i\mathrm{th}\) cluster where \(i=1,\dots ,m\). The values of \({\beta }_{0}\) and \({\tau }^{2}\) are provided in Table 1. An empirical evaluation of the proportion of rejections of the null hypothesis, over 1000 original samples per which 1000 permutation or bootstrap samples are generated, is conducted by setting \({\tau }^{2}=0\). The nominal level at which the comparisons are made is set to be \(\alpha\) = 0.05. The adopted settings in our simulation experiment are very close to the simulation settings in Sinha (2009). In Table 1 the size of both tests is assessed when \({\tau }^{2}=0\). We note that the empirical significance levels of the tests are very close to the nominal \(\alpha\) level. The standard errors of the reported estimated levels produce 95% confidence intervals for the true level that indicate a nonsignificant difference from 0.05.

Table 1 Empirical rejection rates for the binary mixed model (Logit Link)

Results on the power of the competing tests when the random effects \({u}_{i}\) are generated from the normal distribution with zero mean and variance \({\tau }^{2}\) are also reported in Table 1. Clearly, the proposed test performs better than the Score-test. Thus, results suggest that the proposed test is a good alternative to the Score-test not only via power comparisons but also due to its ease of the derivation of the test statistic without the need to derive the log-likelihood function under the alternative hypothesis.

Beside the above results using a logit link function, we also explore the performance of the proposed procedure in the presence of non-canonical link functions. Table 2 provides some insights on the outcomes from testing the null hypothesis of interest when the probit link is assumed for the binary response variable. It is seen that no much change in the conclusion about the size of the proposed test as well as about the power comparison with the competing score test. An exception is in noting that the empirical size of the bootstrap score test seems not to be close to the nominal level (5%).

Table 2 Empirical rejection rates for the binary mixed model (Probit Link)

5.1.2 Poisson regression

Here we explore the finite sample performance of the proposed test using cluster correlated data with one random effect for count data. The simulated data are generated under the Poisson regression settings. Let the data generating process be defined according the following model

$$ln({{\varvec{\mu}}}_{i}|{u}_{i})=({\beta }_{0}+{u}_{i}){1}_{{n}_{i}}$$

Using the parameters appearing in Table 3, the empirical evaluation of the proportion of rejections of the null hypothesis mimics the settings in Sect. 5.1.1. We maintain the random effects to be generated from the normal distribution with zero mean and variance \({\tau }^{2}\). Although both tests possess acceptable size under the null hypothesis, the Score-test remains less powerful than our ANOVA-type test, indicating the superiority of our test when the response variable takes the form of counts.

Table 3 Empirical rejection rates for the Poisson mixed model

5.2 Testing multiple variance components

Here, we consider the frequently used two-level random intercept and random slope model. The logit link function is assumed where we report the results of testing all random effects under each simulation experiment. Let the data generating process be defined according to the following model

$$logit\left({\mu }_{ij}|{{\varvec{u}}}_{i}\right)=({\beta }_{0}+{\beta }_{1}{x}_{1ij})+({u}_{0i}+{x}_{1ij}{u}_{1i})$$

where \({{\varvec{u}}}_{{\varvec{i}}}\sim MVN(0,{\varvec{G}})\), \({\varvec{G}}\) denotes a \(2\times 2\) covariance covariance matrix of \({{\varvec{u}}}_{{\varvec{i}}}={({u}_{0i},{u}_{1i})}^{T}\), and \({x}_{1ij}\) is generated from \(Unif(\mathrm{0,3})\) for \(j=1,\dots ,{n}_{i}\) and \(i=1,\dots ,m\). The fixed effects are taken as \({\beta }_{0}={\beta }_{1}=0.25\). The number of clusters, cluster size, and the covariance matrices of the vector of random effects are specified in Table 4. An empirical evaluation of the proportion of rejections under the null hypothesis is conducted by setting \({\varvec{G}}=0\), where 1000 original samples (per each we generate 1000 permutation and bootstrap samples) are generated. The nominal level is taken to be \(\alpha =\hspace{0.17em}\) 0.05.

Table 4 Empirical rejection rates for the binary two-level model with random intercept and random slope

The results in Table 4 indicate that our proposed test maintains an acceptable size that is close to 0.05 when the null hypothesis is true. The Score-test maintains acceptable size too though possessing smaller power as compared to the proposed test under all settings.

To sum up this section, the simulation studies demonstrate that the proposed test is a valid test. It is shown over all the simulation settings that it has a correct Type-I error and remains powerful as well as reliable compared to the Score-test under models with single variance component. When the working logistic model involves multiple random effects, the proposed test shows consistent outcomes in terms of increasing power as the values of the variance components, the number of clusters, and the cluster sizes increase. Beside the computational feasibility, a major advantage of the proposed test is the speed in calculating the p-values in the simulations. For example, the run time to obtain a p value under a two-level model (50 clusters and 10 observations per cluster) took about 10 s for the proposed test while it takes about 35 s for the Score-test.

6 Data application

A dataset obtained from the Guidelines for Urinary Incontinence Discussion and Evaluation (GUIDE) study is used. The objective was to identify factors that were predictive of the responses to the question of whether individuals in that age group consider this accidental loss of urine a problem that interferes with their day to day activities or bothers them in other ways. The study is conducted on urinary incontinent men and women of age 76 or above. The data were collected from 137 patients in 38 medical practices, and the binary response \({y}_{ij}\) was defined as 1 if the jth patient from the \({i}\)th medical practice was “bothered” by the urinary incontinence, and 0 if not. The predictors are gender, age, dayacc (how many leaking accidents in an average week divided by 7), severe (takes values 1, 2, 3, and 4 depending on how sever is the status of the patient), and toilet (how many times to urinate during the day in the toilet).

Sinha (2004) analyzed this dataset using a binary logistic mixed model and Sinha (2009) applied the bootstrap score test to check whether the random effects variance component is zero. The mixed effect logistic model for the conditional mean response is given by

$$\mathit{logit}\left({\pi }_{ij}\right)={\beta }_{0}+{\beta }_{1}{female}_{ij}+{\beta }_{2}{age}_{ij}+{\beta }_{3}{dayacc}_{ij}+{\beta }_{4}{sever}_{ij}+ {\beta }_{5}{toilet}_{ij}+{u}_{i}$$
(8)

where \({u}_{i}\) is the \({i}{\rm{th}}\) cluster (medical practice) effect and is assumed to be independent normal with zero mean and variance component \({\tau }^{2}\). The estimates of the regression coefficients and the variance component using the ML method are reported in Sinha (2004). ML estimates under \({\tau }^{2}=0\) are reported in Sinha (2009).

We apply the Score-test, see Sect. 5 for details and abbreviations, using the ML method to test the significance of the variance component (i.e. testing \({\tau }^{2}=0\)). Besides, we apply the proposed test to compare the results. Interestingly, the p-values are as follows. From the Score-test, the p-value is 0.158 (using 1000 bootstrap samples). Thus, the test concluded no significant clustering effect on the response of the patients. The p-value from the proposed test is 0.028 based on 1000 permutation samples, suggesting that the random intercept should be maintained. Indeed, the ICC is estimated under a mixed effects logistic model to be 0.282 using the ML method. Using the penalized quasi-likelihood method, we estimate the ICC to be 0.233. Both values of the ICC provide an insight on the need to account for the clustering effect (i.e. fitting model (8)) in the data analysis as supported by the decision from our test.

7 Applicability under complex covariance structures

Here we briefly describe some situations where the derivation of the log-likelihood under the alternative hypothesis of positive definite matrix \({{\varvec{G}}}_{i}\) (i.e. \({{\varvec{G}}}_{i}>0\)) may be intractable and hence sophisticates the process of deriving the Score-test. Consider the following cases:

  1. (i)

    The unknown matrix \({{\varvec{G}}}_{i}\) is not constant across all clusters for \(i=1,\dots ,m\).

  2. (ii)

    The unknown matrix \({{\varvec{G}}}_{i}\) has unknown structure or its structure is being misspecified.

  3. (iii)

    The vectors of random effects \({{\varvec{u}}}_{i}\) are not independently distributed among clusters.

Although score tests can generally overcome the problem in case (i), it worths saying that this problem cannot be avoided when the likelihood ratio test or the Wald test are used. In most cases, such tests assume a constant value, \({{\varvec{G}}}_{i}={\varvec{G}}\), for all clusters. For case (ii) the situation is even sophesticated due to many technical issues such as the efficiency and consistency of the estimator of \({{\varvec{G}}}_{i}\) which have their consequences at least on the power of likelihood-based tests. Although score tests yet may avoid the estimation problem in the previous case too, if the problem under case (iii) holds, its consequences cannot be avoided by any of the three tests, or at least some theory is needed to enhance their practicality when clusters are not necessarily treated as independent. Note that, if the independence assumption is violated, the joint distribution of the response variable and the random effects is not the product of the joint distribution over the clusters anymore. This complicates the derivation of marginal likelihood function, the score function, and the Hessian matrix and can sophisticate the construction of the test statistic underlying each of the three likelihood based tests. We explore the performance of the bootstrap score test under case (iii). Table 5 provides some insights on the size and power under in the latter case. Let the data generating process be defined according the following model

$$logit\left({\mu }_{ij}|{u}_{i}\right)=1+{x}_{1ij}+{u}_{0i}+{z}_{1ij}{u}_{1i}+{z}_{2ij}{u}_{2i}$$

for \(i=1,\dots ,m\) and \(j=1,\dots ,{n}_{i}\), where \({x}_{1ij}\sim U(\mathrm{0,0.5})\), \({z}_{1ij}\sim U(\mathrm{0,1})\) and \({z}_{2ij}\sim U(\mathrm{1,2})\). The random effects are generated from the multivariate normal distribution wit mean zero and covariance matrix \({\varvec{G}}\) as specified in Table 5. Thus, three random effects are involved in the generating process of the binary response in this case. Assume beside the correlated clusters that the suggested working model is specified such that \(logit\left({\mu }_{ij}|{u}_{i}\right)=1+{x}_{1ij}+{u}_{0i}.\)

Table 5 Empirical rejection rates for a mixed-effects logistic model under random effects misspecification

The proposed test remains robust to such model misspecification form in terms of size and continues to gain higher power as we depart from the null hypothesis of no random effects. We also observe that the Score-test maintains acceptable size and interestingly, it gains power as we depart from the null hypothesis \({H}_{0}:{\varvec{G}}=0\). To sum up, by construction, the proposed test based on the ANOVA-like test statistic can work properly under cases (i–iii). Implementing the null hypothesis implies the absence of all random effects, which simplifies the assumptions underlying the model fitting problem (including the construction of the log-likelihood function) to the case of independent observations. Thus, one can easily use the proposed Monte Carlo algorithm holding no worries about any of those cases. The algorithm can be easily programmed using any of currently existing free software packages.

8 Conclusion

We demonstrate via simulation studies the ability to test one or more variance components under GLMs including logistic and Poisson regression models using a simple ANOVA-type test statistic. Using a linearized version of the response under the null model, the proposed test can be conducted where a unified permutation procedure is used to approximate its null distribution. The empirical Type-I error rates produced by the proposed test are very close to the target nominal level. Besides, the simulation results demonstrate that the proposed test possesses empirical powers that are higher than those produced using the bootstrap score test. The proposed test may be recommended as a natural choice for testing variance components when we relax the assumptions about the model covariance structure on which the score test, Wald test and likelihood ratio test are based. The proposed Monte Carlo algorithm can naturally extend to other link functions although the size/power performances have not been assessed before. Unfortunately, testing only a subset of the variance components is not a straightforward task via the same linearization method, due to the correlation structure of the response variables under the working null model. This may be considered as an interesting point for future research.