Abstract
Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the problems associated with a small number of clusters, (2) review previous studies on multilevel models with a small number of clusters, (3) to provide an illustrative simulation to demonstrate how a simple model becomes adversely affected by small numbers of clusters, (4) to provide researchers with remedies if they encounter clustered data with a small number of clusters, and (5) to outline methodological topics that have yet to be addressed in the literature.
Similar content being viewed by others
Notes
On a more technical note, population values for standard errors cannot be directly set within a simulation design, so other values must be used to assess the variability of estimates in the population (of which the standard error is an estimate). Although there are different values that can be used for such a purpose, the prevailing technique in the reviewed studies was to use the variability of the parameter estimates across replications. Because the same technique was implemented across studies, it is reasonable to compare these values across studies.
It is important to note again that the level-2 variance component will be included in the calculation of ICC values. The level-1 variance component is calculated differently than with continuous outcomes and Goldstein, Browne, and Rasbash (2002) discuss four methods for its calculation. Most commonly \( \frac{\pi^2}{3} \) or about 3.29 is substituted for the level-1 variance, since this is the variance of the logistic distribution when the scale is set to 1 with a location of 0. Other methods include simulation and Taylor series expansions. The associated problems with misestimated ICC values are the same as presented with continuous outcomes.
A diffuse inverse gamma prior is common used when a researcher wants to utilize Bayesian methods but wants to limit the impact of the prior distribution on the posterior distribution. This is the default prior distribution for variances for more user-friendly Bayesian software programs such as Mplus.
If model comparison is undertaken and models differ with respect to fixed effects, then FML must be used. Otherwise, the deviance will not be calculated appropriately with REML.
References
References marked by an (*) indicate they were included in the review
* Austin, P.C. (2010). Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures. The International Journal of Biostatistics, 6, Article 16.
*Baldwin, S.A., & Fellingham, G.W. (2013). Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Psychological Methods, 18, 151–164.
Bell, B., Ene, M., Smiley, W., & Schoeneberger, J. (2013). A multilevel primer using SAS Proc Mixed, SAS Global Forum.
Bell, Schoeneberger, Smiley, Ene, and Leighton (2013). Doubly diminishing returns: an empirical investigation on the impact of sample size and predictor prevalence on point and interval estimates in two-level linear models. Paper presented at the Modern Modeling Methods Conference (M3). Storrs.
*Bell, B.A., Morgan, G.B., Schoeneberger, J.A., Kromrey, J.D., & Ferron, J.M. (2014). How low can you go? An investigation of the influence of sample size and model complexity on point and interval estimates in two-level linear models. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 10, 1–11.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152.
* Browne, W.J., & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1, 473–514.
Butar, F. B., & Lahiri, P. (2003). On measures of uncertainty of empirical Bayes small-area estimators. Journal of Statistical Planning and Inference, 112, 63–76.
*Clarke, P. (2008). When can group level clustering be ignored? Multilevel models versus single level models with sparse data. Journal of Epidemiology and Community Health, 62, 752–758.
*Cohen, J. (1998). Determining sample sizes for surveys with data analyzed by hierarchical linear models. Journal of Official Statistics, 14, 267–275.
Dedrick, R. F., Ferron, J. M., Hess, M. R., Hogarty, K. Y., Kromrey, J. D., & Lee, R. (2009). Multilevel modeling: a review of methodological issues and applications. Review of Educational Research, 79, 69–102.
*Ferron, J.M., Bell, B.A., Hess, M.R., Rendina-Gobioff, G., & Hibbard, S.T. (2009). Making treatment effect inferences from multiple-baseline data: the utility of multilevel modeling approaches. Behavior Research Methods, 41, 372–384.
Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2012). Applied longitudinal analysis. Hoboken: Wiley.
Gardiner, J. C., Luo, Z., & Roman, L. A. (2009). Fixed effects, random effects and GEE: what are the differences? Statistics in Medicine, 28, 221–239.
Gelman, A. (2002). Prior distribution. Encyclopedia of Environmetrics.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1, 515–534.
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton: CRC press.
Goldstein, H., Browne, W., & Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 1, 223–231.
González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D., & Santamaría, L. (2007). Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Computational Statistics and Data Analysis, 51, 2720–2733.
Halekoh, U., & Højsgaard, S. (2012). pbkrtest: parametric bootstrap and Kenward Roger based methods for mixed model comparison. URL http://cran.r-project.org/web/packages/pbkrtest/pbkrtest.pdf [accessed on 14 March 2014].
Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group randomized trials in education. Educational Evaluation and Policy Analysis, 29, 60–87.
Heo, M., & Leon, A. C. (2008). Statistical power and sample size requirements for three level hierarchical cluster randomized trials. Biometrics, 64, 1256–1262.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling. An overview and a meta-analysis. Sociological Methods & Research, 26, 329–367.
Hox, J. J. (1998). Multilevel modeling: when and why. In I. Balderjahn, R. Mathar, & M. Schader (Eds.), Classification, data analysis, and data highways (pp. 147–154). Berlin: Springer.
Hox, J. (2010). Multilevel analyses: techniques and applications (2nd ed.). Mahwah, NJ: Erlbaum.
Hox, J., van de Schoot, R., & Matthijsse, S. (2012). How few countries will do? Comparative survey analysis from a Bayesian perspective. Survey Research Methods, 6, 87–93.
Kenward, M. G., & Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983–997.
Kenward, M. G., & Roger, J. H. (2009). An improved approximation to the precision of fixed effects from restricted maximum likelihood. Computational Statistics and Data Analysis, 53, 2583–2595.
Kim, Y., Choi, Y. K., & Emery, S. (2013). Logistic regression with multiple random effects: a simulation study of estimation methods and statistical packages. The American Statistician, 67, 171–182.
*Konstantopoulos, S. (2010). Power analysis in two-level unbalanced designs. The Journal of Experimental Education, 78, 291–317.
Kowalchuk, R. K., Keselman, H. J., Algina, J., & Wolfinger, R. D. (2004). The analysis of repeated measurements with mixed-model adjusted F tests. Educational and Psychological Measurement, 64, 224–242.
*Kreft, I. G. G. (1996). Are multilevel techniques necessary? An overview, including simulation studies. Unpublished manuscript, California State University, Los Angeles.
*Maas, C., & Hox, J. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica.,58,127-137.
*Maas, C.J., & Hox, J.J. (2005). Sufficient sample sizes for multilevel modeling. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 1, 86–92.
*McNeish, D.M. (2014). Modeling sparsely clustered data: design-based, model based, and single-level methods. Psychological Methods. DOI: 10.1037/met0000024.
*Meuleman, B., & Billiet, J. (2009). A Monte Carlo sample size study: how many countries are needed for accurate multilevel SEM? Survey Research Methods, 3, 45–58.
*Moineddin, R., Matheson, F.I., & Glazier, R.H. (2007). A simulation study of sample size for multilevel logistic regression models. BMC Medical Research Methodology, 7, 34.
*Mok, M. (1995). Sample size requirements for 2-level designs in educational research. Multilevel Modelling Newsletter, 7, 11–15.
Molenberghs, G., & Verbeke, G. (2004). Meaningful statistical model formulations for repeated measures. Statistica Sinica, 14, 989–1020.
*Paccagnella, O. (2011). Sample size and accuracy of estimates in multilevel models. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 7, 111–120.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (2nd ed.). Thousand Oaks: Sage.
Satterthwaite, F. E. (1946). An approximate distribution of the estimates of variance components. Biometrics, 2, 110–114.
Savalei, V., & Kolenikov, S. (2008). Constrained versus unconstrained estimation in structural equation modeling. Psychological Methods, 13, 150–170.
*Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating statistical power and required sample size for organizational research using multilevel modeling. Organizational Research Methods, 12, 347–367.
Searle, S. R., Casella, G., & McCulloch, C. E. (2006). Variance components. Hoboken: Wiley.
*Snijders, T., & Bosker, R. (1993). Standard errors and sample sizes for two-level research. Journal of Educational Statistics, 18, 237–259.
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: an introduction to basic and advanced multilevel modeling (2nd ed.). London: Sage.
Spilke, J., Piepho, H. P., & Hu, X. (2005). A simulation study on tests of hypotheses and confidence intervals for fixed effects in mixed models for blocked experiments with missing data. Journal of Agricultural, Biological, and Environmental Statistics, 10, 374–389.
*Stegmueller, D. (2013). How many countries for multilevel modeling? A comparison of frequentist and Bayesian approaches. American Journal of Political Science, 57, 748–761.
Stram, D. O., & Lee, J. W. (1994). Variance components testing in the longitudinal mixed effects model. Biometrics, 50, 1171–1177.
Van der Leeden, R., Busing, F., & Meijer, E. (1997, April). Applications of bootstrap methods for two-level models. Paper presented at the Multilevel Conference. Amsterdam.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
McNeish, D.M., Stapleton, L.M. The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration. Educ Psychol Rev 28, 295–314 (2016). https://doi.org/10.1007/s10648-014-9287-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10648-014-9287-x