The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration

McNeish, Daniel M.; Stapleton, Laura M.

doi:10.1007/s10648-014-9287-x

The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration

Review Article
Published: 19 October 2014

Volume 28, pages 295–314, (2016)
Cite this article

Educational Psychology Review Aims and scope Submit manuscript

Daniel M. McNeish¹ &
Laura M. Stapleton¹

16k Accesses
458 Citations
10 Altmetric
Explore all metrics

Abstract

Multilevel models are an increasingly popular method to analyze data that originate from a clustered or hierarchical structure. To effectively utilize multilevel models, one must have an adequately large number of clusters; otherwise, some model parameters will be estimated with bias. The goals for this paper are to (1) raise awareness of the problems associated with a small number of clusters, (2) review previous studies on multilevel models with a small number of clusters, (3) to provide an illustrative simulation to demonstrate how a simple model becomes adversely affected by small numbers of clusters, (4) to provide researchers with remedies if they encounter clustered data with a small number of clusters, and (5) to outline methodological topics that have yet to be addressed in the literature.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A Review of Multilevel Modeling: Some Methodological Issues and Advances

Level-specific residuals and diagnostic measures, plots, and tests for random effects selection in multilevel and mixed models

Article 01 March 2022

Effect size measures for multilevel models: definition, interpretation, and TIMSS example

Article Open access 23 July 2018

Notes

On a more technical note, population values for standard errors cannot be directly set within a simulation design, so other values must be used to assess the variability of estimates in the population (of which the standard error is an estimate). Although there are different values that can be used for such a purpose, the prevailing technique in the reviewed studies was to use the variability of the parameter estimates across replications. Because the same technique was implemented across studies, it is reasonable to compare these values across studies.
It is important to note again that the level-2 variance component will be included in the calculation of ICC values. The level-1 variance component is calculated differently than with continuous outcomes and Goldstein, Browne, and Rasbash (2002) discuss four methods for its calculation. Most commonly $ \frac{\pi^2}{3} $ or about 3.29 is substituted for the level-1 variance, since this is the variance of the logistic distribution when the scale is set to 1 with a location of 0. Other methods include simulation and Taylor series expansions. The associated problems with misestimated ICC values are the same as presented with continuous outcomes.
A diffuse inverse gamma prior is common used when a researcher wants to utilize Bayesian methods but wants to limit the impact of the prior distribution on the posterior distribution. This is the default prior distribution for variances for more user-friendly Bayesian software programs such as Mplus.
If model comparison is undertaken and models differ with respect to fixed effects, then FML must be used. Otherwise, the deviance will not be calculated appropriately with REML.

References

References marked by an (*) indicate they were included in the review

* Austin, P.C. (2010). Estimating multilevel logistic regression models when the number of clusters is low: a comparison of different statistical software procedures. The International Journal of Biostatistics, 6, Article 16.
*Baldwin, S.A., & Fellingham, G.W. (2013). Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Psychological Methods, 18, 151–164.
Bell, B., Ene, M., Smiley, W., & Schoeneberger, J. (2013). A multilevel primer using SAS Proc Mixed, SAS Global Forum.
Bell, Schoeneberger, Smiley, Ene, and Leighton (2013). Doubly diminishing returns: an empirical investigation on the impact of sample size and predictor prevalence on point and interval estimates in two-level linear models. Paper presented at the Modern Modeling Methods Conference (M3). Storrs.
*Bell, B.A., Morgan, G.B., Schoeneberger, J.A., Kromrey, J.D., & Ferron, J.M. (2014). How low can you go? An investigation of the influence of sample size and model complexity on point and interval estimates in two-level linear models. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 10, 1–11.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical Psychology, 31, 144–152.
Article Google Scholar
* Browne, W.J., & Draper, D. (2006). A comparison of Bayesian and likelihood-based methods for fitting multilevel models. Bayesian Analysis, 1, 473–514.
Butar, F. B., & Lahiri, P. (2003). On measures of uncertainty of empirical Bayes small-area estimators. Journal of Statistical Planning and Inference, 112, 63–76.
Article Google Scholar
*Clarke, P. (2008). When can group level clustering be ignored? Multilevel models versus single level models with sparse data. Journal of Epidemiology and Community Health, 62, 752–758.
*Cohen, J. (1998). Determining sample sizes for surveys with data analyzed by hierarchical linear models. Journal of Official Statistics, 14, 267–275.
Dedrick, R. F., Ferron, J. M., Hess, M. R., Hogarty, K. Y., Kromrey, J. D., & Lee, R. (2009). Multilevel modeling: a review of methodological issues and applications. Review of Educational Research, 79, 69–102.
Article Google Scholar
*Ferron, J.M., Bell, B.A., Hess, M.R., Rendina-Gobioff, G., & Hibbard, S.T. (2009). Making treatment effect inferences from multiple-baseline data: the utility of multilevel modeling approaches. Behavior Research Methods, 41, 372–384.
Fitzmaurice, G. M., Laird, N. M., & Ware, J. H. (2012). Applied longitudinal analysis. Hoboken: Wiley.
Google Scholar
Gardiner, J. C., Luo, Z., & Roman, L. A. (2009). Fixed effects, random effects and GEE: what are the differences? Statistics in Medicine, 28, 221–239.
Article Google Scholar
Gelman, A. (2002). Prior distribution. Encyclopedia of Environmetrics.
Gelman, A. (2006). Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper). Bayesian Analysis, 1, 515–534.
Article Google Scholar
Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., & Rubin, D. B. (2013). Bayesian data analysis (3rd ed.). Boca Raton: CRC press.
Google Scholar
Goldstein, H., Browne, W., & Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 1, 223–231.
Article Google Scholar
González-Manteiga, W., Lombardía, M. J., Molina, I., Morales, D., & Santamaría, L. (2007). Estimation of the mean squared error of predictors of small area linear parameters under a logistic mixed model. Computational Statistics and Data Analysis, 51, 2720–2733.
Article Google Scholar
Halekoh, U., & Højsgaard, S. (2012). pbkrtest: parametric bootstrap and Kenward Roger based methods for mixed model comparison. URL http://cran.r-project.org/web/packages/pbkrtest/pbkrtest.pdf [accessed on 14 March 2014].
Hedges, L. V., & Hedberg, E. C. (2007). Intraclass correlation values for planning group randomized trials in education. Educational Evaluation and Policy Analysis, 29, 60–87.
Article Google Scholar
Heo, M., & Leon, A. C. (2008). Statistical power and sample size requirements for three level hierarchical cluster randomized trials. Biometrics, 64, 1256–1262.
Article Google Scholar
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling. An overview and a meta-analysis. Sociological Methods & Research, 26, 329–367.
Article Google Scholar
Hox, J. J. (1998). Multilevel modeling: when and why. In I. Balderjahn, R. Mathar, & M. Schader (Eds.), Classification, data analysis, and data highways (pp. 147–154). Berlin: Springer.
Chapter Google Scholar
Hox, J. (2010). Multilevel analyses: techniques and applications (2nd ed.). Mahwah, NJ: Erlbaum.
Google Scholar
Hox, J., van de Schoot, R., & Matthijsse, S. (2012). How few countries will do? Comparative survey analysis from a Bayesian perspective. Survey Research Methods, 6, 87–93.
Google Scholar
Kenward, M. G., & Roger, J. H. (1997). Small sample inference for fixed effects from restricted maximum likelihood. Biometrics, 53, 983–997.
Article Google Scholar
Kenward, M. G., & Roger, J. H. (2009). An improved approximation to the precision of fixed effects from restricted maximum likelihood. Computational Statistics and Data Analysis, 53, 2583–2595.
Article Google Scholar
Kim, Y., Choi, Y. K., & Emery, S. (2013). Logistic regression with multiple random effects: a simulation study of estimation methods and statistical packages. The American Statistician, 67, 171–182.
Article Google Scholar
*Konstantopoulos, S. (2010). Power analysis in two-level unbalanced designs. The Journal of Experimental Education, 78, 291–317.
Kowalchuk, R. K., Keselman, H. J., Algina, J., & Wolfinger, R. D. (2004). The analysis of repeated measurements with mixed-model adjusted F tests. Educational and Psychological Measurement, 64, 224–242.
Article Google Scholar
*Kreft, I. G. G. (1996). Are multilevel techniques necessary? An overview, including simulation studies. Unpublished manuscript, California State University, Los Angeles.
*Maas, C., & Hox, J. (2004). Robustness issues in multilevel regression analysis. Statistica Neerlandica.,58,127-137.
*Maas, C.J., & Hox, J.J. (2005). Sufficient sample sizes for multilevel modeling. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 1, 86–92.
*McNeish, D.M. (2014). Modeling sparsely clustered data: design-based, model based, and single-level methods. Psychological Methods. DOI: 10.1037/met0000024.
*Meuleman, B., & Billiet, J. (2009). A Monte Carlo sample size study: how many countries are needed for accurate multilevel SEM? Survey Research Methods, 3, 45–58.
*Moineddin, R., Matheson, F.I., & Glazier, R.H. (2007). A simulation study of sample size for multilevel logistic regression models. BMC Medical Research Methodology, 7, 34.
*Mok, M. (1995). Sample size requirements for 2-level designs in educational research. Multilevel Modelling Newsletter, 7, 11–15.
Molenberghs, G., & Verbeke, G. (2004). Meaningful statistical model formulations for repeated measures. Statistica Sinica, 14, 989–1020.
Google Scholar
*Paccagnella, O. (2011). Sample size and accuracy of estimates in multilevel models. Methodology: European Journal of Research Methods for the Behavioral and Social Sciences, 7, 111–120.
Raudenbush, S. W., & Bryk, A. S. (2002). Hierarchical linear models: applications and data analysis methods (2nd ed.). Thousand Oaks: Sage.
Google Scholar
Satterthwaite, F. E. (1946). An approximate distribution of the estimates of variance components. Biometrics, 2, 110–114.
Article Google Scholar
Savalei, V., & Kolenikov, S. (2008). Constrained versus unconstrained estimation in structural equation modeling. Psychological Methods, 13, 150–170.
Article Google Scholar
*Scherbaum, C. A., & Ferreter, J. M. (2009). Estimating statistical power and required sample size for organizational research using multilevel modeling. Organizational Research Methods, 12, 347–367.
Searle, S. R., Casella, G., & McCulloch, C. E. (2006). Variance components. Hoboken: Wiley.
Google Scholar
*Snijders, T., & Bosker, R. (1993). Standard errors and sample sizes for two-level research. Journal of Educational Statistics, 18, 237–259.
Snijders, T. A. B., & Bosker, R. J. (2012). Multilevel analysis: an introduction to basic and advanced multilevel modeling (2nd ed.). London: Sage.
Google Scholar
Spilke, J., Piepho, H. P., & Hu, X. (2005). A simulation study on tests of hypotheses and confidence intervals for fixed effects in mixed models for blocked experiments with missing data. Journal of Agricultural, Biological, and Environmental Statistics, 10, 374–389.
Article Google Scholar
*Stegmueller, D. (2013). How many countries for multilevel modeling? A comparison of frequentist and Bayesian approaches. American Journal of Political Science, 57, 748–761.
Stram, D. O., & Lee, J. W. (1994). Variance components testing in the longitudinal mixed effects model. Biometrics, 50, 1171–1177.
Article Google Scholar
Van der Leeden, R., Busing, F., & Meijer, E. (1997, April). Applications of bootstrap methods for two-level models. Paper presented at the Multilevel Conference. Amsterdam.

Download references

Author information

Authors and Affiliations

Measurement, Statistics, and Evaluation Program, Department of Human Development and Quantitative Methodology, University of Maryland, 1230 Benjamin Building, College Park, MD, 20742-1115, USA
Daniel M. McNeish & Laura M. Stapleton

Authors

Daniel M. McNeish
View author publications
You can also search for this author in PubMed Google Scholar
Laura M. Stapleton
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel M. McNeish.

Rights and permissions

Reprints and permissions

About this article

Cite this article

McNeish, D.M., Stapleton, L.M. The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration. Educ Psychol Rev 28, 295–314 (2016). https://doi.org/10.1007/s10648-014-9287-x

Download citation

Published: 19 October 2014
Issue Date: June 2016
DOI: https://doi.org/10.1007/s10648-014-9287-x

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Multilevel Modeling: Some Methodological Issues and Advances

Level-specific residuals and diagnostic measures, plots, and tests for random effects selection in multilevel and mixed models

Effect size measures for multilevel models: definition, interpretation, and TIMSS example

Notes

References

References marked by an (*) indicate they were included in the review

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

The Effect of Small Sample Size on Two-Level Model Estimates: A Review and Illustration

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

A Review of Multilevel Modeling: Some Methodological Issues and Advances

Level-specific residuals and diagnostic measures, plots, and tests for random effects selection in multilevel and mixed models

Effect size measures for multilevel models: definition, interpretation, and TIMSS example

Notes

References

References marked by an (*) indicate they were included in the review

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation