The interpretation of statistical power after the data have been gathered

Dziak, John Joseph; Dierker, Lisa C.; Abar, Beau

doi:10.1007/s12144-018-0018-1

The interpretation of statistical power after the data have been gathered

Published: 02 October 2018

Volume 39, pages 870–877, (2020)
Cite this article

Current Psychology Aims and scope Submit manuscript

John Joseph Dziak¹,
Lisa C. Dierker^1,2 &
Beau Abar³

5102 Accesses
88 Citations
16 Altmetric
1 Mention
Explore all metrics

Abstract

Post-hoc power estimates (power calculated for hypothesis tests after performing them) are sometimes requested by reviewers in an attempt to promote more rigorous designs. However, they should never be requested or reported because they have been shown to be logically invalid and practically misleading. We review the problems associated with post-hoc power, particularly the fact that the resulting calculated power is a monotone function of the p value and therefore contains no additional helpful information. We then discuss some situations that seem at first to call for post-hoc power analysis, such as attempts to decide on the practical implications of a null finding, or attempts to determine whether the sample size of a secondary data analysis is adequate for a proposed analysis, and consider possible approaches to achieving these goals. We make recommendations for practice in situations in which clear recommendations can be made, and point out other situations where further methodological research and discussion are required.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Effect Sizes and Power Analysis in HCI

Estimation and Hypothesis Testing

References

Amrhein, V., Korner-Nievergelt, F., & Roth, T. (2017). The earth is flat (p > 0.05): Significance thresholds and the crisis of unreplicable research. PeerJ, 5, e3544.
Article Google Scholar
Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size planning for more accurate statistical power: A method adjusting sample effect sizes for publication bias and uncertainty. Psychological Science, 28, 1547–1562.
Article Google Scholar
Baril, G. L., & Cannon, J. T. (1995). What is the probability that null hypothesis testing is meaningless? American Psychologist, 50, 1098–1099.
Article Google Scholar
Bierman, A. S., & Bubolz, T. (2003). Secondary analysis of large survey databases. In M. Mitchell & J. Lynn (Eds.), Symptom research: Methods and Opportunities (interactive textbook). Washington, DC: National Institutes of Health.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376.
Article Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Erlbaum.
Google Scholar
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304–1312.
Article Google Scholar
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
Article Google Scholar
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003.
Article Google Scholar
Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statistics in Medicine, 26, 3385–3397.
Article Google Scholar
Demidenko, E. (2016). The p-value you can't buy. The American Statistician, 70, 33–38.
Article Google Scholar
Detsky, A. S., & Sackett, D. L. (1985). When was a 'negative' clinical trial big enough? How many patients you needed depends on what you found. Archives of Internal Medicine, 145(4), 709–712.
Article Google Scholar
Durrleman, S., & Simon, R. (1990). Planning and monitoring of equivalence studies. Biometrics, 46, 329–336.
Article Google Scholar
Dziak, J. J., Lanza, S. T., & Tan, X. (2014). Effect size, statistical power and sample size requirements for the bootstrap likelihood ratio test in latent class analysis. Structural Equation Modeling, 21, 534–552.
Article Google Scholar
Eng, J. (2004). Sample size estimation: A glimpse beyond simple formulas. Radiology, 230, 606–612.
Article Google Scholar
Esarey, J. (2017, August 7). Lowering the threshold of statistical significance to p < 0.005 to encourage enriched theories of politics. [blog post] Retrieved from https://thepoliticalmethodologist.com/.
Fagley, N. S. (1985). Applied statistical power analysis and the interpretation of nonsignificant results. Journal of Counseling Psychology, 32, 391–396.
Article Google Scholar
Goldstein, A. (1964). Biostatistics: An introductory text. New York: MacMillan.
Google Scholar
Guo, Y., Logan, H. L., Glueck, D. H., & Muller, K. E. (2013). Selecting a sample size for studies with repeated measures. BMC Medical Research Methodology, 13, 100.
Article Google Scholar
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations in data analysis. The American Statistician, 55, 19–24.
Article Google Scholar
Jones, L. V., & Tukey, J. W. (2000). A sensible formulation of the significance test. Psychological Methods, 5, 411–414.
Article Google Scholar
Kirk, R. E. (2007). Effect magnitude: A different focus. Journal of Statistical Planning and Inference, 137, 1634–1646.
Article Google Scholar
Korn, E. L. (1990). Projecting power from a previous study: Maximum likelihood estimation. The American Statistician, 22, 290–292.
Google Scholar
Kraemer, H. C., & Thiemann, S. (1987). How many subjects?: Statistical power analysis in research. Newbury Park: SAGE.
Google Scholar
Kruschke, J. K. (2015). Doing Bayesian data analysis, a tutorial with R, JAGS, and Stan (2nd ed.). Waltham: Academic Press / Elsevier.
Google Scholar
Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362.
Article Google Scholar
Lenth, R. V. (2001). Some practical guidelines for effective sample-size determination (tech. Rep.). University of Iowa.
Lenth, R. V. (2007). Post-hoc power: Tables and commentary (tech. Rep.). University of Iowa: Department of Statistics and Actuarial Science.
Lindley, D. V. (1998). Decision analysis and bioequivalence trials. Statistical Science, 13, 136–141.
Article Google Scholar
Lipsey, M. W., Crosse, S., Punkle, J., Pollard, J., & Stohart, G. (1985). Evaluation: The state of the art and the sorry state of the science. New Directions for Program Evaluation, 27, 7–28.
Article Google Scholar
Longford, N. T. (2016). Comparing two treatments by decision theory. Pharmaceutical Statistics, 15, 387–395.
Article Google Scholar
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021.
Article Google Scholar
Norcross, J. C., Hogan, T. P., Koocher, G. P., & Maggio, L. A. (2017). Clinician's guide to evidence-based practices: Behavioral health and addictions (2nd ed.). New York: Oxford.
Book Google Scholar
Peterman, R. M. (1990). The importance of reporting statistical power: The forest decline and acidic deposition example. Ecology, 71, 2024–2027.
Article Google Scholar
Schulz, K. F., & Grimes, D. A. (2005). Sample size calculations in randomized trials: Mandatory and mystical. Lancet, 365, 1348–1353.
Article Google Scholar
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.
Article Google Scholar
Szucs, D., & Ioannidis, J. P. A. (2017a). When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience, 11, 390.
Article Google Scholar
Szucs, D., & Ioannidis, J. P. A. (2017b). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3), e2000797.
Article Google Scholar
Thomas, L. (1997). Retrospective power analysis. Conservation Biology, 11, 276–280.
Article Google Scholar
Vandenbroucke, J. P., von Elm, E., Altman, D. G., Mulrow, P. C. G. D., Pocock, S. J., Poole, C., et al. (2007). Strengthening the reporting of observational studies in epidemiology (STROBE): Explanation and elaboration. PLoS Medicine, 4(10), e297.
Article Google Scholar
Vickers, A. J., & Altman, D. G. (2001). Analysing controlled trials with baseline and follow up measurements. BMJ, 323, 1123–1124.
Article Google Scholar
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.
Article Google Scholar
Yuan, K.-H., & Maxwell, S. (2005). On the post hoc power in testing mean differences. Journal of Educational and Behavioral Statistics, 30, 141–167.
Article Google Scholar

Download references

Acknowledgements

This research was supported in part by grant awards P50 DA010075 and P50 DA039838 from the National Institute on Drug Abuse (National Institutes of Health, United States). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions as mentioned above. The corresponding author thanks Dr. Joseph L. Schafer for a helpful conversation on this topic, and also thanks Dr. J. Timothy Cannon for review and feedback. The authors thank Amanda Applegate for very helpful editorial review, and also thank the anonymous reviewers for substantially improving the paper. On behalf of all authors, the corresponding author states that there are no conflicts of interest.

Funding

This research was supported in part by grant awards P50 DA010075 and P50 DA039838 from the National Institute on Drug Abuse (National Institutes of Health, United States). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions as mentioned above.

Author information

Authors and Affiliations

The Methodology Center, 408 Health and Human Development, The Pennsylvania State University, University Park, PA, 16801, USA
John Joseph Dziak & Lisa C. Dierker
Department of Psychology, Wesleyan University, Middletown, CT, USA
Lisa C. Dierker
Department of Emergency Medicine, School of Medicine and Dentistry, University of Rochester Medical Center, Rochester, NY, USA
Beau Abar

Authors

John Joseph Dziak
View author publications
You can also search for this author in PubMed Google Scholar
Lisa C. Dierker
View author publications
You can also search for this author in PubMed Google Scholar
Beau Abar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to John Joseph Dziak.

Ethics declarations

Conflict of Interest

On behalf of all authors, the corresponding author states that there are no conflicts of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Informed Consent

Informed consent is not required on this study.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Dziak, J.J., Dierker, L.C. & Abar, B. The interpretation of statistical power after the data have been gathered. Curr Psychol 39, 870–877 (2020). https://doi.org/10.1007/s12144-018-0018-1

Download citation

Published: 02 October 2018
Issue Date: June 2020
DOI: https://doi.org/10.1007/s12144-018-0018-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

The interpretation of statistical power after the data have been gathered

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effect Sizes and Power Analysis in HCI

Estimation and Hypothesis Testing

Estimation and Hypothesis Testing

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Informed Consent

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

The interpretation of statistical power after the data have been gathered

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Effect Sizes and Power Analysis in HCI

Estimation and Hypothesis Testing

Estimation and Hypothesis Testing

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Informed Consent

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation