Abstract
Post-hoc power estimates (power calculated for hypothesis tests after performing them) are sometimes requested by reviewers in an attempt to promote more rigorous designs. However, they should never be requested or reported because they have been shown to be logically invalid and practically misleading. We review the problems associated with post-hoc power, particularly the fact that the resulting calculated power is a monotone function of the p value and therefore contains no additional helpful information. We then discuss some situations that seem at first to call for post-hoc power analysis, such as attempts to decide on the practical implications of a null finding, or attempts to determine whether the sample size of a secondary data analysis is adequate for a proposed analysis, and consider possible approaches to achieving these goals. We make recommendations for practice in situations in which clear recommendations can be made, and point out other situations where further methodological research and discussion are required.
Similar content being viewed by others
References
Amrhein, V., Korner-Nievergelt, F., & Roth, T. (2017). The earth is flat (p > 0.05): Significance thresholds and the crisis of unreplicable research. PeerJ, 5, e3544.
Anderson, S. F., Kelley, K., & Maxwell, S. E. (2017). Sample-size planning for more accurate statistical power: A method adjusting sample effect sizes for publication bias and uncertainty. Psychological Science, 28, 1547–1562.
Baril, G. L., & Cannon, J. T. (1995). What is the probability that null hypothesis testing is meaningless? American Psychologist, 50, 1098–1099.
Bierman, A. S., & Bubolz, T. (2003). Secondary analysis of large survey databases. In M. Mitchell & J. Lynn (Eds.), Symptom research: Methods and Opportunities (interactive textbook). Washington, DC: National Institutes of Health.
Button, K. S., Ioannidis, J. P. A., Mokrysz, C., Nosek, B. A., Flint, J., Robinson, E. S. J., & Munafò, M. R. (2013). Power failure: Why small sample size undermines the reliability of neuroscience. Nature Reviews Neuroscience, 14, 365–376.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale: Erlbaum.
Cohen, J. (1990). Things I have learned (so far). American Psychologist, 45, 1304–1312.
Cohen, J. (1992). A power primer. Psychological Bulletin, 112, 155–159.
Cohen, J. (1994). The earth is round (p < .05). American Psychologist, 49, 997–1003.
Demidenko, E. (2007). Sample size determination for logistic regression revisited. Statistics in Medicine, 26, 3385–3397.
Demidenko, E. (2016). The p-value you can't buy. The American Statistician, 70, 33–38.
Detsky, A. S., & Sackett, D. L. (1985). When was a 'negative' clinical trial big enough? How many patients you needed depends on what you found. Archives of Internal Medicine, 145(4), 709–712.
Durrleman, S., & Simon, R. (1990). Planning and monitoring of equivalence studies. Biometrics, 46, 329–336.
Dziak, J. J., Lanza, S. T., & Tan, X. (2014). Effect size, statistical power and sample size requirements for the bootstrap likelihood ratio test in latent class analysis. Structural Equation Modeling, 21, 534–552.
Eng, J. (2004). Sample size estimation: A glimpse beyond simple formulas. Radiology, 230, 606–612.
Esarey, J. (2017, August 7). Lowering the threshold of statistical significance to p < 0.005 to encourage enriched theories of politics. [blog post] Retrieved from https://thepoliticalmethodologist.com/.
Fagley, N. S. (1985). Applied statistical power analysis and the interpretation of nonsignificant results. Journal of Counseling Psychology, 32, 391–396.
Goldstein, A. (1964). Biostatistics: An introductory text. New York: MacMillan.
Guo, Y., Logan, H. L., Glueck, D. H., & Muller, K. E. (2013). Selecting a sample size for studies with repeated measures. BMC Medical Research Methodology, 13, 100.
Hoenig, J. M., & Heisey, D. M. (2001). The abuse of power: The pervasive fallacy of power calculations in data analysis. The American Statistician, 55, 19–24.
Jones, L. V., & Tukey, J. W. (2000). A sensible formulation of the significance test. Psychological Methods, 5, 411–414.
Kirk, R. E. (2007). Effect magnitude: A different focus. Journal of Statistical Planning and Inference, 137, 1634–1646.
Korn, E. L. (1990). Projecting power from a previous study: Maximum likelihood estimation. The American Statistician, 22, 290–292.
Kraemer, H. C., & Thiemann, S. (1987). How many subjects?: Statistical power analysis in research. Newbury Park: SAGE.
Kruschke, J. K. (2015). Doing Bayesian data analysis, a tutorial with R, JAGS, and Stan (2nd ed.). Waltham: Academic Press / Elsevier.
Lakens, D. (2017). Equivalence tests: A practical primer for t tests, correlations, and meta-analyses. Social Psychological and Personality Science, 8, 355–362.
Lenth, R. V. (2001). Some practical guidelines for effective sample-size determination (tech. Rep.). University of Iowa.
Lenth, R. V. (2007). Post-hoc power: Tables and commentary (tech. Rep.). University of Iowa: Department of Statistics and Actuarial Science.
Lindley, D. V. (1998). Decision analysis and bioequivalence trials. Statistical Science, 13, 136–141.
Lipsey, M. W., Crosse, S., Punkle, J., Pollard, J., & Stohart, G. (1985). Evaluation: The state of the art and the sorry state of the science. New Directions for Program Evaluation, 27, 7–28.
Longford, N. T. (2016). Comparing two treatments by decision theory. Pharmaceutical Statistics, 15, 387–395.
Munafò, M. R., Nosek, B. A., Bishop, D. V. M., Button, K. S., Chambers, C. D., du Sert, N. P., Simonsohn, U., Wagenmakers, E.-J., Ware, J. J., & Ioannidis, J. P. A. (2017). A manifesto for reproducible science. Nature Human Behaviour, 1, 0021.
Norcross, J. C., Hogan, T. P., Koocher, G. P., & Maggio, L. A. (2017). Clinician's guide to evidence-based practices: Behavioral health and addictions (2nd ed.). New York: Oxford.
Peterman, R. M. (1990). The importance of reporting statistical power: The forest decline and acidic deposition example. Ecology, 71, 2024–2027.
Schulz, K. F., & Grimes, D. A. (2005). Sample size calculations in randomized trials: Mandatory and mystical. Lancet, 365, 1348–1353.
Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22, 1359–1366.
Szucs, D., & Ioannidis, J. P. A. (2017a). When null hypothesis significance testing is unsuitable for research: A reassessment. Frontiers in Human Neuroscience, 11, 390.
Szucs, D., & Ioannidis, J. P. A. (2017b). Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLoS Biology, 15(3), e2000797.
Thomas, L. (1997). Retrospective power analysis. Conservation Biology, 11, 276–280.
Vandenbroucke, J. P., von Elm, E., Altman, D. G., Mulrow, P. C. G. D., Pocock, S. J., Poole, C., et al. (2007). Strengthening the reporting of observational studies in epidemiology (STROBE): Explanation and elaboration. PLoS Medicine, 4(10), e297.
Vickers, A. J., & Altman, D. G. (2001). Analysing controlled trials with baseline and follow up measurements. BMJ, 323, 1123–1124.
Wasserstein, R. L., & Lazar, N. A. (2016). The ASA's statement on p-values: Context, process, and purpose. The American Statistician, 70(2), 129–133.
Yuan, K.-H., & Maxwell, S. (2005). On the post hoc power in testing mean differences. Journal of Educational and Behavioral Statistics, 30, 141–167.
Acknowledgements
This research was supported in part by grant awards P50 DA010075 and P50 DA039838 from the National Institute on Drug Abuse (National Institutes of Health, United States). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions as mentioned above. The corresponding author thanks Dr. Joseph L. Schafer for a helpful conversation on this topic, and also thanks Dr. J. Timothy Cannon for review and feedback. The authors thank Amanda Applegate for very helpful editorial review, and also thank the anonymous reviewers for substantially improving the paper. On behalf of all authors, the corresponding author states that there are no conflicts of interest.
Funding
This research was supported in part by grant awards P50 DA010075 and P50 DA039838 from the National Institute on Drug Abuse (National Institutes of Health, United States). The content is solely the responsibility of the authors and does not necessarily represent the official views of the funding institutions as mentioned above.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of Interest
On behalf of all authors, the corresponding author states that there are no conflicts of interest.
Ethical Approval
This article does not contain any studies with human participants or animals performed by any of the authors.
Informed Consent
Informed consent is not required on this study.
Rights and permissions
About this article
Cite this article
Dziak, J.J., Dierker, L.C. & Abar, B. The interpretation of statistical power after the data have been gathered. Curr Psychol 39, 870–877 (2020). https://doi.org/10.1007/s12144-018-0018-1
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12144-018-0018-1