Summary
In classical statistics, the significance of comparisons (e.g., θ1− θ2) is calibrated using the Type 1 error rate, relying on the assumption that the true difference is zero, which makes no sense in many applications. We set up a more relevant framework in which a true comparison can be positive or negative, and, based on the data, you can state “θ1 > θ2 with confidence,” “θ2 > θ1 with confidence,” or “no claim with confidence.” We focus on the Type S (for sign) error, which occurs when you claim “θ1 > θ2 with confidence” when θ2> θ1 (or vice-versa). We compute the Type S error rates for classical and Bayesian confidence statements and find that classical Type S error rates can be extremely high (up to 50%). Bayesian confidence statements are conservative, in the sense that claims based on 95% posterior intervals have Type S error rates between 0 and 2.5%. For multiple comparison situations, the conclusions are similar.
Similar content being viewed by others
References
Berger, J. O., and Delampandy, M. (1987). Testing precise hypotheses (with discussion). Statistical Science, 2, 317–352.
Berger, J. O., and Sellke, T. (1987). Testing a point null hypothesis: the irreconcilability of P-values and evidence (with discussion). Journal of the American Statistical Association, 82, 112–139.
Carlin, B. P., and Louis, T. A. (1996). Bayes and Empirical Bayes Methods for Data Analysis. London: Chapman and Hall.
Casella, G., and Berger, R. L. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). Journal of the American Statistical Association, 82, 106–111.
Gelman, A. (1996). Discussion of “Hierarchical generalized linear models,” by Y. Lee and J. A. Neider. Journal of the Royal Statistical Society B.
Gelman, A., Carlin, J. B., Stern, H. S., and Rubin, D. B. (1995). Bayesian Data Analysis. London: Chapman and Hall.
Gelman, A., and Little, T. C. (1997). Poststratification into many categories using hierarchical logistic regression. Survey Methodology, 23, 127–135.
Harris, R. J. (1997). Reforming significance testing via three-valued logic. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 145–174. Mahwah, N.J.: Lawrence Erlbaum Associates.
Klockars, A.J., and Sax, G. (1986). Multiple Comparisons. Newbury Park: Sage.
Kirk, R. E. (1995). Experimental Design: Procedures for the Behavioral Sciences, third edition. Brooks/Cole.
Maghsoodloo, S., and Huang, C. L. (1995) Computing probability integrals of a bivariate normal distribution. Interstat. http://interstat.stat.vt.edu/
Meng, X. L. (1994). Posterior predictive p-values. Annals of Statistics, 22, 1142–1160.
Morris, C. (1983). Parametric empirical Bayes inference: theory and applications (with discussion). Journal of the American Statistical Association, 78, 47–65.
Pruzek, R. M. (1997). An introduction to Bayesian inference and its applications. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 287–318. Mahwah, N.J.: Lawrence Erlbaum Associates.
Rindskopf, D. M. (1997). Testing “small,” not null, hypotheses: classical and Bayesian approaches. In What if there were no Significance Tests?, ed. L. L. Harlow, S. A. Mulaik, and J. H. Steiger, 319–332. Mahwah, N.J.: Lawrence Erlbaum Associates.
Robins, J. M., van der Vaart, A., and Ventura, V. (1998). The asymptotic distribution of p-values in composite null models. Technical report.
Robinson, G. K. (1991). That BLUP is a good thing: the estimation of random effects (with discussion). Statistical Science, 6, 15–51.
Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151–1172.
Scheffe, H. (1959). The Analysis of Variance. New York: Wiley.
Tukey, J. W. (1960). Conclusions vs. decisions. Technometrics, 2, 423–433.
Author information
Authors and Affiliations
Additional information
We thank David H. Krantz, the editor, and two referees for helpful comments. This work was supported in part by the U.S. National Science Foundation grant SBR-9708424 and Young Investigator Award DMS-9796129. The second author is a research assistant for the Fund of Scientific Research — Flanders.
Rights and permissions
About this article
Cite this article
Gelman, A., Tuerlinckx, F. Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational Statistics 15, 373–390 (2000). https://doi.org/10.1007/s001800000040
Published:
Issue Date:
DOI: https://doi.org/10.1007/s001800000040