Things I Have Learned (So Far)
Things I Have Learned (So Far)
Things I Have Learned (So Far)
ABSTRACT: This is an account of what I have learned 20 friend's power was rather worse (.33), but of course
(so far) about the application of statistics to psychology he couldn't know that, and he ended up with nonsignif-
and the other sociobiomedical sciences. It includes the icant resultswith which he proceeded to demolish an
principles "less is more" (fewer variables, more highly important branch of psychoanalytic theory.
targeted issues, sharp rounding off), "simple is better"
(graphic representation, unit weighting for linear com- Less Is More
posites), and "some things you learn aren't so." I have One thing I learned over a long period of time that is so
learned to avoid the many misconceptions that surround is the validity of the general principle that less is more.
Fisherian null hypothesis testing. I have also learned the except of course for sample size (Cohen & Cohen, 1983,
importance of power analysis and the determination of pp. 169-171). I have encountered too many studies with
just how big (rather than how statistically significant) are prodigious numbers of dependent variables, or with what
the effects that we study. Finally, I have learned that there seemed to me far too many independent variables, or
is no royal road to statistical induction, that the informed (heaven help us) both.
judgment of the investigator is the crucial element in the In any given investigation that isn't explicitly ex-
interpretation of data, and that things take time. ploratory, we should be studying few independent vari-
ables and even fewer dependent variables, for a variety
of reasons.
What I have learned (so far) has come from working with If all of the dependent variables are to be related to
students and colleagues, from experience (sometimes bit- all of the independent variables by simple bivariate anal-
ter) with journal editors and review committees, and from yses or multiple regression, the number of hypothesis tests
the writings of, among others, Paul Meehl, David Bakan, that will be performed willy-nilly is at least the product
William Rozeboom, Robyn Dawes, Howard Wainer, of the sizes of the two sets. Using the .05 level for many
Robert Rosenthal, and more recently, Gerd Gigerenzer, tests escalates the experimentwise Type I error rateor
Michael Oakes, and Leland Wilkinson. Although they in plain English, greatly increases the chances of discov-
are not always explicitly referenced, many of you will be ering things that aren't so. If, for example, you study 6
able to detect their footprints in what follows. dependent and 10 independent variables and should find
that your harvest yields 6 asterisks, you know full well
Some Things You Learn Aren't So that if there were no real associations in any of the 60
One of the things I learned early on was that some things tests, the chance of getting one or more "significant" re-
you learn aren't so. In graduate school, right after World sults is quite high (something like 1 - .9560, which equals,
War II, I learned that for doctoral dissertations and most coincidentally, .95), and that you would expect three spu-
other purposes, when comparing groups, the proper sam- riously significant results on the average. You then must
ple size is 30 cases per group. The number 30 seems to ask yourself some embarrassing questions, such as, Well,
have arisen from the understanding that with fewer than which three are real?, or even. Is six significant signifi-
30 cases, you were dealing with "small" samples that re- cantly more than the chance-expected three? (It so hap-
quired specialized handling with "small-sample statistics" pens that it isn't.)
instead of the critical-ratio approach we had been taught. And of course, as you've probably discovered, you're
Some of us knew about these exotic small-sample statis- not likely to solve your multiple tests problem with the
ticsin fact, one of my fellow doctoral candidates un- Bonferroni maneuver. Dividing .05 by 60 sets a per-test
dertook a dissertation, the distinguishing feature of which significance criterion of .05/60 = 0.00083, and therefore
was a sample of only 20 cases per group, so that he could a critical two-sided t value of about 3.5. The effects you're
demonstrate his prowess with small-sample statistics. It dealing with may not be large enough to produce any
wasn't until some years later that I discovered (mind you, interesting /s that high, unless you're lucky.
not invented) power analysis, one of whose fruits was the Nor can you find salvation by doing six stepwise
revelation that for a two-independent-group-mean com- multiple regressions on the 10 independent variables. The
parison with n = 30 per group at the sanctified two-tailed amount of capitalization on chance that this entails is
.05 level, the probability that a medium-sized effect would more than I know how to compute, but certainly more
be labeled as significant by the most modern methods (a than would a simple harvest of asterisks for 60 regression
t test) was only .47. Thus, it was approximately a coin coefficients (Wilkinson, 1990, p. 481).
flip whether one would get a significant result, even In short, the results of this humongous study are a
though, in reality, the effect size was meaningful. My n = muddle. There is no solution to your problem. You