Journal of Clinical and Translational Research 2017; 3(S2): 1-4
Journal of Clinical and Translational Research
Journal homepage: http://www.jctres.com/en/home
EDITORIAL
The need for reporting negative results – a 90 year update
Brian D. Earp
Ethics advisory editor
Departments of Psychology and Philosophy, Yale University, New Haven, Connecticut, United States
brian.earp@gmail.com
In January of 1927, Dr. Richard D. Mudd of Detroit published a letter in the Journal of the American Medical Association, seeking to vindicate his grandfather, Dr. Samuel A. Mudd,
against charges of conspiring in a murder [1]. The victim was
U.S. President Abraham Lincoln; the murderer, actor John
Wilkes Booth (see Appendix). In this editorial, I, an erstwhile
actor, would like to vindicate my own grandfather, Dr. John
Rosslyn Earp, for a letter he published on the same day, just
one column over, in the very same issue of the journal [2]. But
I mean “vindicate” in its other sense—to prove correct—as we
shall see.
Figure 1. Photograph of John Rosslyn Earp, taken circa 1930
I never knew my grandfather. He died in 1941 at the age of
49, more than four decades before I was born. My father, his
son, hardly knew him either: he was only 7 when “Ros” passed
away from longstanding health problems, leaving him and his
siblings to the care of their mother. I had been told that
Grandpa Earp—no relation to Wyatt—was at one point the
Distributed under creative commons license 4.0
Director of Public Health for the State of New Mexico [3]. I
knew that he’d emigrated from somewhere in England around
the turn of the last century. That, and an impression I had from
an old photographic proof balanced atop a bookcase in my
childhood home, was about it (Figure 1).
In 2013, I took a break from my acting career to study the
history and philosophy of science at the University of Cambridge.1 My preoccupation at the time, which has not abated,
was the public and professional “crisis of confidence” affecting among other fields medicine and social psychology [4-6].
The term “crisis of confidence” refers to the “unprecedented
level of doubt” experienced by many contemporary scientists
about the reliability of reported findings in the literature [7].
Why all the doubt? There are several reasons. Anonymous
surveys of practicing scientists have shown widespread use of
“questionable research practices,” including “p-hacking,”
selective reporting of measures or outcomes, and HARKing
—hypothesizing after the results are known—all of which increase the likelihood of generating Type 1 errors [8-11].
Moreover, critiques have been raised about the reward structure of science which favors non-stop “productivity” and headline-grabbing conclusions over painstaking methodology
[12-15]. And a series of high-profile apparent failures to replicate major findings from prior studies has sent shockwaves
through the scientific community [16,17].
All of this has combined to create a sense of genuine worry:
how much of what we think we know do we actually know?
Controversially, at least one prominent meta-scientist, John
Ioannidis, has estimated that “most published research findings
are false” [18].
The hardest-hit field seems to be psychology (which to its
credit has also taken up the vanguard for reform) [19,20], with
1 After I arrived, I got a phone call from my father. “You know, Brian, now
that I think about it, I seem to remember that your grandpa used to be a student
at Cambridge, too, before he came to America.” Sure enough, an email sent to
a university archivist resulted in a record for John Rosslyn Earp: he had been
at St. Johns—the college right next door to where I was studying at Trinity—almost exactly a century before.
DOI: http://dx.doi.org/10.18053/jctres.03.2017S2.001
2
Earp | Journal of Clinical and Translational Research 2017; 3(S2): 1-4
biomedicine and related disciplines trailing not so far behind
[21-23]. Since I had studied the former subject as an undergraduate student, I was familiar with an eerily similar crisis in
that field from the 1970s, as a result of which leading practitioners sought to root out problems in the way they conducted,
evaluated, and published their empirical research [24]. One of
the biggest problems to get spotlight treatment was the failure
of most journals to publish “negative” results.
In a now-famous article published in 1975, Professor Anthony Greenwald, then of Ohio State University, discussed
what he called the “Consequences of prejudice against the null
hypothesis” [25]. As he wrote, the lack of a dependable
“home” for negative findings creates “a dysfunctional research-publication system.” Not only are there “relatively few
publications on problems for which the null hypothesis is (at
least to a reasonable approximation) true,” but, even among
those, “a high proportion will erroneously reject the null hypothesis.”
In short, Greenwald identified what is now termed “publication bias” in favor of “statistically significant” findings—a
bias that has featured prominently in contemporary discussions
about the potential causes of the so-called “replication crisis”
[26–28].
The idea is simple. If 20 labs, say, run essentially the same
experiment, and only one of them gets it to “work,” chances
are good that the apparent finding from this one “lucky” lab is
actually a statistical fluke. But since journals—and especially
high-impact journals—have had a historical tendency to publish only positive findings, it is this probably-a-fluke result that
will end up enshrined in the scientific record [29].
The “negative” results, by contrast, from the 19 other labs in
our dummy example—or perhaps the 19 previous versions of
the same study from the original lab, recast as “pilot” experiments when they didn’t pan out—won’t typically be written up
and submitted, much less published in a prominent journal.
Instead, they get “filed away” in the researcher’s bottom
drawer (the so-called “file drawer” problem), never to be seen
again [30,31].
The literature, then, gets skewed in the direction of impressive-looking errors, which, for obvious reasons, can’t be replicated later on. In a clinical context, this “skew” may have serious ethical implications for the protection of patient health
and well-being. As the editor-in-chief of this journal notes,
“selective publication [of] trials can skew the apparent
risk-benefit ratio of the drug towards the latter and generate an
unrealistic bias, thereby potentially slanting the accuracy of
evidence-based medicine” [32].
Needless to say, medical treatments need to be based on
accurate research. Basing them on something else is not only
unethical (because of the unjustified risk it poses to patients
and study participants); it is also an extraordinary waste of
resources [33]. Selectively publishing “positive” findings
makes these problems worse.
So what can be done? In the course of researching this issue,
I stumbled across a paper with a pertinent title that I thought
Distributed under creative commons license 4.0
might offer a solution: “The Need for Reporting Negative Results.” The source? Journal of the American Medical Association—volume 88, number 2. The year? 1927. The author? J. R.
Earp, my grandfather [2].
I had no idea he had ever written on the subject (to speak of
chills and spines is to get it right). What follows then is his
prophetic letter in full, with a few minor edits for ease of
reading:
To the Editor:—One of the things we practitioners
sometimes neglect is the reporting of failures. In THE
JOURNAL, Oct. 2, 1926, Dr. Richard L. Sutton, with
proper scientific reserve, reported the treatment of six
consecutive cases of warts with intramuscular injections
of sulpharsphenamine. As a result of this communication,
I venture to guess that not less than a hundred physicians,
perhaps several hundred, injected sulpharsphenamine
into patients with warts. Supposing that 99 per cent get
negative results, what happens? Each of them gives up
the method as a failure and does not say anything more
about it, and the treatment remains on record as an undisputed success. Possibly 1 per cent who meet with success will communicate with Dr. Sutton, so that by and by
he will have quite an impressive series of cases, comparable with the mercurochrome successes published in a
recent number of THE JOURNAL. …
To practice what I am preaching, let me now report
that on November 30, I injected 0.4 g of sulpharsphenamine [into] the left buttock of E. M. B., a girl, aged 18,
who was at that date complaining of the presence of
twenty-four warts distributed mostly over the hands and
arms. At the present date, there are twenty-eight warts,
and evidence of regressive changes in the original twenty-four has not been seen.
The problem is plain to see; the “need for reporting negative
results” is equally apparent [34]. But one-off letters to the editor by conscientious doctors like my grandfather will not suffice to address the root of the problem. What is needed is
top-down leadership from journals themselves: not only passively allowing for the submission of negative findings, but
actively welcoming them and even seeking them out. In fact, it
should be no harder to publish a high-quality study with “null”
results—including unsuccessful attempts at replication—than a
high-quality study that purports to show an effect.
There are some signs of progress. Articles with “replication”
in the title are now being published on a regular basis [35–42];
there is even a dedicated Journal of Articles in Support of the
Null Hypothesis (although it is not especially well-known).
But there is still a lot of room for improvement. In a recent
review of 1151 journals, researchers found that only 3% explicitly stated that they accepted replications; 63% did not state
as much but also did not discourage them; 33% discouraged
them implicitly by stressing novelty in solicited submissions;
and 1% actively frowned on replications by stating that they
did not publish them [43].
DOI: http://dx.doi.org/10.18053/jctres.03.2017S2.001
Earp | Journal of Clinical and Translational Research 2017; 3(S2): 1-4
Against this backdrop, where does the Journal of Clinical
and Translational Research (JCTR) stand? In the founding
editorial for this journal, the editor states that JCTR encourages the publication of negative results for two main reasons in
addition to counteracting the “skewing” problem already mentioned [32]:
(1) publication of negative data, especially when obtained in a technically sound study … provides cues
as to why a certain procedure or process did not work
and steers research efforts away from failure. In that
sense, something not working can be considered ‘part’
of the mechanism.
(2) negative results prevent colleagues from conducting redundant work, saving animals and valuable
resources in the process. An expedient trajectory to
the clinical setting, during which redundancy is minimized, is ultimately beneficial for everyone involved
in translational and clinical research as well as the
target group (i.e., patients).
It is with these points in mind that I am happy to introduce,
on behalf of my co-editors Emma Bruns and Michal Heger—
as well as the entire journal staff—this special issue dedicated
entirely to the publication of negative results. Though I never
had a chance to meet him, something tells me Grandpa would
be proud.
References
[1]
Mudd RD. Dr. Mudd and the death of Lincoln. JAMA.
1927;88:119.
[2] Earp JR. The need for reporting negative results. JAMA.
1927;88:119.
[3] Editor. News from the field. Am J Public Health. 1937;27:755–758.
[4] Baker M. Is there a reproducibility crisis? Nature. 2016;533:452–
454.
[5] Earp BD, Trafimow D. Replication, falsification, and the crisis of
confidence in social psychology. Front Psychol. 2015;6:1–11.
[6] Nosek BA, Errington TM. Making sense of replications. eLife.
2017;6:e23383.
[7] Pashler H, Wagenmakers E. Editors’ introduction to the special
section on replicability in psychological science: a crisis of
confidence? Perspect Psychol Sci. 2012;7:528–530.
[8] John LK, Loewenstein G, Prelec D. Measuring the prevalence of
questionable research practices with incentives for truth telling.
Psychol Sci. 2012;23:524–532.
[9] Kerr NL. HARKing: hypothesizing after the results are known.
Personal Soc Psychol Rev. 1998;2:196–217.
[10] Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. The
extent and consequences of p-hacking in science. PLOS Biol.
2015;13:e1002106.
[11] Trafimow D, Earp BD. Null hypothesis significance testing and
Type I error: the domain problem. New Ideas in Psychology.
2017;45:19-27.
Distributed under creative commons license 4.0
3
[12] Nosek BA, Spies JR, Motyl M. Scientific utopia II: restructuring
incentives and practices to promote truth over publishability.
Perspect Psychol Sci. 2012;7:615–631.
[13] Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD,
Sert NP du, Simonsohn U, Wagenmakers E, Ware JJ, Ioannidis JPA.
A manifesto for reproducible science. Nat Hum Behav. 2017;1:1–9.
[14] Everett JAC, Earp BD. A tragedy of the (academic) commons:
interpreting the replication crisis in psychology as a social dilemma
for early-career researchers. Front Psychol. 2015;6:1–4.
[15] Earp BD. The unbearable asymmetry of bullshit. Health Watch.
2016;Spring(101):4–5.
[16] Yong E. Replication studies: bad copy. Nat News.
2012;485:298–300.
[17] Earp BD. What did the OSC replication initiative reveal about the
crisis in psychology? BMC Psychol. 2016;4:1–19.
[18] Ioannidis JPA. Why most published research findings are false.
PLoS Med. 2005;2:e124.
[19] Chambers C. The changing face of psychology. The Guardian. 2014
Jan 24 https://www.theguardian.com/science/head- quarters/2014/
jan/24/the-changing-face-of-psychology
[20] LeBel EP, Vanpaemel W, McCarthy RJ, Earp BD, Elson M. A
unified framework to quantify the trustworthiness of empirical
research. PsyArXiv. 2017; https://osf.io/preprints/ psyarxiv/uwmr8
[21] Engber D. Cancer research is broken. Slate. 2016 Apr 19.
http://www.slate.com/articles/health_and_science/future_tense/201
6/04/biomedicine_facing_a_worse_replication_crisis_than_the_one
_plaguing_psychology.html
[22] Collins FS, Tabak LA. NIH plans to enhance reproducibility.
Nature. 2014;505:612–613.
[23] Lose G and Klarskov N. Why published research is untrustworthy.
Int Urogynecol J. 2017; in press.
[24] Elms AC. The crisis of confidence in social psychology. Am
Psychol. 1975;30:967–976.
[25] Greenwald AG. Consequences of prejudice against the null
hypothesis. Psychol Bull. 1975;82:1–20.
[26] Easterbrook PJ, Gopalan R, Berlin JA, Matthews DR. Publication
bias in clinical research. The Lancet. 1991;337:867–872.
[27] Francis G. Replication, statistical consistency, and publication bias.
J Math Psychol. 2013;57:153–69.
[28] Bakker M, van Dijk A, Wicherts JM. The rules of the game called
psychological science. Perspect Psychol Sci. 2012;7:543–554.
[29] Earp BD, Wilkinson D. The publication symmetry test: a simple
editorial heuristic to combat publication bias. J Clin Transl Res.
2017; 3: in press.
[30] Rosenthal R. The file drawer problem and tolerance for null results.
Psychol Bull. 1979;86:638–41.
[31] Pautasso M. Worsening file-drawer problem in the abstracts of
natural, medical and social science databases. Scientometrics.
2010;85:193–202.
[32] Heger M. Editor’s inaugural issue foreword: perspectives on
translational and clinical research. J Clin Transl Res. 2015;1: 1– 5.
[33] Glasziou P, Altman DG, Bossuyt P, Boutron I, Clarke M, Julious S,
Michie S, Moher D, Wager E. Reducing waste from incomplete or
unusable reports of biomedical research. The Lancet. 2014;383:
267–276.
[34] Earp BD, Everett JAC. How to fix psychology’s replication crisis.
The Chronicle of Higher Education. 2015 Oct 25. http://www.
chronicle.com/article/How-to-Fix- psychologys/233857
DOI: http://dx.doi.org/10.18053/jctres.03.2017S2.001
Earp | Journal of Clinical and Translational Research 2017; 3(S2): 1-4
4
[35] Boekel W, Wagenmakers EJ, Belay L, Verhagen J, Brown S,
Forstmann BU. A purely confirmatory replication study of
structural brain-behavior correlations. Cortex. 2015;66:115–133.
[36] Bostyn DH, Roets A. Trust, trolleys and social dilemmas: a
replication study. J Exp Psychol Gen. 2017;146:e1–7.
[37] Castro VM, Kong SW, Clements CC, Brady R, Kaimal AJ, Doyle
AE, Robinson EB, Churchill SE, Kohane IS, Perlis RH. Absence of
evidence for increase in risk for autism or attention-deficit
hyperactivity disorder following antidepressant exposure during
pregnancy: a replication study. Transl Psychiatry. 2016;6:e708.
[38] Earp BD, Everett JAC, Madva EN, Hamlin JK. Out, damned spot:
Can the “Macbeth Effect” be replicated? Basic Appl Soc Psychol.
2014;36:91–98.
[39] Radke S, de Bruijn ERA. Does oxytocin affect mind-reading? A
replication study. Psychoneuroendocrinology. 2015;60:75–81.
[40] Renes RA, van der Weiden A, Prikken M, Kahn RS, Aarts H, van
Haren NEM. Abnormalities in the experience of self-agency in
schizophrenia: a replication study. Schizophr Res. 2015;164:210–
213.
[41] Simeoni S, Hannah R, Daisuke S, Kawakami M, Gigli GL,
Rothwell JC. Effects of quadripulse stimulation on human motor
cortex excitability: a replication study. Brain Stimul. 2016;9:148–
150.
[42] Gil-Gómez de Liaño B, Stablum F, Umiltà C. Can concurrent
memory load reduce distraction? A replication study and beyond. J
Exp Psychol Gen. 2016;145:e1.
[43] Martin GN, Clarke RM. Are psychology journals anti-replication?
A snapshot of editorial practices. Front Psychol. 2017;8:1–6.
APPENDIX
Letter from Dr. Mudd. JAMA. 1927;88:119.
Distributed under creative commons license 4.0
DOI: http://dx.doi.org/10.18053/jctres.03.2017S2.001