Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

A Short (Personal) Future History of Revolution 2.

0 Spellman

Abstract

Crisis of replicability is one term that psychological scientists use for the current introspective phase we
are in—I argue instead that we are going through a revolution analogous to a political revolution.
Revolution 2.0 is an uprising focused on how we should be doing science now (i.e., in a 2.0 world). The
precipitating events of the revolution have already been well-documented: failures to replicate,
questionable research practices, fraud, etc. And the fact that none of these events is new to our field
has also been well-documented. I suggest four interconnected reasons as to why this time is different:
changing technology, changing demographics of researchers, limited resources, and misaligned
incentives. I then describe two reasons why the revolution is more likely to catch on this time:
technology (as part of the solution) and the fact that these concerns cut across social and life sciences—
that is, we are not alone. Neither side in the revolution has behaved well, and each has characterized
the other in extreme terms (although, of course, each has had a few extreme actors). Some suggested
reforms are already taking hold (e.g., journals asking for more transparency in methods and analysis
decisions; journals publishing replications) but the feared tyrannical requirements have, of course, not
taken root (e.g., few journals require open data; there is no ban on exploratory analyses). Still, we have
not yet made needed advances in the ways in which we accumulate, connect, and extract conclusions
from our aggregated research. However, we are now ready to move forward by adopting incremental
changes and by acknowledging the multiplicity of goals within psychological science.

Keywords methodology, replication, scientific practices, journal practices

It has been an interesting time to be Editor of Perspectives on Psychological Science. My term (2010 to
2015) has unexpectedly coincided with some major concerns about our science. This short personal1
future history describes some of what I saw at the revolution and includes my predictions of what longer
pieces about the history of psychology will say in the future.

There is a war between the ones who say there is a war and the ones who say there isn’t. – Leonard
Cohen (“There Is a War”)

Psychological science is currently going through a major introspective stage. Some people call it a “crisis”
(of confidence or of replicability), and others deny that term is applicable. I call it a revolution. It is not a
revolution in the sense of the “cognitive revolution” or of a Kuhnian paradigm shift (Kuhn, 1962)
because it is not about the content of our science.2 Rather, it is about the values we hold as we
conceptualize, implement, analyze, and share our science. Because this revolution relies on creating
more open interaction between people and laboratories, and because how we do our science now so
heavily depends not only on individual computers but also on the Internet, I call what is currently
happening Revolution 2.0 (Spellman, 2012b, 2013a, 2013d).
I predict that any (good) future history of this revolution will not read like a history of a scientific
revolution. Instead, it will read like a history of a political revolution. This article elucidates some of the
factors I see as common to our Revolution 2.0 and a prototypical political revolution. The analogy is not
to a political revolution like the American Revolution, involving the overthrow of an external colonial
power; rather the analogy is to a revolution like the French or Russian Revolution, a revolution that
overturns the status quo within one country and leaves the same people to function in a differently
structured environment.

This revolution, like all revolutions, did not begin from nothing. Like all revolutions, it has precipitating
events, past harbingers, and underlying structural factors that enabled a revolution to take root now
rather than in times past. Like many revolutions there has been fear, anger, and confusion; excesses and
extremes; some (metaphorical) head rolling; and movement back to an acceptable middle that will,
incrementally, become the new way of proceeding.

Precipitating Events

Someone must have been living in a cave to not be aware of events within psychological science (and, in
fact, in all of science) in the early 2010s that might have contributed to this revolution. They have been
documented and discussed in many places (see the many papers in the November 2012 special section
on replicability in Perspectives, Pashler & Wagenmakers, 2012, and the blogs of Ed Yong, e.g., Yong,
2012), so let me just quickly note a few:

Failures to Replicate: The recognition that many findings, especially some that were ground breaking
and well-cited, failed to replicate across laboratories, in addition to the increasing frustration with the
inability to publish replication failures (or even successes).

Questionable Research Practices: The pointed illustration by Simmons, Nelson, and Simonsohn (2011)
that, with enough leeway built into a study, researchers could show just about anything; that many in
the field knew of people (themselves or others) who used these practices (John, Loewenstein, & Prelec,
2012); and that some practices (e.g., not reporting all variables measured) have been not only accepted
but also encouraged within the field.
Standard Statistics: Increased dissatisfaction with the use of null hypothesis significance testing (NHST),
which intensified after the publication of Daryl Bem’s (2011) paper on precognition in a prestigious
social psychology journal.

Open Science (and Open Access): The inability to obtain the data of other researchers for reanalysis or
inclusion in meta-analyses, despite publication guidelines stating that authors should be willing to share
data for such purposes. (This concern is sometimes conflated with that of wanting scientific publications
to be publicly available for no fee. The first can be thought of as “open science” and the second as “open
access.”)

Fraud: Some high-profile cases of fraud were galvanizing early on, particularly the case of the social
psychologist Diederik Stapel, which broke in 2010. The final report about his actions (Levelt Committee,
2012) found fraud in over 50 of his publications.

Other Fields: Although psychologists, and social psychologists in particular, seemed to be suffering from
the “spotlight effect” (i.e., believing that everyone was staring at them for these mishaps), it turns out
that problems of nonreplicability had been running rampant across the biological sciences and, most
scarily, in medicine for years (Ioannidis, 2005).

It is important to note here that what triggers a revolution need not be what the revolution is actually
about. So, for example, although I believe that the Stapel case and other revelations of fraud were
important motivators for action, they will not be key to the kinds of changes that will ultimately result
from this revolution.

Prehistory

It turns out, of course, that none of these calls for alarm or reform is particularly new (although some
may be better articulated now). Like all political revolutions, we can look with hindsight to the
unsuccessful precursors of the current movement. These events have also been documented elsewhere,
particularly in the introductory sections of many recent articles, so I mention only a few here. (The table
in the Appendix shows a comparison between the worries of the past and present.)

Psychologists have long been concerned about our statistical tools, especially NHST. Indeed, the debate
about its value goes back to its adoption and there were loud discussions about it in the 60s and 70s.
Arguments for Bayesian analysis—or at least for supplementing NHST with other statistics—have been
ongoing. In more recent decades, editors of various journals have attempted to change reporting
statistics, including Tony Greenwald (1976, at Journal of Personality and Social Psychology; JPSP), Geoff
Loftus (1993a, at Memory & Cognition), and James Cutting (at Psychological Science). Greenwald also
wanted authors to submit their full analyses to JPSP (e.g., entire analysis of variance [ANOVA] tables
rather than a few selected results from a multifactor ANOVA). He also told them to expect to have their
data available for sharing for at least 5 years after publication. Greenwald’s initiatives lasted all of 3
years before he was relieved of his editorial duties.

Researchers have also long expressed frustration in getting others to send them data for reanalysis or
meta-analyses. Concern with the positive result rate and the inability to publish negative findings (or any
kinds of replications) has a long history as well. Relatedly, the “file drawer problem” is a well-known
major flaw in all meta-analysis (Rosenthal, 1979).

In 1998, Norbert Kerr publicized the term HARKing: hypothesizing after the results are known. He
pointed out the dangers of the then-common standard practice of presenting data as if it confirmed a
hypothesis that you had all along. The likely pervasiveness of this practice was documented by Bones
(2012).3

Worries about power are not new nor are worries about questionable research practices. And, of
course, as psychologists, we should remember that problems of cross-laboratory replicability have
haunted us since Introspectionism during the formative years of our science.

Why Might This Time Be Different?

To understand why Revolution 2.0 began when it did, and why it might actually have lasting effects, we
need to look not only at the temporal convergence of the precipitating events, but also at the status quo
in psychological science circa 2010.

I believe that there are two major factors behind the current large push for change: changes in
technology and changes in the demographics of psychology researchers. These two factors interact with
some structural characteristics in the field—namely, limited resources and perverse incentives. I also
believe that there is a hugely underrated third factor that makes this time different—recognition that
we (psychologists) are not alone.
Changing technology (as part of the problem)

Technology now is quite different from the extant technology when “the rules of the game called
psychological science” (Bakker, van Dijk, & Wicherts, 2012) were developed. As researchers, we now
have the ability to create4 and access more information than ever before. When I give talks about
changing science, I often quiz my audience (Spellman, 2013d). I have asked hundreds of psychology
researchers questions such as:

Did you ever think that you could run 100 subjects in a day? How about 1,000?

Did you ever think that you could input and analyze all your data in an hour?

Did you ever think that you could sit at your desk and collect all of the articles you want to read and cite
in 1 minute?

Did you ever think that with the push of a button you could send your research to colleagues across the
globe in 1 second?

For people who started in the field before the new millennium, these transformations were magical, and
as they raise their hands in answer to my questions some people smile, some groan, and some sigh.
Perhaps that is because the great increase in the amount and speed of research enabled by the new
technologies has also created problems.

More information is a good thing—or at least most scientists think so. But increasing the speed of
acquiring information can have both beneficial and harmful consequences.

One unexpected consequence of the speed of research and dissemination was more people learning
about others’ failures to replicate studies. Yes, people used to fail to replicate studies, but such failures
were (and still often are) typically seen as a failure of the replicating researcher to properly implement
the study. Discovering that other labs had also failed to replicate the study was often the result of
fortuitous late-night conference conversation. With research speed came more attempts to replicate,
and with dissemination speed came swifter and broader communication through e-mail, Twitter, blogs,
etc. In addition, advances in data mining techniques made it simpler to study the overall research
literature itself. Statistical analyses of large swaths of the literature showed, for example, the unsettling
relation between effect sizes and sample sizes (e.g., Fraley & Vazire, 2014) and the implausibility of the
large percentage of hypothesis-confirming studies.

Of course, another obvious consequence was that with more research came more studies for
publication and, in turn, more competition to produce more articles. Ultimately, articles packaging the
best stories and prettiest data were more likely to be accepted.

Changing demographics

In the last few decades, psychology has been a booming business. Psychology departments grew
throughout the 1990s and 2000s (at least before the “great recession” of 2008). Psychology and
psychologists have moved into business, law, and policy schools. Psychological findings have been
reported in visible publications—such as the New Yorker (e.g., in articles by Malcolm Gladwell and Jonah
Lehrer) and the expanded New York Times Science Times section (Clark & Illman, 2006). And
psychologists themselves have written popular books about their own research.

The demographics of psychology academia have also changed. Of course, women and minorities are still
underrepresented as faculty members—at least relative to their presence in the larger population and,
for women, relative to their presence as graduate students. But the grip of the “old boys club” is
weaker. Gone are the days when Professor A at a top university could call Professor B at another top
university and say, “I’ve got a great graduate student who needs a job next year.” And Professor B would
say, “Sure. We have a position. Tell him to pack his bags.” Perhaps the opening of our field, the
increasing number and diversity of researchers, and the variety of laboratories and types of training
have led to less implicit trust in statements such as, “I did it and so, of course, I did it correctly.” It’s not
that people suspect fraud so much as they want more transparency in the process from people whom
they had not attended graduate school with and did not know. “So, how exactly did you do that?”5

And, finally, the younger generation of academics was raised on the speed and connectivity of
computers and the Internet. They are used to sharing more information and doing things faster. Thus,
again, overall, there are many more people trying to publish many more studies.

Limited resources

The abundance of new researchers and new research has caused problems with various types of limited
resources. One such limited resource is the number of subjects in university subject pools. Research
demands have outpaced the growth of student bodies. Plus, there are new pressures for more diverse
participants in studies to improve generalizability.6 Researchers have begun using more online
platforms, survey services, and, of course, Amazon Mechanical Turk7 to increase their sample sizes.

A second limited resource is grant money. There are many more researchers but funding availability has
stagnated. On the one hand, with the increase of researchers wanting funding for expensive fMRI (and
other brain and biological) measures, more money has gone to fewer researchers. On the other hand,
with computerized testing and simpler data entry and analysis, a great deal of research has become less
resource intensive in terms of both money and time.

But to many people, the most limited resource is printed pages in journals—the remnants of a 1.0
publishing processes in a 2.0 world8 (see Priem, 2013). New journals have provided more outlets, but
not enough to meet the demand (at least in the eyes of the authors), and the rejection rates remain
high. Many journals have implemented triage systems. The promise of fast reviews makes the sting of
rejection hurt less, but it might exacerbate the problem of too many submissions—if it is a top journal
and its rejections are quick, then why not gamble and send a paper there first?

Short form empirical papers in psychology—popularized by Psychological Science (which began


publishing in 1990)—have become more common, with the goals of speeding publication and publishing
more research. But that format itself exacerbates other problems (Ledgerwood & Sherman, 2012). One
is the fragmented publication of research programs (and the concept of the “minimal publishable unit”).
Researchers might prefer to publish a set of experiments as two fast short publications rather than as
one slower longer (more comprehensive) one. This does not help create an integrated science. Another
exacerbated issue is the problem of truncated reports with key omissions. When a paper has a tight
word limit, it is easy to cut method details, mentions of pilot studies or measures that “didn’t work,” or
even references. Missing method details are a source of nonreplicability, leaving out pilot studies or
measures denies readers access to the full record of what does and does not work, and cutting
references does not help create an integrated science.

There is also a puzzle: Although paper journals are limited in pages, we do not live in a purely paper
world any more. Why can’t journals publish more and publish longer by publishing electronically?
Several journals started producing online supplements, online discussions or commentaries, and even
entire online publications. The uptake for online supplements was fine early on, because they were
supplementing an accepted archival peer-reviewed paper publication. But, despite, or perhaps because
of, the limitlessness of the resource, interest in online-only publishing is mixed, and there are many
people who prefer that their thoughts appear in print.
There is a final relevant piece of the status quo to consider: Who has controlled these limited resources?
The answer, for the most part, is the established folks from Generation 1.0. They are the heads of
associations, the reviewers on grant panels, and the editors of journals. Why would they want to change
the status quo when the status quo allowed them to succeed? (I’m not saying that age and longstanding
success are the only predictors of whether someone is part of Generation 1.0 or 2.0, but I suspect huge
correlations.) I got my first associate editor position at Psychological Science in 2002. A couple of years
later, I went to a conference and ran into a well-known, well-published psychologist who had been a
friend but who had recently stopped talking to me. I asked him why. He said, “No friend has ever
rejected a paper of mine before.” My first thought was, “What a sad view of friendship,” and my second
was, “What a sad comment on the state of our science.”

Misaligned incentives

Another pressure point is the misaligned incentives in scientific publication. We assume that researchers
want to do true, accurate, and important science and be appropriately rewarded for it. But the rewards
(credit, tenure, fame) only come if the research is published, and that research is only published if it is
novel, hypothesis-confirming, and the data are “pristine” (Giner-Sorolla, 2012). There has previously
been no reward for solid research that “didn’t work” or for replication, and there has not been much
reward for so-called “incremental research.”

This situation creates “a disconnect between what is good for scientists and what is good for science”
(Nosek, Spies, & Motyl, 2012, p. 616). Even without assuming any kind of intentional fraud, scientists can
unwittingly become victims of motivated reasoning or hindsight bias – valuing data, methods, and
analyses that support their hypotheses more than those that do not and even misremembering what
had been hypothesized before learning the results (Nosek et al., 2012.) And if you know that other
people are engaged in questionable research practices, wouldn’t it be fair if you did it too (John et al.,
2012)?

Of course, the misalignment of incentives is not simply in publishing. Because publishing is key to jobs,
tenure and promotion, future grant success, and professional awards and status, some people might
feel the incentives for publication to be stronger than the incentives for truth telling. The problem of
misaligned incentives in science is not a new one, but given the rest of the status quo—researchers with
expectations of sharing, more competition for limited jobs and publication outlets—the bar and the
stakes are higher than ever. (See Engel, 2015, for a game-theoretical analysis of scientific disintegrity as
a “public bad” and Diederik Stapel’s autobiography for an explanation of why even a prominent
psychologist might succumb to minor, then major, fraud.)

You might also like