Causal Empiricism in Quantitative Research: Cyrus Samii, New York University
Causal Empiricism in Quantitative Research: Cyrus Samii, New York University
Causal Empiricism in Quantitative Research: Cyrus Samii, New York University
Quantitative analysis of causal effects in political science has trended toward the adoption of “causal empiricist”
approaches. Such approaches place heavy emphasis on causal identification through experimental and natural experimental
designs and on characterizing the specific subpopulations for which effects are identified. This trend is eroding the position
of traditional regression studies as the prevailing convention for quantitative causal research in political science. This essay
clarifies what is at stake. I provide a causal empiricist critique of conventional regression studies, a statement of core pillars
of causal empiricism, and a discussion of how causal empiricism and theory interact. I propose that the trend toward causal
empiricism should be welcomed by a broad array of political scientists. The trend fits into a broader push to reimagine our
discipline in terms of collective research programs with high standards for evidence and a research division of labor.
Cyrus Samii (cds2083@nyu.edu) is assistant professor, Politics Department, New York University, 19 West 14th Street, New York, NY 10012.
1. Econometrician Angus Deaton is even more colorful in characterizing such work in economics in terms of “the magic regression machine” (Ogden
2015).
2. A pseudo-fact is not the same thing as a “stylized fact”—rather, a pseudo-fact could be understood as a statistical finding presented as a stylized fact
on the basis of an erroneous interpretation. For example, in an example below I will propose that the claim of a statistical nonrelationship between ethnic
fractionalization and civil war onset is a pseudo-fact and therefore should not enjoy the status of a stylized fact.
The Journal of Politics, volume 78, number 3. Published online May 17, 2016. http://dx.doi.org/10.1086/686690
q 2016 by the Southern Political Science Association. All rights reserved. 0022-3816/2016/7803-0022$10.00 941
of prevailing conventions is harsh, but I will argue below that factors of interest are quite stubborn in terms of their ma-
it is justified. At the turn of the millennium, the modal quan- nipulability. Causal empiricist research in conflict studies is
titative research design was one in which researchers assem- substantively rich and diverse, focusing on the effectiveness
bled data on theoretically interesting dependent and inde- of counterinsurgency strategies, effects of foreign aid, insti-
pendent variables for the “universe” of cases of interest. tutional and economic roots of civil war, and postwar con-
Researchers then assessed the presumably causal relationships sequences of exposure to wartime violence, among other
in these data using regressions with informally motivated sets topics. This goes to show that causal empiricist research does
of control variables to reduce the potential for confounding.3 not have to be narrow or focus on small questions (e.g.,
The question of whether one was supposed to “believe” in the Deaton 2010; Huber 2013). In other subfields, such as the
regression specification was rarely addressed, and so it is study of voting behavior (de Rooij, Green, and Gerber 2009;
rarely clear whether the regression model, as an object, should Green, McGrath, and Aronow 2013) or representation and
be construed as a structural model of outcomes or as an ag- accountability (Grose 2014), the move toward causal empir-
nostic tool for achieving causal identification. icism has been more thorough.
This convention in quantitative causal research appears The sections below offer three considerations related to
to be breaking down, and more quantitative causal research the move toward causal empiricism. First is a causal empiricist
is moving toward causal empiricism. This does not represent critique of what is still the prevailing convention: loosely
a major change in general goals—researchers have always specified and heroically interpreted regression studies. I will
been interested in causal inference. Rather, it represents a show that in terms of generalizability, conventional regression
major change in what researchers believe are credible ways studies possess no special evidentiary advantage over experi-
of doing causal inference and immediately justifiable ways ments or natural experiments that estimate effects for well-
of describing their results. defined subpopulations. Moreover, conventional regression
Take, for example, quantitative research on the causes studies are often highly compromised in terms of their in-
and effects of civil conflict. This subfield hosts one of the most ternal validity. The declining prevalence of conventional re-
cited quantitative empirical papers in recent decades (Fearon gression studies relative to experiments and natural experi-
and Laitin 2003), but it is also a subfield where one might ments is, therefore, welcome on internal validity grounds and
expect that causal empiricist approaches would be difficult does not represent a loss in terms of the generalizability of
to employ. I searched all quantitative papers on civil conflict the findings.
that make causal claims published in American Political Sci- Second is a statement of core pillars of causal empiricism.
ence Review, American Journal of Political Science, and the The goal here is to clear misconceptions and to show how
Journal of Politics. Thirty of 34 papers (88%) published be- causal empiricism puts emphasis on both statistical rigor and
tween 2000 and 2010 relied on the conventional multiple re- in-depth knowledge of specific cases. In doing so, causal em-
gression design.4 Those four standouts went further in trying piricist research aims to establish credible causal facts un-
to estimate causal relations by using either instrumental vari- derstood for their specificity. Whether or not such facts gen-
ables, matching, or panel fixed effects methods to try to im- eralize is not a question to be addressed definitively by a single
prove causal inference. Between 2011 and 2015, conventional study. Rather, these are questions to be addressed in research
regression studies accounted for 32 of 48, or 67%, a marked programs that consider collections of credible, specific facts
drop. Those 16 other studies applied methods such as in- in light of theoretical models. If journal editors want to be
strument variables, regression discontinuity, difference in dif- realistic in their promotion of credible causal research, they
ferences, or matching combined with explicit discussions of should expect quantitative studies to do less in terms of gen-
identification.5 This indicates a shift in an area where causal eralization and theory development and more in terms of
identification. Generalization and theory development are bet-
ter left to synthesis studies.
Given such realism about identification and specificity,
3. I say informally motivated because in only very rare cases do re-
where does this leave generalization? The section Pillars of
searchers motivate control specifications on the basis of a structural model. Causal Empiricism addresses this question in terms of the
See Morton (1999, 130–31) for a related discussion in political science. relationship between causal empiricism and theoretical mod-
4. The list of studies is available on the author’s website. eling. In debates about causal empiricism, a refrain among
5. The tally does not include field experimental studies on post-
conflict development programs, such as Avdeenko and Gilligan (2015),
skeptics is that “theory is being lost” in the so-called identi-
Beath, Christia, and Eniolopov (2013), and Fearon, Humphreys, and fication revolution (Huber 2013). I address this concern by
Weinstein (2015). describing how causal empiricist research can fit into broader
research programs that also pursue theoretical modeling. The pseudo-generality problem
Theoretical models provide lenses for interpreting specific The predominant approach to quantitative research in po-
empirical results in terms that are generalizable. litical science is a regression study on a data set represen-
This essay focuses on conceptual issues and does not go tative of a universe of cases of interest. Researchers describe
into identification strategies and techniques employed in causal the findings from such studies in general terms, and they use
empiricist research. For reviews covering such techniques, summary statistics for the sample at hand when reasoning
readers should consult Imbens and Wooldridge (2009) or about scope conditions. Consider the following very typical
Keele (2015b). Textbook treatments are given by Angrist and excerpt, from the abstract of Hartzell and Hoddie (2003,
Pischke (2009), Dunning (2012), Hernan and Robins (2013), 318): “Employing the statistical methodology of survival
Imbens and Rubin (2015), and Morgan and Winship (2015). analysis to examine the 38 civil wars resolved via the process
Nor do I discuss the relationships between quantitative and of negotiations between 1945 and 1998, we find that the
qualitative research, which for political science applications more dimensions of power sharing among former combat-
are covered in the contributions to Brady and Collier (2010). ants specified in a peace agreement the higher is the likeli-
Finally, the focus here is on quantitative causal studies. Al- hood that peace will endure.” Or consider this more recent
ternative modes of quantitative analysis include development excerpt from the abstract of Prorok (2016, 70): “These prop-
of quantitative measures of latent or otherwise hard-to- ositions are tested on an original data set identifying all rebel
observe phenomena, pure prediction problems, for which and state leaders in all civil conflict dyads ongoing between
machine learning has contributed to recent advances (Klein- 1980 and 2011. Results support the hypothesized relation-
berg et al. 2015), and descriptive characterizations of trends ships between leader responsibility and war outcomes.” These
or equilibrium relationships. My focus on quantitative causal particular authors are by no means the exception—they op-
research does not imply any disregard for these other modes erate well within the predominant mode of quantitative em-
of empirical research. The discussion of research programs pirical inference. But that is exactly the problem. This type of
below takes such research to be complementary to causal generalizing reflects a presumption that by using a data set rep-
research. At the same time, Ashworth, Berry, and Bueno De resentative of some target population (“the 38 civil wars . . .”,
Mesquita (2015) show that many points raised by causal em- “all rebel and state leaders in all civil conflict dyads . . .”), the
piricists are relevant for these other modes of quantitative study will produce findings generalizable to that population.
analysis as well, and so even those who do not engage in causal Such a presumption is sometimes used to justify the use of the
research may find the discussion below interesting. conventional regression study design over a research design
based on an experiment or natural experiment that is clearly
limited to a specific subpopulation.6
A CAUSAL EMPIRICIST CRITIQUE OF PREVAILING Statistically speaking, this line of reasoning is completely
CONVENTIONS misguided and has led to problematic judgments about the
I begin with a critique of the prevailing convention in quan- merits of studies using “general” data relative to experi-
titative causal research in political science, which is to use ments or natural experiments using data from more tightly
multiple regression methods to estimate causal relations on defined subpopulations. It is the identifying variation that
“general” data sets that are meant to be representative of a determines which units in a sample contribute to an effect
universe of cases of interest, whether in the form of a data set estimate, not the mere presence of a unit in a sample. The key
exhaustive of all units of interest (e.g., a cross-national study concept for characterizing identifying variation is “positiv-
with all available country data) or a representative survey ity” (Hernan and Robins 2013, 29–30; Petersen et al. 2011),
sample. The methodological literature has given much more also known as the condition of “overlap” in covariate (i.e.,
attention to threats to internal validity for conventional re- control variable) distributions over values of the treatment
gression studies, and these threats serve as a primary moti- variable (Imbens 2004). Supposing for a set of units indexed
vation for the turn to a causal empiricist approach. And yet, by i, one is interested in the effect of some treatment of in-
conventional regression studies still dominate empirical prac- terest Ti that takes values defined by the set T . Each unit
tice in quantitative political science. This may be because in the population is characterized by “potential outcomes”
researchers seek the comfort of methods that have the veneer
of generalizability or seem to be reliable enough for estimating
6. Exemplary instances of such arguments include Bardhan (2013), as
multiple causal effects at once. This is a false comfort. This cited in Aronow and Samii (2016), and Huber (2013), who compares a
section explains why, both with respect to generality and in- “traditional regression-type paper” to a hypothetical natural experiment in
ternal validity. Sweden.
corresponding to each treatment value, denoted by a ran- gan and Sergenti 2008). Matching estimation forces the re-
dom variable Yi(t) for all t ∈ T . For each unit, the observed searcher immediately to confront the reality of limited over-
outcome, Yi, equals the value of Yi(t) corresponding to the lap. By dropping cases in areas with no overlap (e.g., by using
treatment that the unit received, t. Finally, suppose one matching calipers), one consciously limits the scope of one’s
controls for a vector of covariates Xi. The data “alone” only inference (Banerjee and Duflo 2009, 162–63).
support causal comparisons where, in the neighborhood of Aronow and Samii (2016) take these points further. The
a given value Xi p x, the sample includes overlapping units conventional research design, which marries regression with
with different treatment values. It is in such neighborhoods a representative sample of some target population (all coun-
that positivity holds.7 The values of X for which positivity tries in the world, the population of the United States, etc.),
holds are the locations in the covariate space where one has typically fails to yield results that generalize to the target
identifying variation in the treatment. population or even to natural subpopulations where positivity
Grasping positivity and overlap is easy, as the following holds. That is, conventional regression studies can be worse
shows.8 Suppose at time 1 we randomly assign households than our toy example of the pamphlet experiment. To see
in one county in northern California either to receive pam- why, first suppose that one uses ordinary least squares (OLS)
phlets on income inequality or to receive nothing, and we to estimate the average effect of T on Y with a regression of
want to estimate the effect of the pamphlets on household the form
members’ attitudes toward redistribution. Then, at time 2 we
Y p a 1 bT 1 X 0 g 1 ε,
survey not only households in that one county, but in all
counties in the United States, even though none of the other and that the regression satisfies specification requirements
counties received pamphlets. Would this research design pro- such that the OLS estimate for b indeed estimates a causal
vide credible evidence on the average effect of the pamphlets effect.9 This is even more generous than King and Zeng, for
for all US households? Clearly not, because the identifying whom the dangers arose from model misspecification in areas
variation is limited to but a small and specific segment of of no overlap. Let ti denote the i-specific effect of changes in
the US population. This example may seem contrived, but as Ti on Yi. This embeds the very reasonable assumption that
I show below it resembles what occurs in conventional re- effects are heterogeneous over the units. (Without such het-
gression studies. erogeneity, the issue of generalizability would be moot any-
Where there is no overlap, one can only make compari- way!) Then, the OLS estimator, b, ^ obeys
sons with interpolated or extrapolated counterfactual poten- p E ½wi t i
^→
b , where wi p (T i 2 E ½T i jX i )2 :
tial outcomes values. King and Zeng (2006) brought the issue E ½wi
to political scientists’ attention, characterizing interpolations
and, especially, extrapolations as “model dependent,” by The wi weights are referred to as the multiple regression
which they meant that they were nonrobust to modeling weights, and they characterize the extent to which a given
observation contributes to b. ^ 10 The weights are largest in
choices that are often indefensible. By pointing out how
common such model dependent estimates are in political areas of the covariate space where the treatment is poorly
science research, King and Zeng raised troubling questions explained—that is, for units where the treatment assignment
about the validity of many generality claims in quantitative is unpredictable given observables. The multiple regression
causal research in political science. They provided an algo- weights are particular to linear regression methods, and
rithm for determining whether counterfactual comparisons Aronow and Samii go further to characterize general condi-
are within the convex hull of the data, and thus in areas tions needed to produce generalizable estimates. Under such
where positivity likely holds, although few studies seem to conditions, various interaction-model, response-surface mod-
have applied these methods. Among those who take posi- eling, and weighting techniques are capable of producing
tivity and overlap seriously, the common reaction, and the effects that generalize to the population for which positivity
one endorsed by Ho et al. (2007), has been to resort to other holds.
estimation methods like matching estimators (e.g., Gilli- The upshot is that the effective sample that gives rise to
an effect estimated in a regression study can be quite dif-
7. That is, positivity or overlap holds for a population of interest and a 9. Aronow and Samii (2016, thoerem 1) provides a precise statement
set of treatment values, T 0 ⊆ T if 0 < Pr½T ∈ T 0 jX i p x < 1 for all x of the assumptions.
with positive measure in the population of interest (Hernan and Robins 10. Very similar results obtain for coefficients estimated via generalized
2013, 30; Imai and van Dyk 2004, 855). linear models (logit, probit, etc.) and random coefficient models (Aronow and
8. I credit Peter Aronow for this toy example. Samii 2016, 256–57).
ferent from the nominal sample with which one started. The areas of positivity, what looks on the surface like a “general”
effective sample is the transformation of the nominal sam- empirical analysis is often, in reality, a comparison within
ple after reweighting by the multiple regression weights. Aro- a highly specific subpopulation. What is disturbing is that
now and Samii demonstrate with a study by Jensen (2003) on authors of conventional regression studies typically have
the effects of levels of democracy on foreign direct investment. no clue about where positivity holds or how regression
Figure 1 reproduces the figure from Aronow and Samii (2016) weights this subsample, and they make no effort to be trans-
for the Jensen example. The map on the left shows Jensen’s parent about it. In experiments, by construction, one controls
nominal sample, which is representative of most of the world. positivity, although the reach of experiments is limited to
The map on the right shows the effective sample that was the causal factors available for direct manipulation and to sub-
basis for the main result reported in the paper. The shading populations where we can run the experiments. But at least
indicates the weight that each country receives in the re- sample summary statistics will accurately portray the effective
spective sample. The first thing that strikes the eye is how sample for experiments. For natural experiments, researchers
radical the difference is between the nominal sample and ef- have become accustomed to characterizing areas of identify-
fective sample. It really does resemble our pamphlet study ing variation, as with the complier subpopulation for instru-
example. The effective sample gives nonnegligible weight to mental variables studies (Abadie 2003), the subpopulation
only a few of the countries that were in the nominal sample. near the cutoff for regression discontinuity studies (Lee and
Here is the first indication that the results are not immediately Lemieux 2010), or the comparison cases for difference-in-
general to the world that the nomimal sample is intended to differences studies (Abadie 2005; Abadie, Diamond, and Hain-
represent. The top 12 contributing countries, accounting for mueller 2010; Abadie and Gardeazabal 2003). The same
more than half of the total weight applied for the main esti- scrutiny should be applied to conventional regression studies:
mate in the paper, are (in descending order of their weights) journals should insist that their research design sections re-
Uruguay, Hungary, Niger, Philippines, Argentina, Madagas- port characteristics of the effective sample—this requires only
car, Pakistan, Zimbabwe, Poland, Peru, Lesotho, and Belarus. examining the residual variance in the treatment after par-
At first glance this appears to be an odd grab bag of countries. tialing out the controls.
Upon further consideration one notices that many of these
countries had rapid regime shifts due to military coups d’etat, The problem of pseudo-facts
while others had rapid regime shifts associated with the end We now turn to issues of internal validity—that is, questions
of the Cold War.11 As such, the study is based primarily on of whether the causal “facts” that conventional regression
effects associated with these-specific types of transitions. studies produce are actually reliable. For causal identifica-
Suppose someone were to present studies that carefully tion, conventional regression studies rely on the assumption
sought to estimate the consequences of coups or post- that the control variables, X, are adequate to account for
Communist transitions on foreign investment. Journal re- confounding in the relationship between a causal factor of
viewers operating on the basis of current conventions would, interest, T, and the outcome Y.12 Moreover, such studies rely,
I suspect, criticize such studies for their lack of generaliz- to some extent, on getting functional forms correct. Many
ability. I hope the hollowness of such criticisms is now clear: things can go wrong, and when they do the results produced
such criticisms draw an implicit comparison to either a fal- from a conventional regression study are better thought of
lacious interpretation of how conventional regression studies as “pseudo-facts.” Below I review two important issues re-
work or some unattainable ideal. lated to internal validity of conventional regression studies:
Conventional wisdom in political science about trade- misspecification and determination of control variables.
offs between generalizability and internal validity for dif-
ferent research designs is based on faulty foundations. There
is no clear ordering of experiments, quasi-experiments, and 12. One way to formalize this identifying assumption is in terms of
conditional mean independence for a nonempty set of treatment values, T ,
observational studies that use regression or other control
and a nonempty subset of the covariate space, X :
methods in terms of the generality of their findings. In ob-
servational studies, positivity is out of the control of the E ½Y(t)jT p t, X p x p E ½Y(t)jX p x for all t ∈ T , x ∈ X , and 0
< Pr½T ∈ T jX p x < 1:
researcher, and it is typically limited to an idiosyncratic sub-
set of the population (Dunning 2008, 291). Once we isolate This formulation is weaker than the more common assumption of the full
distribution of potential outcomes, Y(t), being independent of treatment, T,
conditional on X, although as Imbens (2004) notes the case for conditional
mean independence is rarely more compelling than the case for the stronger
11. I thank Ali T. Ahmed for this astute observation. conditional independence assumption.
The first issue is associated with bias due to mispecifica- 2015, chap. 2)? If it is a partial effect, is it causally identified
tion. In a paper that is well cited in political science, Achen under the given specification and given assumptions that
(2005) demonstrated how misspecification for control vari- we are really willing to believe? Rather, researchers still tend
ables undermines estimates of coefficients on treatment vari- to take the textbook characterization of omitted variable bias
ables. The solution he proposed was that researchers should to mean that all variables correlated with treatment variables
use formal theory and specification checks to make more and outcomes should be controlled in order to obtain unbi-
deliberate functional form choices and also to define more ased causal estimates. Rote application of the textbook char-
homogeneous subpopulations within which to conduct one’s acterization of omitted variables contributes to the problem
analysis. But as Ho et al. (2007) explain, the problem with of “bad control” (Angrist and Pischke 2009, 64–68)—that
this “solution” is that it grants considerable latitude to re- is, causally incoherent control for “posttreatment” variables,
searchers. As they put it, there is little credible basis to believe meaning variables that are causal descendants of the treat-
that conventional regression studies “are not merely demon- ment of interest. The result is vagueness, if not horrendous
strations that it is possible to find a specification that fits the bias and inconsistency, in the estimated causal effects.
author’s favorite hypothesis” (199; emphasis in original). Ho Take Fearon and Laitin (2003), who use a regression anal-
et al. propose matching as a more credible solution. Match- ysis to challenge the idea that ethnic structure affects civil
ing orthogonalizes treatment variables relative to control vari- war risk. To do so, they examine the relationship between
ables and thereby limits the extent to which control variable ethnic fractionalization and civil war onset. A headline finding
specifications affect coefficients estimates on treatment vari- of the study—one of the supposed “facts” that it establishes—
ables. Following on that work, there have been numerous is that “factors that explain which countries have been at risk
methodological contributions that help to free researchers for civil war are not their ethnic or religious characteristics”
from the problems of misspecification, including advances (75, abstract). Column 1 of table 1 replicates Fearon and
in matching (as reviewed by Sekhon 2009) and nonparametric Laitin’s main results, showing a small and statistically in-
regression and machine learning methods (Hainmueller and significant coeffient on ethnic fractionalization. As measured
Hazlett 2014; Hill 2011; Van der Laan and Rose 2011). Re- for this study, ethnic fractionalization is an unchanging char-
searchrs’ increasing use of such methods to make the case for acteristic of a country.13 Now, Alesina and La Ferrara (2005)
the robustness of their findings is a welcome development review studies showing a strong negative relationship between
by the standards of causal empiricism. ethnic fractionalization and social and economic develop-
The second issue concerns determination of control vari- ment. Thus, how should we interpret a coefficient produced
ables, a problem for which political scientists tend to rely on from a model that includes economic and social factors that
faulty heuristics in spite of guidance from causality theory are widely believed to be affected by ethnic fractionalization?
(Angrist and Krueger 1999, 1291–93; Imbens 2004; Pearl In fact, the unconditional correlation between ethnic frac-
2009, chap. 3; Rosenbaum 1984). Conventionally, researchers tionalization and civil war onset is really strong. This is shown
use informal substantive arguments to motivate their sets of by the bivariate regression in column 2, as well as in columns 3
controls, often appealing to some notion of a “standard” set and 5, which account for country-level clustering (given that
of controls for an outcome of interest. The underlying sta- ethnic fractionalization does not vary from year to year) and
tistical motivation is typically based on the concept of “omitted then also the “prior war” variable (to mimic Fearon and Lai-
variables” as taught in conventional regression textbooks. Un- tin’s approach to handling dynamics). Only when we “con-
fortunately, such textbooks provide vague guidance leading trol” for per capita income do we get the insignificant result,
to highly problematic decisions. Two commonly referenced as shown by columns 4 and 6.14 Now, Fearon and Laitin ac-
textbooks by Greene (2008, 133–34) and Wooldridge (2009, knowledge this point.15 But it still raises important questions.
87–90) define omitted variable bias in terms of omitting con- These data exhibit the same pattern that Alesina and Ferrara
trol variables that (i) should appear in the “correct” or “true” summarize—a very strong negative correlation between eth-
specification for the outcome variable and (ii) are also cor- nic fractionalization and income, as shown in column 7. Thus,
related with the causal factor of interest. What this defini-
tion omits is that the “correct” specification depends on the 13. Ethnic fractionalization is constant for all countries in the data set
effect that one wants to estimate. These differences are based except USSR/Russia and Yugoslavia, owing to those countries’ break-ups.
on the causal ordering of the treatment and control variables 14. Introducing any of the other control variables, on their own, does
little to change the large, significant coefficient on ethnic fractionalization.
(Elwert and Winship 2014; Pearl 2009, 17; Rosenbaum 1984). 15. In their abstract, they are clear that “after controlling for per capita
We need to ask, are we interested in a “total” effect, or some income, more ethnically or religiously diverse countries have been no
kind of “partial” effect (Pearl 2009, 126–32; VanderWeele more likely to experience significant civil violence.”
Table 1. Replication and Auxiliary Analyses for Laitin and Fearon (2003)
Outcome
Note. Regression coefficients with standard errors in parentheses. To save space the table omits from column 1 coefficients for the following control
variables: log(population), log(% mountainous), noncontinguous state, oil exporter, new state, instability, democracy, religious fractionalization, and the
constant term.
* p ! .05.
** p ! .01.
*** p ! .001.
it is not at all clear that ethnic fractionalization is unrelated controlling for X. Another researcher comes along and sug-
to conflict, at least in terms of its “total” effect. That such an gests that, actually, we need also to control for the variable W in
effect may operate via effects on income does not change this addition to X, and in doing so, the estimated effect of T is now
basic conclusion.16 very small. The conventional interpretation would invoke the
The Fearon and Laitin paper is over a decade old, but logic of “omitted variables,” concluding that the original study
problems of bad control remain ubiquitous, meaning that we probably did a poor job of estimating the average effect of T
should doubt a tremendous amount of the purported facts and the second study provides an improvement. Is this a
established by quantitative political scientists. Acharya, Black- reasonable conclusion? The results from Aronow and Samii
well, and Sen (2016) find that over half of quantitative papers (2016), discussed above, would have us wonder whether in-
published in top political science journals since 2010 suffer clusion of W may have merely shifted the effective sample
from “bad control.” They review methods developed by Robins toward a subpopulation for which the effect of T is weak. In
(1997) and VanderWeele (2015) for getting at what many that case there may have been nothing wrong with the first
researchers seem to really want—a type of partial effect known study. The analysis by Achen (2005) would have us wonder
as the “controlled direct effect.” Generally speaking, valid es- whether the change is a result of misspecification for X or W.
timation of such an effect requires more than just plopping A third possibility is bias amplification: there was residual
posttreatment variables into a linear regression specification. confounding in the first regression, but the second has only
Because of this, along with other endogeneity concerns, there amplified the bias and made things worse (Clarke 2005; Pearl
is no good reason to think that the coefficient in column 1 of 2010). Once we consider these possibilities alongside the
table 1 captures a meaningful partial effect either. conventional “omitted variables” interpretation, it is clear that
Other “omitted variables” fallacies arise in interpreting the the change in the coefficient on T has at least four explanations
consequences of changes to the set of control variables. Sup- for it, each being difficult if not impossible to distinguish!
pose we have a study suggesting that T affects Y on average, A better conclusion is that the conventional regression stud-
ies are deeply problematic in terms of their causal content. Re-
flecting on the indeterminacies that plague the search for con-
16. At the same time, we should be skeptical about whether table 1
trol variables in quantitative political science research, Clarke
conveys any meaningful causal relationships, given that we have no reason
to believe that any of the regressions succeed in either identifying the (2005) argued for “substituting research design for control
causal effect of fractionalization or applying proper functional forms. variables” and to “test broad theories in narrow, focused, con-
trolled circumstances” (349–50). This is precisely the reori- This requires understanding the processes through which
entation that the causal empiricist turn is trying to establish. treatment values are determined—what Imbens and Rubin
(2015, 34) describe as the “assignment mechanism.” Multiple
PILLARS OF CAUSAL EMPIRICISM regression is frequently used to analyze randomized experi-
Causal empiricism is an approach to quantitative empirical ments. The causal credibility is qualitatively higher than a mul-
analysis that pursues well-identified and specific causal facts. tiple regression study in which the regressor of interest is not
The pillars of causal empiricism that distinguish it from the randomly assigned.
prevailing convention include (i) realism about whether a But nature rarely provides sources of identifying varia-
research design is adequate to identify a causal effect and tion, and experiments require considerable effort. For this
(ii) realism about the specificity of empirical results. Neither reason, causal empiricism demands that empirical studies
statistical technique nor the goal of causal inference distin- give extraordinary attention to analyzing and characteriz-
guishes causal empiricism from the prevailing convention that ing sources of identifying variation. The arguments should
the previous section examined: causal empiricist research is be careful about what kinds of effects are identified and
sometimes based on regression techniques, and conventional they should draw on intimate knowledge and evidence re-
regression studies regularly aim to make causal inferences. If garding the topic (Titiunik and Sekhon 2012). Substantively
someone asks, “What is it that makes a causal empiricist study rich identification debates should be welcomed, such as the
special?,” the answer should be “careful use of an identifica- debates between Albouy (2012) and Acemoglu, Johnson, and
tion strategy research design and interpretation of the speci- Robinson (2001, 2012) over the use of settler mortality as
ficity of the results.” The following subsections develop these an instrument for colonial era institutional investments, or
ideas about the pillars of causal empiricism. between Fewerda and Miller (2014, 2015) and Kocher and
Monteiro (2015) over the exogeneity of the Vichy-German
Identification by design administrative border placement. Credibility of empirical
Conditions for causal identification are easy to state (pos- findings depends on being able to stand up to critiques based
itivity, conditional independence) but realizing them is not on in-depth knowledge.
easy. Sekhon (2009, 503) writes poignantly, “Without an The logic of causal empiricism is sometimes described as
experiment, a natural experiment, a discontinuity, or some the study of “the effects of causes” rather than the “causes
other strong design, no amount of econometric or statistical of effects” (Holland 1986). This is based on realism about
modeling can make the move from correlation to causation the difficulty of causal identification. To hope, much less
persuasive. This conclusion has implications for the kind demand, that a single paper investigate the effects of multi-
of causal questions we are able to answer with some rigor. ple treatments is a very tall order. In terms of identification,
Clear, manipulable treatments and rigorous designs are es- what would be required are factorial experiments or whatever
sential. And the only designs I know of that can be mass analogues there may be among natural experiments. For
produced with relative success rely on random assignment. natural experiments, each treatment would need its own spe-
Rigorous observational studies are important and needed. cific, in-depth evidence to make the identification credible.18
But I do not know how to mass produce them.” Thus, neither This will be a hard idea to accept for those steeped in the
the data-analysis technique, sample, nor control variables prevailing convention, where researchers regularly attempt
make the identification but rather the research design and the the heroic feat of trying to evaluate the causes of multiple
way that it exploits identifying variation (Freedman 1991; effects in one analysis—usually in one regression. These anal-
Rosenbaum 1999). Using an instrumental variables estimator yses may turn up intriguing correlations. But what the causal
does not imply causal identification if exclusion does not hold. empiricist asks is that audiences recognize the large gulf in
Using a matching estimator does not imply causal identifi- the credibility of causal facts established via strong identifi-
cation if there is no plausible basis for conditionally exogenous cation research designs and those produced through conven-
treatment assignment. We must be able to answer the ques- tional regression studies.
tion, For two units of causal observation that are identical in
terms of all important background characteristics, how could Specific causal facts
it be that they might differ in the treatments they receive?17 Causal empiricism is an approach that is realistic about the
specificity of the causal estimates that we can obtain. This is
17. Implicit here is the assumption that the “units of causal observation”
are ones for which the “stable unit treatment value assumption” (SUTVA) 18. Adjudicating between causal mechanisms is, however, much more
holds (Aronow and Samii 2015; Imbens and Rubin 2015, 11–12). in line with the “effects of causes” approach.
an implication of the fact that causal identification is difficult empiricism in these terms misses the point. Causal empiri-
to obtain. The local average treatment effect (LATE) theo- cism forces realism about what we should expect from paper-
rem (Angrist, Imbens, and Rubin 1996) is a formal expres- length research contributions and therefore forces us to think
sion of such realism about specificity. The LATE theorem in terms of research programs. Above, I explained why we
states that under a set of basic identifying conditions, an need to disabuse ourselves of two fantasies: (i) a single, paper-
instrumental variable identifies the average causal effect for length empirical analysis is likely to yield nonspecific causal
the subpopulation of units whose treatment status is in fact facts that generalize without strong assumptions, and (ii) there
moved by the instrument. Summary statistics describing this are ready techniques to produce such facts at will for pop-
subpopulation can be computed using the kappa-weighting ulations of one’s choosing. In a causal empiricist paper, the
results of Abadie (2003). The result from Aronow and Samii absence of novel theorizing does not have to mean that “theory
(2016) described above is a LATE-type result, showing that is being lost” but rather that theory is being held constant as
under the relevant identifying assumptions, linear regression we go about the difficult business of trying to do credible causal
estimates are consistent for the average causal effect local to a inference. This orientation respects not only the difficulty of
subpopulation whose traits can be characterized by reweight- causal inference, but it also respects the difficulty of developing
ing the nominal sample by the multiple regression weights. good theory that is capable of explaining a variety of facts.
Similarly, regression discontinuity identifies effects local to the Causal empiricism emphasizes research design in pursuit
relevant cut points, matching with calipers identifies effects of causal identification. Causal empiricist observational stud-
local to the region of common covariate support, experiments ies expend considerable space to justify identification. Experi-
identify effects local to the typically nonrepresentative sample mental studies typically have less to prove on this front. None-
of experimental subjects, and so on. Only under highly ideal theless, experimental papers tend to use considerable space
circumstances, which are unlikely to apply in political science to describe the research design. Naturally this will leave less
research, will we obtain empirical estimates that are imme- room to do other things, such as presenting new theoretical
diately general to some “global” population. The realistic con- models. But even if extra space were granted, any model build-
clusion to draw is that all quantitative empirical results that ing should be done under realism about specificity. A model
we encounter are “local” (Angrist and Pischke 2010, 23–24). that makes claims consistent with the evidence from one study
Such realism is sure to make many political scientists un- is limited in its generality by the study’s effective sample. This
comfortable. But under prevailing conventions, generaliza- suggests that model building is typically more compelling
tion from highly specific results is rampant with little recog- when it synthesizes results from numerous studies, hence the
nition that this is actually what is going on. As such, much proposal to work with existing models at times rather than
research is blind to the assumptions needed for such gener- always proposing a new model (Angrist and Pischke 2010, 23).
alization. This blindness is harmful in that causal questions The point is that there is no inherent tension between causal
are not given any more scrutiny than can be explored in a empiricism and theoretical modeling. An empiricist research
single contribution due to the sense that causal estimates from program builds up to general knowledge through incremental
one study actually answer the causal question generally. It accretion of credible findings across a diversity of settings
leads journal editors to reject studies that focus on obtain- (Keele 2015a, 104). This can happen either organically as re-
ing well-identified, if specific, estimates of important causal searchers happen to discover new opportunities for empirical
quantities for lack of novelty or for their specificity. Armed work, or consciously by deploying, across a variety of contexts,
with a better understanding of how empirical results are al- a set of studies designed to allow for comparative analysis of
ways local and assumptions necessary for effect homogeneity causal estimates. A recent issue of the American Economic
are often dubious, we should welcome the pursuit of oppor- Journal: Applied Economics focusing on six field experiments
tunities to estimate important, but not necessary novel, causal on micro-credit is exemplary (Banerjee, Karlan, and Zinman
effects for new subpopulations. This is how one tests and 2015), as are the “Metaketa” research programs managed by
bounds the scope of theoretical claims. the Evidence in Governance and Politics network (http://
egap.org/metaketa). Theoretical modeling could be conducted
CAUSAL EMPIRICISM AND THEORY in synthesis pieces in which more evidence can be assessed than
Some have charged that “theory is being lost” amid the turn to what is contained in a single paper-length empirical analysis.19
causal empiricism and that causal empiricism leads to the
pursuit of only narrow questions (Huber 2013). In a way, this 19. Dehejia, Pop-Eleches, and Samii (2015) discuss the current state of
is a valid concern, at least when it comes to evaluating what the art in the statistical synthesis of experimental and natural experimental
an individual paper aims to accomplish. But to evaluate causal results.
Such syntheses could consider not only credible quantitative A second theory-related critique comes from “structur-
causal research, but other types of empirical research such as alists” who find that causal empiricist research puts too little
descriptive statistical analyses, ethnographies, and other qual- effort into interpreting experiments and natural experiments
itative studies. Such a collection of credible facts establishes the on the basis of fully specified behavioral models (Deaton 2010;
puzzles and contours that theorists can use to assess the use- Heckman and Urzua 2010; Wolpin 2013). The structural
fulness of behavioral models and develop new hypotheses to approach to causality is indeed different in its goal of using
guide further empirical research. As Gehlbach (2015) discusses, data from specific settings to estimate parameters of gen-
it is unreasonable to think any individual or any single paper eral models of behavior (that is, “causes of effects” models;
could do all of these things well, which motivates the need for Heckman 2010, 361). The debate about causal inference be-
division of labor in a research program. tween those working in the structuralist versus empiricist (or
An experiment or natural experiment is especially inter- “reduced form”) traditions is mostly among economists; in
esting if it provides an opportunity to assess the value of political science journals, structural estimation is still almost
competing models of causal mechanisms. Empirical analyses exclusively applied in latent factor measurement (e.g., voter
do not “prove” or “disprove” models—as Clarke and Primo ideal points; Quinn, Martin, and Whitford 1999) or address-
(2012) discuss, to take this as the goal of empirical work is ing confounding due to strategic interaction (following Sig-
generally nonsensical, and even more so once we appreciate norino 1999). Nonetheless, my expectation is that as the pre-
that all estimates are “local.” Rather, credible empirical work vailing convention described above continues to wither, that
clarifies situations where one or another model is useful. the relationship between causal empiricism and structural
When certain models tend to guide policy or other decisions, causal analysis will become more important in political science.
it is crucial for empirical research to demarcate scope con- I have sympathy for the structuralist view and believe
ditions, expose areas where model propositions fail, and es- that there are fruitful ways to bridge these two approaches.
tablish the need for richer models. In labor economics, the First, we should be clear on where the two approaches tend
research program on minimum wage laws has followed such to agree. Current structural analyses, in economics at least,
an evolution, driven by causally well-identified studies and also emphasize identification and clear definition of coun-
prompting new models (see Schmitt [2013] for a review). terfactual comparisons (Heckman 2010). Gone are the days
An excellent venue for theoretical framing is a research when someone could get away with relying heavily on
design and analysis plan, where one can specify how a research structural assumptions to identify an average causal effect
design and empirical analyses allow one to assess competing in data that are plagued by endogeneity problems.22 The key
models. At present, research design and analysis plans tend to difference today, I would say, is in the two approaches’ re-
focus mostly on statistics.20 They should do much more the- spective treatments of the specificity issue.23 With structural
oretical framing, answering the question, what is at stake for estimation, identifying variation defines the opportunity to
competing models in the analyses being proposed? At con- estimate model parameters (or combinations of such param-
ferences and seminars researchers should be spending much eters), which are presumed to be invariant and therefore
more time discussing these kinds of model-framing research permit simulation of counterfactuals and generalization to
design and analysis plans—arguably, it is at this stage that new settings. For causal empiricists, counterfactual compar-
broad feedback is more critical than after the results are in.21 isons are limited to what the data identify directly and
generalization occurs only after a set of facts are obtained.
Moreover there is typically no a priori reason to believe
invariance assumptions for structural models. The logic of
20. See Humphreys and de la Sierra Raul Sanchez de la Sierra (2013)
and Monogan (2013) other contributions to the 2013 Political Analysis the LATE theorem and other “localness” results extend im-
symposium on research registration. mediately to attempts to estimate parameters in structural
21. Nyhan (2015) takes this idea even further, proposing that journals models, as Angrist, Graddy, and Imbens (2000) show in a
could make publication decisions on the basis of research design and
nonparametric analysis of a simultaneous-equations supply
analysis plans, so that contributions are assessed on the basis of their
theoretical framing and methods, rather than on the basis of whether they
find “significant” results. The cognition and neuroscience journal Cortex
has begun to apply this model in their “registered reports” section. A few 22. That ship began to sail as early as the publication of Lalonde
inter-institutional research working groups, including the Working Group (1986). Of course, instrumental variables have their origin in work on
in African Political Economy (WGAPE), Experiments in Governance and identifying structural models. But until recently exclusion restrictions
Politics (EGAP), and Northeast Workshop in Empirical Political Science were assumed in a manner that was very fast and loose and, by current
(NEWEPS) regularly devote space on conference agendas to research standards, quite unconvincing (Angrist and Pischke 2010).
designs and analysis plans. 23. Angrist and Krueger (1999, 1280) draw a similar distinction.
and demand model. So even if one identifies structural param- scientists (including myself) were trained. Perhaps most
eters from a given study, generalizability remains an open importantly, it reflects how journal editors tended to be
question. What is nice about the localness results is that they trained. I worry that it also reflects beliefs that such studies
provide ways to characterize the sets of units that contribute can generate credible and generalizable facts at will. The
to parameter estimates. first aim of this essay was to make clear that such beliefs are
A bridge between the two approaches is to view structural generally false. Those of us trained in conventional regres-
analysis as a tool for theoretical framing and interpretation sion methods have much to unlearn. Conventional regres-
that can inform the evolution of the research program. Con- sion studies rely on identifying variation that is out of the
sider the analysis by Wolpin (2013, 127–33) of the Project researcher’s control, and as such they generate estimates that
STAR class size experiment (Krueger 1999). His analysis are specific only to certain subpopulations. And this they
shows that the causal relations identified by the experiment only do if the regression methods are applied sensibly. Con-
may be too coarse to make predictions about what would ventional practice does not lead to sensible use of regression
happen if class size reductions were applied on a larger scale. and so even some of the most seminal findings from recent
This analysis defines what further research is necessary to political science research are dubious.
answer questions about consequences of scaling up. An ex- Causal empiricism represents a more realistic approach
ample of structural analysis for theoretical framing from to quantitative causal research, emphasizing the importance
political science is by Brollo and Nannicini (2012), who un- of good research design for causal identification and the spec-
pack a causal effect identified by a regression discontinuity ificity of the causal facts that are obtained even in the best
design in a study of political alignment and federal transfers. of circumstances. The second aim of this essay was to clarify
More ambitious would be efforts toward “structural synthe- these pillars of causal empiricism.
sis.” Causal empiricist research would deliver a set of credible Empirical contributions need to devote more space to re-
empirical results for which contextual conditions are clearly search design and characterization of the subpopulations
stated. Then, one would assess the restrictions these findings for which effects are identified. As such, a single empirical
imply on parameter values for behavioral models. In a 2011 contribution should only devote a limited amount of space to
special issue of Journal of African Economies, Fafchamps theory development. The gains from such revisions to the
(2011), Harrison (2011), McKenzie (2011), and Wantchekon concept of an individual empirical contribution should be the
and Guardado (2011) discuss structural analysis of random- accumulation of more credible findings. Once a set of such
ized controlled trials. Wolpin (2013) gives structural interpre- findings accumulates, we should be in a much better position
tations of a variety of experimental and natural-experimental to evaluate theoretical models in terms of the scope of their
results for examples in labor economics. Chetty (2009) dis- usefulness. The fascination with theoretical novelty in every
cusses ways to derive “sufficient statistics” from experimental empirical paper should be replaced with more appreciation of
and natural experimental results to inform model-based wel- work that brings increasingly refined empirical scrutiny to
fare analysis. Tamer (2010) reviews methods for using em- bear on existing theoretical models. This should excite mod-
pirical results to bound parameter combinations for structural elers as well because it would provide for them a richer set
models. Keniston (2011) gives a nice example of using struc- of facts to use when considering new directions. The third aim
tural methods to reanalyze experimental results to develop of this essay was to propose that credible empirical research
richer counterfactual implications. Current methods training should interact with theoretical models as part of research
should prepare students to analyze how behavioral models and programs and a division of labor. The next chapters in the
causal identification relate. “credibility revolution” may very well be in further synthesis
of causal inference and behavioral modeling.
CONCLUSION My focus on quantitative causal research does not imply a
Our journals continue to churn out conventional regression disregard for descriptive quantitative research or qualita-
studies that try to estimate causal effects and then interpret tive research. Descriptive regression studies and analyses of
them in general terms.24 This reflects how most political trends can define puzzles that establish research programs.
For example, the negative relationship between ethnic di-
versity and development described above was the product
24. As I was writing this essay, the most recent issue of the American of important scientific contributions to measurement and
Political Science Review (November 2015) contained four quantitative causal
studies, examining causal effects of discrimination, government transparency,
establishes an intriguing puzzle. The argument here is that
remittances, terrorism. All four were conventional regression studies in the in trying to move from intriguing relationships to causal
style described above. statements, credibility demands of researchers much more
effort and care in establishing sets of well-identified empir- Aronow, Peter M., and Cyrus Samii. 2015. “Estimating Average Causal
Effects Under General Interference.” Unpublished manuscript, Yale
ical results and interpreting the specificity of their findings
University and New York University.
than is the case under the prevailing convention. Aronow, Peter M., and Cyrus Samii. 2016. “Does Regression Produce Rep-
resentative Estimates of Causal Effects?” American Journal of Political
Science 60 (1): 250–67.
ACKNOWLEDGMENTS Ashworth, Scott, Christopher R. Berry, and Ethan Bueno De Mesquita.
Thanks to Pablo Argote for excellent research assistance 2015. “All Else Equal in Theory and Data (Big or Small).” PS: Political
and to Jeffery Jenkins, Peter Aronow, Jonathon Baron, Neal Science and Politics 48 (1): 89–94.
Avdeenko, Alexandra, and Michael J. Gilligan. 2015. “International Inter-
Beck, Kevin Bryan, Andrew Clarke, Michael Gilligan, Ma- ventions to Build Social Capital: Evidence from a Field Experiment
cartan Humphreys, Helen Milner, Elizabeth Levy Paluck, in Sudan.” American Political Science Review 109 (3): 427–49.
Brenton Peterson, and Sam Plapinger for helpful discussions. Banerjee, Abhijit V., Dean Karlan, and Jonathan Zinman. 2015. “Six Ran-
domized Evaluations of Microcredit: Introduction and Further Steps.”
American Economic Journal: Applied Economics 7 (1): 1–21.
REFERENCES Banerjee, Abhijit V., and Esther Duflo. 2009. “The Experimental Approach
Abadie, Alberto. 2003. “Semiparametric Instrumental Variable Estimation to Development Economics.” Annual Review of Economics 1:151–78.
of Treatment Response Models.” Journal of Econometrics 113:231–63. Bardhan, Pranab. 2013. “Little, Big: Two Ideas about Fighting Global
Abadie, Alberto. 2005. “Semiparametric Difference-in-Differences Estima- Poverty.” Boston Review (May/June), online journal.
tors.” Review of Economic Studies 72 (1): 1–19. Beath, Andrew, Fotini Christia, and Ruben Eniolopov. 2013. “Empow-
Abadie, Alberto, Alexis Diamond, and Jens Hainmueller. 2010. “Synthetic ering Women through Development Aid: Evidence from a Field Ex-
Control Methods for Comparative Case Studies: Estimating the Effect periment in Afghanistan.” American Political Science Review 107:
of California’s Tobacco Control Program.” Journal of the American 540–57.
Statistical Association 105 (490):493–505. Brady, Henry E., and David Collier, eds. 2010. Rethinking Social Inquiry:
Abadie, Alberto, and Javier Gardeazabal. 2003. “The Economic Costs of Diverse Tools, Shared Standards. 2nd ed. Lanham, MD: Rowman &
Conflict: A Case Study of the Basque Country.” American Economic Littlefield.
Review 93 (1): 113–32. Brollo, Fernanda, and Tommaso Nannicini. 2012. “Tying Your Enemy’s
Acemoglu, Daron, Simon Johnson, and James A. Robinson. 2001. “The Hands in Close Races: The Politics of Federal Transfers in Brazil.”
Colonial Origins of Comparative Development: An Empirical Inves- American Political Science Review 106 (4): 742–61.
tigation.” American Economic Review 91 (5): 1369–1401. Chetty, Raj. 2009. “Sufficient Statistics for Welfare Analysis: A Bridge between
Acemoglu, Daron, Simon Johnson, and James A. Robinson. 2012. “The Structural and Reducted Form Methods.” Annual Review of Economics
Colonial Origins of Comparative Development: An Empirical Inves- 1:451–87.
tigation: Reply.” American Economic Review 102 (6): 3077–3110. Clarke, Kevin A. 2005. “The Phantom Menace: Omitted Variable Bias in
Acharya, Avidit, Matthew Blackwell, and Maya Sen. 2016. “Explaining Econometric Research.” Conflict Management and Peace Science 22 (4):
Causal Findings without Bias: Detecting and Assessing Direct Effects.” 341–52.
American Political Science Review (forthcoming). Clarke, Kevin A., and David M. Primo. 2012. A Model Discipline: Political
Achen, Christopher H. 2005. “Let’s Put Garbage-Can Regressions and Science and the Logic of Representations. Oxford: Oxford University
Garbage-Can Probits Where They Belong.” Conflict Management and Press.
Peace Science 22 (4): 327–39. Deaton, Angus. 2010. “Instruments, Randomization, and Learning about
Albouy, David Y. 2012. “The Colonial Origins of Comparative Develop- Development.” Journal of Economic Literature 48:424–55.
ment: An Empirical Investigation: Comment.” American Economic Dehejia, Rajeev H., Cristian Pop-Eleches, and Cyrus Samii. 2015. “From
Review 102 (6): 3059–76. Local to Global: External Validity in a Natural Fertility Natural Ex-
Alesina, Alberto, and Eliana La Ferrara. 2005. “Ethnic Diversity and Eco- periment.” NBER Working paper 21459, National Bureau of Economic
nomic Performance.” Journal of Economic Literature 43 (3): 762–800. Research, Cambridge, MA.
Angrist, Joshua D., and Alan B. Krueger. 1999. “Empirical Strategies in de Rooij, Eline A., Donald P. Green, and Alan S. Gerber. 2009. “Field
Labor Economics.” In Orley C. Ahsenfelter David Card, eds., Hand- Experiments on Political Behavior and Collective Action.” Annual
book of Labor Economics. Vol. 3 Amsterdam: North Holland. Review of Political Science 12:389–95.
Angrist, Joshua D., Guido W. Imbens, and Donald B. Rubin. 1996. Dunning, Thad. 2008. “Improving Causal Inference: Strengths and Limita-
“Identification of Causal Effects Using Instrumental Variables.” Journal tions of Natural Experiments.” Political Research Quarterly 61 (2): 282–93.
of the American Statistical Association 91 (434): 444–55. Dunning, Thad. 2012. Natural Experiments in the Social Sciences: A Design-
Angrist, Joshua D., and Jorn-Steffen Pischke. 2009. Mostly Harmless Eco- Based Approach. Cambridge: Cambridge University Press.
nometrics: An Empiricist’s Companion. Princeton, NJ: Princeton Uni- Elwert, Felix, and Christopher Winship. 2014. “Endogenous Selection
versity Press. Bias: The Problem of Conditioning on a Collider Variable.” Annual
Angrist, Joshua D., and Jorn-Steffen Pischke. 2010. “The Credibility Review of Sociology 40:31–53.
Revolution in Empirical Economics: How Better Research Design Is Fafchamps, Marcel. 2011. “Randomised Controlled Trials or Structural
Taking the Con out of Econometrics.” Journal of Economic Perspectives Models (or Both . . . or Neither . . .)?” Journal of African Economies 20
24 (2): 3–30. (4): 596–99.
Angrist, Joshua D., Kathryn Graddy, and Guido W. Imbens. 2000. “The Fearon, James D., and David Laitin. 2003. “Ethnicity, Insurgency and Civil
Interpretation of Instrumental Variables Estimators in Simultaneous War.” American Political Science Review 97:75–90.
Equations Models with an Application to the Demand for Fish.” Re- Fearon, James D., Macartan Humphreys, and Jeremy M. Weinstein. 2015.
view of Economic Studies 67:499–527. “How Does Development Assistance Affect Collective Action Capac-
ity? Results from a Field Experiment in Post-Conflict Liberia.” Amer- Imbens, Guido W., and Donald B. Rubin. 2015. Causal Inference for
ican Political Science Review 109 (3): 450–69. Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge:
Fewerda, Jeremy, and Nicholas L. Miller. 2014. “Political Devolution and Cambridge University Press Press.
Resistance to Foreign Rule: A Natural Experiment.” American Political Imbens, Guido W., and Jeffrey M. Wooldridge. 2009. “Recent Develop-
Science Review 108 (3): 642–60. ments in the Econometrics of Program Evaluation.” Journal of Economic
Fewerda, Jeremy, and Nicholas L. Miller. 2015. “Rail Lines and Demar- Literature 47 (1): 5–86.
cation Lines: A Response.” SSRN Working paper 2628508. Jensen, Nathan M. 2003. “Democratic Governance and Multinational
Freedman, David A. 1991. “Statistical Models and Shoe Leather.” Socio- Corporations: Political Regimes and Inflows of Foreign Direct Invest-
logical Methodology 21:291–313. ment.” International Organization 57:587–616.
Gehlbach, Scott. 2015. “The Fallacy of Multiple Methods.” Comparative Keele, Luke. 2015a. “The Discipline of Identification.” PS: Political Science
Politics Newsletter 25 (2): 11–12. and Politics 48 (1): 102–6.
Gilligan, Michael J., and Ernest J. Sergenti. 2008. “Does Peacekeeping Keele, Luke. 2015b. “The Statistics of Causal Inference: A View from
Cause Peace? Using Matching to Improve Causal Inference.” Quarterly Political Methodology.” Political Analysis 23 (3): 313–35.
Journal of Political Science 3:89–122. Keniston, Daniel E. 2011. “Experimental vs. Strucural Estimates of the
Green, Donald P., Mary C. McGrath, and Peter M. Aronow. 2013. “Field Return to Capital in Microenterprises.” Unpublished manuscript, Yale
Experiments and the Study of Voter Turnout.” Journal of Elections, University.
Public Opinion and Parties 23 (1): 27–48. King, Gary, and Langche Zeng. 2006. “The Dangers of Extreme Counter-
Greene, William H. 2008. Econometric Analysis. 6th ed. Upper Saddle River, factuals.” Political Analysis 14 (2): 131–59.
NJ: Pearson. Kleinberg, Jon, Jens Ludwig, Sendhil Mullainathan, and Ziad Obermeyer.
Grose, Christian R. 2014. “Field Experimental Work on Political 2015. “Prediction Policy Problems.” American Economic Review: Papers
Institutions.” Annual Review of Political Science 17:355–70. and Proceedings 105 (5): 491–95.
Hainmueller, Jens, and Chad Hazlett. 2014. “Kernel Regularized Least Kocher, Matthew Adam, and Nuno P. Monteiro. 2015. “What’s in a Line?
Squares: Reducing Misspecification Bias with a Flexible and Interpret- Natural Experiments and the Line of Demarcation in WWII Occupied
able Machine Learning Approach.” Political Analysis 22 (2): 143–168. France.” SSRN Working paper 2555716.
Harrison, Glenn W. 2011. “Randomisation and Its Discontents.” Journal Krueger, Alan B. 1999. “Experimental Estimates of Education Production
of African Economies 20 (4): 626–52. Functions.” Quarterly Journal of Economics 114 (2): 497–532.
Hartzell, Carolyn, and Matthew Hoddie. 2003. “Institutionalizing Peace; Lalonde, Robert J. 1986. “Evaluating the Econometric Evaluations of
Power Sharing and Post Civil War Conflict Management.” American Training Programs with Experimental Data.” American Economic Re-
Journal of Political Science 47:318–32. view 76 (4): 604–20.
Heckman, James J. 2010. “Building Bridges between Structural and Pro- Lee, David S., and Thomas Lemieux. 2010. “Regression Discontinuity
gram Evaluation Approaches to Evaluating Policy.” Journal of Economic Designs in Economics.” Journal of Economic Literature 48:281–355.
Literature 48 (2): 356–98. Manski, Charles F. 1995. Identification Problems in the Social Sciences.
Heckman, James J., and Sergio Urzua. 2010. “Comparing IV with Struc- Cambridge, MA: Harvard University Press.
tural Models: What Simple IV Can and Cannot Identify.” Journal of McKenzie, David. 2011. “How Can We Learn Whether Firm Policies Are
Econometrics 156 (1): 27–37. Working in Africa? Challenges (and Solutions?) for Experiments and
Hernan, Miguel A., and James M. Robins. 2013. Causal Inference. Boca Structural Models.” Journal of African Economies 20 (4): 600–625.
Raton, FL: Chapman & Hall/CRC. Monogan, James E. 2013. “A Case for Registering Studies of Political
Hill, Jennifer. 2011. “Bayesian Nonparametric Modeling for Causal Infer- Outcomes: An Application in the 2010 House Elections.” Political
ence.” Journal of Computational and Graphical Statistics 20 (1): 217–40. Analysis 21 (1): 21–37.
Ho, Daniel E., Kosuke Imai, Gary King, and Elizabeth A. Stuart. 2007. Morgan, Stephen L., and Christopher Winship. 2015. Counterfactuals and
“Matching as Nonparametric Preprocessing for Reducing Model De- Causal Inference: Methods and Principles for Social Research, Second
pendence in Parametric Causal Inference.” Political Analysis 15 (3): Edition. Cambridge: Cambridge University Press.
199–236. Morton, Rebecca B. 1999. Methods and Models: A Guide to the Emprical
Holland, Paul W. 1986. “Statistics and Causal Inference.” Journal of the Analysis of Formal Models in Political Science. Cambridge: Cambridge
American Statistical Association 81 (396): 945–60. University Press.
Huber, John. 2013. “Is Theory Getting Lost in the ‘Identification Revo- Nyhan, Brendan. 2015. “Increasing the Credibility of Political Science
lution’?” The Money Cage. http://themonkeycage.org/2013/06/is-theory Research: A Proposal for Journal Reforms.” PS: Political Science and
-getting-lost-in-the-identification-revolution/. Politics 48 (S1): 78–83.
Humphreys, Macartan, and Raul Sanchez de la Sierra Raul Sanchez de la Ogden, Timothy. 2015. “Experimental Conversations: Angus Deaton.”
Sierra. 2013. “Fishing, Commitment, and Communication: A Proposal Medium.com (Accessed Oct 13, 2015).
for Comprehensive Nonbinding Research Registration.” Political Anal- Pearl, Judea. 2009. Causality: Models, Reasoning, and Inference, Second
ysis 21 (1): 1–20. Edition. Cambridge: Cambridge University Press.
Imai, Kosuke, and David A. van Dyk. 2004. “Causal Inference with General Pearl, Judea. 2010. “On a Class of Bias-Amplifying Variables that En-
Treatment Regimes: Generalizing the Propensity Score.” Journal of the danger Effect Estimates.” In Peter Grunwald and Peter Spirtes, eds.,
American Statistical Association 99 (467): 854–66. Proceedings of UAI. Corvallis, OR: AUAI, 417–24.
Imbens, Guido W. 2004. “Nonparametric Estimation of Average Treat- Petersen, Maya L., Kristin E. Porter, Susan Gruber, Yue Wang, and Mark
ment Effects under Exogeneity: A Review.” Review of Economics and J. Van der Laan. 2011. “Positivity.” In Mark J. Van der Laan and Sherri
Statistics 86 (1): 4–29. Rose, eds., Targeted Learning: Causal Inference for Obervational and
Imbens, Guido W. 2010. “Better LATE than Nothing: Some Comments on Experimental Data. New York: Springer, 162–86.
Deaton (2009) and Heckman and Urzua (2009).” Journal of Economic Prorok, Alyssa K. 2016. “Leader Incentives and Civil War Outcomes.”
Literature 48:399–423. American Journal of Political Science 60 (1): 70–84.
Quinn, Kevin M., Andrew D. Martin, and Andrew B. Whitford. 1999. “Voter Signorino, Curtis S. 1999. “Strategic Interaction and the Statistical Anal-
Choice in Multi-Party Democracies: A Test of Competing Theories and ysis of International Conflict.” American Political Science Review 93 (2):
Models.” American Journal of Political Science 43 (4): 1231–47. 279–97.
Robins, James M. 1997. Causal Inference from Complex Longitudinal Tamer, Elie. 2010. “Partial Identification in Econometrics.” Annual Review
Data. In M. Berkane, ed., Latent Variable Modeling and Applications to of Economics 2:167–95.
Causality. New York: Springer-Verlag, 69–117. Titiunik, Rocio, and Jasjeet Sekhon. 2012. “When Natural Experiments
Rosenbaum, Paul R. 1984. “The Consequences of Adjustment for a Are Neither Natural Nor Experiments.” American Political Science
Concomitant Variable That Has Been Affected by the Treatment.” Review 106 (1): 35–57.
Journal of the Royal Statistical Society, Series A 147 (5): 656–66. Van der Laan, Mark, and Sherri Rose. 2011. Targeted Learning: Causal In-
Rosenbaum, Paul R. 1999. “Choice as an Alternative to Control in Ob- ference for Obervational and Experimental Data. New York: Springer.
servational Studies.” Statistical Science 14 (3): 259–304. VanderWeele, Tyler J. 2015. Explanation in Causal Inference: Methods for
Rubin, Donald B. 2008. “For Objective Causal Inference, Design Trumps Mediation and Interaction. Oxford: Oxford University Press.
Analysis.” Annals of Applied Statistics 2 (3): 808–40. Wantchekon, Leonard, and Jenny Guardado. 2011. “Methodology Update:
Schmitt, John. 2013. Why Does the Minimum Wage Have No Discernible Randomised Controlled Trials, Structural Models and the Study of
Effect on Employment? Washington, DC: Center for Economic and Politics.” Journal of African Economies 20 (4): 653–72.
Policy Research Reports. Wolpin, Kenneth I. 2013. The Limits of Inference without Theory. Cambridge,
Sekhon, Jasjeet S. 2009. “Opiates for the Matches: Matching Methods MA: MIT Press.
for Causal Inference.” Annual Review of Political Science 12 (1): 487– Wooldridge, Jeffrey M. 2009. Introductory Econometrics: A Modern Ap-
508. proach. 4th ed. Mason, OH: South-Western.