Difference-in-Differences With Interference: Ruonan Xu
Difference-in-Differences With Interference: Ruonan Xu
Difference-in-Differences With Interference: Ruonan Xu
Ruonan Xu∗
Abstract
In many scenarios, such as the evaluation of place-based policies, poten-
tial outcomes are not only dependent upon the unit’s own treatment but also
its neighbors’ treatment. Despite this, “difference-in-differences” (DID) type
estimators typically ignore such interference among neighbors. I show in this
paper that the canonical DID estimators generally fail to identify interesting
causal effects in the presence of neighborhood interference. To incorporate in-
terference structure into DID estimation, I propose doubly robust estimators
for the direct average treatment effect on the treated as well as the average
spillover effects under a modified parallel trends assumption. The approach in
this paper relaxes common restrictions in the literature, such as partial inter-
ference and correctly specified spillover functions. Moreover, robust inference
is discussed based on the asymptotic distribution of the proposed estimators.
1 Introduction
According to the stable unit treatment value assumption (SUTVA), potential out-
comes only depend on one’s own treatment assignment. In many cases, SUTVA fails
∗
ruonan.xu@rutgers.edu, Rutgers University
1
due to an unknown interference structure among neighbors. In the fields of environ-
mental economics, urban economics, criminal justice, and many other fields of social
sciences, place-based policies often generate spillover effects. One example is mini-
mum wage increase in Seattle studied by Jardim, Long, Plotnick, van Inwegen, Vigdor, and Wething
(2022). Through the channels of competition in the regional labor market for work-
ers and the possibility of relocation of businesses, they find that significant spillover
effects on wages and hours are seen up to a 40-minute drive from Seattle city limits.
When spillover effects are of interest, one often needs to observe the entire pop-
ulation. For example, we can typically collect information about all counties in
the United States. In the example above, Jardim et al. (2022) use administrative
employment records in the state of Washington. If we take sampling from the super-
population/infinite population approach literally, what we are estimating turns out
to be the spillover effect in a researcher’s sample instead of in the population from
which the sample is drawn unless interactions are restricted within clusters of friends
or household members.
In this paper, I study the “difference-in-differences” (DID) type estimators that
allow interference from a finite population perspective, where inference is condi-
tional on covariates and the whole population is observed. This approach is clos-
est to the conditional inference discussed by Abadie, Imbens, and Zheng (2014) and
Jin and Rothenhäusler (2024) without reference to a superpopulation of covariates.
Conditional treatment effect parameters have also been mentioned in Abadie and Imbens
(2002), Imbens (2004), and Balzer, Petersen, and van der Laan (2015). Recently,
Viviano (2024) adopts the same inference framework when studying optimal treat-
ment allocation under network interference. The conditional inference approach
adopted here allows arbitrary spatial correlation and nonstationarity of covariates.
Meanwhile, stochastic potential outcomes allow the possibility of modeling the con-
ditional means of the outcome variables. Consequently, the proposed estimators
below are more robust to model specification with a straightforward causal interpre-
tation. One could legitimately argue that researchers should not stick to a single
inference framework. That said, many attribute variables containing locational in-
formation and neighborhood characteristics, such as landlocked status, are deemed
2
non-stochastic for spatial data. I therefore consider the current approach a natural
starting point for studying population interference/spillover effects.
One challenge of incorporating interference is the modeling of spillover functions.
There is no guarantee that the spillover function will be specified correctly since
the true interference structure is rarely known. It is common practice to come up
with a functional form of the spillover pattern within a fixed neighborhood bound-
ary. One leading example is to specify exposure as the average value of treatment
statuses of neighbors within a distance d of a unit i. (Exposure mapping is a “func-
tion that maps an assignment vector and unit-specific traits to an exposure value”;
see Aronow and Samii (2017) for detailed definition.) This particular functional
specification might be chosen because of its straightforward presentation and policy
relevance. That said, potentially misspecified exposure mapping prevents the ap-
plication of standard causal inference techniques, especially in the context of DID,
where there is not much guidance on how to incorporate interference to begin with.
To overcome misspecification, I consider the expected direct treatment effect at
certain neighborhood exposure levels as the parameter of interest inspired by Sävje
(2024). The causal estimands I define coincide with the exact direct treatment effect
when the exposure mapping is correctly specified and remain well-defined even under
misspecification. The proposed estimators are consistent for the estimands regardless
of the correct specification of the exposure mapping.
Namely, practitioners can still use the spillover function they choose based on
their domain/institutional knowledge. Nevertheless, the estimands defined and the
estimators proposed in the current paper provide robustness with respect to misspec-
ification of the chosen spillover function. Given the cutoff of the spillover effect is
typically unknown, the chosen neighborhood boundary can also differ from the true
one. Applying the device of approximate neighborhood interference (ANI) in Leung
(2022) to spatial data, the only requirement of the data generating process is that
treatments assigned to units further from i have a smaller, but possibly nonzero, ef-
fect on i’s response. In addition, the assignment variables are allowed to be spatially
correlated as is often the case in practice with spatial data.
Putting all the pieces together, I propose doubly robust estimators for the di-
3
rect treatment effect and spillover effect. The proposed doubly robust estimator is a
modified version of the augmented inverse probability weighting (AIPW) estimator,
which only requires correct specification of either the propensity scores of treat-
ment/exposure or the conditional mean of the outcomes. The conditional inference
approach in the current paper leads to a different variance-covariance matrix which
may require a new variance estimator when necessary.
Besides the main contribution above, there are two other sets of results worth
mentioning. In Section 3.1 below and Section B in the online appendix, I study
the identification of canonical DID estimators available in the literature. I provide
conditions under which canonical estimators can still identify meaningful causal es-
timands. This discussion alone would be of interest to practitioners. In Section C
of the online appendix, I clarify what toolkit practitioners can use by comparing
various dimension reduction approaches in the interference literature. I also show
how these dimension reduction approaches relate to the ANI device. The proposed
doubly robust estimators are applied in Section 6 to study the policy effect of special
economic zones (SEZ) in China. Section D of the online appendix summarizes the
detailed steps for direct effect estimation.
Literature: Most of the methodological literature studies spillover effects in a
single cross section of experimental data and assumes partial interference or limits in-
terference to immediate neighbors. Additionally, they assume that the function of de-
pendence on neighbors’ treatments is known and correctly specified. See, for instance,
Hudgens and Halloran (2008) and Aronow and Samii (2017). Delgado and Florax
(2015), Clarke (2017), and Butts (2021) allow interference in DID estimation in a
two-way fixed effects (TWFE) estimating equation (often without covariates) from
a superpopulation perspective. Huber and Steinmayr (2021) also propose a DID ap-
proach to estimate spillover effect and total effect within a superpopulation frame-
work.1 All four papers mentioned above share some or all restrictions of the general
interference literature. Design-based DID estimation has been studied by Arkhangelsky, Imbens, Lei, an
1
Their potential outcomes are defined as functions of individual and regional treatments, where
individual treatment status is a function of the regional treatment. Therefore, Huber and Steinmayr
(2021) is more applicable to studies of local equilibrium effects.
4
(2021), Athey and Imbens (2022), and Rambachan and Roth (2022), but they keep
the SUTVA. Sant’Anna and Zhao (2020) have proposed AIPW estimators in the
DID context, maintaining SUTVA and the superpopulation perspective.
2 Setup
2.1 Environment
I start with the relatively simple setting of panel data with two time periods; t =
1, 2 stands for the time period before and after treatment respectively. Consider a
sequence of lattices of (possibly) unevenly placed locations in Rd , {DM ⊆ Rd , d ≥ 1},
where M indexes the sequence of finite populations. Because I consider the case
where the sample coincides with the population for spatial data, I let the population
size |DM | diverge to infinity in deriving the asymptotic properties, where |V | denotes
the cardinality of a finite subset V ⊆ DM .
I briefly summarize the notation used throughout the paper. I adopt the metric
ρ(i, j) = max1≤l≤d |jl − il | in space Rd , where il is the l-th component of i. The
distance between any subsets K, V ⊆ DM is defined as ρ(K, V ) = inf{ρ(i, j) : i ∈
1/p
K and j ∈ V }. For any random vector X, kXkp = E kXkp , p ≥ 1, denotes its
Lp -norm. Lastly, C denotes a generic positive constant that may vary under different
circumstances.
For each unit i in the population, there is a stochastic assignment variable Wi ∈
{0, 1}, a vector of fixed attributes zi = (ziind , zineigh ) that possibly includes attributes
of i’s neighborhood zineigh in addition to individual characteristics ziind , and a vector
of stochastic unobservables Uit . The potential outcome function for any i ∈ DM is
defined as hit (·) : {0, 1}|DM | × Rdim(zi ) × Rdim(Uit ) → R. I emphasize the treatment
vector of the entire population by denoting the potential outcomes as yit (wi , w−i ) =
hit (wi , w−i , zi , Uit ), where w−i = {wj , j ∈ DM , j 6= i}.2 The dependence of the
2
As in Manski (2013), the potential outcome function defined here can be considered as the
response function, namely the reduced form of structural equations where the structural potential
outcome may depend on other units’ treatments as well as outcomes.
5
potential outcomes on the fixed attributes and stochastic unobservables is indicated
by its i, t subscript. The realized potential outcomes are denoted by Yit = yit (W ).
Notice that (W , z, Y , U ) = {(Wi , zi , Yit (·), Uit ), i ∈ DM , M ≥ 1} are triangular
arrays of random fields defined on a probability space (Ω, F , P ). Exposure mapping
is defined by the function Gi = G(i, W−i ) ∈ G, where G is a discrete set. Therefore,
G(·) maps the treatment status of all units except i to an exposure value.
The construction of the G(·) function deserves some explanation. In an ideal
scenario, empirical researchers would like to come up with a functional form of
G(·) that captures actual interactions among neighbors as well as conveys clear
causal explanations. Because of the unknown interference structure and the high
dimensional treatment assignment vector of the entire population, choosing a G(·)
that achieves both goals is challenging.3 For instance, a leading choice is Gi =
P P
j∈DM ,j6=i Aij Wj / j∈DM ,j6=i Aij , where Aij = 1 if the distance between units i and
j is within a certain cutoff d. Besides the somewhat arbitrary cutoff d, the impact of
i’s neighbors may not be exchangeable in reality, e.g., unit l may have greater influ-
ence on i compared to unit m. That said, the specification above might still capture
meaningful policy effects. Consequently, using domain knowledge in the context of
each specific empirical question, coming up with a G(·) that summarizes interesting
and relevant policy implications for both direct treatment and spillover effects might
be a good starting point. In this paper, I show how to proceed with DID estimation
with a chosen G(·) function that is potentially misspecified. For more discussion on
some common choices of G(·) and how they can be accommodated in the current
framework, I refer readers to Section C in the online appendix.
In line with common practices, empirical researchers can construct the exposure
mapping G(·) in the following manner, which might not necessarily coincide with the
true interference structure. Given a fixed K, define the K-neighborhood of the unit
i as
N (i, K) = {j ∈ DM : ρ(i, j) ≤ K, j 6= i}
3
Manski (2013) and Basse and Airoldi (2018) formally point out that there exist no consistent
treatment effect estimators under arbitrary interference. It is therefore necessary to make dimension
reduction assumptions about the interference structure in order to identify meaningful treatment
effect parameters.
6
Let wN (i,K) = (wj : j ∈ N (i, K)) be the treatment vector of units within i’s
′
K-neighborhood. There exists K < ∞ such that for all w−i and w−i such that
′ ′
wN (i,K) = wN (i,K), G(i, w−i ) = G(i, w−i ). As a result, the specified exposure map-
ping function restricts spillover effects within the immediate K-neighborhood of each
unit. Having said that, the actual potential outcome function does not restrict the
interference structure. Treatments of units outside of i’s K-neighborhood can le-
gitimately influence i’s potential outcome as long as treatments assigned to units
further from i have a smaller, but possibly nonzero, effect on i’s response. Section 4
below describes the related assumptions. This way, the exposure mapping function
is allowed to be misspecified. The G(·) function is allowed to be multidimensional,
in which the K distance would be the largest distance that interference is allowed
under the specification across the fixed dimensions of G(·).
which marginalizes over the treatment assignment vector. The overall direct effect is a
natural extension of the average treatment effect on the treated (ATT) as it coincides
with the ATT when units do not interfere (yi2 (Wi , W−i ) reduces to yi2 (Wi )).
Often the time, in addition to a summary of the direct effects, researchers can
7
also be interested in direct effect at different exposure levels. The overall direct
effect is highly related to the expected direct average treatment effect on the treated
(EDATT) at exposure levels g ∈ G defined in equation (1) below.4
1 X
τ (g) = E yi2 (1, W−i ) − yi2 (0, W−i)|Wi = 1, Gi = g, zi (1)
|DM | i∈D
M
where the expectation is taken over all possible realizations of W−i given the specified
exposure mapping G(i, W−i ) = g. The definition of the expected potential outcome
4
Their relationship is explained in equation (G.1) in Section G of the online appendix.
5
As shown in Section 3.1, the canonical DID estimand, which ignores interference, fails to identify
τ for spatially correlated assignments.
8
is different from what is initially proposed in Sävje (2024), in which the potential
outcome is fixed in an experimental setting and the expectation is with respect to
the assignment variables only. Not only that I split the entire treatment vector into
wi and w−i , but also the stochastic nature of the potential outcomes needs to be
taken into account. The randomness of the potential outcome function brings up
challenge to causal interpretation of the spillover effect estimand. Section A in the
online appendix provides more detailed reasoning.
In terms of the interpretation of EDATT, if the spillover effect and the direct
effect are additively separable, we can identify the exact direct ATT even if the
spillover function is misspecified. Without additivity, we can still identify the direct
ATT that would realize in expectation at the specified exposure level.
Under correct specification of the exposure mapping,
1 X
τ (g) = τ ∗ (g) = E ỹi2 (1, g) − ỹi2 (0, g)|Wi = 1, Gi = g, zi ,
|DM | i∈D
M
where ỹi2 (wi , g) = yi2 (wi , w−i ). What we identify in this case is exactly the direct
ATT at exposure level g.
9
the expected direct effect is lower when there is a higher proportion of neighboring
SEZ villages. This can help determine whether establishing an additional SEZ is
cost-effective.
On the basis of the SEZ dataset, I simulated how the usual DID estimates ig-
noring interference differ from the overall direct effect estimates. The logarithm of
output is first regressed on direct treatment assignment status Wi , exposure mapping
Gi defined above, as well as a list of covariates used in Lu et al. (2019) and their
interactions with Wi . The new output variables are generated as the sum of the
fitted values from the regression above and random draws from a standard normal
distribution. Nevertheless, I scale the coefficient for Gi from negative six folds to six
folds with one increment for each set of generated outcomes. In terms of magnitude,
the largest scaled coefficient on Gi is nearly half as large as the coefficient on Wi .
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
-6 -4 -2 0 2 4 6
scalar of the coefficient on G
In Figure 1, I plot the estimates based on the usual DID method, τ̂canonic , with-
out considering potential interference. In addition, I report the overall direct effect
estimates, τ̂ .6 Given the model specification, τ̂ remains the same under different
scalings. Based on 100 simulations for each scaling factor, the mean of the generated
6
τ̂ is calculated as a weighted average of the estimates of τ (1) and τ (0). The estimator for τ (g)
is presented in Section 4 below.
10
log output ranges between 9.46 and 9.96, which is pretty close to the mean of the
actual log output of 9.75.
In the actual dataset, τ̂canonic is 29.4% larger than τ̂ . A simple simulation study
shows that the average of τ̂canonic can be either 74.4% smaller or 116.0% larger than
the average of τ̂ in accordance with the magnitude of the spillover effect. By applying
larger scaling factors, it is possible to make the differences even more pronounced. As
shown in Remark 1 below, τcanonic is not the co-called total effect either. Therefore,
the usual DID estimates can be misleading for direct causal effects when there is
potential interference among units.
3 Identification
The first question when relaxing SUTVA is what the canonical DID estimator identi-
fies if spillover effects are incorrectly ignored. Namely, will the canonical DID estima-
tor still consistently estimate ATT in the presence of interference? Forastiere, Airoldi, and Mealli
(2021) discuss bias of the difference-in-means estimator when SUTVA is wrongly as-
sumed in observational studies on networks. To my knowledge, the literature has not
yet investigated DID type estimators. To facilitate the discussion of identification, I
impose the following assumptions.
Assumption 1. (Overlap) ∀ i ∈ DM , there exists ǫ > 0 such that ǫ < p(zi ) < 1 − ǫ,
π1g (zi ) > ǫ, and π0g (zi ) > ǫ, where
and
π0g (zi ) = P (Gi = g|Wi = 0, zi ). (4)
To simplify notation, I assume that the overlap assumption applies to every unit
in the population. With certain exposure mapping specifications, this might not be
plausible. An easy fix is to change the estimand by averaging over the subpopulation
11
where Gi can take on the value g. Failure to satisfy the overlap condition for p(zi ) is
trickier. If one is willing to move the goalpost by redefining the population, one can
drop units that always or never take treatment. The good news is that for the rede-
fined population, we can still observe the treatment assignment vector of the original
population since the treatment status of the dropped units is fixed and known. This
way, dropping the always or never takers will not affect the exposure mapping. On
the other hand, to deal with weak overlap conditions in practice without changing
the population or estimand, one can consider approaches proposed by Ma and Wang
(2020) and Man, Sant’Anna, Sasaki, and Ura (2023) to trim propensity scores and
correct the resulting bias simultaneously.
Assumption 2 requires that the potential outcome in the first time period prior to
treatment is always equal to the potential outcome without treatment nor spillover.
The no-anticipation assumption is quite standard in the literature, sometimes im-
plicitly assumed.
If we focus on the case of correctly specified exposure mapping, we could impose
the following parallel trends assumption: For any g ∗ ∈ G ∗ and ∀ i,
E ỹi2 (0, g ∗)|Wi = 1, G∗i = g ∗, zi − E yi1 (0, 0)|Wi = 1, G∗i = g ∗ , zi
(5)
=E ỹi2 (0, g ∗)|Wi = 0, G∗i = g ∗, zi − E yi1 (0, 0)|Wi = 0, G∗i = g ∗ , zi ,
where G∗i = G∗ (i, W−i ) stands for the unknown true exposure mapping and G ∗ is the
set of discrete values that G∗ can take. Notice that in the parallel trends assumption,
as no one is treated at t = 1 there is no spillover in the potential outcome function in
the first time period. Namely, moving from the first to the second time period, in the
absence of direct treatment the conditional mean of the potential outcomes for the
treated and the untreated with the same level of exposure in the second time period
follows the same trend. Equation (5) serves as our starting point for identification.
12
In order to provide a general framework for identifying EDATT, I impose the
following assumption instead.
1 X h i
E yi2 (0, W−i)|Wi = 1, Gi = g, zi − E yi1 (0, 0)|Wi = 1, Gi = g, zi
|DM | i∈D
M
1 X h i
= E yi2 (0, W−i)|Wi = 0, Gi = g, zi − E yi1 (0, 0)|Wi = 0, Gi = g, zi
|DM | i∈D
M
(6)
If one removes the outer average, and assumes that equality holds for each unit
i ∈ DM , Assumption 3 becomes the conventional conditional parallel trends. Notice
that equation (5) and Assumption 3 can be linked by the law of iterated expectations
invariant of the specified exposure mapping function if equality in equation (7) holds.
Assumption 3-A. The exposure mapping is correctly specified such that G∗i = Gi .
When G∗i = Gi , equation (7) holds trivially regardless of the spatial correlation
among the assignment variables. This is the standard setting where Assumption 3
reduces to a population average of equation (5) under correct specification of the
exposure mapping. In this case, individual assignment variables can be spatially
correlated because of clustered assignments at a more aggregate geographical level
or simply because of geographical homophily such as the distribution of natural
resources. Spatial correlation between assignment variables can also be induced
by “peer effects.” For instance, individual units take up treatments based on their
neighbors’ adoption. Therefore, the asymptotic theory in Section 4 below allows for
spatial correlation as long as weak dependence holds. When G∗i is correctly specified,
we can identify the exact direct ATT.
13
Assumption 3-B. Individual assignments are independent such that Wi ⊥
⊥ Wj |z, ∀ i 6=
j.
This is the case where the exposure mapping G(·) is allowed to be arbitrarily
misspecified. It is empirically relevant because the interference structure is typi-
cally unknown. Nevertheless, the second case rules out clustered/spatially corre-
lated assignments as well as neighbors’ influence on treatment uptake after par-
tialing out individual and neighborhood characteristics. Notice that zi can contain
attributes of i’s neighborhood. Recall that G∗i = G∗ (i, W−i ). It is not hard to see
that equality (7) holds when individual assignment variables are independent con-
ditional on zi = z. If G∗ (·) is a function of the treatment vector of units within i’s
K-neighborhood, i.e., G∗i = G∗ (i, WN (i,K)), zi can be a sub-vector of z, for instance,
zi = zN (i,K) = (zjind : j ∈ N (i, K)).
Assumption 3-C. Exposure mapping is misspecified in the sense that G∗i 6= Gi and
individual assignments are correlated but equation (7) holds.
14
and bordering counties, Assumption 3-C holds.7
In another example, neighbors’ identities could matter. Suppose units are par-
titioned into clusters consisting of three units. The other two units in i’s clus-
ter could be assigned treatment status (0,1) or (1,0) with the same probability
under the cluster assignment mechanism.8 The exposure mapping is specified as
P
Gi = Cj =Ci ,j6=i Wj /2. However, the true exposure is the assignment vector of the
two neighbors within the same cluster. Still, P (G∗i = g ∗ |Wi = 1, Gi = 1/2) =
P (G∗i = g ∗ |Wi = 0, Gi = 1/2) = 1/2 for g ∗ = (1, 0) or g ∗ = (0, 1), which are
two different exposures. In an example of information diffusion among villagers, the
identity of the initial information receivers could matter.
Lemma 3.1. Under equation (5), if any of Assumption 3-A, 3-B, or 3-C is satisfied,
Assumption 3 holds.
Lemma 3.1 summarizes the three cases underlying the general formalization of
the parallel trends assumption in Assumption 3. While the first two cases are easier
to argue, certain data generating processes may lead to the third case. Regardless,
Assumption 3 identifies the EDATT.
There is a growing literature on justification and falsification of the parallel trends
assumption under SUTVA; see, for instance, Ghanem, Sant’Anna, and Wüthrich
(2022) and Roth and Sant’Anna (2023). When parallel trends might be violated,
Rambachan and Roth (2023) present confidence sets for the identified set of treat-
ment effects. The extension of these analyses to parallel trends with interference is
outside the scope of the current paper.
7
The empirical example is used to demonstrate the third case, but it does not necessarily fit into
this specific data generating process.
8
The individual assignment mechanism consists of two steps. In the first step, each cluster has
its own assignment probability qc ∈ (0, 1) drawn from a distribution with variance σ 2 > 0. In the
second step, units within cluster c are assigned to treatment independently according to cluster
specific probability qc .
15
3.1 Canonical DID
The usual ATT under the SUTVA is
1 X
τ̃ = E yi2 (1) − yi2 (0)|Wi = 1, zi .
|DM | i∈D
M
Here, the potential outcomes are determined solely by unit i’s own treatment. Sup-
pose the canonical DID estimator consistently estimates
1 X h i
τcanonic = E(Yi2 − Yi1 |Wi = 1, zi ) − E(Yi2 − Yi1 |Wi = 0, zi ) .
|DM | i∈D
M
Examples include the TWFE linear estimating equation in Remark 1 in Sant’Anna and Zhao
(2020) under the additional restrictions of the data generating process therein, as well
as the inverse probability weighting (IPW) estimator in Abadie (2005). If the usual
(conditional) parallel trends assumption holds without interference, τcanonic would be
equivalent to τ̃ .
If SUTVA is violated, τ̃ is not well defined. Also, EDATT is generally determined
by the specified exposure level. As a result, I use the overall direct effect as a
benchmark for comparison.
Suppose that the parallel trends assumption (8) below holds for any g ∈ G, ∀ i.
E yi2 (0, W−i )|Wi = 1, Gi = g, zi − E yi1 (0, 0)|Wi = 1, Gi = g, zi
(8)
=E yi2 (0, W−i )|Wi = 0, Gi = g, zi − E yi1 (0, 0)|Wi = 0, Gi = g, zi
Using the law of iterated expectations, τ and τcanonic can be decomposed in the
following way:
1 X Xh i
τ= E(Yi2 |Wi = 1, Gi = g, zi) − E(Yi1 |Wi = 1, Gi = g, zi) P (Gi = g|Wi = 1, zi )
|DM | i∈D g∈G
M
Xh i
− E(Yi2 |Wi = 0, Gi = g, zi ) − E(Yi1 |Wi = 0, Gi = g, zi ) P (Gi = g|Wi = 1, zi )
g∈G
16
1 X Xh
τcanonic = E(Yi2 |Wi = 1, Gi = g, zi)
|DM | i∈D g∈G
M
i
− E(Yi1 |Wi = 1, Gi = g, zi ) P (Gi = g|Wi = 1, zi )
Xh i
− E(Yi2 |Wi = 0, Gi = g, zi) − E(Yi1 |Wi = 0, Gi = g, zi) P (Gi = g|Wi = 0, zi )
g∈G
Corollary 1. Proposition 1 also holds for τ ∗ (g) under the correct specification of the
exposure mapping.
Remark 1. When the exposure G takes two values zero and one, after a simple
calculation
1 X h
τcanonic =τ + E(Yi2 |Wi = 0, Gi = 1, zi ) − E(Yi2 |Wi = 0, Gi = 0, zi )
|DM | i∈D
M
i
− E(Yi1|Wi = 0, Gi = 1, zi ) − E(Yi1 |Wi = 0, Gi = 0, zi )
· P (Gi = 1|Wi = 1, zi ) − P (Gi = 1|Wi = 0, zi )
17
spillover effect is moderate compared with the direct effect, the difference between
τcanonic and τ can be sizable, as shown in the motivating example in Section 2.3
above.
In the empirical literature, researchers often augment the TWFE DID estimating
equation with another binary indicator of spillover. In the interest of space, I examine
the identification of the modified TWFE estimating equations in Section B of the
online appendix.
1 X
τ (g) = µ2,1g (zi ) − µ2,0g (zi ) − µ1,1g (zi ) − µ1,0g (zi ) . (11)
|DM | i∈D
M
18
To allow for more robustness against misspecification of the propensity scores
or the conditional means of the outcomes, the IPW-DID estimand can be extended
to an AIPW estimand. Let mt,wg (zi ) denote the model for equation (10). Denote
∆mt,g (zi ) = mt,1g (zi ) − mt,0g (zi ). Furthermore, let η(zi ), η1g (zi ), and η0g (zi ) be the
models for the propensity scores in equations (2)-(4), respectively. The doubly robust
estimand is
"
Wi 1{Gi = g}
τ (g) =ED Yi2 − m2,1g (zi ) − Yi1 − m1,1g (zi )
η(zi ) η1g (zi )
1 − Wi 1{Gi = g}
− Yi2 − m2,0g (zi ) − Yi1 − m1,0g (zi ) (12)
1 − η(zi ) η0g (zi )
#
+ ∆m2,g (zi ) − ∆m1,g (zi ) .
Proposition 2. Under Assumptions 1-3, equation (12) recovers the EDATT, τ (g),
as long as either the models for the propensity scores or the models for the conditional
means of the outcome are correctly specified.
Corollary 2. Proposition 2 also holds for τ ∗ (g) under the correct specification of the
exposure mapping.
3.3 Pre-trends
In empirical research, tests for pre-trends remain common despite the caution de-
scribed in Roth (2022). A placebo DID is typically applied to multiple periods
observed before treatment by imposing a hypothetical period of adoption of treat-
ment. Something similar can be done in the context of interference. In the simplest
case, suppose there are two time periods t = 0, 1 prior to treatment, by imposing the
19
placebo treatment between time periods 0 and 1, we would like to test
1 X h i
E yi1 (0, W−i)|Wi = 1, Gi = g, zi − E yi0 (0, 0)|Wi = 1, Gi = g, zi
|DM | i∈D
M
1 X h i
= E yi1 (0, W−i)|Wi = 0, Gi = g, zi − E yi0 (0, 0)|Wi = 0, Gi = g, zi .
|DM | i∈D
M
(13)
The potential outcome yi1 (0, W−i ) is not observable because no unit is treated in
time period 1. Nevertheless, under the no anticipation assumption, equation (13)
reduces to
1 X h i
E yi1 (0, 0)|Wi = 1, Gi = g, zi − E yi0 (0, 0)|Wi = 1, Gi = g, zi
|DM | i∈D
M
1 X h i
= E yi1 (0, 0)|Wi = 0, Gi = g, zi − E yi0 (0, 0)|Wi = 0, Gi = g, zi .
|DM | i∈D
M
(14)
Namely, one can test whether subgroups defined by the combination of direct treat-
ment status and exposure level have different trends before actual treatment. In
reality, the testing equation (14) is a test of both no anticipation and parallel pre-
trends.
20
implies that the individual propensity score function and the individual conditional
mean function of the outcome remain the same across units.
To make the estimators more robust to misspecification of these functions, one can
use various moment conditions to identify the propensity scores. One option is the
covariate balancing propensity scores (CBPS) in Imai and Ratkovic (2014) or simi-
larly the inverse probability tilting estimator in Graham, de Xavier Pinto, and Egel
(2012), which can be locally more robust than the propensity scores based on maxi-
mum likelihood estimation (MLE). The alternative would be estimating all functions
semiparametrically or nonparametrically, which is left as future work.
I denote a generic moment condition for propensity scores as
ED q1 (Wi , zi , γ1∗ ) = 0 (15)
and
ED q2 (Wi , Gi, zi , γ2∗ ) = 0, (16)
where zi can contain neighbors’ attributes. For instance, the moment conditions for
CBPS are
Wi (1 − Wi )
ED zi − zi = 0 (17)
P (Wi = 1|zi ) 1 − P (Wi = 1|zi )
and for g = 1, 2, . . . , G − 1,
1{Gi = g} 1{Gi = g − 1}
ED (Wi , zi ) − (Wi , zi ) = 0, (18)
P (Gi = g|Wi , zi ) P (Gi = g − 1|Wi , zi )
exp(z γ ∗ )
where P (Wi = 1|zi ) is some probability for a binary response, such as 1+exp(zi i1γ ∗ ) ,
1
and P (Gi = g|Wi , zi ) is some probability for discrete choices. Similarly, generic
conditional moment conditions are denoted by
ED q3 (Yi1 , Wi , Gi , zi , γ3∗ ) = 0 (19)
and
ED q4 (Yi2 , Wi , Gi, zi , γ4∗ ) = 0. (20)
21
Alternatively, one can model the conditional mean for ∆Yi = Yi2 − Yi1 and formulate
the moment condition as
ED q̃3 (∆Yi , Wi , Gi, zi , γ̃3∗ ) = 0. (21)
If there are only a few possible values that the exposure levels Gi can take, one can
alternatively model the conditional outcomes for the subpopulation with Wi = w
and Gi = g as a function of zi , separately. Leading cases for outcome regression are
moment conditions from (nonlinear) least squares. Lastly, the moment condition for
∗
τ (g) is a restatement of equation (12)10 . Denote θM = (γ1∗ ′ , γ2∗ ′ , γ3∗ ′ , γ4∗ ′ , τ (g))′ .
ED q5 (Yit , Wi , Gi , zi , θM
∗
)
"
Wi 1{Gi = g}
=ED Yi2 − m2,1g (zi ) − Yi1 − m1,1g (zi )
η(zi ) η1g (zi )
1 − Wi 1{Gi = g}
− Yi2 − m2,0g (zi ) − Yi1 − m1,0g (zi )
1 − η(zi ) η0g (zi )
#
+ ∆m2,g (zi ) − ∆m1,g (zi ) − τ (g) = 0 (22)
b
Let Xi = {Yit , Wi , Gi , zi }, q(Xi , θ) = (q1′ (γ1 ), q2′ (γ2 ), q3′ (γ3 ), q4′ (γ4 ), q5 (θ))′ , and Ψ
be the weighting matrix with dimensions larger or equal to that of θ.
1 X b 1
X
θ̂ = arg min q(Xi , θ)′ Ψ q(Xi , θ) (23)
θ∈Θ |DM | i∈D |DM | i∈D
M M
The GMM estimator is the solution to the finite population minimization problem
in equation (23). And the estimator of τ (g) is the last element of θ̂.
I impose the following assumptions to study the asymptotic distribution of the
GMM estimator.
10
In practice, it is recommended to normalize the weights for IPW type estimators. Changing the
moment condition with normalized propensity scores – where the weights sum to unity – does not
affect asymptotic normality of the GMM estimator. In fact, estimators with normalized weights
consistently show better finite sample performance in the simulations below.
22
Assumption 4. Suppose {DM } ⊆ Rd , d ≥ 1 is a sequence of finite sets such that
|DM | → ∞ as M → ∞. All elements in DM are located at distances of at least
ρ0 > 0 from each other, i.e., for all i, j ∈ DM : ρ(i, j) ≥ ρ0 ; w.l.o.g. I assume that
ρ0 > 1.
Consistent with the increasing domain asymptotics, the assumption of the mini-
mum distance ensures the expansion of the finite population region.
Assumption 5. (Approximate Neighborhood Interference) Let W (i,s) = WN (i,s) , WD′ M \N (i,s) ,
where W ′ is an independent copy of W , W (i,s,0) = WN (i,s), 0 , i.e., WD′ M \N (i,s) = 0,
and h i
κM (s) = max E yi2 (W ) − yi2 W (i,s,0)
z . (24)
i∈DM
23
Definition 1. A triangular array {Vi , i ∈ DM , M ≥ 1}, Vi ∈ Rν , is called ψ-
dependent if there exist uniformly bounded constants {κ̃M,s }s≥0 with κ̃M,0 = 1, and
a collection of nonrandom functions {ψh,h′ }h,h′∈N with ψh,h′ : Lν,h × Lν,h′ → [0, ∞)
such that for all (H, H ′ ) ∈ PM (h, h′ ; s) with s > 0 and all f ∈ Lν,h and f ′ ∈ Lν,h′ ,
Cov f (VH ), f ′(VH ′ )|z ≤ ψh,h′ (f, f ′ )κ̃M,s ,
Assumption 6. Let yit = φ(Wi , W−i , zi , Uit ), where φ(·) is some generic function
and Uit denotes the unobservables. Let ǫi = (Wi , Ui1 , Ui2 ). The random field ǫ =
{ǫi , i ∈ DM , M ≥ 1} is α-mixing under Definition 2 in Jenish and Prucha (2012).
The mixing coefficient is denoted by αǫ (u, v, r) ≤ (u + v)b
αǫ (r).
To adapt the limit theorems in Kojevnikov et al. (2021) to spatial data, I replace
the network denseness with the cardinality of the spatial sets implied by Lemma
A.1 in Jenish and Prucha (2009). As a result, Assumption 3.2 in Kojevnikov et al.
(2021) is modified as
Assumption 7.
∞
X
sd−1 κ̃M,s < ∞
s=1
Assumption 7 is in line with Assumption 3(b) in Jenish and Prucha (2009) for
α-mixing random fields.
24
2
P ′ ∗
Let σM = V ar i∈DM λ q(Xi , θM )|z for a nonzero vector λ. Similarly, Assump-
tion 3.4 in Kojevnikov et al. (2021) is modified as
2+k
sd−1 max N (i; rM ) \ N (j; s − 1) κ̃M,s p → 0
σM i∈DM s=0
j∈DM ,s≤ρ(i,j)<s+1
and
1−(1/p)
|DM |2 κ̃M,rM
→0
σM
as M → ∞, where p > 4 is that appears in Assumption A.1 in Appendix A.
Analogous conditions can be found in Jenish and Prucha (2009) as equations (B.18)
and (B.19) therein.
The notation used in the asymptotic distribution of the GMM estimator is intro-
duced below. Define
where
1 X ∗ ∗ ′
∆ehw,M = E q(Xi , θM )q(Xi , θM ) |z , (26)
|DM | i∈D
M
1 X ′
∗ ∗
∆E,M = E q(Xi , θM )|z E q(Xi , θM )|z , (27)
|DM | i∈D
M
1 X X ∗ ∗ ′
∆spatial,M = E q(Xi , θM )q(Xj , θM ) |z , (28)
|DM | i∈D
M j∈DM ,j6=i
25
and
1 X X ′
∆ES,M = ∗
E q(Xi , θM )|z E q(Xj , θM
∗
)|z . (29)
|DM | i∈D
M j∈DM ,j6=i
∆ehw,M and ∆spatial,M account for heteroskedasticity and spatial correlation respec-
tively, whereas ∆E,M and ∆ES,M are their finite population counterparts. Denote
∗
RM = ED ∇θ q(Xi , θM
∗
)
and
∗ ′ ∗
−1 ∗ ′ ∗ ∗ ′ ∗
−1
VM = RM ΨM RM RM ΨM ΩM ΨM RM RM ΨM RM , (30)
p
b − ΨM →
where Ψ 0.
−1/2
p ∗ d
VM |DM |(θ̂ − θM ) → N (0, Ik ).
Corollary 3. Under correct specification of the exposure mapping, Theorem 4.2 holds
∗
for θM = (γ1∗ ′ , γ2∗ ′ , γ3∗ ′ , γ4∗ ′ , τ ∗ (g))′.
where
1 X
R̂ = ∇θ q(Xi , θ̂)
|DM | i∈D
M
and
1 X s X
∞ X
Ω̃(θ) = ω q(Xi , θ)q(Xj , θ)′ .
|DM | s=0 bM i∈D
M j∈DM ,s≤ρ(i,j)<s+1
26
I impose the following assumption for the estimation of the variance-covariance ma-
trix.
(iii)
∞
1 X d−1 2d 1−4/p
s bM κ̃M,s → 0
|DM | s=0
as M → ∞, where bM = o |DM |1/2d and p > 4 is that appears in Assumption A.1
in Appendix A.
p
V̂ − (VM + VE ) → 0,
where
∗ ′ ∗
−1 ∗ ′ ∗ ∗ ′ ∗
−1
VE = RM ΨM RM RM ΨM ΩE ΨM RM RM ΨM RM
and
1 X s X
∞ X ∗
∗
′
ΩE = ω E q(Xi , θM )|z E q(Xj , θM )|z .
|DM | s=0 bM i∈D
M j∈DM ,s≤ρ(i,j)<s+1
Corollary 4. Under correct specification of the exposure mapping, Theorem 4.3 holds
∗
for θM = (γ1∗ ′ , γ2∗ ′ , γ3∗ ′ , γ4∗ ′ , τ ∗ (g))′.
27
Remark 2. When we choose kernel functions that produce positive semi-definite
(PSD) weighting matrix, the usual SHAC variance estimator is generally conserva-
tive for the finite population conditional spatial-correlation robust variance-covariance
matrix.
The conservativeness of the usual variance estimator for conditional variance has
also been investigated in Abadie et al. (2014) under the independence assumption
for the heteroskedasticity-robust variance matrix. I extend it to the case with spatial
correlation here when ΩE is PSD based on a PSD kernel weighting matrix. An
exception to Remark 2 is when E q(Xi , θM ∗
)|z = 0 for all i ∈ DM . In this case, the
usual variance-covariance matrix estimator is no longer conservative as VE = 0. With
heterogeneous direct treatment effect or misspecification of either the propensity
scores or conditional means, E q(Xi , θM
∗
)|z 6= 0.
That said, I would like to highlight a few points. First, because Ω̃(θ̂) is a con-
servative estimator for ΩM , even if we choose ΨM as the optimal weighting matrix
Ω−1 b
M , using Ψ = Ω̃(θ̂) in estimation is not going to achieve the most efficient GMM
estimator. The usual variance estimator is therefore conservative not only because
of the neglect of the additional terms in the variance-covariance matrix but also be-
cause the optimal weighting matrix is not consistently estimated. Of course, when
the model is just identified, the weighting matrix choice is irrelevant.
Second, unlike the finite population variance-covariance matrix in Xu and Wooldridge
(2022), the conditional spatial-correlation robust variance matrix is consistently es-
timable because it is no longer conditional on the unobserved potential outcomes.
There are different approaches one can take. However, since the usual SHAC vari-
ance estimator is known to suffer from downward bias especially when the spatial
correlation is high, it is not always necessary to estimate the smaller conditional
variance matrix.
28
5 Simulations
In the simulation, I show the finite sample performance of the proposed estimators
for EDATT. I consider an irregularly spaced lattice with M = 400 units. The
locations (s1,iM ,s2,iM ) are drawn once and kept fixed across replications. Each of
s1,iM and s2,iM is independently drawn from U(0, 20). The distance between units
i and j is measured by ρ(i, j) = max{|s1,iM − s1,jM |, |s2,iM − s2,jM |}. Units are
considered neighbors if ρ(i, j) ≤ 0.3 with the neighborhood structure summarized by
the normalized contiguity matrix, A. After ruling out units without neighbors, the
effective size of the subpopulation eligible for spillover reduces to 350.
I consider two time period panel data. The potential outcome function in the
first time period remains the same across different designs.
y1 (0, 0) = 1 + z + e1 ,
where z is the individual covariate independently drawn from the standard normal
distribution and kept fixed, while e1 is the first time period unobservable. There is
a single binary treatment variable W = 1{p(z ∗ ) > u} with ui ∼ U(0, 1). I vary
i.i.d
the second time period potential outcome function and the assignment probability
p(z ∗ ) in different designs summarized in Table 1 below. z ∗ = (z, zu ), where the
vector of zu in the assignment probability is drawn from a multivariate normal dis-
tribution with mean zero and a variance-covariance matrix equal to 0.5 raised to the
power of the distance between units. Thus, zu is a spatially correlated locational
covariate that stands for neighborhood similarity, which might be neglected in naive
estimation assuming away spillover effect. Along with the individual second time pe-
riod unobservable e2 , ei1 |Wi , W−i , zi ∼ N (Wi ∗ zi , 1), ei2 |Wi , W−i , zi ∼ N (Wi ∗ zi , 1)
and ei1 ⊥
⊥ ei2 |Wi , W−i , zi , ∀ i. The specified exposure mapping is denoted by
G = 1{AW > 0}, which may or may not coincide with the true interference struc-
ture.
I compare the following estimators: the canonical TWFE, Abadie’s IPW esti-
mator, the augmented TWFE, regression adjustment, IPW estimator with either
29
Table 1: Simulation Designs
Design Assignment probability Second period potential outcomes
1 p1 Y2 = 2 + W + G + z + e2
2 p1 Y2 = 2 + W + G + 2z + e2
3 p2 Y2 = 2 + W + G + 2z + e2
4 p2 Y2 = 2 + W + 0.2A ∗ Y2 + 2z + e2
5 p2 Y2 = 2 + W + 0.2A ∗ Y2 + 2z 2 + e2
6 p2 Y2 = 2 + W ∗ G + 2z + e2
1 exp(0.3z) exp(0.3z+0.8zu )
p1 = p(z) = 1+exp(0.3z) ; p2 = p(z ∗ ) = 1+exp(0.3z+0.8z u)
.
2
Y2 , W , z, and e2 stand for the M × 1 vector of Y2 , W , z, and e2 .
3
In designs 4 and 5, the parallel trends assumption holds approximately with the
difference between the trends being less than 0.001.
MLE or CBPS moment condition for the propensity scores, and the proposed AIPW
estimator with either MLE or CBPS moment condition for the propensity scores.
Section F of the online appendix contains the standard deviation of the proposed
estimators and the coverage rate of the 95% confidence intervals based on the usual
SHAC standard errors for the doubly robust estimators.
For the canonical Abadie’s IPW estimator, I only include z in the logit model of
W as interference is assumed away when employing the canonical DID estimators.
As an illustration of Proposition 1, I also report Abadie’s IPW estimator with z,
Az, and zu included in the logit model, which leads to conditional independence of
W and G. The estimation of the augmented TWFE is as in equation (B.1) in the
online appendix with Si = Gi . For the estimation of the proposed IPW, regression,
and AIPW estimator accounting for spillover effect, the propensity scores for W and
G are estimated based on a logit model on z, Az, and zu and a logit model on W ,
z, Az, and zu , respectively. For the first time period data, I regress Y1 on W , z,
and W ∗ z. As for the second period data, I regress Y2 on W , z, W ∗ z, and G.
All estimators involving weighting are weighted by the normalized propensity scores.
The results are summarized across 10,000 replications.
According to the population generating process, the direct effects are τ (1) =
τ (0) = 1 in designs 1-5 and τ (1) = 1, τ (0) = 0 in design 6. In the last design, the
30
Table 2: Expected Direct ATT
1 2 3 4 5 6
twfe 0.998 1.259 1.355 1.333 0.884 0.909
abadie(z) 0.999 1.000 1.098 1.069 1.147 0.654
abadie(z ∗ ) 0.997 0.997 0.999 1.028 1.031 0.607
atwfe1 1.003 1.272 1.250 1.289 0.842 1.250
atwfe0 0.998 1.248 1.270 1.296 0.771 0.270
ra1 1.001 1.001 1.001 1.041 1.139 0.692
ra0 1.001 1.001 1.001 1.041 1.139 0.692
ipw mle1 0.999 0.998 1.001 1.028 1.034 1.001
ipw mle0 1.007 1.018 1.042 1.088 1.083 0.042
ipw cbps1 0.998 0.996 1.000 1.026 1.038 1.000
ipw cbps0 1.009 1.019 1.037 1.081 1.076 0.037
dr mle1 1.002 1.002 1.002 1.028 1.037 1.001
dr mle0 0.997 0.997 0.999 1.045 1.084 0.001
dr cbps1 1.002 1.002 1.002 1.028 1.040 1.001
dr cbps0 0.997 0.997 0.999 1.042 1.072 0.001
1
twfe stands for the canonical TWFE estimator; abadie(z) stands
for the canonical Abadie’s IPW estimator including only z as
the covariate; abadie(z ∗ ) stands for the canonical Abadie’s IPW
estimator using z, Az, and zu as the covariates; atwfe stands
for the augmented TWFE estimator; ra stands for the regression
adjustment estimator; ipw mle stands for the proposed IPW es-
timator with MLE moment condition for the propensity scores;
ipw cbps stands for the proposed IPW estimator with CBPS mo-
ment condition for the propensity scores; dr mle stands for the
proposed doubly robust estimator with MLE moment condition
for the propensity scores; dr cbps stands for the proposed doubly
robust estimator with CBPS moment condition for the propen-
sity scores;
2
All estimators ending with 1 or 0 correspond to the estimator
for the direct treatment effect at exposure levels one or zero,
respectively.
3
τ (1) = τ (0) = 1 in designs 1-5; τ (1) = 1, τ (0) = 0 in design 6
with the overall direct effect being approximately 0.607.
31
overall direct effect is approximately 0.607. The point estimates for the direct effect
are summarized in Table 2. In designs 1 and 2, neighborhood similarity does not
drive treatment assignment. As a result, the canonical Abadie’s IPW estimator with
covariate z closely estimates the overall direct effect. The canonical TWFE only
performs well in design 1 as the estimating equation of TWFE rules out z-specific
time trends, which is violated in all other designs. The augmented TWFE estimators
suffer from the same linearity restriction in their estimating equation as the regular
TWFE. With the inclusion of both z and zu , Abadie’s IPW estimates are very close
to the overall direct effect.
The proposed estimators accounting for the spillover effects all perform relatively
well. Due to the specific exposure mapping functional form, the overlap condition
holds better for exposure level one than zero. Consequently, the point estimates
for the direct effect estimator at exposure level one are slightly more accurate than
the results for the estimator at exposure level zero. It is worth mentioning that the
propensity score model for G is always misspecified. The outcome regressions are
also misspecified in designs 4-6. Nevertheless, the estimates from the proposed IPW
and doubly robust estimators are all quite close to the truth and much more accurate
than the TWFE type estimators.
The doubly robust estimators improve upon regression adjustment and IPW
alone, especially at exposure level zero. The only exception is design 5. Since the
outcome regression is more severely misspecified than in other designs, we do not
see improvement moving from IPW to AIPW. Nevertheless, the AIPW estimates
are still better than the regression adjustment estimates. Estimators with CBPS
moment condition slightly improve upon estimators with MLE moment condition.
When the overlap condition holds weaker in other population generating processes,
exp(z+2zu )
for instance, changing the assignment probability to p(z ∗ ) = 1+exp(z+2z u)
, we can
see more noticeable improvement from using the CBPS moment condition instead
of the MLE moment condition. Moreover, the doubly robust estimator can perform
substantially better than the proposed IPW estimator at exposure level zero.
32
6 Empirical Illustration
I evaluate the effects of China’s special economic zones (SEZ) policy using the pro-
posed doubly robust estimators. SEZs are a prominent development strategy that
aims to foster agglomeration economies. The benefits of SEZs include corporate tax
concessions, customs duty exemptions, discounts on land use fees, and special bank
loan programs. SEZs are likely to affect neighboring non-SEZ areas through, for
instance, firm relocation or knowledge spillover.
The data for the empirical illustration come from Lu et al. (2019). There are
five waves of SEZ establishment in China. Each wave is different in nature and
targets different regions with earlier waves creating more national-level economic
zones.11 Since detailed village level data is only available starting from 2004, Lu et al.
(2019) focus on the latest wave of SEZs established between 2005 and 2008. China
established 663 SEZs at the provincial level in 2006, accounting for 42 percent of
the country’s SEZs. These SEZs cover the coastal, central, and western regions and
are considered small-scale regional SEZs. As a result, the policy effect is interpreted
as the treatment effect on villages that had not yet been treated prior to this wave.
This means that areas covered by zones from earlier waves are not included in the
finite population.
Lu et al. (2019) collect comprehensive data on China’s economic zones based on
the economic censuses conducted by China’s National Bureau of Statistics in 2004
and 2008 covering all manufacturing firms. Consequently, the entire finite popula-
tion of village-level data in 2004 and 2008 is observed, where 2004 is the period prior
to the treatment and 2008 post the treatment. The units of observation are vil-
lages, which are the most disaggregated geographical units and smaller than an SEZ.
Treated villages are referred to as SEZ villages. Unfortunately, the publicly avail-
able data from Lu et al. (2019) do not contain an identifier of villages nor distances
among villages. Nevertheless, I can match counties in which villages are located from
separate datasets published by Lu et al. (2019). In the organized dataset, there are
3,963 SEZ villages and 99,259 non-SEZ villages. It would be ideal to set certain
11
As a result, SEZ establishment in China cannot simply be considered as staggered adoption.
33
neighborhood boundaries for each village based on the distance between villages.
The (potentially misspecified) exposure mapping is then a function of neighboring
villages’ SEZ assignment status. In the absence of detailed geographical data, I con-
sider each village’s neighborhood to be its corresponding county, with its neighbors
being the other villages within the same county. Without distance measures, stan-
dard errors are clustered at the county level, which can be considered a special case
of spatial-correlation robust inference.
According to equation (12), the outcome variables Yit include the logarithm of
capital, employment, and output of firms in a village. The direct treatment variable
Wi is equal to one if village i is located within the boundaries of SEZs and zero
otherwise. As an illustration, exposure mapping is defined in two ways. In the first
specification, Gi is equal to one if at least one of the rest villages in the corresponding
county is a SEZ village; Gi is equal to zero if there is at least another village in the
corresponding county but none of the other villages in the county are treated. As
one can see, the finite population only includes villages with eligible neighbors. In
the second specification, I define SEZ ratio as the fraction of other SEZ villages over
the total number of other villages in a county. Gi is also a binary variable equal
to one if this ratio in county c in which village i is located is above the mean of
the ratio among all counties.12 In the latter specification, villages are considered as
intensively exposed to neighbors’ economic zones if the corresponding county contains
a relatively high fraction of SEZ villages.13
There are four baseline village characteristics including logs of a village’s distance
from an airport and port, log of the capital-to-labor ratio, and log of the number
of firms in the village in 2004. These baseline characteristics and their interactions
with the direct treatment variable are included as regressors in moment conditions
(19) and (20). The four baseline characteristics and their leave-one-out means at the
county level are covariates in moment conditions (17) and (18) for the propensity
scores. The spillover effects are estimated analogously using equations (A.3) and
(A.4) in the online appendix.
12
Counties on average contain 148.3 villages. On average, there are 5.8 SEZ villages in a county.
13
These spillover functions are not necessarily supposed to be correctly specified.
34
Table 3: Direct and Spillover Effects of Special Economic Zones
log capital log employment log output
Canonical DID 0.689 0.395 0.629
(0.049) (0.038) (0.051)
Panel A: exposure G = 1 if at least one neighbor SEZ village in the same county
Direct effect (1) 0.594 0.345 0.492
(0.053) (0.039) (0.052)
Direct effect (0) 0.634 0.400 0.567
(0.058) (0.048) (0.062)
Spillover effect (treated) -0.004 -0.008 0.002
(0.056) (0.047) (0.062)
Spillover effect (untreated) 0.036 0.046 0.078
(0.033) (0.027) (0.041)
Panel B: exposure G = 1 if county SEZ ratio above national average
Direct effect (1) 0.438 0.219 0.302
(0.070) (0.057) (0.072)
Direct effect (0) 0.679 0.441 0.645
(0.060) (0.045) (0.059)
Spillover effect (treated) -0.138 -0.096 -0.174
(0.089) (0.073) (0.091)
Spillover effect (untreated) 0.103 0.129 0.169
(0.033) (0.030) (0.042)
1
The standard errors are clustered at the county level.
2
Canonical DID is estimated using inverse probability weighted DID in Abadie (2005).
3
Panel A reports expected direct and spillover effects when the exposure mapping is defined as
a binary variable equal to one if there is at least one neighbor SEZ village in the corresponding
county and zero otherwise.
4
Panel B reports expected direct and spillover effects when the exposure mapping is defined as
a binary variable equal to one if the ratio of SEZ villages in a county is above the mean ratio
across all counties and zero otherwise.
5
Direct effect (1) and direct effect (0) are τ (1) and τ (0) respectively; spillover effect (treated)
and spillover effect (untreated) are τ (1, 1, 0) and τ (0, 1, 0) as defined in the online appendix
respectively.
35
The first row of Table 3 below reports DID estimates using the IPW approach
in Abadie (2005) with the four baseline village characteristics as covariates. Be-
cause of potential spillover effects, these canonical estimates are difficult to interpret
causally. The direct effects reported in Panels A and B are mostly smaller than the
canonical DID estimates, especially for SEZ villages with exposure level one. SEZ
establishment has positive and statistically significant direct effects at the 1% level.
This implies that the SEZ villages benefit from the program by gaining investment,
employing more labor, and producing more output.
Moving from Panel A to B, by increasing the intensity of spillovers from having
more neighboring SEZ villages, the direct effects of SEZ establishment, τ (1), decrease
quite a bit for all three outcomes. On the other hand, with relatively low spillover
intensity, τ (0) increases moderately. In terms of spillover effects, having neighboring
SEZ villages does not significantly affect economic activities in SEZ villages. The only
exception is for log output, where the spillover effect under more intensive exposure is
quite negative and marginally statistically significant. By contrast, Panel B reports
that non-SEZ villages benefit from SEZ neighbors, especially if there is a sufficient
number of SEZ villages in the same county. These patterns of direct and spillover
effects are not found in Lu et al. (2019).
As suggested in Section 3.3, I also examined pre-trends under both specifications
of exposure mapping as well as classical pre-trends for canonical DID estimation.
Unfortunately, there is only one economic census period prior to treatment. As
a result, the pre-trends are tested with China’s Annual Surveys of Industrial Firms
(ASIF) used by Lu et al. (2019), which contain more data in the pre-treatment period
but only cover firms with relatively large sizes. I use ASIF data from 2004 and 2005.
Given this is not the same dataset used for the main analysis, the results presented in
Table 4 are only demonstrative. For canonical DID, only the differential pre-trend for
log of employment is statistically significant at the 10% level with small magnitude.
None of the doubly robust estimates for placebo direct effects are significant.
36
Table 4: Direct Effects of Special Economic Zones: Pre-trends
log capital log employment log output
Canonical DID 0.000 0.051 0.039
(0.039) (0.029) (0.045)
Panel A: exposure G = 1 if at least one neighbor SEZ village in the same county
Direct effect (1) -0.041 0.024 -0.002
(0.050) (0.046) (0.065)
Direct effect (0) -0.017 0.048 -0.009
(0.052) (0.043) (0.056)
Panel B: exposure G = 1 if county SEZ ratio above national average
Direct effect (1) -0.042 0.022 0.102
(0.080) (0.083) (0.105)
Direct effect (0) -0.010 0.055 -0.015
(0.059) (0.048) (0.070)
1
ASIF data from 2004 and 2005 are used in the pre-trends test.
2
The standard errors are clustered at the county level.
3
Canonical DID is estimated using inverse probability weighted DID in Abadie (2005).
4
Panel A reports expected direct and spillover effects when the exposure mapping is defined as
a binary variable equal to one if there is at least one neighbor SEZ village in the corresponding
county and zero otherwise.
5
Panel B reports expected direct and spillover effects when the exposure mapping is defined as
a binary variable equal to one if the ratio of SEZ villages in a county is above the mean ratio
across all counties and zero otherwise.
6
Direct effect (1) and direct effect (0) are τ (1) and τ (0) respectively.
7 Conclusion
I propose doubly robust estimators for the expected direct treatment effect and
spillover effect in a DID context. The approach in the current paper is general in the
37
sense that misspecification of exposure mapping is allowed and interference is not
restricted within a fixed boundary of neighborhoods. Given arbitrary spillover effect,
one needs to account for spatial correlation when conducting inference. With the
entire population observed, the usual spatial-correlation robust variance estimator
could be conservative. The immediate extension of the current framework to multiple
time periods with common treatment timing is summarized in Section E in the online
appendix.
I provide identification results of the direct and spillover effect for the IPW es-
timand, outcome regression estimand, and the doubly robust estimand. From here,
researchers can approach these estimands using various parametric, semiparametric,
or nonparametric estimation methods. In the current paper, I proved the asymp-
totic properties of GMM-type parametric estimators as an illustration of estimation.
Given the inclusion of neighbors’ treatments and attributes in the propensity score
and the conditional mean functions, nonparametric estimation is attractive to allow
for arbitrary functional forms. This is left as future work.
References
Abadie, A. (2005), Semiparametric difference-in-differences estimators. Review of
Economic Studies 72(1), 1–19.
Abadie, A., Imbens, G.W., and Zheng, F. (2014), Inference for misspecified models
with fixed regressors. Journal of the American Statistical Association 109(508),
1601–1614.
Arkhangelsky, D., Imbens, G.W., Lei, L., and Luo, X. (2021), Double-robust
two-way-fixed-effects regression for panel data. Tech. rep., arXiv preprint
arXiv:2107.13737.
38
Aronow, P.M. and Samii, C. (2017), Estimating average causal effects under general
interference, with application to a social network experiment. Annals of Applied
Statistics 11(4), 1912–1947.
Balzer, L.B., Petersen, M.L., and van der Laan, M.J. (2015), Targeted estimation
and inference for the sample average treatment effect. Tech. rep., U.C. Berkeley
Division of Biostatistics Working Paper Series.
Basse, G.W. and Airoldi, E.M. (2018), Limitations of design-based causal inference
and a/b testing under arbitrary and network interference. Sociological Methodology
48(1), 136–151.
Delgado, M.S. and Florax, R.J. (2015), Difference-in-differences techniques for spatial
data: Local autocorrelation and spatial interaction. Economics Letters 137, 123–
126.
Forastiere, L., Airoldi, E.M., and Mealli, F. (2021), Identification and estimation of
treatment and interference effects in observational studies on networks. Journal of
the American Statistical Association 116(534), 901–918.
Gallant, A.R. and White, H. (1988), A unified theory of estimation and inference for
nonlinear dynamic models. Blackwell.
Ghanem, D., Sant’Anna, P.H., and Wüthrich, K. (2022), Selection and parallel
trends. Tech. rep., arXiv preprint arXiv:2203.09001.
39
Graham, B.S., de Xavier Pinto, C.C., and Egel, D. (2012), Inverse probability tilt-
ing for moment condition models with missing data. Review of Economic Studies
79(3), 1053–1079.
Hudgens, M.G. and Halloran, M.E. (2008), Toward causal inference with interference.
Journal of the American Statistical Association 103(482), 832–842.
Jardim, E.S., Long, M.C., Plotnick, R., van Inwegen, E., Vigdor, J.L., and Wething,
H. (2022), Boundary discontinuity methods and policy spillovers. Tech. rep., Na-
tional Bureau of Economic Research.
Jenish, N. and Prucha, I.R. (2009), Central limit theorems and uniform laws of large
numbers for arrays of random fields. Journal of econometrics 150(1), 86–98.
Jenish, N. and Prucha, I.R. (2012), On spatial processes and asymptotic inference
under near-epoch dependence. Journal of Econometrics 170(1), 178–190.
Jin, Y. and Rothenhäusler, D. (2024), Tailored inference for finite populations: con-
ditional validity and transfer across distributions. Biometrika 111(1), 215–233.
Kojevnikov, D., Marmer, V., and Song, K. (2021), Limit theorems for network de-
pendent random variables. Journal of Econometrics 222(2), 882–908.
40
Lu, Y., Wang, J., and Zhu, L. (2019), Place-based policies, creation, and agglomer-
ation economies: Evidence from china’s economic zone program. American Eco-
nomic Journal: Economic Policy 11(3), 325–360.
Ma, X. and Wang, J. (2020), Robust inference using inverse probability weighting.
Journal of the American Statistical Association 115(532), 1851–1860.
Man, Y., Sant’Anna, P.H., Sasaki, Y., and Ura, T. (2023), Doubly robust estimators
with weak overlap. Tech. rep., arXiv preprint arXiv:2304.08974.
Newey, W.K. and McFadden, D. (1994), Large sample estimation and hypothesis
testing. Handbook of Econometrics 4, 2111–2245.
Roth, J. (2022), Pretest with caution: Event-study estimates after testing for parallel
trends. American Economic Review: Insights 4(3), 305–22.
Roth, J. and Sant’Anna, P.H. (2023), When is parallel trends sensitive to functional
form? Econometrica 91(2), 737–747.
41
Sävje, F., Aronow, P., and Hudgens, M. (2021), Average treatment effects in the
presence of unknown interference. Annals of statistics 49(2), 673.
A Proofs
Definition 2. The random function g(Xi , θ) is said to be Lipschitz in parameter θ on
Θ if there is h(u) ↓ 0 as u ↓ 0 and b(·) : W → R such that supM,i∈DM E |b(Xi )| < ∞,
and for all θ̃, θ ∈ Θ, g(Xi , θ̃) − g(Xi , θ) ≤ b(Xi )h(kθ̃ − θk), i ∈ DM , M ≥ 1.
p
Assumption A.1. (i) Ψ b − ΨM → 0, where ΨM is positive semidefinite; (ii) Θ is
′
compact; (iii) let QM (θ) = ED q(Xi , θ) ΨED q(Xi , θ) . {QM (θ)} has identifiably
∗
unique minimizers {θM } on Θ as in Definition 3.2 in Gallant and White (1988);
(iv) q(Xi , θ) is continuously differentiable
h on int(Θ),i∀ i, M; (v) q(Xi , θ) is Lipschitz
in θ on Θ; (vi) supM,i∈DM E supθ∈Θ kq(Xi , θ)kp z < ∞ for some p > 4; (vii)
∗
θM ∈ int(Θ) uniformly in M, and ED q(Xi , θM ∗
) = 0; (viii) inf M λmin (ΩM ) > 0,
where λmin (·) i (ix) ∇θ q(Xi , θ) is Lipschitz in θ on Θ; (x)
h is the smallest eigenvalue;
supM,i∈DM E supθ∈Θ k∇θ q(Xi , θ)k2 z < ∞; (xi) RM ∗ ′ ∗
ΨM RM is nonsingular; (xii)
let li = l(Xi , θ) be a generic function standing for each element of either q(Xi , θ)
or ∇θ q(Xi , θ). ∀ θ ∈ Θ, l(Xi , θ) is Lipschitz in Xi on the domain of Xi such that
supM,i∈DM Lip(li ) < ∞.
42
Proof of Lemma 4.1:
(r) (r) (i,r,0)
Denote li = l Xi , θ = l yit W (i,r,0) , G i, W−i , Wi , zi , θ . Let f ∈ Lν,h
and f ′ ∈ Lν,h′ . Let s > 0 and (H, H ′) ∈ PM (h, h′ ; s). Define ξ = f (lH ), ζ = f ′ (lH ′ ),
(s) (s)
ξ (s) = f (li : i ∈ H), and ζ (s) = f ′ (li : i ∈ H ′).
First, for s ≤ 3 max{K, ρ0 }, we have
h i h i
≤h kf ′ k∞ Lip(f ) sup E li − li z + h′ kf k∞ Lip(f ′ ) sup E li − li
(s/3) (s/3)
z
M,i∈DM M,i∈DM
h i
≤ h kf ′ k∞ Lip(f ) + h′ kf k∞ Lip(f ′ ) sup Lip(li ) sup E Xi − Xi
(s/3)
z
M,i∈DM M,i∈DM
(A.3)
Since s/3 ≥ K,
(i,s/3,0)
Yi1 , yi2 W (i,s/3,0) , G i, W−i , Wi , zi = Yi1 , yi2 W (i,s/3,0) , G i, W−i , Wi , zi .
As a result,
h i h i
E Xi − Xi(s/3) z = E yi2 (W ) − yi2 W (i,s/3,0) z ≤ κM (s/3). (A.4)
(s/3)
For any fixed s, li is α-mixing under Assumption 6. By Proposition 2.2 in
43
Kojevnikov et al. (2021), the last term in equation (A.2) is bounded by
(s/3)
s d ′ s d s
C2 α l (h, h′ , s) ≤ C2 αǫ hC3 , h C3 , . (A.5)
3 3 3
1 X p
q(Xi , θ) − ED q(Xi , θ) → 0 (A.7)
|DM | i∈D
M
follows from Lemma 4.1 and Theorem 3.1 in Kojevnikov et al. (2021). Next,
1 X p
sup q(Xi , θ) − ED q(Xi , θ) → 0 (A.8)
θ∈Θ |DM | i∈D
M
follows from Corollary 3.1 in Newey (1991) and equation (A.7) under condition (v).
Also, ED q(Xi , θ) is uniformly equicontinuous. Let
b 1 X b 1
X
Q(θ) = q(Xi , θ)′ Ψ q(Xi , θ).
|DM | i∈D |DM | i∈D
M M
and QM (θ) is uniformly equicontinuous. The proof of equation (A.9) and the equicon-
tinuity is standard. One can follow, for instance, the proof of Theorem 3 in Jenish and Prucha
(2012).
44
Next, I prove the asymptotic normality. The key steps are to prove
−1/2 1 X d
∗
ΩM p q(Xi , θM ) → N (0, Ik ) (A.10)
|DM | i∈DM
and
1 X p
sup ∇θ q(Xi , θ) − ED ∇θ q(Xi , θ) → 0. (A.11)
θ∈Θ |DM | i∈D
M
45
Online Appendix for “Difference-in-Differences with
arXiv:2306.12003v5 [econ.EM] 30 May 2024
Interference”
Ruonan Xu∗
A Spillover Effect
In addition to the expected direct average treatment effect on the treated (EDATT),
empirical researchers might also be interested in spillover effects defined in equations
(A.1) and (A.2).
1 X
τ (1, g, g ′) = E yi2 (1, W−i )|Wi = 1, Gi = g, zi
|DM | i∈D
M (A.1)
− E yi2 (1, W−i )|Wi = 1, Gi = g , zi
′ ′
1 X
′
τ (0, g, g ) = E yi2 (0, W−i )|Wi = 0, Gi = g, zi
|DM | i∈D
M (A.2)
− E yi2 (0, W−i
′
)|Wi = 0, Gi = g ′, zi
The spillover effect contrasts the expected potential outcomes between levels g and
g ′ and could differ with or without direct treatment. A leading case would be setting
g ′ to 0. The identification of the spillover effect is more straightforward because po-
tential outcomes under direct assignment and the specified exposures are observable.
Nevertheless, I impose a further condition to facilitate causal interpretation of the
spillover effects.
∗
ruonan.xu@rutgers.edu, Rutgers University
1
Condition 1 ∀ i ∈ DM , yi2 (wi , w−i ) ⊥
⊥ W−i |Wi , zi .
1 X
τ (1, g, g ′) = E yi2 (1, W−i)|Wi = 1, Gi = g, zi − E yi2 (1, W−i′
)|Wi = 1, Gi = g ′ , zi
|DM | i∈D
M
X X
1
= E yi2 (1, w−i)|Wi = 1, W−i = w−i , zi P (W−i = w−i |Gi = g, Wi = 1, zi )
|DM | i∈D w−i ∈Ω
M
X
− E yi2 (1, w−i)|Wi = 1, W−i = w−i , zi P (W−i = w−i |Gi = g , Wi = 1, zi )
′ ′ ′ ′ ′ ′
′ ∈Ω
w−i
X
1 X
= E yi2 (1, w−i)|Wi = 1, zi P (W−i = w−i |Gi = g, Wi = 1, zi )
|DM | i∈D w−i ∈Ω
M
X
− E yi2 (1, w−i)|Wi = 1, zi P (W−i = w−i |Gi = g , Wi = 1, zi ) ,
′ ′ ′ ′
′ ∈Ω
w−i
where Ω = {0, 1}|DM |−1 . As a result, the spillover effect contrasts the expected poten-
tial outcome with direct treatment but weighted by different conditional probabilities
of neighbors’ treatment realization at either exposure g or g ′.
2
Analogously, the doubly robust estimands for the spillover effects are
"
Wi 1{Gi = g}
τ (1, g, g ′) =ED Yi2 − m2,1g (zi ) + m2,1g (zi )
η(zi ) η1g (zi )
# (A.3)
Wi 1{Gi = g ′ }
− Yi2 − m2,1g′ (zi ) − m2,1g′ (zi )
η(zi ) η1g′ (zi )
and
"
1 − Wi 1{Gi = g}
τ (0, g, g ′) =ED Yi2 − m2,0g (zi ) + m2,0g (zi )
1 − η(zi ) η0g (zi )
# (A.4)
1 − Wi 1{Gi = g ′}
− Yi2 − m2,0g′ (zi ) − m2,0g′ (zi ) .
1 − η(zi ) η0g′ (zi )
When the exposure mapping is correctly specified, the spillover effect reduces to
1 X
τ ∗ (1, g, g ′) = E ỹi2 (1, g)|Wi = 1, Gi = g, zi − E ỹi2 (1, g ′)|Wi = 1, Gi = g ′, zi
|DM | i∈D
M
1 X
= E ỹi2 (1, g)|Wi = 1, zi − E ỹi2 (1, g ′)|Wi = 1, zi .
|DM | i∈D
M
3
rent paper, I modify the estimating equation to be
where Wit = Wi ∗ 1{t = 2} and Sit = Si ∗ 1{t = 2}. β̂1 and β̂1 + β̂3 − β̂2 estimated
from equation (B.1) would be consistent for the EDATT defined by
1 X h i
τ̄ (0) = E yi2 (1, 0) − yi2 (0, 0)|Wi = 1, Si = 0
|DM | i∈D
M
and
1 X h i
τ̄ (1) = E yi2 (1, W−i) − yi2 (0, W−i)|Wi = 1, Si = 1
|DM | i∈D
M
suffers from the same drawbacks of the usual canonical TWFE regression for DID
estimation as pointed out by Remark 1 in Sant’Anna and Zhao (2020). These limi-
4
tations include implicitly imposing homogeneous treatment effects and homogeneous
time trends.
Remark B.1 Inspired by the modified TWFE estimating equation above, for any
specification of the exposure mapping one can instead augment two-way fixed effects
in a saturated way.
5
C.1 Partial Interference
The most popular approach to dimension reduction of the interference structure
is partial interference restricted within disjoint clusters. In Qu et al. (2022), their
potential outcome function is modeled as2
where c is the index of a cluster, yc,i and wc,i is the potential outcome and treatment
assignment of unit i in cluster c, and wc,(i),j is the treatment assignment of unit i’s
neighbors in the disjoint subset j of cluster c. Units within each of the m disjoint
subsets are exchangeable. As a result, the impact of wc,(i),j can be summarized by gc,j ,
which measures the number of treated neighbors in subset j of cluster c. Compared
with the assumption of fully exchangeable neighbors in cluster c, the partition of m
subsets allows for more heterogeneity of neighbors’ influence. This allows for a more
flexible interference structure.
If (C.1) is correctly specified, one can choose K to be maxc=1,...,C maxi,j∈c ρ(i, j).
Given bounded cluster sizes, K is finite. For all s > K and any i, yi (W ) −
yi (W (i,s) ) = 0. Therefore, potential outcomes in the form of (C.1) can be accommo-
dated in the approach I take. A trickier question is how to partition the m subsets
within each cluster c. On top of that, partial interference might be too strong an
assumption. If either the exchangeability or the partial interference assumption does
not hold, the approach in the current paper can still identify the expected exposure
effect as long as the interference from units further away is increasingly negligible.
6
tion is specified as
f 1 ({Wj }j∈DM ,j6=i ), · · · , f r ({Wj }j∈DM ,j6=i ) (C.2)
7
radius r to approximate the effective treatment according to the spillover function.4
Below, I provide another interpretation of the ANI assumption. Under correct speci-
fication of the spillover function, the ANI approach is not too different from the local
configuration approach.
According to the metric definition in Auerbach and Tabord-Meehan (2021), for
1
effective treatment g and g̃, if the distance d(g, g̃) ≤ 1+r
then Gri = G̃ri . Under
Assumption 4.5 therein,
h(g0 ) − h(g̃) ≤ φ d(g0, g̃) , (C.3)
Examples 2.1 and 2.2 in Auerbach and Tabord-Meehan (2021) are essentially
examples of Sections C.1 and C.2, and hence I focus on their Example 2.3 – the
linear-in-means peer effects model. Assuming correct specification,
1 X
Yi = α + δ Yj + Wi γ + ei ,
ni j∈P
i
where Pi is the peer group of unit i with size ni . As usual, |δ| < 1. The reduced
form of the potential outcome is solved to be
S
X
Yi = lim hs (Gsi , Ui ) = h(Gi , Ui )
S→∞
s=1
4
See Manski (2013) for the definition of “effective treatment.” The terms “exposure mapping”
from Aronow and Samii (2017) and “effective treatment” from Manski (2013) are used interchange-
ably throughout the text.
8
1
for some functions hs and h. Hence, for d(g, g̃) ≤ 1+r
,
D Practical Guide
The following steps summarize the procedure for estimation of the EDATT. Spillover
effects can be estimated in a similar manner.
3. Set up models for the propensity scores P (Wi = 1|zi ), P (Gi = g|Wi = 1, zi ),
and P (Gi = g|Wi = 0, zi ). Also set up models for the conditional means of the
outcomes in both time periods, E(Yit |Wi = w, Gi = g, zi ).
5. Estimate the GMM model given in Step 4 and conduct spatial-correlation ro-
bust inference. The EDATT estimate is the last element of the GMM estimates.
5
I refer readers to Auerbach and Tabord-Meehan (2021) for the introduction to notation and
more detailed derivation.
9
E Multiple Time Periods with Common Treat-
ment Timing
Extension to multiple time periods is straightforward. With common treatment
timing, the simplest approach is to aggregate the time periods prior to and post
treatment into a single time period, again denoted t = 1, 2. With the aggregated
data, we can directly apply the results in the main text. Alternatively, we might
be interested in the EDATT at different time periods. Denote the time periods by
{−T , . . . , −1, 0, 1, . . . , T }. Without loss of generality, suppose treatment starts at
t = 2. For any t ≥ 2, the EDATT for time period t at exposure level g is defined as
1 X
τt (g) = E yit (1, W−i) − yit (0, W−i )|Wi = 1, Gi = g, zi (E.1)
|DM | i∈D
M
10
F Additional Simulation Results
I examine the inference performance of doubly robust estimators with finite samples
in this section. In the main text, Section 5 describes how the population is generated.
The standard deviation of the τ (1) estimates is summarized in the top panel of Table
F.1 below. Regression adjustment comes with the smallest standard deviation. It is
more interesting to see that the standard deviation of the doubly robust estimates can
be one third smaller than that of the IPW estimates. With moderate misspecification,
we can still see efficiency gains from using the doubly robust estimator.
The bottom panel of Table F.1 summarizes the coverage rate of the 95% con-
fidence interval based on the usual standard error of the doubly robust estimator
with CBPS moment conditions. In this population generating process, the EHW
standard errors work well in designs 1-3 and 6. In designs 4-5, misspecification of
the linear-in-means outcome model induces more spatial correlation. As a result, the
confidence interval based on the SHAC standard errors provides better coverage than
that based on the EHW standard errors. With homogeneous direct treatment effect
and 350 effective population size, we do not see over coverage of the 95% confidence
interval based on the usual standard errors. Here, the conservativeness of the usual
standard errors is due to misspecification of the propensity scores and conditional
means. The typical downward bias of the SHAC standard errors in finite samples
also lowers the coverage rate.
I introduce an additional design with heterogeneous direct treatment effects.
There are now 900 units in the lattice. Among them, 612 units have neighbors
and are thus eligible for spillover. The individual treatment assignment probabil-
exp(0.3z+0.8zu )
ity remains as p(z ∗ ) = 1+exp(0.3z+0.8z u)
but the second period potential outcomes
are Y2 = 2 + 3z ⊙ W + 0.2A ∗ Y2 + 2z 2 + e2 . As a result, τ (1) = τ (0) = 0 for
the entire population. For the subpopulation composed of units with neighbors,
τ (1) = τ (0) = −0.016. Point estimates follow similar patterns as in Table 2 in the
main text. The biases of the doubly robust estimator with CBPS moment conditions
are -0.017 and -0.021 for τ (1) and τ (0), respectively. In this design, we do see (sub-
stantive) over coverage of the 95% confidence intervals for the average direct effect
11
Table F.1: Standard Deviation and Coverage of CI: τ (1)
1 2 3 4 5 6
standard deviation
12
and parameters in the other moment conditions in the GMM estimation.
Table F.2 below summarizes results for a subset of the GMM parameters. The
first five columns are coverage rates for the parameters in q2 , the moment condition
for the propensity score for G. The next five columns are coverage rates for the
parameters in the outcome regression moment condition in the second time period,
q4 . The last two columns are coverage rates for the two direct effects at exposure
levels one and zero. Because of the spatial correlation induced by spillover, the SHAC
standard errors are the appropriate ones to be considered. As expected, the EHW
standard errors can be a bit too small when spatial correlation is nonnegligeble.
Because the usual standard errors tend to be conservative, the coverage rates of the
confidence interval constructed using the SHAC standard errors with appropriate
bandwidth can go above the nominal level of 0.95 with some coverage rates above
0.99.
G Additional Proofs
Proof of Proposition 1
Compare the canonical DID estimand with EDATT:
1 X
τ= E yi2 (1, W−i ) − yi2 (0, W−i )|Wi = 1, zi
|DM | i∈D
M
13
1 X X
= E yi2 (1, W−i) − yi2 (0, W−i)|Wi = 1, Gi = g, zi P (Gi = g|Wi = 1, zi )
|DM | i∈D g∈G
M
1 X Xn
= E yi2 (1, W−i )|Wi = 1, Gi = g, zi − E yi1 (0, 0)|Wi = 1, Gi = g, zi
|DM | i∈D g∈G
M
h io
− E yi2 (0, W−i )|Wi = 0, Gi = g, zi − E yi1 (0, 0)|Wi = 0, Gi = g, zi P (Gi = g|Wi = 1, zi )
1 X Xh i
= E(Yi2 |Wi = 1, Gi = g, zi) − E(Yi1 |Wi = 1, Gi = g, zi) P (Gi = g|Wi = 1, zi )
|DM | i∈D g∈G
M
Xh i
− E(Yi2 |Wi = 0, Gi = g, zi ) − E(Yi1 |Wi = 0, Gi = g, zi ) P (Gi = g|Wi = 1, zi )
g∈G
(G.1)
1 X h i
τcanonic = E(Yi2 − Yi1 |Wi = 1, zi ) − E(Yi2 − Yi1 |Wi = 0, zi )
|DM | i∈D
M
1 X Xh i
= E(Yi2 |Wi = 1, Gi = g, zi) − E(Yi1 |Wi = 1, Gi = g, zi) P (Gi = g|Wi = 1, zi )
|DM | i∈D g∈G
M
Xh i
− E(Yi2 |Wi = 0, Gi = g, zi) − E(Yi1 |Wi = 0, Gi = g, zi) P (Gi = g|Wi = 0, zi )
g∈G
(G.2)
Proof of Proposition 2
Identification of the doubly robust estimand:
When the propensity scores are correctly specified, η(z) = p(z), η1g (z) = π1g (z),
and η0g (z) = π0g (z).
" #
Wi 1{Gi = g}
E Yi2 − m2,1g (zi ) − Yi1 − m1,1g (zi ) zi
p(zi ) π1g (zi )
" #
Wi 1{Gi = g}
=E Yi2 − m2,1g (zi ) − Yi1 − m1,1g (zi ) zi , Wi = 1 P (Wi = 1|zi )
p(zi ) π1g (zi )
1{Gi = g}
=E Yi2 − m2,1g (zi ) − Yi1 − m1,1g (zi ) zi , Wi = 1, Gi = g P (Gi = g|Wi = 1, zi )
π1g (zi )
14
=E(Yi2 |zi , Wi = 1, Gi = g) − E(Yi1 |zi , Wi = 1, Gi = g) − m2,1g (zi ) − m1,1g (zi )
(G.3)
Similarly,
" #
1 − Wi 1{Gi = g}
E Yi2 − m2,0g (zi ) − Yi1 − m1,0g (zi ) zi
1 − p(zi ) π0g (zi )
=E(Yi2|zi , Wi = 0, Gi = g) − E(Yi1|zi , Wi = 0, Gi = g) − m2,0g (zi ) − m1,0g (zi )
(G.4)
Hence,
"
Wi 1{Gi = g}
ED Yi2 − m2,1g (zi ) − Yi1 − m1,1g (zi )
p(zi ) π1g (zi )
#
1 − Wi 1{Gi = g}
− Yi2 − m2,0g (zi ) − Yi1 − m1,0g (zi ) + ∆m2,g (zi ) − ∆m1,g (zi )
1 − p(zi ) π0g (zi )
1 X
= E(Yi2 |zi , Wi = 1, Gi = g) − E(Yi1 |zi , Wi = 1, Gi = g)
|DM | i∈D
M
− E(Yi2 |zi , Wi = 0, Gi = g) − E(Yi1 |zi , Wi = 0, Gi = g)
− m2,1g (zi ) − m1,1g (zi ) − m2,0g (zi ) − m1,0g (zi ) + ∆m2,g (zi ) − ∆m1,g (zi )
1 X
= E yi2 (1, W−i) − yi2 (0, W−i )|Wi = 1, Gi = g, zi (G.5)
|DM | i∈D
M
When the conditional means are correctly specified, mt,wg (z) = µt,wg (z).
" #
Wi 1{Gi = g}
E Yi2 − µ2,1g (zi ) − Yi1 − µ1,1g (zi ) zi
η(zi ) η1g (zi )
" #
Wi 1{Gi = g}
=E Yi2 − µ2,1g (zi ) − Yi1 − µ1,1g (zi ) zi , Wi = 1 P (Wi = 1|zi )
η(zi ) η1g (zi )
p(zi ) 1{Gi = g}
= E Yi2 − µ2,1g (zi ) − Yi1 − µ1,1g (zi ) zi , Wi = 1, Gi = g P (Gi = g|zi , Wi = 1)
η(zi ) η1g (zi )
15
p(zi ) π1g (zi ) h i
= E(Yi2 |zi , Wi = 1, Gi = g) − µ2,1g (zi ) − E(Yi1 |zi , Wi = 1, Gi = g) − µ1,1g (zi ) = 0
η(zi ) η1g (zi )
(G.6)
Analogously,
" #
1 − Wi 1{Gi = g}
E Yi2 − µ2,0g (zi ) − Yi1 − µ1,0g (zi ) zi = 0 (G.7)
1 − η(zi ) η0g (zi )
As a result,
"
Wi 1{Gi = g}
ED Yi2 − µ2,1g (zi ) − Yi1 − µ1,1g (zi )
η(zi ) η1g (zi )
#
1 − Wi 1{Gi = g}
− Yi2 − µ2,0g (zi ) − Yi1 − µ1,0g (zi ) + ∆µ2,g (zi ) − ∆µ1,g (zi )
1 − η(zi ) η0g (zi )
1 X
= ∆µ2,g (zi ) − ∆µ1,g (zi )
|DM | i∈D
M
1 X
= E yi2 (1, W−i) − yi2 (0, W−i )|Wi = 1, Gi = g, zi (G.8)
|DM | i∈D
M
1 X X ∗ ∗ ′
= E q̃(Xi , θM )q̃(Xj , θM ) , (G.9)
|DM | i∈D j∈D
M M
where
∗ ∗
∗
q̃(Xi , θM ) = q(Xi , θM ) − E q(Xi , θM )|z (G.10)
16
with E q̃(Xi , θM
∗
)|z = 0.
Since any sequence of symmetric matrices {AN } converges to a symmetric matrix
{A0 } if and only if c′ AN c → c′ A0 c for any vectors c, we can reach our conclusion by
taking an arbitrary linear combination of q(Xi , θ). From now on, I focus on the case
of a scalar q(Xi , θ).
∗ ∗
Ω̃(θ̂) − ΩM − ΩE ≤ Ω̃(θ̂) − Ω̃(θM ) + Ω̃(θM ) − ΩM − ΩE . (G.11)
For the first term on the right hand side of (G.11), take a mean value expansion
∗
of Ω̃(θ̂) around θM . Let θ̌ denote the mean value from this expansion.
∗
|Ω̃(θ̂) − Ω̃(θM )|
1 X s X
∞ X
∗
= (θ̂ − θM ) ω ∇θ q(Xi , θ̌)q(Xj , θ̌) + q(Xj , θ̌)∇θ q(Xj , θ̌)
|DM | s=0 bM i∈D
M j∈DM ,s≤ρ(i,j)<s+1
bM
X X X
p ∗ 1
≤C1 |DM |(θ̂ − θM ) sup ∇θ q(Xi , θ)q(Xj , θ)
|DM |3/2 s=0 i∈DM j∈DM ,s≤ρ(i,j)<s+1 θ∈Θ
X
bM
p ∗ 1 d−1 1 X
≤C |DM |(θ̂ − θM ) p s +1 sup ∇θ q(Xi , θ)q(Xj , θ)
|DM | s=1
|DM | i∈D θ∈Θ
M
(G.12)
Since
h 1 X i h i
E sup ∇θ q(Xi , θ)q(Xj , θ) z ≤ sup E sup ∇θ q(Xi , θ)q(Xj , θ) z
|DM | i∈D θ∈Θ M,i∈DM θ∈Θ
M
h i1/2 h i1/2
2 2
≤ sup E sup ∇θ q(Xi , θ) z · sup E sup q(Xi , θ) z < ∞, (G.13)
M,i∈DM θ∈Θ M,i∈DM θ∈Θ
1 X
sup ∇θ q(Xi , θ)q(Xj , θ) = Op (1) (G.14)
|DM | i∈D θ∈Θ
M
PbM
by Markov’s inequality. Given bM = o |DM |1/2d , √ 1 s=1 sd−1 = o(1). Also,
|DM |
17
p ∗ ∗
|DM |(θ̂ − θM ) = Op (1) by Theorem 4.2. Hence, |Ω̃(θ̂) − Ω̃(θM )| = op (1).
Let
1 X s X
∞ X
∗ ∗
Ω̌M = ω q̃(Xi , θM )q̃(Xj , θM ). (G.15)
|DM | s=0 bM i∈D
M j∈DM ,s≤ρ(i,j)<s+1
∗
Ω̃(θM ) − ΩE − Ω̌M
1 X s X
∞ X
≤2 ω E q(Xj , θM
∗ ∗
)|z q̃(Xi , θM ) (G.17)
|DM | s=0 bM i∈D
M j∈DM ,s≤ρ(i,j)<s+1
=op (1).
P∞ P
j∈DM ,s≤ρ(i,j)<s+1 E q(Xj , θM )|z .
s ∗
Let Bi = s=0 ω bM
1 X s X
∞ X
ω E q(Xj , θM
∗ ∗
)|z q̃(Xi , θM )
|DM | s=0 bM i∈D
M j∈DM ,s≤ρ(i,j)<s+1
1
1 X
∗
≤ q̃(Xi , θM )Bi
|DM | i∈D
M 2
h 1 X 2 1 X X i1/2
≤ E q̃(X i , θM
∗ 2
) |z Bi + E ∗
q̃(Xi , θM ∗
)q̃(Xj , θM )|z Bi Bj
|DM |2 i∈D |DM |2 i∈D j∈DM ,j6=i
M M
h C C2 X X
∞ X i1/2
1 2d
≤ b + κ̃M,s Bi Bj
|DM | M |DM |2 i∈D s=1
M j∈DM ,s≤ρ(i,j)<s+1
h ∞
C2 X d−1 2d i1/2
≤ o(1) + s bM κ̃M,s = o(1). (G.18)
|DM | s=1
18
Hence, equation (G.17) follows from Markov’s inequality. Theorem 4.3 follows by
continuity of matrix inversion and multiplication.
References
Agarwal, A., Cen, S., Shah, D., and Yu, C.L. (2022), Network synthetic interventions:
A framework for panel data with network interference. Tech. rep., arXiv preprint
arXiv:2210.11355.
Aronow, P.M. and Samii, C. (2017), Estimating average causal effects under general
interference, with application to a social network experiment. Annals of Applied
Statistics 11(4), 1912–1947.
Basse, G.W. and Airoldi, E.M. (2018), Limitations of design-based causal inference
and a/b testing under arbitrary and network interference. Sociological Methodology
48(1), 136–151.
19
Emmenegger, C., Spohn, M.L., and Bühlmann, P. (2022), Treatment effect estima-
tion from observational network data using augmented inverse probability weight-
ing and machine learning. Tech. rep., arXiv preprint arXiv:2206.14591.
Kojevnikov, D., Marmer, V., and Song, K. (2021), Limit theorems for network de-
pendent random variables. Journal of Econometrics 222(2), 882–908.
Qu, Z., Xiong, R., Liu, J., and Imbens, G. (2022), Efficient treatment effect estima-
tion in observational studies under heterogeneous partial interference. Tech. rep.,
arXiv preprint arXiv:2107.12420.
20