Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

2007.10432v5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 66

Treatment Effects with Targeting Instruments∗

Sokbae Lee† Bernard Salanié‡


arXiv:2007.10432v5 [econ.EM] 6 Dec 2024

December 5, 2024

Abstract

Multivalued treatments are commonplace in applications. We explore the use of


discrete-valued instruments to control for selection bias in this setting. Our discussion
revolves around the concept of targeting: which instruments target which treatments.
It allows us to establish conditions under which counterfactual averages and treatment
effects are point- or partially-identified for composite complier groups. We illustrate the
usefulness of our framework by applying it to data from the Head Start Impact Study.
Under a plausible positive selection assumption, we derive informative bounds that
suggest less beneficial effects of Head Start expansions than the parametric estimates
of Kline and Walters (2016).

Keywords: Identification, selection, multivalued treatments, discrete instruments,


monotonicity.


This is a shorter, revised version of our “Filtered and Unfiltered Treatment Effects with Targeting Instru-
ments” (first arXiv version July 2020). We thank Josh Angrist, Junlong Feng, Len Goff, and Jim Heckman
for helpful comments. This work is in part supported by the European Research Council (ERC-2014-CoG-
646917-ROMIA) and the UK Economic and Social Research Council for research grant (ES/P008909/1) to
the CeMMAP.

Department of Economics, Columbia University and Centre for Microdata Methods and Practice, Insti-
tute for Fiscal Studies, sl3841@columbia.edu.

Department of Economics, Columbia University and FGV EPGE, Rio de Janeiro,
bsalanie@columbia.edu.
Introduction
Much of the literature on the evaluation of treatment effects has concentrated on the paradig-
matic “binary/binary” example, in which both treatment and instrument only take two val-
ues. Multivalued treatments are common in actual policy implementations, however, as are
multivalued instruments. Many different programs aim to help train job seekers for instance,
and each of them has its own eligibility rules. Tax and benefit regimes distinguish many
categories of taxpayers and eligible recipients. The choice of a college and major has many
dimensions too, and responds to a variety of financial help programs and other incentives.
Finally, more and more randomized experiments in economics resort to factorial designs1 .
Existing work on multivalued treatments under selection on observables includes Imbens
(2000), Cattaneo (2010), and Ao, Calonico, and Lee (2021) among others. As the training,
education choice, and tax-benefit examples illustrate, in non-experimental settings multival-
ued treatments are also subject to selection on unobservables. The use of instruments to eval-
uate the effects of multivalued treatments under selection on unobservables has received in-
creasing attention in recent literature. In previous work (Lee and Salanié, 2018), we analyzed
the case when enough continuous instruments are available. Identification is of course more
difficult when instruments only take discrete values. We explore in this paper the use of such
discrete-valued instruments in order to control for selection bias when evaluating discrete-
valued treatments. Our goal is to find plausible conditions on treatment assignment and on
the distribution of outcomes under which counterfactual averages and treatment effects are
point- or partially identified for various (sometimes composite) complier groups. This distin-
guishes our paper from the recent contributions of Bai, Huang, Moon, Shaikh, and Vytlacil
(2024), which focuses on population-wide average outcomes, and of Goff (2024b), which
studies identification without any assumption on outcomes.
In the binary/binary model, the analyst can often take for granted that switching on
the binary instrument makes treatment (weakly) more likely for all or no observations.
This is satisfied under the local average treatment effect (LATE)-monotonicity assumption
(e.g., Imbens and Angrist, 1994; Vytlacil, 2002; Heckman and Vytlacil, 2007a). With mul-
tiple instrument values and multiple treatments, there may be no natural ordering of in-
strument or treatment values that would give meaning to the word “monotonicity”. Since
Heckman and Pinto (2018) defined an “unordered monotonicity” property, various papers
have proposed other definitions of (qualified) monotonicity2 .
Even when some sort of monotonicity holds, there exist several groups of compliers—
1
Muralidharan, Romero, and Wüthrich (2023) review recent applications of factorial designs.
2
See Navjeevan and Pinto (2022) for a detailed analysis of some of these proposals.

1
individuals whose treatment assignment changes with the value of the instrument. The mul-
tiplicity of treatments and instruments may give rise to a bewildering number of cases, as ex-
isting literature demonstrates. Angrist and Imbens (1995) analyzed two-stage least squares
(TSLS) estimation when the treatment takes a finite number of ordered values. Closer to us,
Heckman, Urzua, and Vytlacil (2006); Heckman and Vytlacil (2007b); Heckman, Urzua, and Vytlacil
(2008) discussed the identification of treatment effects in the presence of discrete-valued
instruments when assignment to treatment can be modeled as a discrete choice model.
Several recent papers have studied the case of binary treatments with multiple instru-
ments. Mogstad, Torgovitsky, and Walters (2021) and Goff (2024a) analyzed the identifying
power of different monotonicity assumptions in this context3 . Others have studied mod-
els with binary instruments and multivalued or continuous treatments. Torgovitsky (2015),
D’Haultfoeuille and Février (2015), Huang, Khalil, and Yildiz (2019), Caetano and Escanciano
(2021), and Feng (2024) developed identification results for different models.
In a wide-ranging contribution, Heckman and Pinto (2018) derived results on partial iden-
tification in discrete-instrument, discrete-treatment models; they also showed how additional
identifying assumptions, such as unordered monotonicity, can be applied to shrink the identi-
fied set of treatment effects for various complier groups. While their results are very general,
they are not as transparent as one would like. Our approach to this issue is different: we seek
a parsimonious framework within which we can make constructive progress, and that can still
be useful in many applications. In order to reduce the complexity of the problem, we start by
imposing an additive random-utility model (ARUM) structure. Under ARUM, the selection
into treatment depends on mean values and additive, observation-specific shocks. Some, but
not all, ARUM models satisfy the unordered monotonicity property of Heckman and Pinto
(2018), which was applied by Pinto (2021) to the Moving to Opportunity program. In many
applications, some observations are not treated; in others, another treatment value is partic-
ularly salient. We call it the “control”. Under ARUM, each treatment t generates a change
in the mean value, relative to the control, that depends on the value z of the instrument. It
is natural to speak of an instrument value z targeting a treatment value t when it maximizes
this change in mean value. Most of our paper relies on the assumption of strict targeting,
which obtains when each instrument only changes the mean values of the treatments it tar-
gets. Strict targeting holds for instance in models of imperfect compliance when the cost of
non-compliance does not depend on its nature. Some of our results also require one-to-one
targeting, where each non-zero instrument targets one treatment only, and each treatment
3
Mogstad, Torgovitsky, and Walters (2020) further apply their framework of monotonicity
with multiple instruments to marginal treatment effects (e.g., Heckman and Vytlacil, 2001, 2005;
Carneiro, Heckman, and Vytlacil, 2011).

2
(apart from the control) is targeted by one instrument only.
Our use of “targeting”instruments is similar in spirit to Section 7.3 of Heckman and Vytlacil
(2007b)4 . We define it differently and we seek to identify a more general class of treatment
effects. The term “targeting” is inspired by the time-honored Targeting Principle5 . Some
policies act directly on final outcomes, and others aim to modify choices. Our use of the
term “targeting” refers to the latter. Take a Roy model in which workers choose among
occupations on the basis of their net utilities; we observe the choice of occupation and the
wage in that occupation. A safety regulation that reduces the disutility of labor for (say)
construction workers is, in our terminology, an instrument that targets the choice to be a
construction worker. Policymakers might also seek to increase average incomes by offering a
college credit. While their final aim is to increase wages (an outcome), we would say that the
college credit is an instrument that targets the choice to go to college—a treatment variable.
To illustrate, consider a typical randomized experiment with imperfect compliance: (i)
individuals are randomly assigned to a treatment branch t (including a non-treatment op-
tion 0) based on the instrument value z that they draw; (ii) some individuals self-select
into a treatment branch t1 that they prefer, even though they did not draw the corre-
sponding instrument value z 1 . In our terminology, z targets t and z 1 targets t1 . Often
this mapping is one-to-one; this is what our one-to-one targeting assumption states. Strict
targeting of a treatment branch t is more restrictive, as its name indicates. One way to
interpret it in a non-compliance context is that it is equally difficult for an average in-
dividual in the population to select into treatment t when she was not assigned to that
branch, no matter what the experimenter’s intended treatment branch was. To cite two ex-
amples, consider the interventions reported in Angrist, Lang, and Oreopoulos (2009) and in
Attanasio, Fernández, Fitzsimons, Grantham-McGregor, Meghir, and Rubio-Codina (2014; 2020).
These are 4-way factorial randomized experiments: each subject is randomly assigned to
a control group, to receive treatment 1, to receive treatment 2, or to receive both treat-
ments. By definition, this is one-to-one targeting. Compliance was very imperfect in
Angrist, Lang, and Oreopoulos (2009), and it is described as “high” in the other two pa-
pers. If subjects self-selected into treatments on the basis of their expected benefits, then
strict targeting is a natural assumption.
Combining ARUM and assumptions on targeting allows us to point-identify the size
of some complier groups and the corresponding counterfactual averages and treatment ef-
fects on any function of the outcomes, and to partially identify others. We use two ex-
4
See also the recent contribution by Buchinsky, Gertler, and Pinto (2023), which uses revealed preference
arguments.
5
Early references include Tinbergen (1952) and Bhagwati (1971).

3
amples to demonstrate the identification power and implications of ARUM and targeting.
Our first example is a 2 ˆ T model where a binary instrument targets only one of T ě 3
treatment values, as in Kline and Walters (2016). In our second example, three unordered
treatment values target three instrument values. This 3 ˆ 3 model was also studied by
Kirkeboen, Leuven, and Mogstad (2016)6 . Unlike them, we do not assume that the data
contains information on next-best alternatives. Whereas the 2 ˆ T model satisfies unordered
monotonicity under our strongest targeting assumptions, the 3 ˆ 3 model does not7 .
We obtain novel identification results for both examples; they lead to new estimands or
bounds for average treatment effects on various groups. Additional identifying assumptions
can refine these bounds. One example is what we call positive selection. This assumes that
the average outcome for a given treatment t is larger for some response group than for an-
other. Consider for instance the binary instrument case. It seems natural to assume that
the always-takers of a treatment get more from it than compliers who only take it if they
are incentivized to do so. Positive selection also obtains under weak assumptions in the gen-
eralized Roy model. More generally, let us return to our earlier illustration of a randomized
experiment under imperfect compliance. Consider the response group of individuals who
would end up in treatment t1 both when drawing z and when drawing z 1 . We would expect
this response group to have better outcomes under t1 , on average, than the response group
that exhibits perfect compliance to z and z 1 draws—assuming that these two response groups
end up in the same treatment branches for all other instrument values. This falls exactly
under our positive selection assumption. It adds identifying power in both of our leading
examples.
To illustrate the usefulness of our framework, we apply it to the Head Start Impact Study
(HSIS), a randomized experiment that sought to evaluate the value added of Head Start
preschools. Kline and Walters (2016) revisited the HSIS; they took into account the presence
of a substitute treatment (alternative preschools in this case). They found that Head Start
was only beneficial for children who would not have attended an other preschool program
instead. We confirm the importance of taking into consideration alternative preschools when
evaluating Head Start. Unlike Kline and Walters (2016), we do not rely on parametric
selection models. Under a plausible positive selection assumption, our estimates suggest
that the large difference between complier groups that they find can only be rationalized
under negative selection into Head Start. As a by-product, we provide an upper bound
on the welfare effect of expanding access to Head Start. Interestingly, the estimated upper
6
See also more recent work by Bhuller and Sigstad (2024), Heinesen, Hvid, Kirkeboen, Leuven, and Mogstad
(2022), and Nibbering, Oosterveen, and Silva (2022).
7
It does satisfy the weaker generalized monotonicity assumption of
Bai, Huang, Moon, Shaikh, and Vytlacil (2024), however.

4
bound turns out to be lower than the point estimate of Kline and Walters (2016); and it
yields a lower marginal value for public funds used in expanding access to Head Start.
The paper is organized as follows. Section 1 defines our framework. In Section 2, we define
and discuss the concepts of targeting, one-to-one targeting, and strict targeting. Section 3
derives their implications for the identification of population shares, counterfactual averages,
and the effects of the treatments on various complier groups; it also defines and illustrates
positive selection. Finally, we present estimation results for Head Start in Section 4. The
Appendices contain the proofs of all propositions and lemmata, along with some additional
material.

1 The Framework
In all of the paper, we denote observations as i “ 1, . . . , n. Each observation consists of
covariates Xi , instruments Zi , outcome variables Yi , and treatments Ti . We assume that
the covariates Xi are exogenous to treatment assignment and outcomes. Since they will not
play any role in our identification strategy, we condition on the covariates throughout and
we omit them from the notation. Our results should therefore be interpreted as conditional
on X.
We assume that observations are independent and identically distributed. Random sam-
pling rules out that the treatment status of one observation influences other observations.
This further implies that the outcome for a specific observation does not impact the out-
comes of other members within the population. In other words, we rely on the Stable Unit
Treatment Value Assumption (SUTVA).
We focus in this paper on treatment variables that take discrete values, which we label
t P T . For simplicity, we will call T “ t “treatment t”. These values do not have to be
ordered; e.g., when t “ 2 is available, it does not necessarily indicate “more treatment” than
t “ 1. We assume that the only available instruments are discrete-valued, and we label their
values as z P Z.
We will use the standard counterfactual notation: Ti pzq and Yi pt, zq denote respectively
potential treatments and outcomes. 11pAq denotes the indicator of set A.
The validity of the instruments requires the usual exclusion and independence restrictions:

Assumption 1 (Valid Instruments). (i) Yi pt, zq “ Yi ptq for all pt, zq in T ˆ Z.

(ii) Yi ptq and Ti pzq are independent of Zi for all pt, zq in T ˆ Z.

Under Assumption 1, we define Ti :“ Ti pZi q and Yi :“ Yi pTi q.


Throughout the paper, we assume that we observe pYi , Ti , Zi q for each i.

5
1.1 Restricting Heterogeneity
As in most of this literature, we will need an assumption that restricts the heterogeneity in
the counterfactual mappings Ti pzq. In the binary/binary model, this is most often done by
imposing LATE-monotonicity. As is well-known, LATE-monotonicity imposes that (denoting
instrument values as z “ 0, 1) (i) or (ii) must hold:

(i) for each observation i, Ti p1q ě Ti p0q;

(ii) for each observation i, Ti p0q ě Ti p1q.

With more than two treatment values and/or more than two instrument values, there are
many ways to restrict the heterogeneity in treatment assignment. Since treatments may not
be ordered in any meaningful way, we cannot apply the results in Angrist and Imbens (1995)
for instance. Mogstad, Torgovitsky, and Walters (2021) state several versions of monotonic-
ity for a binary treatment model with |Z| ą 2. They propose a “partial monotonicity” as-
sumption which applies binary LATE-monotonicity component by component. This requires
that the instruments be interpretable as combinations of component instruments, which is
not necessarily the case here.
To cut through this complexity, we assume from now on that assignment to treatment
can be represented by an Additive Random-Utility Model (ARUM), that is by a discrete
choice problem with additively separable errors:

Ti pzq “ arg maxpUz ptq ` uit q


tPT

for some real numbers Uz ptq which are common across observations, and random vectors
puit qtPT that are distributed independently of Zi . We do not restrict the codependence of the
random variables uit . The usual models of multinomial choice belong to this family. ARUM
also includes ordered treatments, for which uit ” σptqui for some increasing positive function
σ.
In a randomized experiment with perfect compliance, we would have Uz pt1 q “ ´8 and
Uz 1 ptq “ ´8. With imperfect compliance, these mean values are finite; if for instance
uit1 ´ uit ą Uz ptq ´ Uz pt1 q, individual i will get into treatment t1 when drawing z would
normally assign her to t.
Imposing an ARUM structure will greatly simplify our discussion of treatment assign-
ment. It incorporates a substantial restriction, however. Suppose that observation i has

6
treatment values t under z and t1 under z 1 . By the ARUM structure, this implies

Uz ptq ` uit ě Uz pt1 q ` uit1


Uz 1 pt1 q ` uit1 ě Uz 1 ptq ` uit .

Combining these two restrictions implies an “increasing differences” property:

Uz 1 pt1 q ´ Uz 1 ptq ě Uz pt1 q ´ Uz ptq.

This inequality in turn is incompatible with the existence of an observation j that has treat-
ment values t1 under z and t under z 1 . Thus we rule out “direct two-way flows”: if a change in
the value of an instrument causes an observation to shift from a treatment value t to a treat-
ment value t1 , it can cause no other observation to switch from t1 to t. The argument above
is a special case of the general discussion in Heckman and Pinto (2018); their Theorem T-3
shows that the treatment assignment models that satisfy unordered monotonicity for each
pair of instrument values can be represented as an ARUM. Not all ARUM models satisfy
unordered monotonicity, however; unordered monotonicity excludes a more general class of
two-way flows. We will illustrate this point on one of our leading examples in Section 3.3.

1.2 Assignment to Treatment


Assumption 2 defines the class of models of assignment to treatment that we analyze in this
paper.

Assumption 2 (ARUM). The treatment assignment model consists of:

1. a finite set T “ t0, 1, . . . , |T | ´ 1u;

2. a finite set of instrument values Z “ t0, 1, . . . , |Z| ´ 1u;

3. an ARUM model of treatment:

Ti pzq “ arg maxpUz ptq ` uit q,


tPT

where the vector puit qtPT is distributed independently of Zi and has an absolutely con-
tinuous distribution with full support on R|T | .

We will often refer to the Uz ptq as “mean values”. This is only meant to simplify the
exposition; it is consistent with, but need not refer to, preferences on the part of the agent.

7
Note that when T “ t0, 1u, Assumption 2 is just the standard monotonicity assumption,
with a threshold-crossing rule

Ti pzq “ 11pui0 ´ ui1 ď Uz p1q ´ Uz p0qq.

If we add a third treatment value so that T “ t0, 1, 2u, the ARUM assumption starts to bite
as it excludes direct two-way flows in the treatment model. However, the combination of
Assumptions 1 and 2 is far from sufficient to identify interesting treatment effects in general.
In order to better understand what is needed, we now resort to the notion of response-groups
of observations, whose members share the same mapping from instruments z to treatments
t. We first state a general definition8 .

Definition 1 (Response-vectors and Response-groups). Let R be an element of the Cartesian


product T Z and Rpzq P T denote its component for instrument value z P Z.

• Observation i has (elemental) response-vector R if and only if for all z P Z, Ti pzq “


Rpzq. The set CR denotes the set of observations with response-vector R and we call
it a response-group.

• We extend the definition in the natural way to incompletely specified mappings, where
each Rpzq is a subset of T . We call the corresponding response-vectors and response-
groups composite. In particular, if Rpzq “ T we denote it by an asterisk in the
corresponding position.

To illustrate, consider the binary instrument/binary treatment case. It has a priori 22 “ 4


response vectors, R P t00, 01, 10, 11u with corresponding response-groups C00 , C01 , C10 , C11 .
In this notation, the first number refers to a treatment value with z “ 0 and the second
number with z “ 1. For instance, C01 refers to those with Ti p0q “ 0 and Ti p1q “ 1, while the
composite response-group C˚1 , for which Rp0q “ t0, 1u, represents the union of C01 and C11
The LATE-monotonicity assumption implies that either C01 or C10 is empty.

2 Targeting
We start by introducing additional assumptions on the underlying treatment model. We will
illustrate these assumptions on three examples: the “binary instrument model” or the “2 ˆ T ”
model; the “3 ˆ 3 model”; and a generalized Roy model. We first define them briefly.
8
This is analogous to the definitions in Heckman and Pinto (2018).

8
Example 1 (The binary instrument (2 ˆ T ) model). T “ t0, 1, . . . , T ´ 1u and Z “ t0, 1u.
This could for instance represent an intent-to-treat model, where agents in the control group
Z “ 0 are not treated (T “ 0) and agents with Z “ 1 self-select the type of the treatment
T ě 1 or opt out altogether (T “ 0).

When |T | “ 3, treatment assignment can be represented in the pui1 ´ ui0, ui2 ´ ui0q plane.
The points of coordinates Pz “ pUz p0q ´ Uz p1q, Uz p0q ´ Uz p2qq play an important role as for
a given z,

• Ti pzq “ 0 to the south-west of Pz ;

• Ti pzq “ 1 to the right of Pz and below the diagonal that goes through it;

• Ti pzq “ 2 above Pz and above the diagonal that goes through it.

Treatment assignment is illustrated in Figure 1 for a given z, where the origin is in Pz . We


will make recurrent use of this type of figure.

Example 2 (3ˆ3 model). Assume that Z “ t0, 1, 2u and T “ t0, 1, 2u. As a leading example,
Kirkeboen, Leuven, and Mogstad (2016) investigate the 3 ˆ 3 model in order to analyze the
effect of students’ choice of field of study on their earnings; each instrument value shifts the
eligibility of a student for a given field. We will return to this application in Section 3.4.

Finally, our framework also includes multivalued generalized Roy models (see Eisenhauer, Heckman, and
(2015)).

Example 3 (A Generalized Roy Model). Suppose that agents choose occupations t “ 0, . . . , |T |´


1 on the basis of their expected wages wi ptq “ w̄ptq ` ηit , net of labor disutilities that depend
on the values of the instruments:

Ti pzq “ arg max pwi ptq ´ di pz, tqq


t“0,...,|T |´1

where di pz, tq “ d¯z ptq ` vit . Potential wages are Yi ptq “ wi ptq ` εit . We observe Zi , the
chosen occupation Ti “ Ti pZi q, and realized wages Yi “ Yi pTi q. If the vector of variables
|T |´1
tpηit , vit qut“0 is independent of Zi , this is an ARUM model with Uz ptq “ w̄ptq ´ d¯z ptq and
uit “ ηit ´ vit .

“Targeting” will be the common thread in our analysis. Just as in general economic
discussions a policy measure may target a particular outcome, we will speak of instruments
(in the econometric sense) targeting the assignment to a particular treatment.

9
Figure 1: Treatment assignment for |T | “ 3 for given z

ui2 ´ ui0

Ti pzq “ 2

Pz ui1 ´ ui0

Ti pzq “ 0 Ti pzq “ 1

Under Assumption 2, assignment to treatment is governed by the differences in mean


values pUz ptq ´ Uz pτ qq and by the differences in unobservables uit ´ uiτ . Only the former
depend on the instrument. From now on, we assume that there is a reference treatment value
t0 whose mean utility does not depend does not depend on the value of the instrument:

Assumption 3 (Reference Treatment). There exists t0 P T such that z P Z Ñ Uz pt0 q is


constant. Without loss of generality, we renumber treatment values so that t0 “ 0; and we
normalize utilities with Uz p0q “ 0 for all z P Z.

In many applications, t “ 0 is a “no-treatment” value, and instruments only change the


mean utilities of the other treatments. For instance, tuition subsidies, investment credits,
and invitations to training programs have no effect for those who do not attend college,
do not invest, or choose not to train. Assumption 3 seems natural in such cases9 . For a
counter-example, consider a program of unconditional cash transfers with different values z,
for which we observe the purchases t of several categories of goods the following month. If a
household decides to save the transfer (t “ 0), its mean (discounted) utility will still depend
on the value of the transfer z that it received10 .
Given Assumption 3, we will say that an instrument value z targets a treatment value t
if it maximizes the mean utility Uz ptq ´ Uz p0q “ Uz ptq.

Definition 2 (Targeted Treatments and Targeting Instruments). For any z P Z and t P T ,


let
Ūptq ” max Uz ptq and Z ˚ ptq ” arg max Uz ptq
zPZ zPZ

9
In the generalized Roy model (Example 3), it holds if the disutility of occupation 0 does not depend on
the values of the instruments.
10
In that case one could define targeting with the function Ũz ptq “ Uz ptq ´ Uz p0q. We have not explored
the consequences of this alternative definition.

10
denote the maximum value of Uz ptq over z P Z and the set of maximizers, respectively. If
Z ˚ ptq is not all of Z, then we will say that the instrument values z P Z ˚ ptq target treatment
value t; and we write t P T ˚ pzq. We denote by T ˚ the set of targeted treatments and
Z˚ “ Z ˚ ptq the set of targeting instruments.
Ť
tPT ˚

Definition 2 calls for several remarks. First, Assumption 3 implies that Z ˚ p0q “ Z.
Therefore t “ 0 is not in T ˚ ; the set T ˚ may exclude other treatment values, however. If a
treatment value t is not targeted (t R T ˚ ), by definition the function z Ñ Uz ptq is constant
over z P Z, with value Ū ptq. If an instrument value z does not target any treatment (z R Z ˚ ),
then Uz ptq ă Ū ptq for every t P T ˚ . While non-targeted treatment values (t P T z T ˚ ) have
mean values that do not respond to changes in the instruments, these mean values may and
in general will differ across treatments. The probability that an individual observation takes
a treatment t P T z T ˚ also generally depends on the value of the instrument.
It is important to note here that the values Uz ptq and therefore the targeting maps Z ˚
and T ˚ are not observable; any assumption on targeting instruments and targeted treatments
must be a priori and context-dependent. As we will see, these prior assumptions sometimes
have consequences that can be tested.
Now suppose that each z consists of a set of (possibly zero or negative) subsidies Sz ptq
for treatments t P T . If there is a no-subsidy regime z “ 0 with S0 ptq “ 0 for all t, it
seems natural to write the mean value as Uz ptq “ U0 ptq ` Sz ptq. Then for any treatment
t, the set Z ˚ ptq consists of the instrument values z that subsidize t most heavily. As this
illustration suggests, the sets Z ˚ ptq may not be singletons, and they may well intersect. We
now introduce a more restrictive definition that rules out these two possibilities.

Definition 3 (One-to-one targeting). Targeting is one-to-one when both Z ˚ : T ˚ Ñ Z ˚ and


T ˚ : Z ˚ Ñ T ˚ are functions.

Under one-to-one targeting, we will often write “z “ t” if z targets t; this is without loss
of generality. Let us illustrate these varieties of targeting on Example 2.

Table 1: Values of Uz ptq in the 3 ˆ 3 model

t“0 t“1 t“2


z“0 0 a d
z“1 0 b e
z“2 0 c f

Example 2 continued. Table 1 shows the values of Uz ptq in the 3 ˆ 3 model of Example 2.
Suppose that t “ 1 is targeted; choose some z that targets it and relabel it as z “ 1. This

11
means that
b ě maxpa, cq and b ą minpa, cq.

If t “ 2 is also targeted by some z ‰ 1, we relabel this instrument value as z “ 2. This gives

f ě maxpd, eq and f ą minpd, eq.

Finally, if targeting is one-to-one we have b ą maxpa, cq and f ą maxpd, eq.

2.1 Consequences of One-to-One Targeting


In this subsection, we impose

Assumption 4 (One-to-one Targeting). Targeting is one-to-one.

Remember that under Assumption 4, we can relabel instrument values so that if t is


targeted, then it is targeted by z “ t. Moreover, t˚ pzq must equal z.
This implies some useful restrictions on response-groups.

Proposition 1 (Response-groups under one-to-one targeting). Under Assumptions 1, 2, and 4,


take a targeted treatment t P T ˚ .

(i) If an observation i has Ti ptq “ 0, then it never receives treatment t: Ti pzq ‰ t for all
z P Z.

(ii) As a consequence, all response-groups CR with Rptq “ 0 and Rpzq “ t for some z ‰ t
are empty.

Example 2 (continued) Return to the 3 ˆ 3 model and to Table 1. Suppose that both t “ 1
and t “ 2 are targeted. Under the conditions of Proposition 1, we have b ą maxpa, cq and
f ą maxpd, eq.
Since the points Pz have coordinates p´Uz p1q, ´Uz p2qq,

• P1 “ p´b, ´eq must lie to the left of P0 “ p´a, ´dq and of P2 “ p´c, ´f q,

• P2 must lie below P0 and P1 .

This is easily rephrased in terms of the response-vectors of definition 1. First note that
in the 3 ˆ 3 case, there are a priori 33 “ 27 response-vectors, R “ 000 to R “ 222, with
corresponding response-groups C000 to C222 . Groups Cddd are “always-takers”11 of treatment
11
Observations in group C000 are usually called the “never-takers”. We prefer not to break the symmetry
in our notation. We hope this will not cause confusion.

12
value d. All other groups are “compliers” of some kind, in that their treatment changes under
some changes in the instrument. We will also pay special attention to some non-elemental
groups. For instance, C0˚2 will denote the group who is assigned treatment 0 under z “ 0
and treatment 2 under z “ 2, and any treatment under z “ 1. That is,
Ť Ť
C0˚2 “ C002 C012 C022 .

Assumptions 2 and 4 together imply the emptiness of four composite groups out of the 27
possible. For any treatment value τ , Proposition 1(ii) rules out group C10τ since this group
has Rp1q “ 0 and Rp0q “ 1. It rules out Cτ 01 as Rp1q “ 0 and Rp2q “ 1. This eliminates
the composite groups C10˚ and C˚01 . The same argument applies to composite groups C˚20
and C2˚0 , which have Rp2q “ 0 and Rp1q “ 2 or Rp2q “ 2.
These four composite groups correspond to 10 elemental groups12 . This still leaves us with
17 elemental groups, and potentially complex assignment patterns. Consider for instance
Figure 2. It shows one possible configuration for the 3 ˆ 3 model; the positions for P0 , P1
and P2 are consistent with Assumptions 2 and 4.

Figure 2: A 3 ˆ 3 example

ui2 ´ ui0

C222
C212
P1
C202 C112
P0 ui1 ´ ui0
C002 C012
P2
C000 C010 C110 C111

The number of distinct response-groups (ten in this case) and the contorted shape of the
C212 and C112 groups in Figure 2 point to the difficulties we face in identifying response-
groups without further assumptions. Moreover, this is only one possible configuration: other
cases exist, which would bring up other response-groups.
Heckman and Pinto (2018, pp. 16–20), Pinto (2021), and Kirkeboen, Leuven, and Mogstad
(2016) also studied the 3 ˆ 3 model; they proposed sets of assumptions that identify some
treatment effects. The example in Heckman and Pinto (2018, pp. 16–20) is rather specific.
12
Specifically, they are: C100 , C101 , C102 , C001 , C201 , C020 , C120 , C220 , C200 , and C210 .

13
We show in Appendix E how to apply our framework to the Moving to Opportunity ex-
periment studied in Pinto (2021). The setup in Kirkeboen, Leuven, and Mogstad (2016) is
most similar to ours; we will return to the differences between our approach and theirs in
Section 3.3.

2.2 Strict Targeting


Figure 2 suggests that if we could make sure that P1 is directly to the left of P0 , the shape
of C212 would become nicer—and group C202 would be empty. Bringing P2 directly under
P0 would have a similar effect. This translates directly into assumptions on the dependence
of the Uz ptq on the instruments: the first one imposes d “ e and the second one imposes
a “ c. This can be interpreted as policy regime z “ 1 (resp. z “ 2) subsidizing treatment
t “ 1 (resp. z “ 2) only. To return to the general model, there are applications in which
the instruments z P Z ˚ ptq, which maximize Uz ptq, do not shift assignment between the other
values of the treatment. The following definition is a direct extension of this discussion.

Definition 4 (Strict Targeting of Treatment t). Take any targeted treatment value t P T ˚ .
It is strictly targeted if the function z P Z Ñ Uz ptq takes the same value for all instruments
that do not target t (the values z R Z ˚ ptq). We denote this common value by U ptq, and we
will say of the instrument values z P Z ˚ ptq that they strictly target t.

Suppose for instance the data comes from a randomized experiment, where the instrument
value z “ t targets treatment t. If compliance is imperfect, an individual will trade off the
benefits from switching to a treatment t1 ‰ t with the costs of the effort required. Strict
targeting obtains when the cost of switching to t1 do not depend on the value of t.
Under strict targeting, turning on instrument z P Z ˚ ptq promotes treatment t without
affecting the mean values Uz pt1 q of other treatment values t1 . This explains our use of the
term “strict targeting”. In this ARUM specification, an instrument in Z ˚ ptq plays the same
role as a price discount on good t in a model of demand for goods whose mean values only
depend on their own prices. In the language of program subsidies, all z P Z ˚ ptq subsidize t
at the same high rate, and all other instrument values offer the same, lower subsidy.
Note that strict targeting only bites if Z contains at least three instrument values. If
|Z| “ 2 (one binary instrument, as in our Example 1) and say z “ 1 targets t, then ZzZ ˚ ptq
can only consist of z “ 0 and Assumption 5 trivially holds.
Finally, we should emphasize that one-to-one targeting and strict targeting are logically
independent assumptions: neither one implies the other. Consider the 3 ˆ 3 model of Exam-
ple 2 under one-to-one targeting; strict targeting only holds for t “ 1 if a “ c, and for t “ 2

14
if d “ e. On the other hand, the 3 ˆ 3 model with b ą a “ c and e ą d “ f satisfies strict
targeting but not one-to-one targeting, as z “ 1 targets both t “ 1 and t “ 2.

2.3 Consequences of Strict Targeting


Now consider the general model. If a treatment t is strictly targeted, then Uz ptq can only take
one of two values: Ū ptq if z targets t, and Uptq otherwise. By definition, if t is not targeted
then the value of Uz ptq does not depend on z; we also denote it U ptq. We will assume in this
subsection that all targeted treatments are strictly targeted:

Assumption 5 (Strict targeting). If t is in T ˚ , then t is strictly targeted.

Under strict targeting, the values of Uz ptq are given in Table 2.

Table 2: Values of Uz ptq under strict targeting

t P T ˚ pzq t R T ˚ pzq
z P Z˚ Ū ptq U ptq
z R Z˚ U ptq

Consider an observation i under strict targeting. If it is assigned an instrument value


z, it can end up with one of the treatment values t that z targets (if any), with a value
Ū ptq ` uit in the ARUM. Alternatively, if its treatment is some t1 that z does not target,
then the ARUM value will be U pt1 q ` uit1 . This motivates the following definition.

Definition 5 (Top targeted and top alternative treatments). Take any observation i in the
population.

(i) For any targeting instrument z P Z ˚ , let

Vi˚ pzq “ max pŪ ptq ` uit q


tPT pzq
˚

and Ti˚ pzq Ă T ˚ pzq denote the set of maximizers. We call the elements of Ti˚ pzq the
top targeted treatments for observation i under instrument value z.

(ii) Also define


V i “ max pU ptq ` uit q
tPT

and let T i Ă T denote the set of maximizers. We call the elements of T i the top
alternative treatments for observation i.

15
(iii) The sets Ti˚ pzq and T i are singletons13 with probability 1; we let t˚i pzq and ti denote
the top targeted treatment and the top alternative treatment.

The term “top alternative treatment” may read like a misnomer since the maximiza-
tion runs over all treatment values. The following result justifies it; more importantly, it
shows that strict targeting imposes a lot of structure on the mapping from instruments to
treatments.

Proposition 2 (Response groups under strict targeting). Let Assumptions 1, 2, 3, and 5 hold.
Let i be any observation in the population. For any instrument value z, Ti pzq is either the
top targeted treatment or the top alternative treatment. If z is not a targeting instrument,
Ti pzq can only be the top alternative treatment. That is:

(i) if z P Z ˚ , then Ti pzq is t˚i pzq if Vi˚ pzq ą V i ; if Vi˚ pzq ă V i , then Ti pzq “ ti and ti is
not targeted by z.

(ii) if z R Z ˚ , then Ti pzq is ti .

Note that in a sense, all instrument values in Z z Z ˚ are equivalent under strict targeting.
If z and z 1 are both in Z ˚ , then the functions Uz and Uz 1 coincide on all of T and the
counterfactual treatments Ti pzq and Ti pz 1 q must be in T i for any observation i.
In the 2 ˆ 2 model, we have Ū p1q “ U1 p1q and U p0q “ Up1q “ 0. A complier is an
observation i P C01 ; it is in treatment arm t “ 0 when z “ 0 and in t˚ p1q “ 1 when z “ 1. In
our more general model, it seems natural to define a t-complier as an observation i that is
in treatment arm t when assigned an instrument value z such that t˚i pzq “ t, and only then.
This is, clearly, a composite group. Take the 3 ˆ 3 model as an example, and assume that
t˚ p1q “ 1. Then the set of 1-compliers consists of the five response-groups C010 , C012 , C111 ,
C112 , and C212 .

2.4 Strict one-to-one targeting


We now impose one-to-one targeting (Assumption 4) as well as strict targeting. Under one-
to-one targeting, the sets Z ˚ ptq and T ˚ pzq are singletons; and each targeting instrument z
can be relabeled as the treatment value t “ t˚i pzq that it targets.

Corollary 1 (Treatment assignment under strict, one-to-one targeting). Take any observa-
tion i. Let Ai be the (possibly empty) subset of t P T ˚ such that Ti ptq “ t. Then under
Assumptions 1 to 5,
13
Note that this follows from our assumption that the distribution of the random vector puit qtPT is ab-
solutely continuous; however, it does not extend to the sets Z ˚ ptq and T ˚ pzq, which can still have several
elements.

16
1. Ti ptq “ ti for all t P T z Ai ;

2. if ti is a targeted treatment, it must belong to Ai .

The pair pAi , ti q defines an elemental response group which we denote CpAi , ti q. The family
of sets tCpA, tq | A Ă T ˚ , t R T ˚ z Au form a partition of the set of observations.

Note that the CpA, tq notation is just a shortcut: every CpA, tq is an elemental group,
and every elemental group is a CpA, tq. If for instance |T | “ 6, it is just more convenient to
write Cpt1, 3u, 2q than to write C212322 .
If ti R T ˚ and the set Ai is non-empty, then the observation i is what one could call a
strict Ai -complier: when the value of the instrument moves from Z z Ai to t P Ai , observation
i switches from its top alternative treatment ti to the treatment t. In the 3-by-3 model with
T ˚ “ t1, 2u, there are three groups of strict compliers: C010 “ Cpt1u, 0q, C002 “ Cpt2u, 0q,
and C012 “ Cpt1, 2u, 0q.
Strict one-to-one targeting brings us very close to the main identifying assumption in
Heckman and Vytlacil (2007b, Assumption B-2a, p. 5006): the indicator variable 11pZ “ tq
can be used as the Z rts in their assumption. Heckman and Vytlacil use their Assumption
B-2a to identify the effect of the preferred treatment t relative to the next-best treatment.
Their complier group consists of those individuals who choose treatment t under Z “ z
and another treatment under Z “ z 1 . This can be a very heterogeneous group, as our
examples will show. To paraphrase Heckman and Vytlacil (2007b, p. 5013): the mean effect
of treatment t versus the next best option is a weighted average over t1 P T zttu of the effect of
treatment t versus treatment t1 , conditional on t1 being the next best option, weighted by the
probability that t1 is the next best option. In contrast, we seek a complete characterization
of all treatment effects that can be identified under this set of assumptions.

3 Identification
Now that we have characterized response-groups, we seek to identify the probabilities of the
corresponding response-groups in the treatment model. Let P pt|zq ” PrpTi “ t|Zi “ zq
denote the generalized propensity score. Under strict, one-to-one targeting, the response-
groups are easily enumerated.

Proposition 3 (Counting response-groups under strict, one-to-one targeting). Suppose that


p treatment values are targeted and q are not. Under Assumptions 1 to 5, the number of
response-groups is N ” pp ` 2qq ˆ 2p´1 .

17
As the probabilities of the response-groups must sum to one, we have pN ´ 1q unknowns.
The data gives us the generalized propensity scores P pt|zq for pt, zq P T ˆ Z. The adding-up
ÿ
constraints P pt|zq “ 1 for each z P Z reduce the count of independent data points to
tPT
p|T | ´ 1q ˆ |Z| “ pp ` q ´ 1qpp ` 1q.

Table 3: Identifying the sizes of the response groups under strict, one-to-one targeting

T p q Unknowns Equations Required


LATE ( {0,1} 1 1 2 2 0
Example 1 {0,1,. . . ,|T | ´ 1} 1 |T | ´ 1 2p|T | ´ 1q 2p|T | ´ 1q 0
Example 2 {0,1,2} 2 1 7 6 1

Table 3 shows some values of the number of equations and the number of unknowns
pN ´ 1q for three examples. The first row has |T | “ |Z| “ 2; it generates the standard
LATE case, where the response group consists of never-takers (C00 ), compliers (C01 ), and
always-takers (C11 ). The second row is another case of exact identification. The third row
shows that one restriction is required to identify the sizes of the response-groups for the
3 ˆ 3 model. More generally, the degree of underidentification increases exponentially with
the number of targeted treatments p. The probabilities of the different groups are linked to
the generalized propensity scores by a system of linear equations. Under strict, one-to-one
targeting, this system takes a simple form, as mentioned in Section 3.

Proposition 4 (Identifying equations for group sizes under strict, one-to-one targeting). Un-
der Assumptions 1 to 5, the generalized propensity scores satisfy the following system of
equations, for all pz, tq P Z ˆ T :
ÿ
(3.1) P pt|zq “ 1 pt P Āq Prpi P CpA, tqq
AĂT ˚ z tzu
ÿ ÿ
` 1 pt “ z P Aq Prpi P CpA, τ qq
AĂT ˚
τ PĀ

where we denote Ā ” pT z T ˚ q
Ť
A.

While this may look cryptic, it is directly related to Corollary 1: the first line corresponds
to z P T z Ai and t “ ti , and the second line corresponds to t “ z P Ai . The set Āi ”
pT z T ˚ q Ai contains the non-targeted treatments and those for which Ti ptq “ t.
Ť

To simplify the exposition, we introduce one more element of notation. For any z P Z
and t P T , we define the conditional average outcome by Ēz ptq ” EpYi 11pTi “ tq|Zi “ zq.
For any response-group C and treatment value t P T , we define the group average outcome

18
as EpYi ptq|i P Cq. While the conditional average outcomes Ēz ptq are directly identified from
the data, the group average outcomes of course are not. We do know that some of them are
zero; and that they combine with the group probabilities to form the conditional average
outcomes. We will repeatedly use the following identity from Heckman and Pinto (2018,
Theorem T-1):

Lemma 1 (Group- and conditional average outcomes—Theorem T-1 of Heckman and Pinto
(2018)). Let z P Z and t P T . Then
ÿ
Ēz ptq “ EpYi ptq |i P Cq Prpi P Cq.
C“CR | Rpzq“t

In addition,
ÿ
EpYi |Zi “ zq “ Ēz ptq.
tPT

Under strict, one-to-one targeting, the set of response-groups C “ CR such that Rpzq “ t
is as enumerated in Proposition 4: it consists of

• all CpA, tq such that A Ă T ˚ z tzu and t P Ā;

• and, if t “ z, all CpA, τ q for z P A Ă T ˚ and τ P Ā.

The combination of Lemma 1 and of either Proposition 2 (under strict targeting) or


Proposition 4 (under strict, one-to-one targeting) does not exhaust the empirical content
of the model. A succession of papers14 has given necessary and sometimes sufficient condi-
tions for data to be rationalized under an instrument exclusion restriction. Most recently,
Bai and Tabord-Meehan (2024) characterized the sharp testable implications of ARUM un-
der joint independence15 .
To simplify the exposition, we will state our results in terms of effects of the treatment
on the expectation of the outcomes; they hold, however, for any measurable function of the
outcome f pY q. Note that if we chose f pY q “ 11pY ď tq for some value t, we would identify
the effects of the treatment on the cumulative distribution function of the outcome. By
inversion, we would recover the quantile treatment effects.

3.1 Positive Selection


We will sometimes make use of an identifying assumption that we call positive selection.
It obtains when for some treatment value t and response groups C ‰ C 1 that sometimes
14
See Balke and Pearl (1997), Kitagawa (2015), Mourifié and Wan (2017), Kédagni and Mourifié (2020)
and Sun (2023).
15
That is, when Zi is independent of tpYi ptq, Ti pzqq : pt, zq P T ˆ Zu.

19
choose t,16 we have EpYi ptq|i P Cq ď EpYi ptq|i P C 1 q. The identifying power of positive
selection depends on the context; we will illustrate it in Corollaries 2 and 3 as well as in
our application to Head Start in Section 4. A slightly different definition would replace Yi ptq
with a treatment effect Yi ptq ´ Yi pt1 q; we explore this variant in Corollary 4.

3.2 The Binary Instrument Model


Recall that with a binary instrument, strict targeting is trivially satisfied. Under one-to-one
targeting, Proposition 4 can be applied directly to some of the rows of Table 3.

3.2.1 Identification Under One-to-one Targeting

The second row of Table 3 shows that the group probabilities are just identified in our
Example 1 under strict, one-to-one targeting. Proposition 4 gives 2pT ´ 1q independent
equations: for t ‰ 1,

P pt|0q “ Prpi P CpH, tqq ` Prpi P Cpt1u, tqq and P pt|1q “ Prpi P CpH, tqq.

Moreover, CpH, tq “ Ctt for t ‰ 1 and Cpt1u, tq “ Ct1 for all t.


Note that when z changes from 0 to 1, the only observations that change treatment are
in Ct1 for t ‰ 1. Since the corresponding C1t group is empty, there are no “two-way flows”
and this model satisfies the unordered monotonicity property of Heckman and Pinto (2018).
Proposition 5 gives explicit formulæ for the probabilities of all p2|T | ´ 1q response groups.

Proposition 5 (Response-group probabilities in Example 1 under one-to-one targeting). Un-


der Assumptions 1 to 5, the following probabilities are identified:

PrpC11 q “ P p1|0q,
(3.2) PrpCtt q “ P pt|1q for t ‰ 1,
PrpCt1 q “ P pt|0q ´ P pt|1q for t ‰ 1.

Since PrpCt1 q ě 0, the model has p|T | ´ 1q simple testable predictions: P pt|0q ě
P pt|1q for t ‰ 1. While all the response group probabilities are point-identified, only some
group average outcomes are point identified without further restrictions, as shown by Propo-
sition 6.

16
That is, for which PrpTi “ t|i P Cq and PrpTi “ t|i P C 1 q are nonzero.

20
Proposition 6 (Group average outcomes in Example 1 under one-to-one targeting). Under
Assumptions 1 to 5, the following group average outcomes are point-identified:

Ē0 p1q
E rYi p1q|i P C11 s “ ,
P p1|0q
Ē1 ptq
E rYi ptq|i P Ctt s “ for t ‰ 1,
P pt|1q
Ē0 ptq ´ Ē1 ptq
E rYi ptq|i P Ct1 s “ for t ‰ 1.
P pt|0q ´ P pt|1q

However, if T ą 2 the standard Wald estimator only identifies a convex combination of the
LATEs on the complier groups Ct1 :
ř
EpYi |Zi “ 1q ´ EpYi |Zi “ 0q pĒ1 p1q ´ Ē0 p1qq ´ t‰1 pĒ0 ptq ´ Ē1 ptqq

PrpTi “ 1|Zi “ 1q ´ PrpTi “ 1|Zi “ 0q P p1|1q ´ P p1|0q
ÿ
(3.3) “ αt E rYi p1q ´ Yi ptq|i P Ct1 s ,
t‰1

Ť
where the weights αt “ Prpi P Ct1 |i P Cτ 1 q “ pP pt|0q ´ P pt|1qq{pP p1|1q ´ P p1|0qq are
τ ‰1
identified, positive, and sum to 1. If T “ 2, we have α0 “ 1 and the familiar LATE formula

EpYi |Zi “ 1q ´ EpYi |Zi “ 0q


EpYi p1q ´ Yi p0q|i P C01 q “ .
PrpTi “ 1|Zi “ 1q ´ PrpTi “ 1|Zi “ 0q

Proposition 6 shows that we only identify a known convex combination of the p|T | ´ 1q
LATEs17 . This formula is reminiscent of Angrist and Imbens (1995, Theorem 1), which deals
with a different model in which treatments are ordered. It is possible to re-derive our identi-
fication results in Propositions 5 and 6 using the general framework of Heckman and Pinto
(2018). We provide details in Appendix D.
So far, we only imposed restrictions on the process by which treatment values are assigned
to observations; this is what Goff (2024b) calls an “outcome-agnostic” approach in that it
only assumes that the instruments are excluded from the outcome equations. It is possible
to bound the average treatment effects in a straightforward manner if we assume that the
support of the outcomes is known and bounded. One could instead add restrictions to achieve
point identification of average treatment effects for the compliers. Assuming that the ATEs
are all equal is one obvious solution. Another one is to assume some degree of homogeneity
of group average outcomes. Alternatively, we may consider weaker conditions under which
17
We use the term “LATEs” for the average treatment effects on the various complier groups. Through-
out the remainder of the paper, we assume, as is standard, that probability differences appearing in the
denominator of estimands are always nonzero.

21
the average treatment effects for the compliers are only partially identified. We explore these
ideas below.

3.2.2 Adding Identification Constraints

Consider the binary instrument model with T ě 3.

Beyond One-to-one Targeting First note that the probabilities of the response-groups can
be identified under weaker restrictions than one-to-one targeting. Suppose for instance that
z “ 1 targets all treatment values t ě 1: we have U1 ptq ą U0 ptq for all t ě 1. Then the
complier groups Ct0 for t ě 1 must be empty. To see this, suppose that Ti p0q “ t ě 1. This
implies U0 ptq ` uit ą U0 p0q ` ui0 “ ui0 . Adding up these inequalities gives U1 ptq ` uit ą ui0 ,
and Ti p1q cannot be 0.
All other groups Ctt1 may exist. This leaves |T |p|T | ´ 1q unknown group probabilities,
which is |T |{2 times more than the 2p|T | ´ 1q propensity scores we observe. We need
p|T | ´ 1qp|T | ´ 2q additional constraints to point-identify all group probabilities.

Single-peaked Mean Utilities Now suppose that mean utilities are “single-peaked” in the
sense that the function t Ñ U1 ptq ´ U0 ptq is decreasing over t “ 1, . . . , T ´ 1. This would be
a reasonable assumption if z “ 1 makes treatment t “ 1 more attractive and the treatments
t ą 1 are ordered by their proximity to t “ 1.
If this holds, then the same argument as above shows that the response groups Ctt1 must
be empty when t1 ą t ě 1. This eliminates p|T | ´ 1qp|T | ´ 2q{2 response groups; we divided
by two the number of additional identification constraints that we need.

Positive Selection The binary instrument model gives a first example of the power of the
positive selection defined in Section 3.1. Take τ ‰ 1 and consider the complier groups Cτ 1 :
they all have t “ 1 when z “ 1, but they shift to it from different treatment values τ under
z “ 0. Depending on the context, there may be a plausible reason to order the corresponding
group average outcomes when t “ 1. Suppose for instance that T “ 3, and that

(3.4) E rYi p1q|i P C01 s ď E rYi p1q|i P C21 s .

In this 2 ˆ 3 model under one-to-one targeting, there are five response-groups, as illustrated
in Figure 3. Proposition 6 shows that the Wald estimator only identifies

α0 E rYi p1q ´ Yi p0q|i P C01 s ` p1 ´ α0 qE rYi p1q ´ Yi p2q|i P C21 s ,

22
where α0 “ pP p0|0q ´ P p0|1qq{pP p1|1q ´ P p1|0qq is point-identified. Corollary 2 shows that
adding inequality (3.4) yields bounds on the corresponding LATEs.

Figure 3: A 2 ˆ 3 model with one targeted treatment

ui2 ´ ui0

C22
C21

P1
P0 ui1 ´ ui0
C00 C01 C11

Corollary 2 (Positive selection and treatment effects in the 2 ˆ 3 model under one-to-one
targeting). If

(3.5) E rYi p1q|i P C01 s ď E rYi p1q|i P C21 s ,

then the local average treatment effects for C01 and C21 are partially identified:

Ē1 p1q ´ Ē0 p1q Ē0 p0q ´ Ē1 p0q


E rYi p1q ´ Yi p0q|i P C01 s ď ´ ,
P p1|1q ´ P p1|0q P p0|0q ´ P p0|1q
(3.6)
Ē1 p1q ´ Ē0 p1q Ē0 p2q ´ Ē1 p2q
E rYi p1q ´ Yi p2q|i P C21 s ě ´ ,
P p1|1q ´ P p1|0q P p2|0q ´ P p2|1q

Moreover, (3.5) implies the following testable prediction:

Ē0 p2q ´ Ē1 p2q Ē0 p0q ´ Ē1 p0q


(3.7) ě .
P p2|0q ´ P p2|1q P p0|0q ´ P p0|1q

If (3.5) holds at equality, then the two statements in (3.6) and the testable prediction in (3.7)
also become equalities, and the two LATEs are point-identified.

The lower bounds on the local average treatment effects for C01 and C21 may not be sharp;
on the other hand, they are easy to estimate from sample averages. It is a topic for future
research to obtain the sharp bounds and develop a corresponding method for estimation and
inference.

23
3.3 The 3 ˆ 3 Model
Let us now turn to the 3 ˆ 3 model of Example 2, where Z ˚ “ T ˚ “ t1, 2u and Z “ T “
t0, 1, 2u. We assume strict one-to-one targeting: for all of our results in this section, we
impose Assumptions 1 - 5; z “ 1 targets t “ 1 and z “ 2 targets t “ 2.
The set A in Corollary 1 can be H, t1u, t2u, or t1, 2u, with corresponding values of t
in t0u, t0, 1u, t0, 2u or t0, 1, 2u respectively. The set cpH, 0q corresponds to the never-takers
C000 . For A “ t1u we get C010 and C111 , and for A “ t2u we get C002 and C222 . Finally, with
A “ t1, 2u we have C012 , C112 , and C212 .

Figure 4: Strictly one-to-one targeted treatment in the 3 ˆ 3 model

ui2 ´ ui0

C222
C212

P1 P0 C112
ui1 ´ ui0
C002 C012

C111
C000 C010 P2

These eight elemental response groups are illustrated in Figure 4, again with the origin in
P0 . Comparing Figure 4 with Figure 2 shows the identifying power of Assumption 5. Table 4
shows which groups take Ti “ t when Zi “ z.

Table 4: Response Groups of Example 2

Ti pzq “ 0 Ti pzq “ 1 Ti pzq “ 2

Ť Ť Ť Ť Ť
z“0 C000 C010 C002 C012 C111 C112 C222 C212
Ť Ť Ť Ť Ť
z“1 C000 C002 C111 C010 C012 C112 C212 C222
Ť Ť Ť Ť Ť
z“2 C000 C010 C111 C222 C002 C012 C112 C212

Unlike the 2ˆ3 model, even under strict one-to-one targeting the 3ˆ3 model does not sat-
isfy unordered monotonicity. One could show it with the matrix algebra in Heckman and Pinto
(2018).18 It is more straightforward to note that when thee instrument value changes from
18
See Appendix D for details.

24
z “ 1 to z “ 2, observations in C010 move to treatment value 0, while observations in
C002 leave treatment 0. This is the definition of a two-way flow, which violates unordered
monotonicity. Since the 3 ˆ 3 model has three instrument values and only two targeted
treatments, Bai, Huang, Moon, Shaikh, and Vytlacil (2024, Example 4.7) shows that it sat-
isfies their weaker general monotonicity assumption. As a consequence, the average potential
outcomes ErYi pdqs can only be restricted by identification at infinity arguments.

3.3.1 Identification in the 3 ˆ 3 Model

We know from the third row of Table 3 that one restriction is missing to point-identify
the probabilities of all eight response-groups. The following proposition shows that the
probabilities of four of the eight elemental groups are point-identified: two groups of always-
takers, and two groups of compliers. The other four probabilities are constrained by three
adding-up constraints.

Proposition 7 (Response-group probabilities in the 3 ˆ 3 model under strict, one-to-one


targeting). The following four probabilities are identified: PrpC111 q “ P p1|2q, PrpC222 q “
P p2|1q, PrpC112 q “ P p1|0q ´ P p1|2q, and PrpC212 q “ P p2|0q ´ P p2|1q. The remaining four
response group probabilities are partially-identified and can be parameterized as: PrpC000 q “
p, PrpC002 q “ P p0|1q´p, PrpC010 q “ P p0|2q´p, and PrpC012 q “ P p0|0q´P p0|1q´P p0|2q`p,
where the unknown p satisfies maxt0, P p0|1q`P p0|2q´P p0|0qu ď p ď mint1, P p0|1q, P p0|2qu.

As before, the model has the following testable implications: P p1|1q ě P p1|0q ě P p1|2q,
P p2|2q ě P p2|0q ě P p2|1q, and P p0|0q ě maxpP p0|1q, P p0|2qq. The following proposition
identifies a number of group average outcomes19 .

Proposition 8 (Group average outcomes in the 3ˆ3 model under strict, one-to-one targeting).
19
Again, these could also be derived using the general framework of Heckman and Pinto (2018), even
though the unordered monotonicity assumption is not satisfied—see Appendix D.

25
The following group average outcomes are point-identified:

Ť Ē1 p0q Ť Ē2 p0q


E rYi p0q|i P C000 C002 s “ , E rYi p0q|i P C000 C010 s “ ,
P p0|1q P p0|2q
Ē2 p1q Ē1 p2q
E rYi p1q|i P C111 s “ , E rYi p2q|i P C222 s “ ,
P p1|2q P p2|1q
Ť Ē0 p0q ´ Ē1 p0q Ť Ē0 p0q ´ Ē2 p0q
E rYi p0q|i P C010 C012 s “ , E rYi p0q|i P C002 C012 s “ ,
P p0|0q ´ P p0|1q P p0|0q ´ P p0|2q
Ť Ť Ē1 p1q ´ Ē0 p1q Ē0 p1q ´ Ē2 p1q
E rYi p1q|i P C010 C012 C212 s “ , E rYi p1q|i P C112 s “ ,
P p1|1q ´ P p1|0q P p1|0q ´ P p1|2q
Ť Ť Ē2 p2q ´ Ē0 p2q Ē0 p2q ´ Ē1 p2q
E rYi p2q|i P C002 C012 C112 s “ , E rYi p2q|i P C212 s “ .
P p2|2q ´ P p2|0q P p2|0q ´ P p2|1q

By itself, Proposition 8 does not allow us to identify an average treatment effect for any
(even composite) response-group. Suppose for instance that we want to identify EpYi p1q ´
Yi p0q|i P Cq for some group C. Then C needs to exclude C111 , C112 , and C212 , since EpYi p0q|i P
C 1 q is not identified for any group C 1 that contains C111 , C112 , or C212 . Since we only know
the mean outcome of treatment 1 for groups that contain one of these three elemental groups,
the conclusion follows.

3.3.2 Using Positive Selection

Note that if we assumed EpYi p1q|i P C112 q “ EpYi p1q|i P C212 q, then we could combine the
two equations in the fourth displayed line of Proposition 8 and the probabilities of C112
Ť
and C212 (which are point-identified by Proposition 7) to obtain EpYi p1q|i P C010 C012 q.
This would point-identify the average effect of treatment 1 vs treatment 0 on this composite
complier group C01˚ . While this assumption may be overly strong, it seems natural to
impose that Yi pτ q is on average larger in a response group that has t “ τ for more values of
z. Assumption 6 formalizes this intuition in our setting.

Assumption 6 (Positive selection in the 3 ˆ 3 model). Either or both of the following assump-
tions hold:

(3.8) E rYi p1q|i P C112 s ě E rYi p1q|i P C212 s ,


(3.9) E rYi p2q|i P C212 s ě E rYi p2q|i P C112 s .

Assumption 6 states a form of positive selection into treatment, as defined in Section 3.1.
Consider Equation (3.8) for instance. It says that within the group of “12-compliers” C˚12 “
Ť Ť
C012 C112 C212 , those observations with T p0q “ 1 have a larger average counterfactual

26
Y p1q than those with T p0q “ 2. Corollary 3 shows that this gives bounds on the local
average treatment effects for C01˚ -compliers, with a similar result for Equation (3.9) and
C0˚2 -compliers.

Corollary 3 (Identifying treatment effects in the 3 ˆ 3 model). 1. Under (3.8), the local
average treatment effect
E rYi p1q ´ Yi p0q|i P C01˚ s

is at least as large as

pĒ1 p1q ´ Ē0 p1qq ´ pĒ0 p0q ´ Ē1 p0qq Ē0 p1q ´ Ē2 p1q P p2|0q ´ P p2|1q
´ .
P p0|0q ´ P p0|1q P p1|0q ´ P p1|2q P p0|0q ´ P p0|1q

2. Under (3.9), the local average treatment effect

E rYi p2q ´ Yi p0q|i P C0˚2 s

is at least as large as

pĒ2 p2q ´ Ē0 p2qq ´ pĒ0 p0q ´ Ē2 p0qq Ē0 p2q ´ Ē1 p2q P p1|0q ´ P p1|2q
´ .
P p0|0q ´ P p0|2q P p2|0q ´ P p2|1q P p0|0q ´ P p0|2q

3. In both 1 and 2, “at least as large” can be replaced with “equals” if the corresponding
inequality in Assumption 6 is an equality.

3.3.3 When is Positive Selection Plausible?

Let us focus on (3.9). Given strict one-to-one targeting, C112 is defined by

Up1q ´ Ū p2q ď ui2 ´ ui1 ď Up1q ´ U p2q, ui1 ´ ui0 ě ´U p1q.

C212 is defined by

Up1q ´ U p2q ď ui2 ´ ui1 ď Ūp1q ´ U p2q, ui2 ´ ui0 ě ´U p2q.

To simplify notation, define ζi “ ui2 ´ ui1 and ξi “ ui2 ´ ui0 , so that ui1 ´ ui0 “ ξi ´ ζi . The
inequalities above can be rewritten as

• for C112 : Up1q ´ Ū p2q ď ζi ď Up1q ´ U p2q, ξi ´ ζi ě ´Up1q;

• for C212 : Up1q ´ U p2q ď ζi ď Ūp1q ´ U p2q, ξi ě ´Up2q.

27
Figure 5 plots these two groups on the ζi ˆ ξi plane. Group C212 corresponds to the
piq
top-right (infinite) rectangle and group C112 is partitioned into the two subgroups: C112 is a
piiq
bottom-left triangle and C112 is a top-left (infinite) rectangle.

Figure 5: Positive Selection in the Generalized 3 ˆ 3 Roy model

ξi

piiq
C112 C212

Pb Pc
Pa ζi
piq
C112

Notes: Pa “ pU p1q ´ Ū p2q, ´U p2qq, Pb “ pU p1q ´ U p2q, ´U p2qq,


and Pc “ pŪ p1q ´ U p2q, ´U p2qq.

Now suppose that

Assumption 7 (Positive Codependence). (1) If A and B are two measurable sets in the
pζ, ξq plane such that

pζ, ξq P A and pζ 1 , ξ 1q P B ùñ pζ ď ζ 1 and ξ 1 ď ξ 1 q ,

then EpYi p2q|pζi , ξi q P Aq ď EpYi p2q|pζi , ξi q P Bq.

(2) If A and B are two measurable sets on the real line such that

ζ P A and ζ 1 P B ùñ ζ ď ζ 1

then EpYi p2q|ζi P A, ξi ě bq ď EpYi p2q|ζi P B, ξi ě bq for all b.


piq
As shown in Figure 5, every point in C112 has lower values of both ζi and ξi than any
point in C212 . Therefore, by Assumption 7(1), the expected value of Yi p2q in this triangle
piiq
is smaller than EpYi p2q|i P C212 q. Every point in C112 has is a smaller value of ζi than at
any point in C212 (fixing the value of ξi ě ´U p2q on both sides). Assumption 7(2) implies
that the expected value of Yi p2q in this rectangle again is smaller than EpYi p2q|i P C212 q.
Combining these two inequalities gives EpYi p2q|i P C212 q ě EpYi p2q|i P C112 q, that is (3.9).
Assumption 7 seems weak. Because both ζi “ ui2 ´ ui1 and ξi “ ui2 ´ ui0 are increasing in
ui2 , this assumption aligns well with the concept of positive selection. Suppose for instance

28
that
EpYi p2q|ui0 , ui1, ui2 q ´ EYi p2q “ a0 ui0 ` a1 ui1 ` a2 ui2 ,

where a0 , a1 , and a2 are some constants, and that pui0 , ui1 , ui2q are jointly normal and
mutually uncorrelated with the common mean 0 and the common variance 1. We interpret
pui0 , ui1 , ui2q as the underlying primitive random variables that are normalized to have mean
0 and variance 1. It is easy to derive20 that

a2 ` a0 ´ 2a1 a2 ` a1 ´ 2a0
EpYi p2q|ζi , ξi q ´ EYi p2q “ ζi ` ξi .
3 3

Hence, in this example, Assumption 7 holds if and only if a2 ` a0 ě 2a1 and a2 ` a1 ě 2a0 .
In summary, a sufficiently large value of a2 induces positive selection, generating patterns
similar to those of comparative advantage in generalized Roy models.

3.4 What do the IV estimators identify in the 3 ˆ 3 model?


Kirkeboen, Leuven, and Mogstad (2016, hereafter KLM) used a 3 ˆ 3 model to study the
impact of the field of study on later earnings. Their Proposition 2 characterizes what two-
stage least squares (TSLS) estimators identify under different sets of assumptions. The
least stringent version combines a monotonicity assumption (Assumption 4 in KLM) and
condition (iii) in their Proposition 2, which they call “irrelevance and information on next-
best alternatives”. “Irrelevance” is a set of exclusion restrictions, while “information on
next-best alternatives” assumes the availability of additional data.

3.4.1 Monotonicity and Irrelevance

While we take quite a different path, our strict one-to-one targeting assumption turns out
to yield exactly the same identifying restrictions as the combination of monotonicity and
irrelevance in KLM. We show it in Appendix C.
This set of assumptions in itself is too weak to give two-stage least squares estimates a
simple interpretation. To see this, let β1 and β2 be the probability limits of the coefficients
in a regression of Yi on the indicator variables 11pTi “ 1q and 11pTi “ 2q, with instruments Zi .
Remember from Table 4 that under strict one-to-one targeting, five response-groups have
T p1q “ 1:

1. the always-takers C111 ;

2. the “intermediate” group C112 , which has T pzq “ 1 unless z “ 2;


20
See Appendix G for details.

29
3. the three groups C010 , C012 , and C212 , which have T pzq “ 1 if and only of z “ 1.

A similar distinction applies to the groups that have T p2q “ 2; it motivates Definition 6.

Definition 6 (1-compliers and 2-compliers). We call


Ť Ť
C1 “ C010 C012 C212 ,

the 1-compliers group and


Ť Ť
C2 “ C002 C012 C112

the 2-compliers group.

The β1 and β2 coefficients turn out to be weighted averages of the LATEs on these two
groups and on the intermediate groups C112 and C212 .

Proposition 9 (TSLS in the 3 ˆ 3 model under strict, one-to-one targeting). The parameters
β1 and β2 satisfy
˜ ¸˜ ¸
Prpi P C1 q ´ Prpi P C212 q β1
´ Prpi P C112 q Prpi P C2 q β2
˜ ¸
ErtYi p1q ´ Yi p0qu11pi P C1 qs ´ ErtYi p2q ´ Yi p0qu11pi P C212 qs
“ .
ErtYi p2q ´ Yi p0qu11pi P C2 qs ´ ErtYi p1q ´ Yi p0qu11pi P C112 qs

Proposition 9 implies that β1 and β2 are weighted averages of the four local average
treatment effects on the right-hand side of this system of two equations. The weights are
functions of the four probabilities on the left-hand side, which are point identified by Propo-
sition 7. However, these weights may be positive or negative. This complicates interpretation
further21 .

3.4.2 Additional Assumptions

Next-best alternatives Using the additional information on next-best alternatives in KLM


amounts, in our notation, to dropping the “intermediate” response-groups C212 and C112 from
the data. Then the system of equations in Proposition 9 becomes diagonal and it yields

β1 “ ErYi p1q ´ Yi p0q|i P C1 s,


β2 “ ErYi p2q ´ Yi p0q|i P C2 s,
21
Mogstad, Torgovitsky, and Walters (2021) give a set of assumptions under which the weights are positive
in a model with multiple binary instruments.

30
Ť Ť
where now C1 reduces to C010 C012 and C2 reduces to C002 C012 . This is exactly Propo-
sition 2 (iii) of KLM. Alternatively, one may simply assume that the response-groups C212
and C112 are empty. This is the path taken by Bhuller and Sigstad (2024)22 .

Positive Selection Additional information of the type used by KLM often is not available.
Moreover, assuming away C112 and C212 seems rather strong. On the other hand, reasonable
assumptions can be used to generate bounds on the local average treatment effects for 1-
compliers and 2-compliers. Corollary 4 illustrates this.

Corollary 4 (TSLS in the 3 ˆ 3 model under strict, one-to-one targeting). Assume that

(3.10) D ” Prpi P C1 q Prpi P C2 q ´ Prpi P C212 q Prpi P C112 q ‰ 0.

Let
D1 ” E pYi p1q ´ Yi p0q|i P C1 q ´ E pYi p1q ´ Yi p0q|i P C112 q

and
D2 ” E pYi p2q ´ Yi p0q|i P C2 q ´ E pYi p2q ´ Yi p0q|i P C212 q .

If D1 D2 ą 0, then β1 ´ E pYi p1q ´ Yi p0q|i P C1 q and β2 ´ E pYi p2q ´ Yi p0q|i P C2 q have the sign
of D.

Note that the KLM result of the previous paragraph is the limit case where D1 “ D2 “ 0.
The regularity condition (3.10) ensures that the 2 ˆ 2 matrix that premultiplies pβ1 , β2 q1
in Proposition 9 be invertible23 . To interpret the assumptions on signs, suppose that D1 is
Ť Ť
positive. Since C1 “ C010 C012 C212 , the positivity of D1 states that the average effect
Ť Ť
of treatment 1 on C010 C012 C212 is at least as large as on C112 . This is a form of
positive selection that is in the same spirit as (but different from) Assumption 6. If this form
of positive selection holds for both treatments, then the TSLS estimates overestimate the
LATEs on the corresponding compliers if D ą 0, and they underestimate them if D ă 0.
To summarize, the TSLS estimators in the 3 ˆ 3 model are difficult to interpret unless
additional information is available and/or some additional assumptions are imposed. If
the groups C112 and C212 are indeed empty, then both the TSLS estimators and those we
obtained in Corollary 3 should identify the LATEs on the 1- and 2-compliers. Comparing
their values is a useful (if informal) way of testing the assumptions and of exploring further
the heterogeneity of the treatment effects.
22
See their Corollary 5 and Table 1 for details.
23 Ť Ť
It holds if C212 and C112 have positive probability and either C010 C012 or C002 C012 has positive
probability.

31
4 Empirical Example: The Head Start Impact Study
We now reexamine the Kline and Walters’s (2016) analysis of the Head Start Impact Study
(HSIS) using our framework. We use exactly the same data as they did; we only apply
different identifying assumptions.
Head Start is a federal program in the US that addresses various factors affecting chil-
dren’s development in low-income families. It provides early childhood education (hereafter
“preschool”) and health and nutrition services. HSIS was a longitudinal study conducted
from 2002 to 2010 to assess the program’s impact on cognitive, social-emotional, and health
outcomes. It focused on 84 communities where the demand for Head Start services was larger
than the supply. HSIS randomly assigned about 5,000 three and four year old preschool chil-
dren to either a treatment group which was offered Head Start services, or a control group
which received no such offer. Children in either group could also attend other preschool
centers if offered a slot
The structure of HSIS is identical to that of Example 1: it is a 2 ˆ 3 model. The
treatments here consist of no preschool (n), Head Start (h), and other preschool centers (c):
T “ tn, h, cu. The instrument is binary, with a control group (z “ 0) and a group that is
offered admission to Head Start (z “ 1). The outcome variable is test scores, measured in
standard deviations from their mean.
In the terminology of this paper, treatment assignment satisfies strict, one-to-one tar-
geting: strict targeting as the instrument is binary, and one-to-one targeting as z “ 1 only
targets Head Start24 . Figure 6 reproduces Figure 3 in this setting. In addition to the three
always-taker groups Cnn , Ccc , and Chh , there are two complier groups: Cnh , and Cch . In
Sections 4.1 and 4.2, we focus on the LATEs on the two complier groups Cnh and Cch . Sec-
tion 4.3 embeds the model into a larger, 3 ˆ 3 model in order to evaluate the marginal value
of the public funds used in Head Start.

4.1 Group proportions and counterfactual means


Our estimates of the proportions of the two complier groups in the sample use (3.2) in
Proposition 5; they are shown in Panel A of Table 5. As expected, they coincide with those
in Kline and Walters (2016).
Panel B of Table 5 shows the counterfactual means of test scores for the complier groups,
as per Proposition 6. While ErYi pnq|i P Cnh s is negative, ErYi pcq|i P Cch s is above 0.1 stan-
dard deviation—not a negligible value in this field. This suggests that some of the children
24
Kamat (2024) analyzes HSIS using a different approach that focuses on how the choice sets available to
a child vary with the value of the instrument.

32
Figure 6: The Kline and Walters (2016) Model of Preschool Choice

uic ´ uin

Ccc
Cch

Ph
Pn uih ´ uin
Cnn Cnh Chh

who enter Head Start would have been at a good preschool otherwise. Kline and Walters
(2016) call this pattern the “substitution effect” of Head Start. However, Kline and Walters
(2016) do not report estimates of ErYi pnq|i P Cnh s and ErYi pcq|i P Cch s.

4.2 Treatment Effects


To fully measure the substitution effect, one needs to identify E rYi phq|i P Cnh s and E rYi phq|i P Cch s.
However, we know from Proposition 6 that they are only partially identified by

E rYi 11pTi “ hq|Zi “ 1s ´ E rYi 11pTi “ hq|Zi “ 0s


α0 E rYi phq|i P Cch s`p1´α0 qE rYi phq|i P Cnh s “ .
P ph|1q ´ P ph|0q

where α0 “ pP pc|0q´P pc|1qq{pP ph|1q´P ph|1qq. This is exactly the formula on Kline and Walters
(2016, pp.1811): as they point out, the LATE for Head Start is a weighted average of
“subLATEs” with weights determined by the proportion of Cch among compliers, which is
identified from the data25 .
Kline and Walters (2016) first tried to identify ErYi phq ´ Yi pcq|i P Cch s and ErYi phq ´
Yi pnq|i P Cnh s separately using interactions of the instrument with covariates or experimental
sites. They acknowledged the limitations of this approach and resorted to a parametric
selection model à la Heckman (1979) instead. They report26 estimates of the local average
treatment effects of 0.370 for Cnh and ´0.093 for Cch , with respective standard errors 0.088
and 0.154. The resulting point estimate of the difference is quite large, at 0.463 standard
deviation.
Our Corollary 2 provides an alternative approach to separating the two treatment effects.
25
Our α0 is denoted Sc in their paper.
26
See Kline and Walters (2016, Table VIII, column (4), full model).

33
Table 5: Proportions, Counterfactual Means and Treatment Effects by Response Groups

3-year-olds 4-year-olds Pooled


Panel A. Proportions of Response Groups via Proposition 5

Compliers from n to h (Cnh ) 0.505 0.393 0.454


Compliers from c to h (Cch ) 0.198 0.272 0.232
Panel B. Counterfactual Means of Test Scores via Proposition 6

ErYi pnq|i P Cnh s -0.027 -0.116 -0.062


ErYi pcq|i P Cch s 0.112 0.144 0.129
Panel C. Treatment Effects via Corollary 2

Upper Bound on ErYi phq ´ Yi pnq|i P Cnh s 0.279 0.285 0.278


(0.063) (0.076) (0.050)
Lower Bound on ErYi phq ´ Yi pcq|i P Cch s 0.140 0.025 0.087
(0.089) (0.097) (0.063)
Upper Bound on 0.139 0.260 0.191
ErYi phq ´ Yi pnq|i P Cnh s ´ ErYi phq ´ Yi pcq|i P Cch s (0.098) (0.115) (0.071)

Notes: Head Start (h), other centers (c), no preschool (n). Standard errors in
parentheses are clustered at the Head Start center level.

Given that compliers coming from other preschools (Cch ) had better test scores than com-
pliers not originally in preschools (Cnh ), it seems reasonable to assume that they also have
better test scores under Head Start:

(4.1) E rYi phq|i P Cch s ě E rYi phq|i P Cnh s .

This is a “positive selection” that fits within the framework of Corollary 2. It can be derived in
a simple model in which preschools, and especially Head Start, improve the outcomes of some
students; and students choose schools as a function of their expected outcome. We show in
Appendix B that this model generates positive selection under reasonable assumptions. The
pooled cohort estimates in Panel C of Table 5 indicate that the upper bound on ErYi phq ´
Yi pnq|i P Cnh s is 0.28 and the lower bound on ErYi phq ´ Yi pnq|i P Cch s is 0.09. The difference
between these two numbers gives an upper bound of 0.19 for the difference of these two
LATEs, with a standard error of 0.07. The testable prediction (3.7) implied by positive
selection translates here into the non-negativity of the upper bound; we cannot reject it at
any reasonable level. Conversely, negative selection (reverting the inequality (4.1)) would
make 0.19 a lower bound for the difference of the LATEs. At the same time, it would imply
that the lower bound is negative; this is soundly rejected by the data.

34
Our upper bound of 0.19 is much lower than the point estimate reported by Kline and Walters
(2016). In fact, our 95% and 99% one-sided confidence intervals for

ErYi phq ´ Yi pnq|i P Cnh s ´ ErYi phq ´ Yi pcq|i P Cch s

are p´8, 0.308q and p´8, 0.356q. We conclude that the 0.463 estimate in Kline and Walters
(2016) may overstate the difference between the two complier groups: it can only be ratio-
nalized under negative selection, which is a much less plausible assumption.

4.3 Expanding Access to Head Start


Kline and Walters (2016) sought to evaluate the welfare effect of increasing the number of
slots in Head Start, as summarized by the marginal value of public funds (MVPF). They
note that any expansion of Head Start will vacate some slots at competing preschools, which
are oversubscribed. The relaxation of this rationing must be counted as an effect of Head
Start expansions. This is what they call “rationed substitutes”27 .
The children who move from Ti “ n to Ti “ c when a slot is vacated by a child who
moves to Head Start consitute a Cnc group that is ruled out by the 2 ˆ 3 model. These
children increase their grades by Yi pcq ´ Yi pnq, whose average generates a LATE that we
denote LATEnc . Equation (9) in Kline and Walters (2016, p. 1816) shows that the value
of LATEnc is a crucial input in the computation of the MVPF of a Head Start expansion.
Identifying it requires either data on offers to all preschools, which Kline and Walters (2016)
do not have28 , or additional modeling assumptions. They used their parametric selection
model to construct an estimate for LATEnc . Their estimate of LATEnc “ 0.294 results in a
high MVPF estimate of 2.02 (see Table IX in their paper).
We take a different approach by embedding the 2 ˆ 3 model within a 3 ˆ 3 model. In this
richer model, the instrument can take three values: in addition to the control group (z “ 0)
and those offered admission to Head Start (z “ 1), we have a new group that we denote
z “ 2. This group is only offered admission to competing preschools because some seats were
left free by students who moved to Head Start (the Cch group of the binary model). Note
that this maintains strict, one-to-one targeting.
Figure 7 shows the resulting treatment assignment, using tildes to denote the complier
27
See Kline and Walters (2016, Sections V.D and IX.A) for details.
28
See footnote 19 in their paper.

35
groups of the 3 ˆ 3 model29 . Using this notation, LATEnc can be written as

LATEnc “ ErYi pcq ´ Yi pnq|i P C̃n˚c s,


Ť
where C̃n˚c “ C̃nnc C̃nhc is the composite group of n Ñ c compliers. Comparing Figure 7
with Figure 6 shows that the other complier groups of the two models are linked by
Ť
Cnh “ C̃nh˚ “ C̃nhn C̃nhc
Cch “ C̃ch˚ “ C̃chc .

Figure 7: Embedding Preschool Choice in a 3 ˆ 3 Model

uic ´ ui0

C̃ccc
C̃chc

Ph Pn C̃hhc
uih ´ ui0
C̃nnc C̃nhc

C̃hhh
C̃nnn C̃nhn Pc

Now consider the new group of n Ñ c compliers. It differs from C̃chc in that its members
will not go to a preschool unless they are offered a slot. We show in Appendix B that the
structural model that we used in the binary instrument case predicts the following inequality:

(4.2) ErYi pcq|i P C̃n˚c s ď ErYi pcq|i P C̃chcs.


Ť Ť
Now consider the composite response-groups C̃n˚c “ C̃nnc C̃nhc and C̃nn˚ “ C̃nnc C̃nnn .
As Figure 7 shows, they only differ by the substitution of C̃nnn for C̃nhc . The former never
go to Head Start or to another preschool, while the latter are full compliers. Our structural
model generates the inequality

ErYi pnq|i P C̃nnn s ď ErYi pnq|i P C̃nhc s


29
Again, it is just Figure 4 with different notation.

36
which implies

(4.3) ErYi pnq|i P C̃nn˚s ď ErYi pnq|i P C̃n˚c s.

Inequalities (4.2) and (4.3) again are “positive selection” assumptions that fall under our
Corollary 3.
Since C̃nn˚ coincides with Cnn and C̃chc is Cch , we already know the values of the right-
hand sides of both inequalities. Applying the same logic as in Corollary 3 gives us an upper
bound for LATEnc :

LATEnc ď ErYi pcq|i P C̃chc s ´ ErYi pnq|i P C̃nn˚s “ ErYi pcq|i P Cch s ´ ErYi pnq|i P Cnn s.

As the MVPF is an increasing function of LATEnc , this gives us in turn an upper bound
on its value 30 . We obtain LATEnc ď 0.164 and MVPF ď 1.55. These upper bounds are
noticeably smaller than the point estimates that result from the parametric selection model
of Kline and Walters (2016).

Concluding Remarks
We have shown that the idea of targeting is a useful way to analyze models with multivalued
treatments and multivalued instruments. Our paper only analyzed discrete-valued instru-
ments and treatments. Some of the notions we used would extend naturally to continuous in-
struments and treatments: the definitions of targeting, one-to-one targeting, and positive se-
lection would translate directly. Strict targeting, on the other hand, is less appealing in a con-
text in which continuous values may denote intensities. Our earlier paper (Lee and Salanié,
2018) as well as Mountjoy’s (2022) can be seen as analyzing continuous-instruments/discrete-
treatments models. Extending our analysis to models with continuous treatments is an obvi-
ous topic for further research. It would also be interesting to apply the partial identification
approach of Mogstad, Santos, and Torgovitsky (2018) in our setting. Finally, there has been
a surge of recent interest on understanding the properties of OLS and 2SLS estimands when
treatment effects vary with the covariates (Blandhol, Bonney, Mogstad, and Torgovitsky,
2022; Sloczyński, 2022; Goldsmith-Pinkham, Hull, and Kolesár, 2022). We believe that the
targeting concept and the identifying assumptions explored in this paper should be relevant
in this context and that they merit further investigation.
30
Online Appendix F derives the formula for the MVPF in this model.

37
A Proofs for Section 3
Proof of Proposition 1. Suppose that for some t P T ˚ , Ti pZ ˚ ptqq “ 0. Then ui0 ą Ū ptq ` uit .
However, if z ‰ Z ˚ ptq then Ū ptq ą Uz ptq under Assumption 4. Therefore ui0 ą Uz ptq ` uit ,
and Ti pzq cannot be t.

Proof of Proposition 2. Recall that Ti pzq maximizes pUz ptq ` uit q over t P T . Under strict
targeting, Uz ptq is Ū ptq if t P T ˚ pzq and U ptq otherwise.
Proof of (i): Since Ū ptq ą U ptq if t P T ˚ , we have

Vi˚ pzq ą max pUptq ` uit q.


˚tPT pzq

This implies that


ˆ ˙
maxpUz ptq ` uit q “ max max pŪptq ` uit q, max pU ptq ` uit q
tPT tPT ˚ pzq tRT ˚ pzq
´ ¯
˚
“ max Vi pzq, maxpU ptq ` uit q
tPT

“ max pVi˚ pzq, V i q .

Moreover, if V i “ U ti ` ui,ti is the maximum and ti P T ˚ pzq, then a fortiori U ti ` ui,ti ą


Ūti ` ui,ti . This gives a contradiction since Ūt ą U t for all strictly targeted t.
Proof of (ii): If z R Z ˚ , then T ˚ pzq is empty and Vi˚ pzq “ ´8.

Proof of Corollary 1. It follows directly from Proposition 2.

Proof of Proposition 3. Consider an observation i. The set Ai of Corollary 1 is a possibly


empty subset of T ˚ . The top alternative treatment ti can be in Ai or in T z T ˚ . If Ai has
a elements, this allows for a ` |T | ´ |T ˚ | “ a ` q values of ti . Now every pair pAi , ti q fully
defines a response-group. Since |Z ˚ | “ |T ˚ | “ p, this gives a total of
p ˆ ˙
ÿ p
pa ` qq
a“0
a

response-groups. Using the identities

b ˆ ˙
ÿ b
“ p1 ` 1qb “ 2b
a“0
a
b
ÿ ˆb˙ b´1
ÿ ˆb ´ 1˙
a “bˆ “ b ˆ 2b´1 ,
a“0
a a“0
a

38
we obtain a total of pp ` 2qq ˆ 2p´1 types.

Proof of Proposition 4. Take z P Z and t P T . Consider any observation i and the corre-
sponding Ai Ă T ˚ and ti P Ai pT z T ˚ q. There are only two ways to obtain Ti pzq “ t:
Ť

• if z R Ai , then Ti pzq “ ti ; therefore ti “ t. Summing over all subsets A of T ˚ that


exclude z gives the first term of (3.1).

• if z P Ai (which implies z P T ˚ ), we know that Ti pzq “ z no matter what the value of


ti is; hence t must equal z. Summing over all subsets A that include z and all values
of ti P A pT z T ˚ q gives the second line in (3.1).
Ť

By construction, each CpA, tq completely defines the mapping from instrument values to
treatment values; therefore each CpA, tq is an elemental group. Their union is clearly the set
of all observations. If i P CpA, tq CpA1 , t1 q, then A1 “ A “ Ai by the definition of Ai , and
Ť

t1 “ t “ ti . Therefore the CpA, tq partition the set of observations.

Proof of Lemma 1. For the sake of completeness, we provide the proof although the first
part of the Lemma is the same as Theorem T-1 of Heckman and Pinto (2018) (applied with
κpY q :“ Y ). Let
Ez pt|Cq ” EpYi 11pTi “ tq|Zi “ z, i P Cq.

We start from the sum over all response groups:


ÿ
Ēz ptq “ Ez pt|Cq Prpi P Cq.
C

First note that if group C does not have treatment t under instrument z, it should not figure
in the sum. Now if C “ CR with Rpzq “ t, we have

Ez pt|Cq “ EpYi 11pTi “ tq|Zi “ z, i P Cq


“ EpYi ptq|Zi “ z, i P Cq
“ EpYi ptq|i P Cq.

The second part of the Lemma is just adding up.

Proof of Proposition 5. The proof is in the text, with the exception of PrpC11 q “ P p1|0q
which follows from the fact that the probabilities add up to 1.

39
Proof of Proposition 6. Lemma 1 gives 2|T | equations:

Ē0 p1q “ EpYi p1q|i P Cpt1u, 1qq Prpi P Cpt1u, 1qq “ EpYi p1q|i P C11 q PrpC11 q
for t ‰ 1, Ē0 ptq “ EpYi ptq|i P CpH, tqq Prpi P CpH, tqq
` EpYi ptq|i P Cpt1u, tqq Prpi P Cpt1u, tqq
“ EpYi ptq|i P Ctt q PrpCtt q ` EpYi ptq|i P Ct1 q PrpCt1 q
(A.1) ÿ
Ē1 p1q “ EpYi p1q|i P Cpt1u, τ qq Prpi P Cpt1u, τ qq
τ PT
ÿ
“ EpYi p1q|i P Cτ 1 q PrpCτ 1 q
τ PT

for t ‰ 1, Ē1 ptq “ EpYi ptq|i P CpH, tqq Prpi P CpH, tqq “ EpYi ptq|i P Ctt q PrpCtt q.

Since Proposition 5 identifies all type probabilities, the first and fourth equations in (A.1)
give directly EpYi ptq|i P Ctt q for all t. Then the second equation identifies EpYi ptq|i P Ct1 q for
t ‰ 1.
By subtraction, we obtain
ÿ
pĒ1 p1q ´ Ē0 p1qq ´ pĒ0 ptq ´ Ē1 ptqq
t‰1
ÿ
“ E rYi p1q ´ Yi ptq|i P Ct1 s Prpi P Ct1 q.
t‰1

Combining these results with Proposition 5 and Lemma 1 yields the formula in the Propo-
sition. The denominator
ÿ
pP pt|0q ´ P pt|1qq “ P p1|1q ´ P p1|0q
t‰1

is positive, since all terms in the sum are positive. It follows that all αt weights are positive
and sum to 1.

Proof of Corollary 2. Recall from (A.1) that when T “ 3,

Ē1 p1q ´ Ē0 p1q “ E rYi p1q|i P C01 s Prpi P C01 q ` E rYi p1q|i P C21 s Prpi P C21 q.

40
Hence, under (3.5) we have

E rYi p1q|i P C01 s tPrpi P C01 q ` Prpi P C21 qu


ď Ē1 p1q ´ Ē0 p1q
ď E rYi p1q|i P C21 s tPrpi P C01 q ` Prpi P C21 qu .

The first conclusion of the corollary follows immediately, as

Prpi P C01 q ` Prpi P C21 q “ P p1|1q ´ P p1|0q.

The testable prediction is a direct consequence of this chain of inequalities.

Proof of Proposition 7. It is straightforward from Figure 4 and Table 4.

Proof of Proposition 8. By Lemma 1, we obtain


Ť Ť
Ē1 p0q “ E rYi p0q|i P C000 C002 s Prpi P C000 C002 q,
Ť Ť
Ē2 p0q “ E rYi p0q|i P C000 C010 s Prpi P C000 C010 q,
Ē2 p1q “ E rYi p1q|i P C111 s Prpi P C111 q,
Ē1 p2q “ E rYi p2q|i P C222 s Prpi P C222 q,
Ť Ť
Ē0 p0q ´ Ē1 p0q “ E rYi p0q|i P C010 C012 s Prpi P C010 C012 q,
Ť Ť
Ē0 p0q ´ Ē2 p0q “ E rYi p0q|i P C002 C012 s Prpi P C002 C012 q,
Ť Ť Ť Ť
Ē1 p1q ´ Ē0 p1q “ E rYi p1q|i P C010 C012 C212 s Prpi P C010 C012 C212 q,
Ē0 p1q ´ Ē2 p1q “ E rYi p1q|i P C112 s Prpi P C112 q,
Ť Ť Ť Ť
Ē2 p2q ´ Ē0 p2q “ E rYi p2q|i P C002 C012 C112 s Prpi P C002 C012 C112 q,
Ē0 p2q ´ Ē1 p2q “ E rYi p2q|i P C212 s Prpi P C212 q.

Then, the results follows from the fact that all group probabilities are identified.
Ť
Proof of Corollary 3. First note that C01˚ “ C010 C012 . Under (3.8), we have
Ť Ť Ť
EpYi p1q11pi P C010 C012 qq “ EpYi p1q11pi P C010 C012 C212 qq ´ EpYi p1q11pi P C212 qq
Ť Ť
“ EpYi p1q11pi P C010 C012 C212 qq ´ EpYi p1q|i P C212 q Prpi P C212 q
Ť Ť Ť Ť
ě EpYi p1q|i P C010 C012 C212 q ˆ Prpi P C010 C012 C212 q
´ EpYi p1q|i P C112 q Prpi P C212 q.

41
Replacing the probabilities and conditional expectations with their values from Proposition 7
Ť
and Proposition 8, we obtain Prpi P C010 C012 q “ P p0|0q ´ P p0|1q and

Ť Ē0 p1q ´ Ē2 p1q


EpYi p1q11pi P C010 C012 qq ě Ē1 p1q ´ Ē0 p1q ´ pP p2|0q ´ P p2|1qq.
P p1|0q ´ P p1|2q

Finally, writing

EpYi p1q11pi P C010 C012 qq Ē0 p0q ´ Ē1 p0q


Ť
Ť
EpYi p1q ´ Yi p0q|i P C010 C012 q “ ´
Prpi P C010 C012 q P p0|0q ´ P p0|1q
Ť

gives the result.


Ť
The proof under (3.9) is similar: we start from C0˚2 “ C002 C012 . Under (3.8), we have
Ť Ť Ť
EpYi p2q11pi P C002 C012 qq “ EpYi p2q11pi P C002 C012 C112 qq ´ EpYi p2q11pi P C112 qq
Ť Ť
“ EpYi p2q11pi P C002 C012 C112 qq ´ EpYi p2q|i P C112 q Prpi P C112 q
Ť Ť Ť Ť
ě EpYi p2q|i P C002 C012 C112 q ˆ Prpi P C002 C012 C112 q
´ EpYi p2q|i P C212 q Prpi P C112 q.

Replacing the probabilities and conditional expectations with their values from Proposition 7
Ť
and Proposition 8, we obtain Prpi P C002 C012 q “ P p0|0q ´ P p0|2q and

Ť Ē0 p2q ´ Ē1 p2q


EpYi p2q11pi P C002 C012 qq ě Ē2 p2q ´ Ē0 p2q ´ pP p1|0q ´ P p1|2qq.
P p2|0q ´ P p2|1q

Finally, writing

EpYi p2q11pi P C002 C012 qq Ē0 p0q ´ Ē2 p0q


Ť
Ť
EpYi p2q ´ Yi p0q|i P C002 C012 q “ ´
Prpi P C002 C012 q P p0|0q ´ P p0|2q
Ť

gives the result.

B Positive Selection in Head Start


Let realized grades be given by

Yi ptq “ kt ` mi pt ` ζit ,

where mi ą 0
ph ą pc ą pn “ 0.

42
These conditions imply that children with a larger mi benefit more from preschool, especially
from Head Start; mi does not play a role if i goes to neither type of preschool. The ζit shocks
are zero mean and idiosyncratic; we suppose that each subject i expects Ei pYi ptqq “ kt `mi pt .
Preference shocks depend positively on expected grades:

uit “ ai ` bi Ei pYi ptqq ` εit “ ai ` bi kt ` bi mi pt ` εit ,

with bi ą 0.
Let us define vit “ uit ´ uin ; ηit “ εit ´ εin ; and dt “ kt ´ kn for t “ c, h. With this
specification, we have
vit “ bi dt ` bi mi pt ` ηit .

We assume that bi , mi , and the random vectors pηic , ηih q and pζin , ζic , ζih q are mutually
independent.
We will use the following lemma:

Lemma 2. Let Apηic , ηin , bi q and Bpηic , ηin , bi q be random subsets of R such that

sup Apηic , ηin , bi q ď inf Bpηic , ηin , bi q

with probability one. Then for t “ c, h,

EpYi ptq | mi P Apηic , ηin , bi qq ď EpYi ptq|mi P Bpηic , ηin , bi qq.

Proof of Lemma 2. Take t P tc, hu. Since EpYi ptq|mi “ mq “ kt ` mpt , it is an increas-
ing function of m. Fix pηic , ηih , bi q; obviously, the distribution of mi conditional on mi P
Bpηic , ηin , bi q first-order stochastically dominates that of mi conditional on mi P Apηic , ηin , bi q.
Therefore

EpYi ptq | mi P Apηic , ηin , bi q, ηic , ηih , bi q ď EpYi ptq | mi P Bpηic , ηin , bi q, ηic , ηih , bi q.

Taking the expectation over pηic , ηih , bi q completes the proof.

43
B.1 The Binary Instrument Case
In Section 4.2, subjects who are assigned z “ 1 receive a Head Start offer; those with z “ 0
do not. The complier group Cch has

U c ` vic ě maxp0, U h ` vih q,


Ūh ` vih ě maxp0, U c ` vic q.

and the complier group Cnh has

0 ě maxpU c ` vic , U h ` vih q,


Ūh ` vih ě maxp0, U c ` vic q.

Note that vic ě ´U c in Cch and vic ď ´U c in Cnh . Since vic “ bi dc `bi mi pc `ηic and pc bi ą 0,
it follows that for given pηic , ηih , bi q, mi ě mj for any i P Cch and j P Cnh . Therefore we can
apply Lemma 2 with t “ h to obtain

EpYi phq|i P Cch q ě EpYi phq|i P Cnh q,

which is our version of positive selection in the binary case.

B.2 The Ternary Instrument Case


In our setup in Section 4.3, subjects who are assigned z “ 1 receive a Head Start offer, and
those who are assigned z “ 2 are offered admission in another preschool; those with z “ 0
receive neither.
First note that under our assumptions,

EpYi pnq | ηic , ηih , bi q “ kn

is constant. Therefore, trivially,

EpYi pnq|i P C̃nhc q ě EpYi pnq|i P C̃nnn q.

Now let us consider the response-groups C̃n˚c and C̃chc . C̃n˚c is defined by the inequalities

(B.1) Ūc ` vic ą 0 ą maxpU h ` vih , U c ` vic q;

44
and C̃chc is defined by the inequalities

(B.2) Ūh ` vih ą U c ` vic ą maxpU h ` vih , 0q.

(B.1) implies that vic “ bi dc ` bi mi pc ` ηic ă ´U c , while (B.2) implies the reverse inequality.
Here also, applying Lemma 2 with t “ c directly gives the conclusion:

EpYi pcq|i P C̃n˚c q ď EpYi pnq|i P C̃chc q.

References
Angrist, J., D. Lang, and P. Oreopoulos (2009): “Incentives and Services for College
Achievement: Evidence from a Randomized Trial,” American Economic Journal: Applied
Economics, 1(1), 136–63.

Angrist, J. D., and G. W. Imbens (1995): “Two-stage least squares estimation of aver-
age causal effects in models with variable treatment intensity,” Journal of the American
Statistical Association, 90(430), 431–442.

Ao, W., S. Calonico, and Y.-Y. Lee (2021): “Multivalued Treatments and Decompo-
sition Analysis: An Application to the WIA Program,” Journal of Business & Economic
Statistics, 39(1), 358–371.

Attanasio, O., S. Cattan, E. Fitzsimons, C. Meghir, and M. Rubio-Codina


(2020): “Estimating the Production Function for Human Capital: Results from a Ran-
domized Controlled Trial in Colombia,” American Economic Review, 110, 48–85.

Attanasio, O., C. Fernández, E. Fitzsimons, S. Grantham-McGregor,


C. Meghir, and M. Rubio-Codina (2014): “Using the Infrastructure of a Conditional
Cash Transfer Programme to Deliver a Scalable Integrated Early Child Development Pro-
gramme in Colombia: A Cluster Randomised Controlled Trial,” British Medical Journal,
349, g5785.

Bai, Y., S. Huang, S. Moon, A. M. Shaikh, and E. J. Vytlacil (2024): “On the
Identifying Power of Monotonicity for Average Treatment Effects,” arXiv 2405.14104.

Bai, Y., and M. Tabord-Meehan (2024): “Sharp Testable Implications of Encouragement


Designs,” arXiv:2411.09808, https://arxiv.org/abs/2411.09808.

Balke, A., and J. Pearl (1997): “Bounds on treatment effects from studies with imperfect
compliance,” JASA, 92, 1171–1176.

Bhagwati, J. (1971): “The generalized theory of distortions and welfare,” in Trade, balance
of payments and growth, ed. by J. Bhagwati, R. Jones, R. Mundell, and J. Vanek. North-
Holland, Amsterdam.

45
Bhuller, M., and H. Sigstad (2024): “2SLS with multiple treatments,” Journal of Econo-
metrics, 242(1), 1057–85.

Blandhol, C., J. Bonney, M. Mogstad, and A. Torgovitsky (2022): “When is


TSLS Actually LATE?,” Working Paper 29709, National Bureau of Economic Research.

Buchinsky, M., P. Gertler, and R. Pinto (2023): “The Economics of Mono-


tonicity Conditions: Exploring Choice Incentives in IV Models,” UCLA, mimeo,
https://www.rodrigopinto.net/.

Caetano, C., and J. C. Escanciano (2021): “Identifying Multiple Marginal Effects with
a Single Instrument,” Econometric Theory, 37(3), 464–494.

Carneiro, P., J. J. Heckman, and E. J. Vytlacil (2011): “Estimating Marginal


Returns to Education,” American Economic Review, 101(6), 2754–81.

Cattaneo, M. D. (2010): “Efficient semiparametric estimation of multi-valued treatment


effects under ignorability,” Journal of Econometrics, 155(2), 138–154.

D’Haultfoeuille, X., and P. Février (2015): “Identification of Nonseparable Triangu-


lar Models With Discrete Instruments,” Econometrica, 83(3), 1199–1210.

Eisenhauer, P., J. Heckman, and E. Vytlacil (2015): “The Generalized Roy Model
and the Cost-Benefit Analysis of Social Programs,” Journal of Political Economy, 123,
413–443.

Feng, J. (2024): “Matching points: Supplementing instruments with covariates in triangular


models,” Journal of Econometrics, 238(1), 105579.

Goff, L. (2024a): “A vector monotonicity assumption for multiple instruments,” Journal


of Econometrics, 241(1), 105735.

Goff, L. (2024b): “When is IV Identification Agnostic about Outcomes?,”


arXiv:2406.02835, https://arxiv.org/abs/2406.02835.

Goldsmith-Pinkham, P., P. Hull, and M. Kolesár (2022): “Contamination Bias in


Linear Regressions,” Working Paper 30108, National Bureau of Economic Research.

Heckman, J., and R. Pinto (2018): “Unordered Monotonicity,” Econometrica, 86(1),


1–35.

Heckman, J. J. (1979): “Sample Selection Bias as a Specification Error,” Econometrica,


47(1), 153–161.

Heckman, J. J., S. Urzua, and E. Vytlacil (2006): “Understanding instrumental


variables in models with essential heterogeneity,” Review of Economics and Statistics,
88(3), 389–432.

(2008): “Instrumental variables in models with multiple outcomes: The general


unordered case,” Annales d’économie et de statistique, 91/92, 151–174.

46
Heckman, J. J., and E. Vytlacil (2001): “Policy-relevant treatment effects,” American
Economic Review, 91(2), 107–111.
(2005): “Structural Equations, Treatment Effects, and Econometric Policy Evalu-
ation,” Econometrica, 73(3), 669–738.
(2007a): “Econometric Evaluation of Social Programs, Part I: Causal Models,
Structural Models and Econometric Policy Evaluation,” in Handbook of Econometrics, ed.
by J. J. Heckman, and E. Leamer, vol. 6B, chap. 70, pp. 4779–4874. Elsevier, Amsterdam.
(2007b): “Econometric Evaluation of Social Programs, Part II: Using the Marginal
Treatment Effect to Organize Alternative Econometric Estimators to Evaluate Social Pro-
grams, and to Forecast their Effects in New Environments,” in Handbook of Econometrics,
ed. by J. J. Heckman, and E. Leamer, vol. 6B, chap. 71, pp. 4875–5143. Elsevier, Amster-
dam.
Heinesen, E., C. Hvid, L. J. Kirkeboen, E. Leuven, and M. Mogstad (2022):
“Instrumental Variables with Unordered Treatments: Theory and Evidence from Returns
to Fields of Study,” Working Paper 30574, National Bureau of Economic Research.
Huang, L., U. Khalil, and N. Yildiz (2019): “Identification and estimation of a tri-
angular model with multiple endogenous variables and insufficiently many instrumental
variables,” Journal of Econometrics, 208(2), 346–366.
Imbens, G. W. (2000): “The role of the propensity score in estimating dose-response func-
tions,” Biometrika, 87(3), 706–710.
Imbens, G. W., and J. D. Angrist (1994): “Identification and Estimation of Local
Average Treatment Effects,” Econometrica, 62(2), 467–475.
Kamat, V. (2024): “Identifying the effects of a program offer with an application to Head
Start,” Journal of Econometrics, 240(1), 105679.
Kédagni, D., and I. Mourifié (2020): “Generalized instrumental inequalities: testing the
instrumental variable independence assumption,” Biometrika, 107(3), 661–675.
Kirkeboen, L. J., E. Leuven, and M. Mogstad (2016): “Field of study, earnings, and
self-selection,” Quarterly Journal of Economics, 131(3), 1057–1111.
Kitagawa, T. (2015): “A Test for Instrument Validity,” Econometrica, 83(5), 2043–2063.
Kline, P., and C. R. Walters (2016): “Evaluating public programs with close substitutes:
The case of Head Start,” Quarterly Journal of Economics, 131(4), 1795–1848.
Lee, S., and B. Salanié (2018): “Identifying effects of multivalued treatments,” Econo-
metrica, 86(6), 1939–1963.
Mogstad, M., A. Santos, and A. Torgovitsky (2018): “Using Instrumental Variables
for Inference About Policy Relevant Treatment Parameters,” Econometrica, 86(5), 1589–
1619.

47
Mogstad, M., A. Torgovitsky, and C. R. Walters (2020): “Policy Evaluation With
Multiple Instrumental Variables,” Working Paper 27546, National Bureau of Economic
Research.

Mogstad, M., A. Torgovitsky, and C. R. Walters (2021): “The Causal Interpreta-


tion of Two-Stage Least Squares with Multiple Instrumental Variables,” American Eco-
nomic Review, 111(11), 3663–98.

Mountjoy, J. (2022): “Community Colleges and Upward Mobility,” American Economic


Review, 112, 2580–2630.

Mourifié, I., and Y. Wan (2017): “Testing Local Average Treatment Effect Assumptions,”
Review of Economics and Statistics, 99(2), 305–313.

Muralidharan, K., M. Romero, and K. Wüthrich (2023): “Factorial Designs, Model


Selection, and (Incorrect) Inference in Randomized Experiments,” Review of Economics
and Statistics, pp. 1–44, Just Accepted.

Navjeevan, M., and R. Pinto (2022): “Ordered, Unordered and Minimal Monotonicity
Criteria,” UCLA, mimeo, https://www.rodrigopinto.net/.

Nibbering, D., M. Oosterveen, and P. L. Silva (2022): “Clustered local average


treatment effects: fields of study and academic student progress.,” Discussion Paper No.
15159.

Pinto, R. (2021): “Beyond Intention to Treat: Using the Incentives in


Moving to Opportunity to Identify Neighborhood Effects,” UCLA, mimeo,
https://www.rodrigopinto.net/.

Sloczyński, T. (2022): “Interpreting OLS estimands when treatment effects are hetero-
geneous: Smaller groups get larger weights,” Review of Economics and Statistics, 104,
1–9.

Sun, Z. (2023): “Instrument validity for heterogeneous causal effect,” Journal of Economet-
rics, 237, forthcoming.

Tinbergen, J. (1952): On the Theory of Economic Policy. North Holland.

Torgovitsky, A. (2015): “Identification of Nonseparable Models Using Instruments with


Small Support,” Econometrica, 83(3), 1185–1197.

Vytlacil, E. (2002): “Independence, monotonicity, and latent index models: An equiva-


lence result,” Econometrica, 70(1), 331–341.

48
Online Appendices to “Treatment Effects with Targeting In-
struments”

C Proofs for Section 3.4


Let us first translate Kirkeboen, Leuven, and Mogstad’s (2016) assumptions in our notation
to show that their assumptions are equivalent to strict one to one targeting.
KLM impose the following in their Assumption 4:

• if Ti p0q “ 1 then Ti p1q “ 1

• if Ti p0q “ 2 then Ti p2q “ 2.

This can be viewed as a monotonicity assumption. It excludes the twelve response groups
C10˚ , C12˚ , C2˚0 , and C2˚1 .
Their Proposition 2 proves point-identification of response-groups when one of three
alternative assumptions is added to their Assumption 4. We focus here on the irrelevance
assumption in their Proposition 2 (iii), which is the weakest of the three and the one their
application relies on. In our notation, it states that:

• if (Ti p0q ‰ 1 and Ti p1q ‰ 1), then (Ti p0q “ 2 iff Ti p1q “ 2)

• if (Ti p0q ‰ 2 and Ti p2q ‰ 2), then (Ti p0q “ 1 iff Ti p2q “ 1).

These complicated statements can be simplified. Take the first part. If both Ti p0q and Ti p1q
are not 1, then they can only be 0 or 2. Therefore we are requiring Ti p0q “ Ti p1q. Applying
the same argument to the second part, the irrelevance assumption becomes:

• if (Ti p0q ‰ 1 and Ti p1q ‰ 1), then Ti p0q “ Ti p1q

• if (Ti p0q ‰ 2 and Ti p2q ‰ 2), then Ti p0q “ Ti p2q.

It therefore excludes the response-groups C02˚ , C20˚ , C0˚1 , and C1˚0 . The response-group
C021 appears twice in this list; and four other response-groups were already ruled out by
Assumption 4. The reader can easily check that the 33 ´ 12 ´ p11 ´ 4q “ 8 response-groups
left are exactly the same as in our Figure 4.

Proof of Proposition 9. The moment conditions that define β0 , β1 and β2 are

(C.1) E rpYi ´ β0 ´ β1 11pTi “ 1q ´ β2 11pTi “ 2qq 11pZi “ zqs “ 0

i
for z “ 0, 1, 2.
Using counterfactual notation, we write

(C.2) Yi “ Yi p0q ` pYi p1q ´ Yi p0qq11pTi “ 1q ` pYi p1q ´ Yi p0qq11pTi “ 2q,

which allows us to write Equation (C.1) as

(C.3) E rpYi p0q ´ β0 ` bi p1q11pTi “ 1q ` bi p2q11pTi “ 2qq 11pZi “ zqs “ 0,

where bi ptq ” Yi ptq ´ Yi p0q ´ βt for t “ 1, 2.


Now since

11pTi “ tq “ 11pTi p0q “ tq ` p11pTi p1q “ tq ´ 11pTi p0q “ tqq11pZi “ 1q


` p11pTi p2q “ tq ´ 11pTi p0q “ tqq11pZi “ 2q,

we can expand

rYi p0q ´ β0 ` bi p1q11pTi “ 1q ` bi p2q11pTi “ 2qs ˆ 11pZi “ zq


“ rYi p0q ´ β0 ` bi p1q11pTi p0q “ 1q ` bi p2q11pTi p0q “ 2q
`bi p1qp11pTi pzq “ 1q ´ 11pTi p0q “ 1qq ` bi p2qp11pTi pzq “ 2q ´ 11pTi p0q “ 2qqs ˆ 11pZi “ zq.

Since Zi is independent of tYi ptq, Ti pzq : t, z “ 0, 1, . . . , T ´ 1u, all of the terms that multiply
11pZi “ zq are independent of it. It follows that for z “ 0, 1, 2,

E rYi p0q ´ β0 ` bi p1q11pTi p0q “ 1q ` bi p2q11pTi p0q “ 2q


`bi p1qp11pTi pzq “ 1q ´ 11pTi p0q “ 1qq ` bi p2qp11pTi pzq “ 2q ´ 11pTi p0q “ 2qqs “ 0.

When z “ 0, the second line is zero; therefore

E pYi p0q ´ β0 ` bi p1q11pTi p0q “ 1q ` bi p2q11pTi p0q “ 2qq “ 0.

The other two equations become

E pbi p1qp11pTi pzq “ 1q ´ 11pTi p0q “ 1qq ` bi p2qp11pTi pzq “ 2q ´ 11pTi p0q “ 2qqq “ 0

ii
for z “ 1, 2. Remembering that bi ptq “ Yi ptq ´ Yi p0q ´ βt for t “ 1, 2, we obtain

E rpYi p1q ´ Yi p0qqp11pTi pzq “ 1q ´ 11pTi p0q “ 1qq ` pYi p2q ´ Yi p0qqp11pTi pzq “ 2q ´ 11pTi p0q “ 2qqs
“ β1 Ep11pTi pzq “ 1q ´ 11pTi p0q “ 1qq ` β2 Ep11pTi pzq “ 2q ´ 11pTi p0q “ 2qq.

Proposition 9 follows after noting that given Table 4,

• the variable 11pTi pzq “ 1q ´ 11pTi p0q “ 1q is 11pi P C1 q for z “ 1 and ´11pi P C112 q for
z “ 2;

• the variable 11pTi pzq “ 2q ´ 11pTi p0q “ 2q is 11pi P C2 q for z “ 2 and ´11pi P C212 q for
z “ 1.

Proof of Corollary 4. Solving the system of equations in Proposition 9 gives, after elementary
calculations,

β1 D “ Prpi P C212 q rE ppYi p2q ´ Yi p0qq11pi P C2 qq ´ E ppYi p1q ´ Yi p0qq11pi P C112 qqs
` Prpi P C2 q rE ppYi p1q ´ Yi p0qq11pi P C1 qq ´ E ppYi p2q ´ Yi p0qq11pi P C212 qqs
“ Prpi P C1 q Prpi P C2 qE pYi p1q ´ Yi p0q|i P C1 q ´ Prpi P C112 q Prpi P C212 qE pYi p1q ´ Yi p0q|i P C112 q
` Prpi P C212 q Prpi P C2 q rE pYi p2q ´ Yi p0q|i P C2 q ´ E pYi p2q ´ Yi p0q|i P C212 qs .

The difference of treatment effects in the last line is simply D2 ; note that it is multiplied by
a non-negative term. Suppose for instance that D1 , D2 ě 0. Then

(C.4)
β1 D
ě Prpi P C1 q Prpi P C2 qE pYi p1q ´ Yi p0q|i P C1 q ´ Prpi P C112 q Prpi P C212 qE pYi p1q ´ Yi p0q|i P C112 q .

Moreover, it is easy to prove the following: define r “ pαa ´ βbq{pa ´ bq with a, b ě 0 and
a ‰ b. Then

1. if pα ´ βq and pa ´ bq have the same sign, r ě maxpα, βq

2. if pα ´ βq and pa ´ bq have different signs, r ď minpα, βq.

iii
Now take

a “ Prpi P C1 q Prpi P C2 q
b “ Prpi P C112 q Prpi P C212 q
α “ E pYi p1q ´ Yi p0q|i P C1 q
β “ E pYi p1q ´ Yi p0q|i P C112 q .

Note that a and b are non-negative, and a ´ b “ D ‰ 0. Suppose that D ą 0 so that


Equation (C.4) becomes β1 ě r. Since α ´ β “ D1 ě 0, we can apply result 1 and we get

β1 ě maxpα, βq “ α “ E pYi p1q ´ Yi p0q|i P C1 q .

If on the other hand D is negative, then we have β1 ď r and since D and D1 have different
signs result 2 gives
β1 ď minpα, βq “ β

and a fortiori β1 ď α.
Similar arguments apply to β2 , as well as to the the case when D1 and D2 are non-
positive.

D Revisiting the 2ˆ3 and 3ˆ3 Models via Heckman and Pinto
(2018)
D.1 Notation
We first adapt Heckman and Pinto (2018, HP hereafter)’s notation to our framework. As in
the main text, we focus on identifying the probabilities of the various response groups Prpi P
Cq and the group average outcomes EpYi ptq|i P Cq. The following population quantities are
directly identified from data for all treatment values t:

PZ ptq “ pP pT “ t | Z “ zqqzPZ ,
QZ ptq “ pE pY 1pT “ tq | Z “ zqqzPZ .

We also define PZ “ pPZ ptqqtPT .


We choose an arbitrary ordering pC 1 , . . . , C S q of the S non-empty response groups and
we define the S dummy variables csi “ 1pi P C s q. The response vector S is tc1 , . . . , cS u.

iv
With this notation, our main objects of interest are

PS “ ES
QS ptq “ E pY ptqSq for t P T ,

from which we obtain Prpi P C s q “ PSs and EpYi ptq|i P C s q “ QsS ptq{PSs .
As in HP, Bt denotes a binary matrix with dimension |Z| ˆ S whose element in row z
and column s equals 1 if response group C s has Ti “ t when Zi “ z, and zero otherwise.
Finally, let B be the binary matrix of dimension p|Z| ¨ |T |q ˆ S generated by stacking the
‰1
matrices Bt vertically: B “ B01 , . . . , B|T
1

|´1 .

D.2 Theorem T-2 in HP


Let M : denote the Moore-Penrose pseudo-inverse of a matrix M . We define

Kt “ IS ´ Bt: Bt and K “ IS ´ B: B,

where IS denotes the identity matrix of dimension S. Note that K and Kt are orthogonal
projection matrices in RS that only depend on the binary matrices B and Bt . Theorem T-2
in HP shows that

(D.1) PS “ B: PZ ` Kλ,
(D.2) QS ptq “ Bt: QZ ptq ` Kt λ̃,

where λ and λ̃ are arbitrary S-dimensional vectors.

D.3 Identification in the 2 ˆ 3 model


We can now re-derive our identification results for the 2 by 3 model using the theorems
in HP. To do so, we order the response-types as tC00 , C11 , C22 , C01 , C21 u. Then the binary

v
matrices B0 , B1 , and B2 are
« ff
1 0 0 1 0
B0 “ ,
1 0 0 0 0
« ff
0 1 0 0 0
B1 “ ,
0 1 0 1 1
« ff
0 0 1 0 1
B2 “ ,
0 0 1 0 0

and
» fi
1 0 0 1 0
— ffi
—1 0 0 0 0ffi
— ffi
—0 1 0 0 0ffi
B“— ffi.
— ffi
—0 1 0 1 1ffi
— ffi
—0 0 1 0 1ffi
– fl
0 0 1 0 0

It is easy to see that B has full column rank; it follows that B: B “ I6 and K is the 6 by 6
matrix with all elements zero. Therefore by Theorem T-2 in HP (see equation (D.1) above),
PS is point-identified as PS “ B: PZ .
Since
» fi
1 5 1 ´1 1 ´1
—´1 1 5 1 ´1 1 ffi
— ffi
1
B: “ —
— ffi
1 ´1 1 ´1 1 5 ffi,
6— ffi
– 4 ´4 ´2 2 ´2 2 fl
— ffi

´2 2 ´2 2 4 ´4

this is not very transparent, however. To derive our identification results, we use (D.2)
instead. Note that the equation QS ptq “ Bt: QZ ptq ` Kt λ̃ holds for any function of Y ptq. If
we take it to be a constant function of Y ptq, we get QS ptq “ ES “ PS and QZ ptq “ PZ ptq,
so that (D.2) boils down to

(D.3) PS “ Bt: PZ ptq ` Kt λ̃ for all values of t.

vi
Now
» fi » fi
0 1 0 0 0 0 0
—0 0 ffi —0 1 0 0 0ffi
— ffi — ffi
:
— ffi — ffi
B0 “ —
—0 0 ffi ùñ K0 “ —0
ffi —
ffi;
0 1 0 0ffi
–1 ´1fl –0 0 0 0 0fl
— ffi — ffi

0 0 0 0 0 0 1

» fi » fi
0 0 1 0 0 0 0
1 0 —0 0 0 0 0
— ffi — ffi
— ffi ffi
:
— ffi — ffi
B1 “ —

ffi ùñ K1 “ —0
0
ffi 0 — 0 1 0 ffi;
ffi 0
–´1{2 1{2fl –0 0 0 1{2 ´1{2fl
— ffi — ffi

´1{2 1{2 0 0 0 ´1{2 1{2

and
» fi » fi
0 0 1 0 0 0 0
—0 0 ffi —0 1 0 0 0ffi
— ffi — ffi
:
— ffi — ffi
B2 “ —0 1 ffi ùñ K2 “ —
— ffi
—0 0 0 0 0ffiffi.
–0 0 fl –0 0 0 1 0fl
— ffi — ffi

1 ´1 0 0 0 0 0

Let pes qs“1,...,5 denote the standard basis vectors in R5 . If e1s Kt “ 0, then (D.3) point-
identifies Prpi P C s q “ e1s PS “ e1s Bt: PZ ptq. Clearly,

e11 K0 “ e14 K0 “ e12 K1 “ e13 K2 “ e14 K2 “ 0;

this reproduces our identification results for PS in Proposition 5:


´ ¯1
PS “ e11 B0: PZ p0q, e12 B1: PZ p1q, e13 B2: PZ p2q, e14 B0: PZ p0q, e15 B2: PZ p2q .

vii
Returning to the counterfactual outcomes Y ptq, the same argument results in Proposition 6:

E pYi p0q ¨ 1 ri P C00 sq “ e11 B0: QZ p0q,


E pYi p1q ¨ 1 ri P C11 sq “ e12 B1: QZ p1q,
E pYi p2q ¨ 1 ri P C22 sq “ e13 B2: QZ p2q,
E pYi p0q ¨ 1 ri P C01 sq “ e14 B0: QZ p0q,
E pYi p2q ¨ 1 ri P C21 sq “ e15 B2: QZ p2q.

We conclude that while the first part of Theorem T-2 in HP (i.e., PS “ B: PZ ` Kλ) is
useful to determine the degrees of identification by checking the rank of B, it does not yield
the most constructive form of identification. To get the objects of interest, it is better to
invoke the second part of Theorem T-2 (i.e., QS ptq “ Bt: QZ ptq ` Kt λ̃). Note that since the
2 ˆ 3 model satisfies the unordered monotonicity assumption, we could also obtain the same
results using Theorem T-6 in HP.

D.4 Identification in the 3 ˆ 3 model


We now turn to our 3 by 3 model. We sort the response-types as

tC000 , C111 , C222 , C010 , C002 , C012 , C112 , C212 u .

Now
» fi
1 0 0 1 1 1 0 0
B0 “ –1 0 0 0 1 0 0 0fl ,
— ffi

1 0 0 1 0 0 0 0
» fi
0 1 0 0 0 0 1 0
B1 “ –0 1 0 1 0 1 1 1fl ,
— ffi

0 1 0 0 0 0 0 0
» fi
0 0 1 0 0 0 0 1
B2 “ –0 0 1 0 0 0 0 0fl .
— ffi

0 0 1 0 1 1 1 1

viii
Note that
» fi » fi
´0.25 0.5 0.5 0.25 0 0 ´0.25 ´0.25
0 0 0.25
— ffi — ffi
— 0 0 0 ffi — 0 1 0 0 0 0 0 0ffi
— ffi — ffi
— 0 0 0 ffi — 0 0 1 0 0 0 0 0ffi
ffi —
— ffi
— ffi — ffi
:
— 0.25 ´0.5 0.5 ffi
ffi ùñ K0 “ —´0.25 0 0 0.25
— 0.25 ´0.25 0 0ffi
B0 “ — ffi.
— 0.25
— 0.5 ´0.5ffiffi
—´0.25 0 0 0.25
— 0.25 ´0.25 0 0ffi

— ffi — ffi
— 0.75
— ´0.5 ´0.5ffi

— 0.25 0 0 ´0.25 ´0.25 0.25 0 0ffi
— ffi
— 0 0 0 ffi — 0 0 0 0 0 0 1 0ffi
– fl – fl
0 0 0 0 0 0 0 0 0 0 1

Let pes qs“1,...,8 denote the standard basis vectors in R8 . Since K0 has no zero column, none of
the e1s K0 is zero and the argument in Section D.3 show that no population share Prpi P C s q
is point-identified by B0: PZ p0q. On the other hand,
» » fi fi
0 0 0
1 0 0 0 0 0 0 0
— ffi — ffi
— 0 0 1ffi —0 0 0 0 0 0 0 0ffi
— ffi — ffi
— 0 0 0ffiffi —0 0 1 0 0 0 0 0ffi
— — ffi
— ffi — ffi
:
—´1{3 1{3 0 ffi —0 0 0 2{3 0 ´1{3 0 ´1{3ffi
B1 “ —
— ffi ùñ K1 “ — ffi
— 0 0 0ffiffi —0 0 0 0 1 0 0 0 ffi
— ffi
— ffi — ffi
—´1{3 1{3 0 ffi —0 0 0 ´1{3 0 2{3 0 ´1{3ffi
— ffi — ffi
— 1
– 0 ´1flffi —0
– 0 0 0 0 0 0 0 ffifl
´1{3 1{3 0 0 0 0 ´1{3 0 ´1{3 0 2{3

so that e12 K1 “ e17 K1 “ 0, which point-identifies the population shares of C111 and C112 .
Similarly,
» fi » fi
0 0 0 1 0 0 0 0 0 0 0
— ffi — ffi
— 0 0 0 ffi —0 1 0 0 0 0 0ffi 0
— ffi — ffi
— 0 1 0 ffi
ffi —0 0 0 0 0 0 0 0ffi
— — ffi
— ffi — ffi
:
— 0 0 0 ffi —0 0 0 1 0 0 0 0ffi
B2 “ —
—´1{3 0 1{3ffi ùñ K2 “ —0
ffi — ffi
— ffi — 0 0 0 2{3 ´1{3 ´1{3 0ffiffi
— ffi — ffi
—´1{3 0 1{3ffi —0 0 0 0 ´1{3 2{3 ´1{3 0ffi
— ffi — ffi
—´1{3 0 1{3ffi —0 0 0 0 ´1{3 ´1{3 2{3 0ffi
– fl – fl
1 ´1 0 0 0 0 0 0 0 0 0

and e13 K2 “ e18 K2 “ 0 so that the shares of C222 and C212 are point-identified. On the other
hand, the shares of C010 , C002 , and C012 are not identified. The results in Propositions 7

ix
and 8 follow.
Finally, note that B0 has the following 2 ˆ 2 sub-matrix:
¨ ˛
rC010 C002 s
˝rz “ 1s 1 0 ‚,
˚ ‹

rz “ 2s 0 1

where we indicate the relevant columns and rows of matrix B0 . Given this pattern, Theorem
T-3 and Remark 6.3 in Heckman and Pinto (2018) imply that the unordered monotonicity
assumption is not satisfied for the 3ˆ3 model. As mentioned in the main text, this , switching
from instrument value from 1 to 2 causes observations in C010 to move to treatment 0, while
those in C002 move out of treatment 0. Recall that the ARUM structure rules out “direct
two-way flows” (that is, instrument values 1 and 2, respectively, make treatments 1 and 2
more favorable for everyone). However, the 3 ˆ 3 model allows for “indirect two-way flows”,
where treatment 0 is not targeted by either z “ 1 or z “ 2. Unordered monotonicity is more
restrictive than ARUM in that it rules out both direct and indirect two-way flows.

E The 3 ˆ 3 Model of Pinto (2021)


Pinto (2021) has proposed a 3 ˆ 3 model of the Moving to Opportunity (MTO) experiment.
Here we use our framework to identify response-group probabilities and several counterfactual
averages.
We follow the notation in Pinto (2021). Let Z “ tzc , ze , z8 u and T “ tth , tl , tm u, where

• zc refers to control families, ze those who received the experimental voucher, and z8
those who received Section 8 voucher;

• th refers to families who did not move and chose high-poverty neighborhoods, tl those
who moved to low-poverty neighborhoods, and tm those who moved to medium-poverty
neighborhoods.

There are 7 response types in Pinto (2021): the three always-taker groups Chhh , Clll , and
Cmmm , and four complier groups:

• Chlm : families who choose high-poverty without vouchers, low-poverty with the exper-
imental voucher, and medium-poverty with Section 8 vouchers (Pinto calls this group
full-compliers);

• Chll : families who choose high-poverty without vouchers, low-poverty with either
voucher;

x
• Cmlm : families who choose medium-poverty without the experimental voucher, low-
poverty with it;

• Chhm : families who choose high-poverty without Section 8 voucher, medium-poverty


with it.

Figure 8: MTO

uim ´ uih

Cmmm
Cmlm

Pe Pc
uil ´ uih
Chlm
Chhm

P8 Clll
Chhh Chll

The seven response groups are illustrated in Figure 8 and in Table 6.

Table 6: Response Groups in MTO

Ti pzq “ th Ti pzq “ tl Ti pzq “ tm

Ť Ť Ť Ť
z “ zc Chhh Chhm Chlm Chll Clll Cmmm Cmlm
Ť Ť Ť Ť
z “ ze Chhh Chhm Clll Cmlm Chlm Chll Cmmm
Ť Ť Ť Ť
z “ z8 Chhh Clll Chll Cmmm Chhm Chlm Cmlm

Proposition 10 (Response-group probabilities in MTO). The following probabilities are iden-

xi
tified:

PrpChhh q “ P pth |z8 q,


PrpClll q “ P ptl |zc q,
PrpCmmm q “ P ptm |ze q,
(E.1) PrpChhm q “ P pth |ze q ´ P pth |z8 q,
PrpChll q “ P ptl |z8 q ´ P ptl |zc q,
PrpCmlm q “ P ptm |zc q ´ P ptm |ze q,
PrpChlm q “ 1 ´ P pth |ze q ´ P ptl |z8 q ´ P ptm |zc q.

The model has the following testable implications:

P pth |ze q ě P pth |z8 q,


P ptl |z8 q ě P ptl |zc q,
(E.2)
P ptm |zc q ě P ptm |ze q,
1 ě P pth |ze q ` P ptl |z8 q ` P ptm |zc q.

The following proposition identifies a number of group average outcomes.

Proposition 11 (Identification in MTO). The following group average outcomes are point-

xii
identified:

Ēz8 pth q
E rYi pth q|i P Chhh s “ ,
P pth |z8 q
Ēzc ptl q
E rYi ptl q|i P Chhh s “ ,
P ptl |zc q
Ēze ptm q
E rYi ptm q|i P Cmmm s “ ,
P ptm |ze q
Ēze pth q ´ Ēz8 pth q
E rYi pth q|i P Chhm s “ ,
P pth |ze q ´ P pth |z8 q
Ēz8 ptl q ´ Ēzc ptl q
E rYi ptl q|i P Chll s “ ,
P ptl |z8 q ´ P ptl |zc q
Ēzc ptm q ´ Ēze ptm q
E rYi ptm q|i P Cmlm s “ ,
P ptm |zc q ´ P ptm |ze q
Ť Ēzc pth q ´ Ēze pth q
E rYi pth q|i P Chll Chlm s “ ,
P pth |zc q ´ P pth |ze q
Ť Ēze ptl q ´ Ēz8 ptl q
E rYi ptl q|i P Cmlm Chlm s “ ,
P ptl |ze q ´ P ptl |z8 q
Ť Ēz8 ptm q ´ Ēzc ptm q
E rYi ptm q|i P Chhm Chlm s “ .
P ptm |ze q ´ P ptm |zc q

The proofs of Propositions 10 and 11 are straightforward; we omit the details.

F The MVPF of Extending Head Start


Recall our ternary instrument setting:

• Z “ 0 means no offer of admission to Head Start or to another preschool;

• Z “ 1 means an offer of admission in Head Start only;

• Z “ 2 means an offer of admission in another preschool only.

Z “ 0 does not preclude other ways to get into h or c, Z “ 1 does not preclude other ways
to get into c, and Z “ 2 does not preclude other ways to get into h.
We denote ppzq the probability that Z “ z. We are considering an increase in pp1q:
more offers of admission to Head Start. As pp1q increases, we also increase pp2q to maintain
the number of slots in alternative preschools constant. Like Kline and Walters (2016), we
assume that this increase in pp2q only brings into alternative preschools children that would
otherwise not attend preschools.

xiii
The MVPF is the ratio of the benefits dB of increasing pp1q by dpp1q to its budgetary
costs dC. We have B “ p1 ´ τ qpEY , where p is the pre-tax return to expected scores, and
τ the tax rate. Hence
dB “ p1 ´ τ qpdEY.

The budget costs are the subsidies (φj per student) to Head Start and other preschools,
minus the tax receipts:

C “ φh PrpD “ hq ` φc PrpD “ cq ´ τ pEY.

Therefore
p1 ´ τ qpdEY {dpp1q
MVPF “ .
φh d PrpD “ hq{dpp1q ´ τ pdEY {dpp1q
In order to compute the MVPF, we start by evaluating the marginal return in expected
outcomes dEY {dpp1q.

F.1 The Expected Change in Outcomes


Since
ÿ
PrpD “ cq “ PrpDp0q “ cq ` ppzqpPrpDpzq “ cq ´ PrpDp0q “ cqq
z“1,2

to keep it constant we must have

dpp2q PrpDp0q “ cq ´ PrpDp1q “ cq


“ .
dpp1q PrpDp2q “ cq ´ PrpDp0q “ cq

Dp0q “ c implies Dp2q “ c since Z “ 2 targets c. Therefore PrpDp2q “ cq ´ PrpDp0q “ cq “


PrpDp2q “ c, Dp0q ‰ cq. Since Z “ 1 targets h, Dp1q “ c implies Dp0q “ c; and Dp0q “ c
implies that Dp1q can only be c or h. This gives us

PrpDp0q “ cq ´ PrpDp1q “ cq “ PrpDp0q “ c, Dp1q ‰ cq “ PrpDp0q “ c, Dp1q “ hq,

which is the proportion of the group Cch in the 2 ˆ 3 model. Therefore

dpp2q Prpi P Cch q


“ .
dpp1q PrpDp2q “ c, Dp0q ‰ cq

The resulting change in expected scores is

dEY “ dpp1qEpY pDp1qq ´ Y pDp0qq ` dpp2qEpY pDp2qq ´ Y pDp0qq.

xiv
Now an offer of Head Start (Z “ 1) can only move children to Head Start: Dp1q ‰ Dp0q
implies that Dp1q “ h. As a consequence,

EpY pDp1qq ´ Y pDp0qq “ EpY phq ´ Y pDp0qq|Dp1q “ h, Dp0q ‰ hq ˆ PrpDp1q “ h, Dp0q ‰ hq

and by the same argument,

EpY pDp2qq ´ Y pDp0qq “ EpY pcq ´ Y pDp0qq|Dp1q “ c, Dp0q ‰ cq ˆ PrpDp2q “ c, Dp0q ‰ cq.

Putting things together gives

dEY
“ PrpDp1q “ h, Dp0q ‰ hqˆ
dpp1q
pEpY phq ´ Y pDp0qq|Dp1q “ h, Dp0q ‰ hq ` EpY pcq ´ Y pDp0qq|Dp1q “ c, Dp0q ‰ cq ˆ Sc q
“ PrpDp1q “ h, Dp0q ‰ hq ˆ pLATEh ` Sc LATEc q ,

where
Prpi P Cch q
Sc “
PrpDp1q “ h, Dp0q ‰ hq
is, as in the text of the paper, the proportion of the h-compliers that come from c.

F.2 The MVPF


We still need to compute the denominator d PrpD “ hq{dpp1q. It is

dpp2q
pPrpDp1q “ hq ´ PrpDp0q “ hqq ´ pPrpDp0q “ hq ´ PrpDp2q “ hqq.
dpp1q

The first term in the difference is PrpDp1q “ h, Dp0q ‰ hq, the proportion of h-compliers.
The second term equals

Prpi P Cch q
pPrpDp0q “ hq ´ PrpDp2q “ hqq.
PrpDp2q “ c, Dp0q ‰ cq

Since Z “ 2 targets c, the difference PrpDp0q “ hq ´ PrpDp2q “ hq represents the proportion


of children who would get to Head Start under Z “ 0 and leave it when offered admission
to another preschool (Z “ 2) as pp1q increases. Since these children can only have Dp2q “ c,
our assumption rules out this group and the second term of the difference is zero.

xv
As in Kline and Walters (2016), LATEc “ LATEnc ; we end up with

p1 ´ τ qp pLATEh ` Sc LATEnc q
MVPF “ ,
φh ´ τ p pLATEh ` Sc LATEnc q

which happens to coincide with the formula used by Kline and Walters (2016).

G An Example of Positive Codependence


In this section, we provide details for the example that satisfies the positive codependence
assumption in Assumption 7. Recall that we have considered the example:

EpYi p2q|ui0 , ui1, ui2 q ´ EYi p2q “ a0 ui0 ` a1 ui1 ` a2 ui2 ,

where a0 , a1 , and a2 are some constants. Rewrite

EpYi p2q|ui0 , ui1 , ui2q ´ EYi p2q “ pa0 ` a1 ` a2 qui0 ´ a1 pui2 ´ ui1 q ` pa1 ` a2 qpui2 ´ ui0 q
“ pa0 ` a1 ` a2 qui0 ´ a1 ζi ` pa1 ` a2 qξi
“ EpYi p2q|ui0, ζi , ξi q ´ EYi p2q.

Hence,

EpYi p2q|ζi , ξi q ´ EYi p2q “ pa0 ` a1 ` a2 qEpui0 |ζi , ξi q ´ a1 ζi ` pa1 ` a2 qξi .

Also, recall that we have assumed that pui0, ui1 , ui2 q are jointly normal and mutually uncor-
related with the common mean 0 and the common variance 1. Thus,
¨ ˛ ¨ ˛¨ ˛ »¨ ˛ ¨ ˛fi
ui0 1 0 0 ui0 0 1 0 ´1
˝ ζi ‚ “ ˝ 0 ´1 1‚˝ui1 ‚ „ N –˝0‚, ˝ 0 2 1 ‚fl ,
˚ ‹ ˚ ‹˚ ‹ —˚ ‹ ˚ ‹ffi

ξi ´1 0 0 ui2 0 ´1 1 2

which implies that

¯ 2 1 ´1 ζ
˜ ¸ ˜ ¸
´
i 1 2
Epui0 |ζi , ξi q “ 0 ´1 “ ζi ´ ξi .
1 2 ξi 3 3

xvi
Combining all together yields
ˆ ˙
1 2
EpYi p2q|ζi , ξi q ´ EYi p2q “ pa0 ` a1 ` a2 q ζi ´ ξi ´ a1 ζi ` pa1 ` a2 qξi
3 3
a2 ` a0 ´ 2a1 a2 ` a1 ´ 2a0
“ ζi ` ξi
3 3

Thus, in this example, Assumption 7 holds if and only if

a2 ` a0 ě 2a1 and a2 ` a1 ě 2a0 .

xvii

You might also like