Applications of Random Set Theory in Econometrics: Ilya Molchanov

Applications of Random Set Theory in Econometrics 1
Applications of Random Set Theory in
Econometrics
Ilya Molchanov
Department of Mathematical Statistics and Actuarial Science, University of
Bern, Sidlerstrasse 5, 3012 Bern, Switzerland; email:
ilya. molchanov@ stat. unibe. ch
Francesca Molinari
Department of Economics, Cornell University, 458 Uris Hall, Ithaca NY 14850,
U.S.A.; email: fm72@ cornell. edu . Corresponding author.
Key Words capacity functional, Aumann expectation, support function, par-
tial identification
Abstract The econometrics literature has in recent years shown a growing interest in the
study of partially identified models, where the object of economic and statistical interest is a set
rather than a point. Characterization of this set and development of its consistent estimators
and of inference procedures with desirable properties are the main goals of partial identification
analysis. This review introduces the fundamental tools of the theory of random sets, which
brings together elements of topology, convex geometry and probability theory to develop a
coherent mathematical framework to analyze random elements whose realizations are sets. It
then elucidates how these tools have been fruitfully applied in econometrics, to reach the goals
Annu. Rev. Econ. 2014 6 1941-1383/14/0904-????
of partial identification analysis.
CONTENTS
INTRODUCTION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
RANDOM SET THEORY REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Random Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
Capacity Functional and Containment Functional . . . . . . . . . . . . . . . . . . . . . 8
Selections and Artstein’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Aumann Expectation and Support Function . . . . . . . . . . . . . . . . . . . . . . . . 14
Limit Theorems for Sums of Random Sets . . . . . . . . . . . . . . . . . . . . . . . . . 16
APPLICATIONS TO IDENTIFICATION ANALYSIS . . . . . . . . . . . . . . . . 19
Sharp Identification Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Core Determining Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Random Sets in the Space of Unobservables . . . . . . . . . . . . . . . . . . . . . . . . 28
APPLICATIONS TO INFERENCE . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Estimation of Level Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Support Function Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Duality Between the Level Set Approach and the Support Function Approach . . . . . . 36
Efficiency of the Support Function Approach . . . . . . . . . . . . . . . . . . . . . . . . 38
CONCLUSIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
1 INTRODUCTION
Random set theory is concerned with the development of a coherent mathematical
framework to study random objects whose realizations are sets. Such objects
appeared a long time ago in statistics and econometrics in the form of confidence
regions, which can be naturally described as random sets. The first idea of a
2
general random set in the form of a region that depends on chance appears in
Kolmogorov (1950), originally published in 1933. A systematic development of
the theory of random sets did not occur until another while later, stimulated by
the study in general equilibrium theory and decision theory of correspondences
and nonadditive functionals, as well as the needs in image analysis, microscopy,
and material science, of statistical techniques to develop models for random sets,
estimate their parameters, filter noisy images, and classify biological images.
These and other related applications of set valued random variables induced the
development of statistical models for random sets, furthered the understanding
of their distributions, and led to the seminal contributions of Choquet (1953/54),
Aumann (1965), Debreu (1967) and to the first self contained treatment of the
theory of random sets given by Matheron (1975). Since then the theory expanded
in several directions, developing its relationship with convex geometry, various
limit theorems for random sets, set valued processes, etc. An account of the
modern theory of random sets is available in Molchanov (2005).
More recently, the development within econometrics of partial identification
analysis has provided a new and natural area of application for random set theory.
Partially identified econometric models appear when the available data and
maintained assumptions do not suffice to uniquely identify the statistical func-
tional of interest, might this be finite or infinite dimensional, even as data ac-
cumulate, see Tamer (2010) for a review and Manski (2003) for a systematic
treatment. For this class of models, partial identification proposes that econo-
metric analysis should study the set of values for the statistical functional which
are observationally equivalent, given the available data and credible maintained
assumptions; in this article, this set of values is referred to as the functional’s

4 Molchanov and Molinari
sharp identification region. The goals of the analysis are to obtain a tractable
characterization of the sharp identification region, to provide methods for esti-
mating it, and to conduct test of hypotheses and making confidence statements
about it.
Conceptually, partial identification predicates a shift of focus from single val-
ued to set valued objects, which renders it naturally suited for the use of ran-
dom set theory as a mathematical framework to conduct identification analysis
and statistical inference, and to unify a number of special results and produce
novel general results. The random sets approach complements the more tradi-
tional one, based on mathematical tools for (single valued) random vectors, that
proved extremely productive since the beginning of the research program in par-
tial identification; see, for example, Manski (1995) for results on identification,
and Horowitz and Manski (2000), Imbens and Manski (2004), Chernozhukov,
Hong and Tamer (2007), and Andrews and Soares (2010) for results on statistical
inference.
Lack of point identification can generally be traced back to a collection of
random variables that are consistent with the available data and maintained
assumptions. In many cases, this collection of observationally equivalent random
variables is equal to the family of selections of a properly specified random closed
set, and random set theory can be applied to describe their distribution and to
derive statistical properties of estimators that rely upon them. Specific examples
that we discuss in detail in this article include interval data and finite static games
with multiple equilibria. In the first case, the random variables consistent with
the data are those that lie in the interval with probability one. In the second
case, the random variables consistent with the modeling assumptions are the ones
which represent equilibria of the game.
In order to fruitfully apply random set theory for identification and inference,
the econometrician needs to carry out three fundamental steps. First, she needs to
define the random closed set that is relevant for the problem under consideration
using all information given by the available data and maintained assumptions.
This is a delicate task, but one that is typically carried out in identification anal-
ysis regardless of whether random set theory is applied. Second, she needs to
determine how the observable random variables relate to this random closed set.
Often, one of two cases occurs: either the observable variables determine a ran-
dom set to which the (unobservable) variable of interest belongs with probability
one, e.g. the interval data example; or the (expectation of the) (un)observable
variable belongs to (the expectation of) a random set determined by the model,
e.g. the games with multiple equilibria example. Finally, the econometrician
needs to determine which tool from random set theory should be utilized. To
date, new applications of random set theory to econometrics have fruitfully ex-
ploited Aumann expectations and their support functions, (Choquet) capacity
functionals, and laws of large numbers and central limit theorems for random
sets.
In this article we begin with reviewing in Section 2 these basic elements of
random set theory. Then we review in Section 3 the econometrics literature
that has applied them for identification analysis. Econometrics applications to
statistical inference are discussed in Section 4. Section 5 concludes.
The goal of this review is to provide a guide to the study of random sets theory
using comprehensive textbooks such as Molchanov (2005), from the perspective
of applications in econometrics (this goal is further developed in our book in

preparation, Molchanov and Molinari 2014). Our view is that the instruction of
random sets theory could be fruitfully incorporated into Ph.D.-level field courses
in econometrics on partial identification, and in microeconomics on decision the-
ory. Important prerequisites for the study of random sets theory include measure
theory and probability theory; good knowledge of convex analysis and topology
is beneficial but not essential.
2 RANDOM SET THEORY REVIEW
Throughout this article, we use capital Latin letters to denote sets and random
sets. We use lower case Latin letters for random vectors. We denote parameter
vectors and sets of parameter vectors, respectively by θ and Θ. We let (Ω, F, P)
denote a nonatomic probability space on which all random variables and random
sets are defined, where Ω is the space of elementary events equipped with σ-
algebra F and probability measure P. We denote the Euclidean space by Rd , and
equip it with the Euclidean norm (which is denoted by k · k).
The theory of random closed sets generally applies to the space of closed sub-
sets of a locally compact Hausdorff second countable topological space K, see
Molchanov (2005). Unless otherwise specified, in this article we let K = Rd to
simplify the exposition. Denote by F, G and K, respectively, the collection of
closed, open, and compact subsets of Rd . Let Sd−1 = {x ∈ Rd : kxk = 1} and
Bd = {x ∈ Rd : kxk ≤ 1} denote respectively the unit sphere and the unit ball in
Rd . Given a set A ⊂ Rd , let conv(A) denote its convex hull.

2.1 Random Sets
The conventional theory of random sets deals with random closed sets. An advan-
tage of this approach is that random points (i.e. random sets that are singletons)
are closed, and so the theory of random closed sets includes the classical case
of random points or random vectors as a special case. A random closed set is a
measurable map X : Ω 7→ F, where measurability is defined by specifying the
family of functionals of X that are random variables. In specifying this family,
a balance is sought out between a need for weak conditions, so that there is a
large class of examples of random sets, and a need for strict conditions, so that
important functionals of random sets are random variables. This trade-off results
in the following definition.
Definition 2.1. A map X from a probability space (Ω, F, P) to F is called a
random closed set if
X − (K) = {ω : X(ω) ∩ K 6= ∅} ∈ F
for each compact set K ⊂ Rd .
In other words, a random closed set is a measurable map from the given prob-
ability space to the family of closed sets equipped with the σ-algebra generated
by the families of closed sets {F ∈ F : F ∩ K 6= ∅} for all K ∈ K. A random
compact set is defined as a random closed set which is compact with probability
one, so that almost all values of X are compact sets. A convex random set is
defined similarly, so that X(ω) is a convex closed set for almost all ω.
Example 2.2 (Interval data). Interval data is a common-place problem in eco-
nomics and the social sciences. Let Y = [yL , yU ] be a random interval on R where
yL and yU are two (dependent) random variables such that yL ≤ yU almost surely.
If K = [a, b], then
{Y ∩ K 6= ∅} = {yL < a, yU ≥ a} ∪ {yL ∈ [a, b]} ∈ F
because yL and yU are random variables. Measurability for all compact sets
K ⊂ R follows from similar arguments.
Example 2.3 (Entry game). Consider a two player entry game as in Tamer
(2003), where each player j can choose to enter (yj = 1) or to stay out of the
market (yj = 0). Let ε1 , ε2 be two random variables, and θ1 ≤ 0, θ2 ≤ 0 be two
parameters. Let players’ payoffs be πj = yj (θj y3−j + εj ), j = 1, 2. Each player
enters the game if and only if πj ≥ 0. Then, for given values of θ1 and θ2 , the
set of pure strategy Nash equilibria, denoted Yθ , is depicted in Figure 2.1 as a
function of ε1 and ε2 . The figure shows that for (ε1 , ε2 ) ∈

/ [0, −θ1 ) × [0, −θ2 ) the
equilibrium of the game is unique, while for (ε1 , ε2 ) ∈ [0, −θ1 ) × [0, −θ2 ) the game
admits multiple equilibria and the corresponding realization of Yθ has cardinality
2. An equilibrium is guaranteed to exist because we assume θ1 ≤ 0, θ2 ≤ 0.
To see that Yθ is a random closed set, notice that in this example one can take
K = {(0, 0), (1, 0), (0, 1), (1, 1)}, and that all its subsets are compact. Then
{Yθ ∩ K 6= ∅} = {(ε1 , ε2 ) ∈ GK } ∈ F ,
where GK is a Borel set determined by the chosen K. For example, if K = {(0, 0)}
then GK = (−∞, 0) × (−∞, 0). Measurability follows because ε1 and ε2 are
random variables.
2.2 Capacity Functional and Containment Functional
Definition 2.1 means that X is explored by its hitting events, that is the events
where X hits a compact set K. The corresponding hitting probabilities have an

important role in the theory of random sets, hence we define them formally here,
together with a closely related functional.
Definition 2.4. (i) A functional TX (K) : K 7→ [0, 1] given by
TX (K) = P{X ∩ K 6= ∅} , K ∈ K,
is called the capacity (or hitting) functional of X.
(ii) A functional CX (F ) : F 7→ [0, 1] given by
CX (F ) = P{X ⊂ F } , F ∈F,
is called the containment functional of X. We write T (K) and C(F ) instead of
TX (K) and CX (F ) where no ambiguity occurs.
The importance in random set theory of the capacity functional stems from the
fact that it uniquely determines the probability distribution of a random closed
set X, see Molchanov (2005, Ch. 1, Sec. 1.2). We note that the containment
functional defined on the family of all closed sets F yields the capacity functional
extended to open sets G = F c as
T (G) = P{X ∩ G 6= ∅} = 1 − P{X ⊂ Gc } = 1 − C(F )
and then by approximation determines T on all compact sets. Therefore, the
containment functional defined on the family of closed sets also determines the
distribution of X. If X is a random compact set, the containment functional
defined on the family of compact sets suffices to determine the distribution of X.
If X = {ξ} is a random singleton with distribution Pξ , then TX (K) = P{ξ ∈
K} = Pξ (K) and CX (F ) = P{ξ ∈ F } = Pξ (F ), i.e. TX and CX coincide and
become the probability distribution of ξ. In particular, then TX and CX are
additive, so that TX (K1 ∪ K2 ) = TX (K1 ) + TX (K2 ) for disjoint K1 and K2 , and

similarly for CX . In general, however, TX is a subadditive functional, and CX is
a superadditive functional. This is because when X contains more than a single
point with positive probability, it might hit two disjoint sets simultaneously, so
that TX (K1 ∪ K2 ) ≤ TX (K1 ) + TX (K2 ), and it might be a subset of a union of
sets but of neither of them alone, so that CX (F1 ∪ F2 ) ≥ CX (F1 ) + CX (F2 ).
Example 2.5 (Interval data). Consider again the random interval Y = [yL , yU ].
Then TY ({a}) = P{yL ≤ a ≤ yU } and TY ([a, b]) = P{yL < a, yU ≥ a} + P{yL ∈
[a, b]}. Similarly, CY ([a, b]) = P{yL ≥ a, yU ≤ b}.
Example 2.6 (Entry game). Consider the set-up in Example 2.3. Then for
K = {(0, 1)} we have T ({0, 1}) = P{ε1 < −θ1 , ε2 ≥ 0} and C({0, 1}) = P{ε1 <
−θ1 , ε2 ≥ 0} − P{0 ≤ ε1 < −θ1 , 0 ≤ ε2 < −θ2 }. For K = {(1, 0), (0, 1)} we have
T ({(1, 0), (0, 1)}) = C({(1, 0), (0, 1)}) = 1 − P{ε1 ≥ −θ1 , ε2 ≥ −θ2 } − P{ε1 <
0, ε2 < 0}. One can similarly obtain T (K) and C(K) for each K ⊂ K.
2.3 Selections and Artstein’s Inequality
Ever since the seminal work of Aumann (1965), it has been common to think of
random sets as bundles of random variables – the selections of the random sets.
Definition 2.7. For any random set X, a (measurable) selection of X is a random
vector x such that x(ω) ∈ X(ω) almost surely. We denote by Sel(X) the set of
all selections from X.
We often call x a measurable selection in order to emphasize the fact that x is
measurable itself. Recall that a random closed set is defined on the probability
space (Ω, F, P) and, unless stated otherwise, almost surely means P-almost surely.
A possibly empty random set clearly does not have a selection, so unless stated
otherwise we assume that all random sets are almost surely non-empty, which in
turn guarantees existence of measurable selections (Molchanov, 2005, Theorem
1.2.13) One can view selections as curves taking values in the tube being the
graph of the random set X.
Example 2.8 (Interval data). Consider again the random interval Y = [yL , yU ].
Then Sel(Y ) is the family of all F-measurable random variables y such that y(ω) ∈
[yL (ω), yU (ω)] almost surely.
To tie the notion of selections with more traditional approaches in econometrics,
note that each selection of Y can be represented as follows. Take a random
variable r such that P{0 ≤ r ≤ 1} = 1 and whose distribution is left unspecified
and can be any probability distribution on [0, 1]. Let
y = ryL + (1 − r)yU .
Then y ∈ Sel(Y ). Tamer (2010) gives this representation of the random variables
in the interval [yL , yU ].
Example 2.9 (Entry game). Consider the set Yθ plotted in Figure 2.1. Let
ΩM = {ω ∈ Ω : ε(ω) ∈ [0, −θ1 ) × [0, −θ2 )}. Then for ω ∈

/ ΩM the set Yθ has only
one selection, since the equilibrium is unique. For ω ∈ ΩM , Yθ contains a rich set
of selections, which can be obtained as



 (0, 1) if ω ∈ Ω1 ,

y(ω) =

 (1, 0) if ω ∈ Ω2 ,

for all measurable partitions Ω1 ∪ Ω2 = ΩM .
Artstein (1983) and Norberg (1992) provide a necessary and sufficient condition
which relates the distribution of the selections of the random set X to the capacity
(and containment) functional of X. This is considered a fundamental result in
random sets theory, because it allows to characterize the distribution of bundles
of random vectors that constitute random sets.
Theorem 2.10 (Artstein). A probability distribution µ on Rd is the distribution
of a selection of a random closed set X in Rd if and only if
µ(K) ≤ T (K) = P{X ∩ K 6= ∅} (2.1)
for all compact sets K ⊂ Rd . Equivalently, if and only if
µ(F ) ≥ C(F ) = P{X ⊂ F } (2.2)
for all closed sets F ⊂ Rd . If X is a compact random closed set, it suffices to
check (2.2) for compact sets F only.
Proof. Molchanov (2005, Cor. 1.4.44) and Molchanov and Molinari (2014).
It is important to note that if µ from Theorem 2.10 is the distribution of
some random vector x, then it is not guaranteed that x ∈ X a.s., e.g. x can
be independent of X. Theorem 2.10 means that for each such µ, it is possible
to construct x with distribution µ that belongs to X almost surely. In other
words one couples x and X on the same probability space. Hence, the nature
of the domination condition in (2.1) can be traced to the ordering, or first order
stochastic dominance, concept for random variables. Two random variables x
and y are stochastically ordered if Fx (t) ≥ Fy (t), i.e. P{x ≤ t} ≥ P{y ≤ t} for
all t. In this case it is possible to find two random variables x0 and y 0 distributed
as x and y respectively, such that x0 ≤ y 0 a.s. A standard way to determine
these random variables is to set x0 = Fx−1 (u) and y 0 = Fy−1 (u) by applying
the inverse cumulative distribution functions to the same uniformly distributed

random variable u. One then speaks about the ordered coupling of x and y. Note
that the stochastic dominance condition can be written also as P{x ∈ A} ≤
P{y ∈ A} for A = [t, ∞) and all t ∈ R. Such a set A is increasing (or upper),
i.e. x ∈ A and x ≤ y implies y ∈ A. Using the probabilities of upper sets,
this domination condition can be extended to any partially ordered space. In
particular, this leads to the condition for the ordered coupling for random closed
sets Z and X obtained by Norberg (1992). Two random closed sets Z and X can
be realized on the same probability space as random sets Z 0 and X 0 having the
same distribution as, respectively, Z and X and so that Z 0 ⊂ X 0 almost surely,
if and only if the probabilities that Z has nonempty intersection with any finite
family of compact sets K1 , . . . , Kn are dominated by those of X. If the set Z
is a singleton, say Z = {x}, such condition can be substantially simplified and
reduces to the one in inequality (2.1).
If (2.1) holds, then µ is called selectionable. Further we refer to (2.1) as Art-
stein’s inequality. It follows immediately that TX and CX equal, respectively,
the upper envelope and the lower envelope of all probability measures that are
dominated by TX and dominate CX . Specifically, given
PX = {µ : µ(K) ≤ TX (K) ∀K ∈ K} = {µ : µ(F ) ≥ CX (F ) ∀F ∈ F}
we have,
TX (K) = sup{µ(K) : µ ∈ PX }, K ∈ K,
CX (F ) = inf{µ(F ) : µ ∈ PX }, F ∈ F,
see Molchanov (2005, Theorem 1.5.13). Because of this, the functionals TX and
CX are also called coherent upper and lower probabilities. In general, the upper
and lower probabilities are defined as envelopes of families of probability measures

that do not necessarily stem from a random closed set.
2.4 Aumann Expectation and Support Function
The space of closed sets is not linear, which causes substantial difficulties in
defining the expectation of a random set. One approach, inspired by Aumann
(1965) and pioneered by Artstein and Vitale (1975), relies on representing a
random set using the family of its selections, and considering the set formed by
their expectations.
If X possesses at least one integrable selection, then X is called integrable. In
this case only existence is important, e.g. X being a segment on the line with
one end-point equal to zero and the other one equal to a Cauchy distributed
random variable, is integrable because it possesses a selection that equals zero
almost surely, regardless of the fact that its other end-point is not integrable.
The family of all integrable selections of X is denoted by Sel1 (X).
Definition 2.11. The (selection or Aumann) expectation of an integrable ran-
dom closed set X is given by

 
Z 
EX = cl xdP : x ∈ Sel1 (X) .
 
Ω
If X is almost surely non-empty and its norm kXk = sup{kxk : x ∈ X}
is an integrable random variable, then X is said to be integrably bounded and
all its selections are integrable. In this case the family of expectations of these
integrable selections is already closed and there is no need to take an additional
closure as required in Definition 2.11, see Molchanov (2005, Thr. 2.1.24).
The selection expectation depends on the probability space used to define X,
see Molchanov (2005, Section 2.1.2). In particular, if the probability space is non-
atomic and X is integrably bounded, the selection expectation EX is a convex

set regardless of whether or not X might be non-convex itself (Molchanov, 2005,
Thr. 2.1.15). This convexification property of the selection expectation implies
that the expectation of the closed convex hull of X equals the closed convex hull
of EX, which in turn equals EX. It is then natural to describe the Aumann
expectation through its support function, because this function traces out a con-
vex set’s boundary and therefore knowing the support function is equivalent to
knowing the set itself, see Figure 2.2 and equation (2.3) below.
Definition 2.12. Let K be a convex set. The support function of K is
hK (u) = sup{hk, ui : k ∈ K} , u ∈ Rd ,
where hk, ui denotes the scalar product.
Note that the support function is finite for all u if K is bounded, and is sublinear
(positively homogeneous and subadditive) in u. Hence, it can be considered only
for u ∈ Bd or u ∈ Sd−1 . Moreover, one has
K = ∩u∈Bd {k : hk, ui ≤ hK (u)} = ∩u∈Sd−1 {k : hk, ui ≤ hK (u)}. (2.3)
The great advantage of working with the support function of the Aumann
expectation stems from the following result.
Theorem 2.13. If an integrably bounded random set X is defined on a non-
atomic probability space, or if X is almost surely convex, then
EhX (u) = hEX (u) , u ∈ Rd .
Proof. Molchanov (2005, Thr. 2.1.22).
This implies that one does not need to calculate the Aumann expectation
directly by looking at all selections, but can simply work with the expectation of
the support function of the random set.

2.5 Limit Theorems for Sums of Random Sets
Consider a sequence of independently and identically distributed random sets Xi ,
i = 1, . . . , n, where the notion of i.i.d. in this case corresponds to the requirements
that
Y
P{X1 ∩ K1 6= ∅, . . . , Xn ∩ Kn 6= ∅} = P{Xi ∩ Ki 6= ∅} ∀ K1 , . . . , Kn ∈ K,
i=1,...,n
P{Xi ∩ K 6= ∅} = P{Xj ∩ K 6= ∅} ∀i, j, ∀ K ∈ K.
Random set theory provides laws of large numbers and central limit theorems
for Minkowski sums of i.i.d. random sets, that mimic the familiar ones for random
vectors. Given two sets A and B in Rd , and scalars λ, γ in R, define the dilated set
λA = {r ∈ Rd : r = λa, a ∈ A} and let the Minkowski sum of the sets λA and γB
be defined as λA + γB = {r ∈ Rd : r = λa + γb, a ∈ A, b ∈ B}. The Minkowski
summation is a commutative and associative operation. Notably, however, it is
not an invertible operation: given two sets A and B, it might be impossible to
find a set C such that A + C = B (this happens for example if A is a ball and B
is a rectangle). Hence, while with random variables one expresses limit theorems
by taking the difference between a sample average of the variables and their
expectation (and normalizing it with a growing sequence), in the case of random
sets one considers the (normalized) Hausdorff distance between the Minkowski
average of the sets and their Aumann expectation, where the Hausdorff distance
between two sets A and B is defined as
ρH (A, B) = inf{r > 0 : A ⊂ B r , B ⊂ Ar },
= max{dH (A, B), dH (B, A)},

where Ar = {a : d(a, A) ≤ r} denotes the r-envelope of A, and
dH (A, B) = max min ka − bk

a∈A b∈B
denotes the directed Hausdorff distance from A to B.
The limit theorems rely on three key steps. First, attention is restricted to
convex random sets, and the sets are represented as elements of a functional
space by means of their support function. This is useful because the sum of the
support functions of a sequence of sets is equal to the support function of the
Minkowski sum of the sets:
n
1X
h1 Pn (u) = hXi (u).
n i=1 Xi n
i=1
Second, an embedding theorem given by Hörmander (1954) guarantees that the
space of compact and convex subsets of Rd endowed with the Hausdorff metric can
be isometrically embedded into a closed convex cone in the space of continuous
functions on the unit sphere endowed with the uniform metric, so that
ρH (X1 , X2 ) = sup khX1 (u) − hX2 (u)k.

u∈Sd−1
Finally, Shapley–Folkman–Starr theorem states that for K1 , . . . , Kn being any
subsets of Rd ,
√
ρH (K1 + · · · + Kn , conv(K1 + · · · + Kn )) ≤ d max kKi k .
1≤i≤n
Because for a sequence of i.i.d. integrably bounded random sets X1 , . . . , Xn
we have that n−1 max kXi k converges to zero almost surely, taking a Minkowski
average yields asymptotic convexification. Hence, the Hausdorff distance between
the Minkowski average of not necessarily convex but integrably bounded sets, and
the Minkowski average of their convex hulls, converges to zero.

Putting together these steps, one obtains that
n n n n
! ! !
1X 1X 1X 1X
ρH Xi , EX − ρH conv(Xi ), EX ≤ ρH Xi , conv(Xi )
n n n n
i=1 i=1 i=1 i=1

1
= Op .
n
Hence, a law of large numbers and a central limit theorem for continuous valued
random variables (the i.i.d. average of support functions minus their expectation),
together with Hörmander’s embedding theorem that converts it into a result
for the Hausdorff distance between Minkowski averages of convex random sets
and their Aumann expectation, and together with the Shapley-Folkman-Starr
theorem which allows to lift the requirement of convexity of the sets, yield the
following results.
Theorem 2.14 (Law of large numbers for random sets). Let X, X1 , X2 , . . . be
i.i.d. integrably bounded random closed sets in Rd . Then

X1 + · · · + Xn
ρH , EX →0 a.s. as n → ∞ .
n
Proof. Molchanov (2005, Thr. 3.1.6)
Theorem 2.15 (Central limit theorem for random sets). Let X, X1 , X2 , . . . be
i.i.d. random closed sets in Rd such that EkXk2 < ∞. Then
√

X1 + · · · + Xn d
nρH , EX → sup{|ζ(u)| : u ∈ Sd−1 } as n → ∞ ,
n
where {ζ(u), u ∈ Sd−1 } is a centered sample continuous Gaussian random func-
tion on Sd−1 with covariance E[ζ(u)ζ(v)] = E[hX (u)hX (v)] − E[hX (u)]E[hX (v)].
Proof. Molchanov (2005, Thr. 3.2.1)

3 APPLICATIONS TO IDENTIFICATION ANALYSIS
Identification analysis entails the study of what can be learned about a param-
eter of interest, given the available data and maintained modeling assumptions.
Within the partial identification paradigm, the goal is to characterize the sharp
identification region, denoted ΘI in what follows. This region exhausts all the
available information, given the sampling process and the maintained modeling
assumptions. Although it sometimes is easy to characterize ΘI , there exist many
important problems in which a tractable characterization is difficult to obtain.
It may be particularly difficult to prove sharpness, that is, to show that a con-
jectured region contains exactly the feasible parameter values and no others.
Proving sharpness is important. If a conjectured region is not sharp, this means
that some parameter values in it are actually inconsistent with the sampling pro-
cess and the maintained assumptions. Hence, they cannot have generated the
observed data. Failure to eliminate such values weakens the models ability to
make useful predictions. And it weakens the researcher’s ability to achieve point
identification when it attains, as well as to test for model misspecification. This
is true both in the context of structural analysis and in the context of reduced
form analysis.
3.1 Sharp Identification Regions
Tractable characterizations of sharp identification regions have been provided in
several contexts using standard tools of probability theory; see, among others,
Manski (1989, 2003), Manski and Tamer (2002), and Molinari (2008). Beresteanu,
Molchanov and Molinari (2011, BMM henceforth) show how to apply random set
theory to yield a unified method for characterizing ΘI , including in some impor-

tant settings where other approaches are less tractable. Their approach rests on
the fact that in many partially identified models, the information in the data
and assumptions can be expressed as requiring either that (i) a random vector
belongs to a random set with probability one, or that (ii) the conditional ex-
pectation of a random vector belongs to the conditional Aumann expectation of
a random set almost surely with respect to the restriction of P to the condi-
tioning σ-algebra. This immediately allows for characterizations of the elements
of ΘI through Artstein inequality and through the support function dominance
condition, respectively. We illustrate these ideas using two simple examples.
Example 3.1 (Best linear prediction with interval outcomes and covariates).
Suppose the researcher is interested in best linear prediction (BLP) of y given
x, but only observes random intervals Y = [yL , yU ] and X = [xL , xU ] such that
P{yL ≤ y ≤ yU , xL ≤ x ≤ xU } = 1. Earlier on, Horowitz, Manski, Ponomareva
and Stoye (2003, HMPS henceforth) studied this problem and provided a char-
acterization of the sharp identification region of each component of the vector θ.
The computational complexity of the problem in the HMPS formulation, however,
grows with the number of points in the support of the outcome and covariate vari-
ables, and becomes essentially unfeasible if these variables are continuous, unless
one discretizes their support quite coarsely. We show here that the random sets
approach yields a characterization of ΘI that remains computationally feasible
regardless of the support of outcome and covariate variables.
Suppose X and Y are integrably bounded. Then one can obtain ΘI as the
collection of θ’s such that there are selections (x̃, ỹ) ∈ Sel(X × Y ) and associated
prediction errors ε(θ) = ỹ − θ1 − θ2 x̃, satisfying Eε(θ) = 0 and E(ε(θ)x̃) = 0.

Hence we build the set

   
 
 ỹ − θ1 − θ2 x̃

 


Qθ = q =   : (x̃, ỹ) ∈ Sel(X × Y )
   

 (ỹ − θ1 − θ2 x̃)x̃ 

We remark that Qθ is not necessarily convex.
For given θ we can have a mean-zero prediction error uncorrelated with its
associated selection x̃ if and only if the zero vector belongs to EQθ . Convexity
of EQθ and equation (2.3) yield
0 ∈ EQθ ⇔ h0, ui ≤ hEQθ (u) ∀ u ∈ Bd .
Using Theorem 2.13 we obtain
n o
ΘI = θ : 0 ≤ E(hQθ (u)) ∀ u ∈ Bd = θ : max(−E(hQθ (u)) = 0 , (3.1)
u∈Bd
where
hQθ (u) = max [u1 (ỹ − θ1 − θ2 x̃) + u2 (ỹx̃ − θ1 x̃ − θ2 x̃2 )]

ỹ∈Y,x̃∈X
is an easy to calculate continuous-valued convex sublinear function of u, regardless
of whether the variables involved are continuous or discrete.
The optimization problem in (3.1) showing whether θ ∈ ΘI is a convex program,
hence easy to solve. See for example the CVX software by Grant and Boyd (2010).
It should be noted, however, that the set ΘI itself is not necessarily convex. One
then has to scan the parameter space to trace out ΘI . Ciliberto and Tamer (2009)
and Bar and Molinari (2013) propose methods to conduct this task. Projections
of ΘI on each of its components can be obtained using the support function of
this set, as shown in Kaido, Molinari and Stoye (2013).
Example 3.2 (Entry game). Consider the set-up in Example 2.3. Assume we
observe data that identifies Py , the multinomial distribution of outcomes of the

game, and that the distribution of ε is known up to a finite dimensional param-
eter vector that is part of θ. Earlier on, Tamer (2003), Berry and Tamer (2007)
and Ciliberto and Tamer (2009) studied this problem, and provided an abstract
characterization of ΘI based on augmenting the model with an unrestricted se-
lection mechanism, that picks the equilibrium played in the region of multiplicity.
The selection mechanism is a rather general random function, that BMM later
showed builds all possible selections of the random set of equilibria. Because the
selection mechanism may constitute an infinite dimensional nuisance parameter,
dealing with it directly creates great difficulties for the computation of ΘI and for
inference. Random sets theory yields a complementary approach through which
one can characterize ΘI avoiding altogether the need to deal with the selection
mechanism. The resulting characterization is computationally tractable, can be
directly linked to existing inference methods (e.g., Andrews and Shi (2013)),
and is in the spirit of the earlier literature in partial identification that provided
tractable characterizations of sharp identification regions without making any
reference to the selection mechanism (see, e.g., Manski (2003) and Manski and
Tamer (2002)). To build intuition for how this characterization is possible, we
recall that Theorem 2.10 (Artstein’s inequality) and Theorem 2.13 (Aumann ex-
pectation and support function) allow us to characterize the distribution and
the expectation of each selection of a random set, without having to build such
selections directly.
In our simple example, if the model is correctly specified and the observed out-
comes result from pure strategy Nash play, then a candidate θ can have generated
the observed outcomes if and only if y ∈ Sel(Yθ ). An immediate application of

Artstein’s Theorem yields
ΘI = {θ : P{y ∈ K} ≤ TYθ (K) , K ⊂ K},
which gives that one can verify whether θ ∈ ΘI by checking a finite number of
moment inequalities – specifically 2n − 2, with n the cardinality of K. This can
potentially be a large number, but in Section 3.2 below we show that econometric
applications of random set theory similar in spirit, but much more complex, than
the example considered here, motivated econometricians to find effective ways to
substantially reduce the number of test sets K over which to check the dominance
condition.
BMM show that ΘI can be equivalently characterized in terms of Aumann ex-
pectation and support function, observing that if the model is correctly specified,
the multinomial distribution Py observed in the data should belong to the collec-
tion of multinomial distributions associated with each selection of Yθ . Recalling
the simple fact that the probability mass function of a discrete random variable
is equal to the expectation of properly defined indicator functions, the collection
of multinomial distributions associated with each selection of Yθ can be expressed
as an Aumann expectation. Specifically, define the set

   
 
 (1 − y˜1 )(1 − y˜2 ) 

 


 


   


   


 
 y˜1 (1 − y˜2 ) 



Qθ = q : q =   
 , ỹ ∈ Sel(Yθ ) .
 





 (1 − y˜1 )y˜2 






   


   


 y˜1 y˜2 

Then one can equivalently write
ΘI = {θ : Py ∈ E(Qθ )}
= {θ : hPy , ui ≤ hE(Qθ ) (u) , ∀u ∈ Bd }
= {θ : hPy , ui ≤ E(hQθ (u)) , ∀u ∈ Bd }
= {θ : maxhPy , ui − E(hQθ (u)) = 0},

u∈Bd
where the second line follows from equation (2.3), the third line follows from
Theorem 2.13, and the last line is an algebraic manipulation. The maximization
problem in it is a convex optimization problem, and solving it is computationally
easy. For example, Boyd and Vandenberghe (2004, p. 8) write: ”We can easily
solve [convex] problems with hundreds of variables and thousands of constraints
on a current desktop computer, in at most a few tens of seconds”.
The Aumann expectation based characterization applies easily also when out-
comes of the game result from mixed strategy Nash play or from other solution
concepts, by replacing the set Qθ with one collecting the multinomial distribu-
tions over outcomes of the game associated with each equilibrium mixed strategy.
However, there is no result to date formally establishing a characterization for
these models based on Artstein’s inequalities.
BMM establish validity of their Aumann expectation based approach for a
general class of econometric models which they call ”models with convex moment
predictions.” A detailed discussion of these models goes beyond the scope of
this review; our two preceding examples, however, illustrate the key features
of the random sets approach to obtaining tractable characterizations of sharp
identification regions.
In important complementary work, Galichon and Henry (2011) use the charac-
terization of ΘI based on Artstein’s inequality to study finite games of complete
information with multiple pure strategy Nash equilibria. They show that fur-
ther computational simplifications can be obtained, by bringing to bear different
mathematical tools from optimal transportation theory. A discussion of this the-
ory is beyond the scope of this review.
3.2 Core Determining Classes
Artstein’s inequalities (2.1) characterize distributions of selections by solving a
potentially large system of inequalities indexed by all compact subsets of the
carrier space. As discussed in the previous section, however, econometric appli-
cations of random set theory call for computationally tractable characterizations
of ΘI . This motivated the study of a reduced family of test sets that still suffices
to check for selectionability of a distribution. Such reduced family of sets was
formally defined by Galichon and Henry (2006, 2011), who then implement it in
the context of incomplete models that satisfy a monotonicity requirement. Here
we present a definition which makes use of the containment functional; a similar
definition can be given using the capacity functional.
Definition 3.3. A family of compact sets M is said to be a core determining class
for a random closed set X if any probability measure µ satisfying the inequalities
µ(F ) ≥ C(F ) = P{X ⊂ F } (3.2)
for all F ∈ M, is the distribution of a selection of X and so (3.2) holds for all
closed sets F .
It is easy to show that a core determining class M is also distribution deter-
mining, i.e. the values of the containment functional on M determine uniquely

the distribution of the random closed set X (distribution determining classes
in random sets theory correspond to the similar concept for random variables in
probability theory). However, distribution determining classes are not necessarily
core determining.
A rather easy and general core determining class is obtained as a subfamily of
all compact sets that is dense in a certain sense. For instance, in the Euclidean
space, it suffices to consider compact sets obtained as finite unions of closed balls
with rational centers and radii.
For a further reduction one should impose additional restrictions on the family
of realizations of X. Assume that X is almost surely convex. It is known that
the containment functional CX (F ), F ∈ F, uniquely determines the distribution
of X. Since a convex set X “fits inside” a convex set F , it would be natural to
expect that probabilities of the type CX (F ) = P{X ⊂ F } for all convex closed
sets F determine uniquely the distribution of X. This is however the case only
if X is almost surely compact, see Molchanov (2005, Thr. 1.7.8). Even in this
case, however, the family of all convex compact sets is not a core determining
class when the random sets are of dimension greater than 1.
In some cases, most importantly for random sets being intervals on the line, it
is useful to note that X ⊂ F if and only if X ⊂ FX , where

[
FX = X(ω)
ω∈Ω0 , X(ω)⊂F
for any set Ω0 of full probability. Thus, µ is the distribution of a selection of X
if and only if µ(FX ) ≥ P{X ⊂ FX } for all closed sets F .
Example 3.4 (Random interval). Let Y = [yL , yU ] be a bounded random interval
in the real line. In this case, it is useful to characterize selections by the inequal-
ities (2.2) involving the containment functional of Y . Then µ is the distribution

of a selection of Y if and only if
µ([a, b]) ≥ P{Y ⊂ [a, b]} = P{a ≤ yL , yU ≤ b}
for all segments [a, b] ⊂ R.
Sometimes it is possible to partition the whole space of elementary events Ω
into several subsets such that the values of X on ω’s from disjoint subsets are
disjoint.
Theorem 3.5. Consider a partition of Ω into sets Ωi , 1 ≤ i ≤ N of positive
probability, where N may be infinite. Let Ki = ∪{X(ω) : ω ∈ Ωi } denote the
range of X(ω) for ω ∈ Ωi . Assume that Ki , i ≥ 1 are disjoint. Then it suffices
to check (2.1) only for all K such that there is i ∈ {1, . . . , N } for which K ⊂ Ki .
Proof. Molchanov and Molinari (2014).
This result may yield a significantly reduced core determining class.
Example 3.6 (Entry game). Consider the set-up in Example 2.3. We have
shown before that Artstein’s Theorem yields
ΘI = {θ : P{y ∈ K} ≤ TYθ (K) , K ⊂ K}.
This amounts to 2n − 2 inequalities to check, with n the cardinality of K, in this
case 4. An application of Theorem 3.5, however, yields that it suffices to check
eight inequalities involving singleton sets K = {a}:
ΘI = {θ : CYθ ({a}) ≤ P{y = {a}} ≤ TYθ ({a}), a ∈ K}.
Galichon and Henry (2011) propose to use a matching algorithm to check that
a probability measure is selectionable, using tools from optimal transportation
theory. Indeed, a random vector x is a selection of X if and only if it is possible

to match values x(ω) for ω ∈ Ω to the values of X(ω) so that x(ω) ∈ X(ω).
This yields an alternative algorithm to check the selectionability and also makes
it possible to quantify how far a random vector is from the family of selections.
3.3 Random Sets in the Space of Unobservables
In Example 3.1, for the interval data case, we have encountered a random closed
set defined in the space of unobservables, the prediction errors. Random closed
sets defined in such space can be extremely useful for incorporating restrictions on
the unobservables in the analysis, as illustrated by Chesher and Rosen in a series
of papers, e.g. Chesher, Rosen and Smolinski (2012) and Chesher and Rosen
(2012). Here we illustrate their approach through the entry game example.
Example 3.7 (Entry game). Consider again the two-player entry game in Ex-
ample 2.3. So far we have addressed the identification problem in this model by
defining the random closed set Yθ (ε) of pure strategy Nash equilibria associated
with a given realization of ε = (ε1 , ε2 ). The random set Yθ can be viewed as a
set-valued function of ε. The inverse of this function is defined as
Ȳθ (y) = {ε : y ∈ Yθ (ε)},
see Aubin and Frankowska (1990). If y is a random element in K, then Ȳθ is a
random closed set in the space of unobservables. Then
y ∈ Sel(Yθ (ε)) ⇔ ε ∈ Sel(Ȳθ (y)),
so that using Artstein’s inequality (2.2) we obtain that a candidate distribution
for ε is the distribution of a selection of Ȳθ (y) if and only if
P{ε ∈ F } ≥ P{Ȳθ (y) ⊂ F }

for all closed sets F in the plane, which is the realization space for ε. However,
Chesher and Rosen (2012) show that this family of test sets can be considerably
reduced, to being equivalent to the case when one works with Yθ , by observing
that the realizations of Ȳθ (y) associated with the four realizations of y ∈ K are
four rectangles, see Figure 2.1. Hence, one can construct the core determining
class in steps. (1) Let F be a proper subset of one of the four rectangles; then
P{Ȳθ (y) ⊂ F } = 0 and the inequality is automatically satisfied. (2) Take the
collection of sets F that contain one of the four rectangles but not more. Then it
suffices to check the inequality for the four sets F that equal (the closure of) each
of the rectangles; this is because a larger set F 0 in this family yields the same value
for the containment functional as F . (3) Take the collection of sets F that contain
two of the four rectangles but not more. A similar reasoning allows one to check
the inequality only on the five sets F that equal (the closure of) unions of two of
the rectangles. Observing that the realizations Ȳθ (0, 0) and Ȳθ (1, 1) are disjoint,
one obtains that the containment functional is additive on sets F = F1 ∪ F2 such
that Ȳθ (0, 0) ⊂ F1 and Ȳθ (1, 1) ⊂ F2 , and therefore inequalities involving this
set are redundant. (4) Finally, the collection of sets F that contain three of the
four rectangles can similarly be reduced to (the closure of) unions of three of the
rectangles, and therefore yield redundant inequalities.
As this example makes plain, one can often work with random sets defined
either in the space of observables or in the space of unobservables. It is then
natural to ask which might be more advantageous in practice. We believe the
answer depends on the modeling assumptions. As a rule-of-thumb, if the modeling
assumptions are either stochastic or shape restrictions on the observables, it is
often most useful to work with random sets defined in the space of observables. If
the modeling assumptions are either stochastic or shape restrictions in the space
of unobservables, it is often most useful to work with random sets defined in the
space of unobservables. Suppose for example, within the two player entry game
previously discussed, that one observes variable v along with y. Then if the model
is correctly specified, (y, v) ∈ Sel(Yθ , v). Impose the exclusion restriction that y
is independent of v. Notice that the capacity and containment functional of Yθ
may depend on v. Then applying Artstein’s inequality one immediately gets
ΘI = {θ : CYθ |v ({a}) ≤ P{y = {a}} ≤ TYθ |v ({a}), a ∈ K, v − a.s.}.
see Molchanov and Molinari (2014). On the other hand, suppose the exclusion
restriction is between an instrumental variable v and the unobservable ε. If the
model is correctly specified, (ε, v) ∈ Sel(Ȳθ , v) and a similar reasoning as before
yields
ΘI = {θ : P{ε ∈ F } ≥ P{Ȳθ (y) ⊂ F |v}, F ∈ M, v − a.s.},
where M is the core determining class obtained above. For other examples see
Beresteanu, Molchanov and Molinari (2012) and Chesher and Rosen (2012).
4 APPLICATIONS TO INFERENCE
Identification arguments are always at the population level. That is, they pre-
sume that identified features of the model can be learned with certainty from
observation of the entire population. However, in practice such features need to
be estimated from a finite sample. When a model is partially identified, statis-
tical inference is particularly delicate to conduct. This is because the identified
feature of the model is a set rather than a point. The shape and size of a prop-
erly defined set estimator changes with sample size, and even consistency of the
estimator becomes harder to determine. Horowitz and Manski (2000), Manski
and Tamer (2002), Imbens and Manski (2004), Chernozhukov, Hong and Tamer
(2007), and Andrews and Soares (2010), among others, have addressed the ques-
tion of how to conduct inference in partially identified models, using tools of
probability theory for random variables. A complementary approach is built on
elements of random set theory. The method offers a unified approach to inference
for level sets and convex identified sets based on Wald-type test statistics for the
Hausdorff distance. The approach has been shown to be especially advantageous
when ΘI is convex, or when one is interested in inference for projections of ΘI ,
because in this case the support function is a natural tool to obtain a functional
representation of the boundary of the set, or its projections directly.
4.1 Estimation of Level Sets
The nature of partial identification problems calls for estimation of sets that
appear as solutions to systems of inequalities. In case of one inequality, consider
the set S(t) = {s ∈ Rd : f (s) ≤ t} for a lower semicontinuous real-valued function
f and some t ∈ R. The lower semicontinuity property of f is actually equivalent
to the closedness of such level sets. If now f is replaced by its empirical estimator
fn , then Sn (t) = {s ∈ Rd : fn (s) ≤ t} yields the plug-in estimator of S. If f is
a probability density function, then the set S(t) appears in cluster analysis, see
Hartigan (1975). More sophisticated estimators of S(t) using the so-called excess
mass method have been considered in Polonik (1995). Asympotic normality of
plug-in estimators have been studied in Mason and Polonik (2009) and optimal
rates are obtained in Rigollet and Vert (2009).
It is shown in Molchanov (1998) that the plug-in estimator is strongly consistent

if S(t) equals its closure. This condition is also necessary under some rather mild
technical conditions. However, this condition is violated if f has a local minimum
at level t. Most importantly, this is the case if t is the global minimum of the
function f , and S(t) is then the set arginff of the global minimizers of the function
f . This case has been thoroughly analyzed in Chernozhukov et al. (2007), who
suggested estimating S(t) by the set {s : fn (s) ≤ t + εn }, where the non-negative
correction term εn declines to zero at an appropriate rate.
A limit theorem obtained in Molchanov (1998) for the plug-in estimator pro-
vides a limit distribution for the normalised Hausdorff distance between S(t) and
Sn (t), both intersected with any given compact set K. The limit theorem holds
under the assumptions that the normalised difference fn (s) − f (s) satisfies a limit
theorem and that f satisfies a certain smoothness condition formulated in terms
of its downside continuity modulus, i.e. the infimum of f (s0 ) − f (s) for s0 from a
neighborhood of s.
4.2 Support Function Approach
Beresteanu and Molinari (2008) propose to use statistics based on the Hausdorff
distance to perform estimation and inference on sharp identification regions ΘI
in the space of sets, so as to replicate the common Wald approach to these tasks
for point identified models in the space of vectors. In particular, they employ two
Wald statistics, which measure the Hausdorff distance and the directed Hausdorff
distance between the identified set and a set valued estimator, and develop large
sample and bootstrap inference procedures for these statistics.
Their results apply directly to incomplete econometric models in which ΘI is
equal to the Aumann expectation of a random set which can be constructed us-
ing random variables characterizing the model. Applying the analogy principle,
they suggest to estimate ΘI through a Minkowski sample average of random sets
defined using the sample observations. The support function of the convex hull
of these random sets is used to represent the set estimator as a sample average of
elements of a functional space, so that Theorems 2.14-2.15 (law of large numbers
and central limit theorem) are used to establish consistency of the estimator and
derive its limiting distribution with respect to the Hausdorff distance. Beresteanu
and Molinari also show that the critical values of the limiting distribution can be
consistently estimated through a straightforward bootstrap procedure. Hypothe-
sis about subsets of the population identification region are tested using the Wald
statistic based on the directed Hausdorff distance, and these tests are inverted
to obtain confidence sets that asymptotically cover the population identification
region with a prespecified probability. Additionally, hypothesis about the entire
population identification region, rather than only its subsets, are tested using the
Wald statistic based on the Hausdorff distance.
We illustrate Beresteanu and Molinari’s approach for the case of best linear
prediction with interval outcome data. We remark that in the case of entry
games, ΘI is not convex and therefore any statistic based on the support function
yields asymptotic statements about conv(ΘI ).
Example 4.1 (Inference for best linear prediction with interval outcomes). Sup-
pose the researcher is interested in best linear prediction of y given x. Let
(x, yL , yU ) be the observed variables, with P{yL ≤ y ≤ yU } = 1. Then the sharp
identification region of the BLP parameter vector θ can be obtained defining the
random segment   
 
 y 
 

G=   : yL ≤ y ≤ yU ⊂ R2
  
 xy
 

and collecting the least squares associated with each (ỹ, xỹ) ∈ Sel(G):
  −1   
 

 1 x ỹ  

d
 
ΘI = θ ∈ R : θ = E   
 E 
 
 , (ỹ, xỹ) ∈ Sel(G) , (4.1)
2
 

 x x xỹ 

where we have assumed that G is integrably bounded (this is the case, for exam-
ple, if yL , yU , xyL , xyU are each absolutely integrable). Given a random sample
(xi , yLi , yUi )ni=1 , ΘI can be estimated using
1
Θ̂n = Σ̂−1
n (G1 + · · · + Gn ),
n
where Σ̂n is a consistent estimator of the matrix inside equation (4.1). Using
Theorem 2.14, Beresteanu and Molinari establish a Slutsky-type result and under
mild regularity conditions show that

ρH Θ̂n , ΘI = Op (n−1/2 ).
In order to show that the support function process converges to a Gaussian
process, Beresteanu and Molinari need to assume that all x variables have a con-
tinuous distribution. This assures that the set ΘI does not have flat faces, which
in turn guarantees that hΘI (u) is differentiable in u. Therefore a functional delta
method can be employed to show, under additional mild regularity conditions,
that
√
d
nρH Θ̂n , ΘI → sup kz(u)k ,
u∈Sd−1
√
d
ndH ΘI , Θ̂n → sup (−z(u))+ ,
u∈Sd−1
where z(u) is a linear function of a centered sample continuous Gaussian process.
By comparison, in the presence of flat faces in ΘI , Bontemps, Magnac and Maurin

(2012) show that the support function process converges to the sum of a Gaussian
process and a countable point process which takes non zero values at directions
u orthogonal to the flat faces of ΘI .
A simple nonparametric bootstrap procedure that resamples from the empirical
distribution of (xi , yLi , yUi )ni=1 , consistently estimates the quantiles of the limiting
distributions of these Wald-statistics. Hence, one can test hypotheses of the form
H0 : ΘI = Θ0 , versus HA : ΘI 6= Θ0 , using the statistic based on the Hausdorff
distance, and hypothesis of the form H00 : Θ0 ⊆ ΘI versus H0A : Θ0 * ΘI using
the statistic based on the directed Hausdorff distance. Inverting these tests yield
confidence collections which are unions of sets that cannot be rejected as either
equal to ΘI , or as subsets of ΘI . Estimation and inference can be implemented
using standard statistical packages, including STATA.1
Bontemps, Magnac and Maurin (2012) extend these results in important direc-
tions, by allowing for incomplete linear moment restrictions where the number of
restrictions exceeds the number of parameters to be estimated, and extend the
familiar Sargan test for overidentifying restrictions to partially identified mod-
els. When the number of restrictions equals the number of parameters to be
estimated, they propose a support function based statistic to test hypotheses
about each vector θ ∈ ΘI , and invert this statistic to obtain confidence sets that
asymptotically cover each element of ΘI with a pre-specified probability.
Chandrasekhar, Chernozhukov, Molinari and Schrimpf (2012) significantly ex-
tend the applicability of Beresteanu and Molinari’s approach, to cover best linear
approximation of any function f (x) that is known to lie within two identified
bounding functions. The lower and upper functions defining the band are al-
1
See: http://economics.cornell.edu/fmolinari/#Stata_SetBLP.
lowed to be any functions, including ones carrying an index, and can be esti-
mated parametrically or non-parametrically. Because the intervals defining the
outcome variable (i.e., the extreme points of the band on f (x)) can be estimated
non-parametrically in a first stage, Chandrasekhar et al. develop a new limit
theory for the support function process, and prove that it approximately con-
verges to a Gaussian process and that the Bayesian bootstrap can be applied for
inference. They also propose a simple data jittering procedure, whereby to each
discrete variable in x is added a continuously distributed error with arbitrarily
small but positive variance, eliminating flat faces in ΘI . Hence they obtain valid,
albeit arbitrarily conservative, inference without ruling out discrete covariates.
In the study of inference for sets defined by one smooth non-linear inequality,
Chernozhukov, Kocatulum and Menzel (2012) show that the (directed) Haus-
dorff statistic can be weighted, to enforce either exact or first order equivariance
to transformations of parameters.
4.3 Duality Between the Level Set Approach and the Support
Function Approach
Kaido (2012) further enlarges the domain of applicability of the support function
approach, by establishing a duality between level set estimators based on convex
criterion functions, and the support function of the level set estimators. This
allows one to use Hausdorff-based statistics and the support function approach
not only when ΘI is the Aumann expectation of a properly defined random closed
set, but also when such a representation in not readily available.
Kaido considers an identification region and its corresponding level set estima-
tor given, respectively, by
ΘI = {θ ∈ Θ : f (θ) = 0},
Θ̂n (t) = {θ ∈ Θ : an fn (θ) ≤ t},
where Θ is a convex subset of Rd , f is a convex, lower semicontinuous criterion
function with values in R+ and infimum at zero, f and fn satisfy the additional
regularity conditions set forth in Chernozhukov et al. (2007), an is a growing
sequence, and t ≥ 0 is properly chosen. Under these assumptions, ΘI is convex,
and Kaido (2012, Lemma 3.1) establishes that
hΘ̂n (t) (u) < b ⇔ inf an fn (θ) > t,

θ∈Kb,u ∩Θ
where Kb,u = {θ ∈ Rd : hu, θi ≥ b}.
Using this result, he shows how to relate the normalized support function pro-

cess Zn (u, t) = an hΘ̂n (t) (u) − hΘI (u) to a localized version of the criterion
function fn , to obtain its asymptotic distribution using the notion of weak epi-
convergence, see Molchanov (2005, Sec. 5.3). An application of Hörmander’s em-
bedding theorem then yields the asymptotic distribution of test statistics based
on the Hausdorff distance.
Kaido, Molinari and Stoye (2013) show that the approach of Kaido (2012) can
be extended to conduct inference on projections of ΘI even when this set is non-
convex and the identified set is estimated using a level set estimator under the
assumptions of Chernozhukov, Hong and Tamer (2007). Their method is based
on the simple observation that the projections of ΘI are equal to the projections of
conv(ΘI ), and as such when projections are the object of interest, no information
is lost due to the convexification effect of the support function approach.

4.4 Efficiency of the Support Function Approach
Kaido and Santos (2013) develop a theory of efficiency for estimation of partially
identified models defined by a finite number of convex moment inequalities of
the form E(mj (x; θ)) ≤ 0, j = 1, ...J which are smooth as functionals of the
distribution of the data. The functions θ 7→ mj E(mj (x; θ)) are assumed to be
convex, so that ΘI = {θ : E(mj (x; θ)) ≤ 0, j = 1, ...J} is convex and can be
represented through its support function. Using the classic results in Bickel et al.
(1993), Kaido and Santos show that under suitable regularity conditions, the
√
support function admits for n-consistent regular estimation. The assumptions
rule out, in particular, (i) flat faces in ΘI that depend on parameters to be
estimated, (ii) more binding moment inequalities than parameters to be estimated
at any boundary point of ΘI , and (iii) sets ΘI with empty interior. Using the
convolution theorem, they establish that any regular estimator of the support
function must converge in distribution to the sum of a mean zero Gaussian process
G0 and an independent noise process ∆0 .
Using the same reasoning as in the classical case, they call a support function
estimator semiparametrically efficient if it is regular and its asymptotic distri-
bution equals that of G0 . Hence, they obtain a semiparametric efficiency bound
for regular estimators of the support function, by deriving the covariance kernel
of G0 . Then they show that a simple plug-in estimator based on the support
function of the set of parameters satisfying the sample analog of the moment
inequalities attains this bound.
The semiparametrically efficient estimator of the support function is used to
construct estimates of the corresponding identified set that minimize a wide class
of asymptotic loss functions based on the Hausdorff distance. In order to estimate

critical values of the limiting distribution of test statistics based on the Hausdorff
distance, Kaido and Santos propose a score-multiplier bootstrap which does not
require that the support function is re-computed for each resample of the data,
and as such is especially computationally attractive.
In addition to convex moment inequality models, Kaido and Santos’s results
imply that the estimator for best linear prediction with interval outcome data in
Example 4.1 is asymptotically efficient.
5 CONCLUSIONS
While the initial development of random set theory was in part motivated by
questions of general equilibrium analysis and decision theory, random sets the-
ory has not been introduced in econometrics until recently. The new surge of
interest in applications of random set theory to econometrics has been motivated
by partially identified models, where the identified object is a set rather than a
singleton. Researchers interested in partially identified models need to provide
tractable characterizations of sharp identification regions, and need to develop
methodologies to estimate sets, test hypothesis about (subsets of) the identifi-
cation regions, and build confidence sets that cover them with a pre-speficied
asymptotic probability.
Each of these tasks may be simplified by the use of random set theory, and
many results can be developed under a unified framework. This is because ran-
dom set theory distills elements of topology, convex geometry and probability
theory, to directly provide a mathematical framework designed to analyze ran-
dom elements whose realizations are sets. The resulting tools have been proven
especially useful for inference when the econometric model yields a convex sharp
identification region, and especially useful for identification analysis when the in-
formational content of the econometric model is equivalent to the statement that
(the conditional expectation of) an (un)observable variable almost surely belongs
to (the conditional Aumann expectation of) a random set.
This survey has attempted to introduce the basic elements of random set theory
that have proven useful to date in econometrics, and to summarize the main
applications of random set theory within this literature. The hope is that this
review can ease further applications of random set theory in econometrics.
The random sets approach to partial identification may complement a more
traditional approach based on laws of large numbers and central limit theorems
for random vectors, that continues to be very productively applied in the field.
We did not review results based on these methods, but refer the reader to Tamer
(2010) and references therein for a survey of the partial identification literature.
We have also not summarized the important literature in decision theory that
employs elements of random set theory, most notably nonadditive measures and
Choquet integrals. We refer interested readers to Gilboa (2004) for a thorough
treatment of these topics.
ACKNOWLEDGMENTS We thank Charles Manski for detailed comments
that substantially improved this review. Molinari gratefully acknowledges finan-
cial support from the USA NSF grant SES-0922330. Molchanov was supported
by Swiss National Foundation grants 200021-126503 and 200021-137527.

References
Andrews, D. W. K. and Shi, X. (2013). Inference based on conditional moment
inequalities, Econometrica 81: 609–666.
Andrews, D. W. K. and Soares, G. (2010). Inference for parameters defined
by moment inequalities using generalized moment selection, Econometrica
78: 119–157.
Artstein, Z. (1983). Distributions of random sets and random selections, Israel
J. Math. 46: 313–324.
Artstein, Z. and Vitale, R. A. (1975). A strong law of large numbers for random
compact sets, Ann. Probab. 3: 879–882.
Aubin, J.-P. and Frankowska, H. (1990). Set-Valued Analysis, Vol. 2 of System
and Control, Foundation and Applications, Birkhäuser, Boston.
Aumann, R. J. (1965). Integrals of set-valued functions, Journal of Mathematical
Analysis and Applications 12: 1–12.
Bar, H. Y. and Molinari, F. (2013). Computation of sets via data augmentation
and support vector machines, mimeo.
Beresteanu, A., Molchanov, I. and Molinari, F. (2011). Sharp identification re-
gions in models with convex moment predictions, Econometrica 79: 1785–1821.
Beresteanu, A., Molchanov, I. and Molinari, F. (2012). Partial identification
using random set theory, Journal of Econometrics 166: 17–32. With errata at
http://economics.cornell.edu/fmolinari/NOTE_BMM2012_v3.pdf.
Beresteanu, A. and Molinari, F. (2008). Asymptotic properties for a class of
partially identified models, Econometrica 76: 763–814.

Berry, S. T. and Tamer, E. (2007). Identification in models of oligopoly entry,
Advances in Economics and Econometrics: Theory and Application, Vol. II,
Cambridge University Press, chapter 2, pp. 46–85. Ninth World Congress.
Bickel, P. J., Klaassen, C. A., Ritov, Y. and Wellner, J. A. (1993). Efficient and
Adaptive Estimation for Semiparametric Models, Springer, New York.
Bontemps, C., Magnac, T. and Maurin, E. (2012). Set identified linear models,
Econometrica 80: 1129–1155.
Boyd, S. and Vandenberghe, L. (2004). Convex Optimization, Cambridge Uni-
versity Press, New York.
Chandrasekhar, A., Chernozhukov, V., Molinari, F. and Schrimpf, P. (2012).
Inference for best linear approximations to set identified functions, CeMMAP
Working Paper CWP 43/12.
Chernozhukov, V., Hong, H. and Tamer, E. (2007). Estimation and confidence
regions for parameter sets in econometric models, Econometrica 75: 1243–1284.
Chernozhukov, V., Kocatulum, E. and Menzel, K. (2012). Inference on sets in
finance, CeMMAP Working Paper CWP 46/12.
Chesher, A. and Rosen, A. M. (2012). Simultaneous equations models for dis-
crete outcomes: coherence, completeness, and identification, Working Paper
CWP21/12, CeMMAP.
Chesher, A., Rosen, A. M. and Smolinski, K. (2012). An instrumental variable
model of multiple discrete choice, Quantitative Economics . forthcoming.
Choquet, G. (1953/54). Theory of capacities, Annales de l’Institut Fourier 5: 131–
295.
Ciliberto, F. and Tamer, E. (2009). Market structure and multiple equilibria in
airline markets, Econometrica 77: 1791–1828.
Debreu, G. (1967). Integration of correspondences, Proceedings of the Fifth Berke-
ley Symposium in Mathematical Statistic and Probability, Vol. 2, University of
California Press, pp. 351–372.
Galichon, A. and Henry, M. (2006). Inference in incomplete models. mimeo.
Galichon, A. and Henry, M. (2011). Set identification in models with multiple
equilibria, Review of Economic Studies 78: 1264–1298. mimeo.
Gilboa, I. (2004). Uncertainty in Economic Theory. Essays in honor of David
Schmeidler’s 65th brithday., Routledge, London.
Grant, M. and Boyd, S. (2010). CVX: Matlab software for disciplined convex
programming, version 1.21, http://cvxr.com/cvx.
Hartigan, J. A. (1975). Clustering Algorithms, Wiley.
Hörmander, L. (1954). Sur la fonction d’appui des ensembles convexes dans un
espace localement convexe, Arkiv för Matematik 3: 181–186.
Horowitz, J. L. and Manski, C. F. (2000). Nonparametric analysis of randomized
experiments with missing covariate and outcome data, Journal of the American
Statistical Association 95: 77–84.
Horowitz, J. L., Manski, C. F., Ponomareva, M. and Stoye, J. (2003). Com-
putation of bounds on population parameters when the data are incomplete,
Reliable Computing 9: 419–440.
Imbens, G. W. and Manski, C. F. (2004). Confidence intervals for partially
identified parameters, Econometrica 72: 1845–1857.

Kaido, H. (2012). A dual approach to inference for partially identified econometric
models, Working Paper.
Kaido, H., Molinari, F. and Stoye, J. (2013). Inference for projections of identified
sets. manuscript in preparation.
Kaido, H. and Santos, A. (2013). Asymptotically efficient estimation of models
defined by convex moment inequalities, Econometrica . forthcoming.
Kolmogorov, A. N. (1950). Foundations of the Theory of Probability, Chelsea,
New York.
Manski, C. F. (1989). Anatomy of the selection problem, Journal of Human
Resources 24: 343–360.
Manski, C. F. (1995). Identification Problems in the Social Sciences, Harvard
University Press, Cambridge, MA.
Manski, C. F. (2003). Partial Identification of Probability Distributions, Springer
Verlag, New York.
Manski, C. F. and Tamer, E. (2002). Inference on regressions with interval data
on a regressor or outcome, Econometrica 70: 519–546.
Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set
estimates, Ann. Statist. 19: 1108–1142.
Matheron, G. (1975). Random Sets and Integral Geometry, Wiley, New York.
Molchanov, I. (1998). A limit theorem for solutions of inequalities, Scand. J.
Statist. 25: 235–242.
Molchanov, I. (2005). Theory of Random Sets, Springer, London.
Molchanov, I. and Molinari, F. (2014). Random sets in econometrics, Book
manuscript in preparation.
Molinari, F. (2008). Partial identification of probability distributions with mis-
classified data, Journal of Econometrics 144(1): 81–117.
Norberg, T. (1992). On the existence of ordered couplings of random sets — with
applications, Israel J. Math. 77: 241–264.
Polonik, W. (1995). Measuring mass concentrations and estimating density con-
tour clusters — an excess mass approach, Ann. Statist. 23: 855–881.
Rigollet, P. and Vert, R. (2009). Optimal rates in plug-in estimators of density
level sets, Bernoulli 15: 1154–1178.
Schneider, R. (1993). Convex Bodies. The Brunn–Minkowski Theory, Cambridge
University Press, Cambridge.
Tamer, E. (2003). Incomplete simultaneous discrete response model with multiple
equilibria, Review of Economic Studies 70: 147–165.
Tamer, E. (2010). Partial identification in econometrics, Annual Reviews of Eco-
nomics 2: 167–195.
ε2
{(1, 1)}
{(0, 1)}
−θ2
{(0, 1), (1, 0)}
−θ1 ε1
{(0, 0)} {(1, 0)}
Figure 2.1: The set of pure strategy Nash equilibria of a two player entry game
as a function of ε1 and ε2 .
hK (u)
u
K
Figure 2.2: The support function of K in direction u is the signed distance of the
support plane to K with exterior normal vector u from the origin; the distance
is negative if and only if u points into the open half space containing the origin,
Schneider (1993, p. 37)

Applications of Random Set Theory in Econometrics: Ilya Molchanov

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

Applications of Random Set Theory in Econometrics: Ilya Molchanov

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Applications of Random Set Theory in Econometrics: Ilya Molchanov

Uploaded by

Copyright:

Available Formats

Applications of Random Set Theory in Econometrics 1

Applications of Random Set Theory in

Department of Mathematical Statistics and Actuarial Science, University of

Bern, Sidlerstrasse 5, 3012 Bern, Switzerland; email:

ilya. molchanov@ stat. unibe. ch

Department of Economics, Cornell University, 458 Uris Hall, Ithaca NY 14850,

U.S.A.; email: fm72@ cornell. edu . Corresponding author.

Key Words capacity functional, Aumann expectation, support function, par-

of partial identification analysis.

RANDOM SET THEORY REVIEW . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Capacity Functional and Containment Functional . . . . . . . . . . . . . . . . . . . . . 8

Selections and Artstein’s Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

Aumann Expectation and Support Function . . . . . . . . . . . . . . . . . . . . . . . . 14

Limit Theorems for Sums of Random Sets . . . . . . . . . . . . . . . . . . . . . . . . . 16

APPLICATIONS TO IDENTIFICATION ANALYSIS . . . . . . . . . . . . . . . . 19

Sharp Identification Regions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Core Determining Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

Random Sets in the Space of Unobservables . . . . . . . . . . . . . . . . . . . . . . . . 28

Estimation of Level Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Support Function Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Efficiency of the Support Function Approach . . . . . . . . . . . . . . . . . . . . . . . . 38

Random set theory is concerned with the development of a coherent mathematical

Kolmogorov (1950), originally published in 1933. A systematic development of

the study in general equilibrium theory and decision theory of correspondences

and nonadditive functionals, as well as the needs in image analysis, microscopy,

development of statistical models for random sets, furthered the understanding

of their distributions, and led to the seminal contributions of Choquet (1953/54),

in several directions, developing its relationship with convex geometry, various

modern theory of random sets is available in Molchanov (2005).

More recently, the development within econometrics of partial identification

maintained assumptions do not suffice to uniquely identify the statistical func-

assumptions; in this article, this set of values is referred to as the functional’s

characterization of the sharp identification region, to provide methods for esti-

Conceptually, partial identification predicates a shift of focus from single val-

dom set theory as a mathematical framework to conduct identification analysis

Lack of point identification can generally be traced back to a collection of

assumptions. In many cases, this collection of observationally equivalent random

variables is equal to the family of selections of a properly specified random closed

which represent equilibria of the game.

ploited Aumann expectations and their support functions, (Choquet) capacity

In this article we begin with reviewing in Section 2 these basic elements of

random set theory. Then we review in Section 3 the econometrics literature

that has applied them for identification analysis. Econometrics applications to

statistical inference are discussed in Section 4. Section 5 concludes.

using comprehensive textbooks such as Molchanov (2005), from the perspective

of applications in econometrics (this goal is further developed in our book in

in econometrics on partial identification, and in microeconomics on decision the-

is beneficial but not essential.

2 RANDOM SET THEORY REVIEW

vectors and sets of parameter vectors, respectively by θ and Θ. We let (Ω, F, P)

algebra F and probability measure P. We denote the Euclidean space by Rd , and

equip it with the Euclidean norm (which is denoted by k · k).

sets of a locally compact Hausdorff second countable topological space K, see

Molchanov (2005). Unless otherwise specified, in this article we let K = Rd to

simplify the exposition. Denote by F, G and K, respectively, the collection of

closed, open, and compact subsets of Rd . Let Sd−1 = {x ∈ Rd : kxk = 1} and

Rd . Given a set A ⊂ Rd , let conv(A) denote its convex hull.