Bayes and Minimax Solutions of Sequential Decision Problems
Bayes and Minimax Solutions of Sequential Decision Problems
Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .
http://www.jstor.org/page/info/about/policies/terms.jsp
.
JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of
content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms
of scholarship. For more information about JSTOR, please contact support@jstor.org.
The Econometric Society is collaborating with JSTOR to digitize, preserve and extend access to Econometrica.
http://www.jstor.org
The present paper deals with the general problem of sequential choice
among several actions, where at each stage the options available are to
stop and take a definite action or to continue sampling for more informa-
tion. There are costs attached to taking inappropriate action and to sam-
pling. A characterization of the optimum solution is obtained first under
very general assumptions as to the distribution of the successive observa-
tions and the costs of sampling; then more detailed results are given
for the case where the alternative actions are finite in number, the obser-
vations are drawn under conditions of random sampling, and the cost
depends only on the number of observations. Explicit solutions are given
for the case of two actions, random sampling, and linear cost functions.
CONTENTS
PAGE
Summary............................................................. 213
1. Construction of Bayes Solutions........................... 216
The Decision Function ........ ................... . 216
The Best Truncated Procedure........................ . .. .217
The Best Sequential Procedure........................ 219
2. Bayes Solutions for Finite Multi-Valued Decision Problems.............. 220
Statement of the Problem................... 221
Structure of the Optimum Sequential Procedure...................... 221
3. Optimum Sequential Procedure for a Dichotomy When the Cost Function
Is Linear..... . . . 224
A Method for Determining g and ? .................................... 226
Exact Values of _ and 0 for a Special Class of Double Dichotomies .... 228
4. Multi-Valued Decisions and the Theory of Games....................... 230
Examples of Dichotomies ................... . 230
Examples of Trichotomies................... . . 234
5. Another Optimum Property of the Sequential Probability Ratio Test.... 240
6. Continuity of the Risk Function of the Optimum Test .................. 242
References....... . . . . . . . 244
SUMMARY
priori distribution, then &L(u, a) = R(a) is the expected loss from action
a, and any action, or randomized mixture of actions, which minimizes
R(a) has been called by Wald a Bayes solution of the decision problem,
corresponding to the given a priori distribution of u.
Now suppose there is a sequence x of chance variables xl, X2,
whose joinlt distribution is determined by u. Instead of choosing an
action immediately, the statistician may decide to select a sample of
x's, as this will yield partial information about u, enabling him to
make a wiser selection of a. There will be a cost CN(X) of obtaining the
sample xl, ... , XN and, in choosing a sampling procedure, the statisti-
cian must balance the expected cost against the expected amount of
information to be obtained.
Formally, the possibility of making observations leaves the situation
unchanged, except that the class A of possible actions for the statistician
has been extended. His action now consists of choosing a sampling pro-
cedure T and a decision function D specifying what action a will be taken
for each possible result of the experiment. The expected loss is now
R(T, D) = l(T, D) -1- c(T), where l(T, D) is the expected value of
L(u, a) for the specified sampling procedure and decision rule, and c(T)
is the expected cost of the sampling procedure. A Bayes solution is now
a pair (T, D), or randomized mixture of pairs (T, D) for which R(T, D)
assumes its minimum value.
The minimizing T = T* has been implicitly characterized by Wald,
and may be described by the rule: at each stage, take another observation
if and only if there is some sequential continuation which reduces the
expected risk below its present level. The main difficulty here is that
various quantities wvhicharise are not obviously measurable: for instance
if the first observation is xi , we must compare our present risk level, say
wI(xi), with z(xi) = inf w(xi , T, D), where w(xi , T, D) is the expected
risk for any possible continuation (T, D); we take another observation
if and only if w1 > z. It is not a priori clear that z will be a measurable
function of xl, so that the set of points x1 for which we stop may not be
2
measurable. Actually, z always is measurable, as we shall show.
A characterization of the minimizing T = T* is obtained for hy-
potheses involving a finite number of alternatives under the condition
of random sampling. It consists of the following: We are given 1Khy-
potheses Hi (i = 1, 2, - *, ic) which have an a priori probability gi of
occurringr,a risk matrix W = (wij) wvherewij represents the loss incurred
in choosing Hi when Hi is true, and a function c(n) which represents the
cost of taking n observations. It is shown that for each sample size N,
there exist lo convex regions S* in the (i, - 1)-dimensional simplex
spanned by the unit vectors in Euclidean k-space whose boundaries de-
pend on the hypotheses Hi, the risk matrix W and the cost function
CN(n) -c(N + n) - c(n). These regions have the property that if the
vector J(N) whose components represent the a posteriori probability
distribution of the k hypotheses lies in S* , the best procedure is to accept
Hj wvithoutfurther experimentation. However, if s(N) lies in the com-
plement of i1 S , the best procedure is to continue taking observations.
At any stage, the decision whether to continue or terminate sampling
is uniquely determined by this sequence of k regions and moreover this
sequence of regions completely characterizes T*.
A method for determining the boundaries of these convex regions is
given for k = 2 (dichotomy) when the cost function is linear. It is shown
that in this special case, T* coincides with Wald's sequential probability
ratio test.
The minimax solution to multi-valued decision problems is considered
and methods are given for obtaining them for dichotomies. It is shown
that in general, the minimax strategy for the statistician is pure, except
when the hypotheses involve discrete variates. In the latter case, mixed
strategies will be the rule rather than the exception.
Examples of double dichotomies, binomial dichotomies, and tri-
chotomies are given to illustrate the construction of T* and the notion
of minimax solutions.
It may be remarked that the problem of optimum sequential choice
among several actions is closely allied to the economic problem of the
rational behavior of an entrepreneur under conditions of uncertainty.
At each point in time, the entrepreneur has the choice between entering
into some imperfectly liquid commitment and holding part or all of his
funds in cash pending the acquisition of additional information, the
latter being costly because of the foregone profits.
2 The possibility of nonmeasurability is not considered in [11 or [4].
TheDecisionFunction
We have seen that the statistician must choose a pair (T, D). It turns
out that the choice of D is independent of that of T:
LEMMA: There is a fixed sequence of decision functions Dm such that
This will be the main result of this section. It follows that the expected
loss from a procedure T may be taken as w(T), since this loss may be
approximated to arbitrary accuracy by appropriate choice of Din, and
a best sequential procedure T* of a given class will be one for which
w(T*) = inf w(T) where the inf is taken over all procedures T of the
class under consideration.
We are considering, then, a chance variable u and a sequence x of
chance variables x1, X2, *- - . A sequential procedureT is a sequence of
disjunct sets So, Si, . .., SN, *. , where SN depends only on xi, * -
XN and is the event that the sampling procedure terminates with the
sample xi, ... , xN ; we require that I:=O P(SN) = 1. SO is the event that
we do not sample at all, but take some action immediately; it will have
probability either 0 or 1.
A decision function D is a sequence of functions do, di(xi), *-,
dN(x1, *
. , XN), *. , where each dN assumes values in A, and specifies
the action taken when sampling terminates with xl, ... , XN. We ad-
mit only decision functions D such that L[u, dN(x)] is for each N a
measurable function.
PROOF OF LEMMA: The loss from (T, D) is G(u, x; T, D) -
L[u, dN(x)] + CN(X)for x e SN, and &G = R(T, D). Here, cN(x) depends
only on xi, , XN. Then, denoting by UN the conditional expectation
..
(c) rN ) &Nrn if n ) N.
Now define dNm inductively as follows: dNl =l ; dN,m = d'm for those
values of x such that &NL(u, d'km) > cNL(ut, dN,m-l), otherwise dNm =
dN,m-l . Then certainly (a) holds, so that lirm SNL(u, dNm) = rN(x)
m-~c
exists. Also &NL(it, dNm) < tNL(it, dNm) so that &rN= r. Choose any
dN and any 6 > 0, and let S be the event ('NL(u, dN) < rN(x) -
Then, defining d*m = dN on S, d*m = dNm elsewhere, we have
and, using (b), that R(T, D) ) w(T) for all D. Thus we have reduced
the problem of finding Bayes solutions to the following: we are given a
sequence x of chance variables x1, x2, and a sequence of non-
.
risk with the observations x1, '* *, XN--1. We can then decide, on the
basis of N - 2 observations, whether the (N - l)st is worth taking by
comparing the present risk, WN-2, with N--2aN-1, the attainable risk if
XN-1 is observed. Continuing backwards, we obtain at each stage an
expected attainable risk ak for the observations xi, , Xk, and a
description of how to attain this risk, i.e., of when to take another
observation. This is formalized in the following:
THEOREM: Let xl, , XN; wo, * * , WN be any chance variables,
Wi = W,(X , ''', Xi). Define aN = WN , aj = min (wi , 6jaj+) for j < N,
Si = {wi > aifor i < j, wi = ai}. Then for any disjoint events Bo, ,
BN , Bi dependingonly on xl *,, xi,
, No P(Bi) = 1, we have
N N
fJ wjdP EI widP.
-=0 i-0 B
PROOF:We shall show that, for fixed i and any (x1, .. , xi)-set A,
(1.3) E
>ji
J ASi
a dP = E
i>i
Jai
AS;
dP,
and that, for fixed j, and any disjoint sets A i, , AN with Ai depending
only on xl,, xi and --ij+i Ai depending only on x , .., xi,
(1.5)
i-j=O
A c q-dP ]J aidP.
BiSj i,jO BiSj
aidP idP +
dP fi dP
L
j> Si SAji + A(si++-.. '+SN)
f afi
dP = I - '+AN
aidP +
- '+AN
ai+l dP
i>j i+l-' iAj+l+-
a+l dP + E f a+ dP,
fAi+1 ii>+l
> + dA fA
where the inequality is obtained from the fact that always ai < &jia+il.
An induction backward on j now completes the proof of (1.4).
The Best Sequential Procedure
We are given now a sequence of functions wo , w , ** N, * * * , where
WN = rN(xl , .. , XN) + CN(Xl , ... , XN). The sequence rN(x) is uniformly
bounded, since we supposed the original loss function L(u, a) to be
bounded, and we have shown that rN > S&Nr for n > N. We shall
suppose that CN(X) is a nondecreasing sequence, CN(X) -> oo as N -> o
for all x. We now construct a best sequential procedure.3
The best sequential procedure is obtained as a limit of the best trun-
cated procedures given in the preceding section.
We first define aNN = WN, aON = min (wi, &jaj+1,N), SN = {Wi > aOi
for i < j, w = aN }. For fixed j, ajN is a decreasing sequence of functions;
- Then ai min (wj, &,air+). Define Si =
say aiN -* ca as N -c. =a
{wi > ai for i < j, , = aj}. We shall prove that T* = {Si} is a best
sequentill procedulre, i.e., T* is a sequential procedure, and for any
sequential procedure T = {B,},
w(T-) - >7]? dP
o I w,i =dP w(T).
j =O ?=0JR
Now
dP + twNdP +
A
z=0
f ti '; f NA
dP w(TN)
E
=0 f
SiN
Wj (
for all K. Then
3 The
assumption made here is somewhat weaker than Condition 6 in [1],
p. 297. The only other assumption made, that L(u, a) is bounded, is Condition
1 in [11, p. 297.
, f w dP < w(T),
1-0 ij
letting N -> oo, and using Lebesgue's convergence theorem and the easily
verified fact that the characteristic function of SJN approaches that of
Si, and w(T*) < w(T).
It remains to prove that T* really is a sequential test, i.e.,
N-O
P(SN)= 1.
oo
ao N E cm dP m c dP
CSQm+N ( ' ++mN)
°ao T
cm dP >
A
c dP
Am
aON = widP.
J-O SN
an(d
k h 1 k
(2.15) T(q2 =) L |jcQ) T] + L
i=1
W
gwjPj ___ g2iwij.
9
J=1
We are given two alternative hypotheses H1 and H2, which, for the
sake of simplicity, we assume are characterized respectively by two
probability densities fj(x) and f2(x) of a random vector X in an R dimen-
sional Euclidean space. (If X is discrete fJ(x) and f2(x) will represent the
probability under the respective hypotheses that X = x). We assume
that the a priori probability of H1 is g and that of 1I2 is 1- g, where g is
known. (Later we shall show how to construct the minimax sequential
procedure whose average risk is independent of g). We are also given two
nonnegative numbers W12and W21where wij(i # j = 1, 2) represents the
loss incurred in accepting Hi when Hi is true. In addition we shall assume
that the cost per observation is a constant c which, by a suitable change
in scale, can be taken as unity. We also assume that the observations
taken during the course of the experiment are independent. We define
Pin = I|flifi(xi) and P2n = fl%%1f2(xi) where xi , X2 x,, represent
the first, second, etc., observation.
If we apply the discussion of Section 2 to the dichotomy under con-
sideration we see that the convex regions S*(i = 1, 2) reduce themselves
to two intervals, A1 and I2 where 1A consists of points g such that
O< g s g and 12 consists of points g such that g < g < 1 where g < 0.
Moreover, in view of the assumption of constant cost of observations,
the boundaries i and g of these two intervals are independent of the
number of observations taken but depend only on w12and W21 (and, of
course c, which is taken as unity).5
The intervals I, and I2 have the following properties: If the a priori
probability g for H, belongs to ,1X then there exists no sequential pro-
cedure which will result in a smaller average risk than the risk R1 = Ws2g
of accepting 112 without further experimentation. If the a priori proba-
bility g for Hi belongs to '2, there exists no sequential procedure which
will result in a smaller average risk than the risk R2 = w21(l - g) of
accepting H1 without any further experimentation. However, in case
6 It is assumed here that the intervals are closed; this assumption has not yet
been justified. It will be shown below (Section 3) that it is a matter of indiffer-
ence whether the endpoints are included or not.
(3.2) gn ggp + -g
-
gPi r(1 g)P2nj
We now define
By symmetry, we get
(3.12) gRJ(T*) + (1 - g)R2(T*) = W21(1-
where the symbol {y} stands for the smallest integer greater than or
equal to y. Let xi1, x12, * , be a sequence of observations obtained
from 7rwand X21, X22, , a sequence of observations obtained from r2.
We continue sampling as long as -b < Z:=l (X2i - xli) < a. We termi-
nate sampling as soon as for some sample size n either ZX1 (x2i - xli) = a
or -=1 (x2i - xli) = -b. In the former case we accept Hi , in the
latter case we accept H2.
Let L(a, b I 1i) = P [Z=1 (x2i - x1) = a] when HI is true and let
&(n I a, b; HII)be the expected number of observations required to reach
a decision when IA is true. Then, without any approximation (see [3])
we have
a+b _b
j a, + b)L(a, b I H2) - a
(3.17) &(n b; H2) -(a
P2U1 - pl q2
(3.18) where u = P2 ql
pl q2
If we now let g = a = 0
9, then Tg is defined by the boundaries
and b = a. Hence the average risk of going on with the optimum pro-
cedure when g = g is given by
R(g l T*) = 1 + gp1q2{6f(nI1, - 1; Hi)
(3.21) + w12Fl - L(1, a - 1; H1)]} + (1 - O)w2l(1 - p2q,)
-
(3.23) a so b (1 ).
'1logpi1q2J
p2q
If the resulting quantity has a value equal to the guessed a, the computed
g and g are correct. If not, repeat the process.
Equations (3.20) and (3.22) can also be used to compute w12andw21
for given values of g and g. This can be accomplished as follows. For the
given g and g, compute a from (3.23). Set J = a in (3.20) and (3.22)
and solve for w12and w21.
The average risk of the optimum sequential procedure under con-
sideration can be computed as a function of g as soon as g and g are
determined and is given by
(3.24) R(g I T*) = gC(n a, b; H1) + (1 - g)&(n I a, b; H2)
where each xi takes on the value of 1 with probability pi and -1 with probability
, = 1 - pi(i = 1, 2).
3 /
C-l, Wit' W21, 10
R (9)
MINIMAX STRATEGIES:
FOR NATURE: 13 49\
FOR THE STATISTICIAN: SEOUENTIAL
PROBABILITY RATIO TEST (o,b Ill) \
o _I | | 1I X I I
0 .1 .2 .3 .4 .5 I.6 .7 .8 .9 LO
30G ro
9
FIGURE 1
R (g)
MINIMAX STRATEGIES:
FOR NATURE: g9 .
FOR THE STATISTICIAN! SEQUENTIAL
PROBABILITY RATIO TEST (I,b) (i1,i
WITH FREQUENGY 3 , AND ACCEPTINGH2
WITH NO OBSERVATIONS WITH FREQUENCY4
O . .2 .3 .4 .5 .6 .7 .8 .9 1.0
F _R 2 25
FIGURE,2
R (g)
g
FIGURE 3
II8-
I6 -
-VERY SHORTSEGMENT
AVERAGE MINIMAX RISK 14.642
I4 - (3,4) (4 4)
5) .
5?
I2-
2-·2~ ~ ~~^y
(2,61)
/ I'Z (5,3)
O - .
,/ <-t ~ THE AVERAGE M;NIMAL RISK AS A FUNCTION
OF THE A PRIORI DISTRIBUTION g FOR A
<6,2)
R (g) /
~/ ~ ~oDICHOTOMY
WITH HI p'-l,
..C., W,2 50. WHI'100
H2: t'-,
VERY SHORT
/ SEGMENT W1T
/ (a, b) 7:(, (7,1)
(TYP)
GSc~~ -~"~~
~MINIMAX STRATEGICS.
/
j~~~~/ ~ FOR NATURE' 9*- 62175
_.~~ ,~~/
J~~~~~/ ~~ FOR THE STATISTICIAN SEJQUENTIAL
PROBABILITY RATIO TESTS (o,b) (3,4)
/AND (4,4) WITH FREQUENCIES
4 - 58682 AND 41318 RESPECTIVELY
/ \
oX
0 4
3 2 3 5 6 7 8 9 LO
' 17045 ' .92990
4
FIGURE
FIGURE 4
Examples of Trichotomies
EXAMPLE 1. Assume that the random variables xl, x2, * * are inde-
pendently distributed and all have the same distribution. Each x, takes
on only the values 1, 2, 3 with probabilities specified by one of the fol-
lowing alternative hypotheses:
-Typothesis Event
1 2 3
H, 0 4 2
H2 2 0 2
H3 2 ½ 0
Let w¢i be the loss if Hi is accepted when Hi is true; the values are
given by the following table:
State of Nature Hypothesis Accepted
H1 H2 H3
H1 0 4 6
I, 6 0 4
H3 4 6 0
Note that both of these matrices are invariant under a cyclic permuta-
tion of the hypotheses and events. Finally, assume that the cost of each
observation is 1.
Let gi be the a priori probability of HIi. An a priori distribution
(t= (g 2 , 3), w\ith gi + g + g3 = 1, may be represented by a point
in an equilateral triangle with unit altitudes; the distances from the point
to the three sides are the values of g , g2, and g3. Pi is the point where
gi = (i = 1, 2, 3).
Let R(j i T') be the average risk under sequential procedure T when
the a priori probabilities are gi, g2, g3. Let To be the best sequential
procedure where no observations are taken; let Si be the region in g-space
where Ii is accepted under To. Let L,(j) be the loss in accepting HI
when the a priori distribution is yj.Then
3
That is,
S2 S2
\
A/
/2 P\ 3 2 P3
FIGURES 5 and 6
(4.4) gii = 3
gkPki
k-i
tially all that is needed to determine all the tests T*(g). Further, the
regions S* are convex sets whose boundaries are characterized by the
relations
(4.6) S* C S'.
It is first necessary to find the optimum tests for each of the dichoto-
mies formed by taking pairs from the trichotomy H1 , H2 , H3 . Consider
the dichotomy H1, H2 . Then gi + 92 = 1. Suppose gi is such that it pays
to take at least one observation. From (4.4), since Pi3 = 1/2(i = 1, 2),
(4.7) 9i3 = 9i
+ 92
ig
=
91 +
W12Y12 = 2, or U12 2=
(4.8) W21(1 - p12) = 2 or g12 -
(4.9) 91 9
g2, 1g 292, 92<91<292,
+D R
(4.14) g2 +_I? (,) + 92
(fj) -R
2 Li7).
2
3.-1) 2
From (4.13-14), (4.12), and (4.2),
+
1 g4+ 4g )
692 + 4ga,
2 +g1 +g3
or
(4.15) 92= 3
The intersection of (4.15) with the line 92 = 293 occurs at the point
(1/2, 1/3, 1/6), which satisfies the conditions (4.13) and so lies on the
boundary of the region R1 . As R1 is convex, it follows that the boundary
of S* actually does intersect R1and there coincides with the line segment
joining (2/3, 1/3, 0) and (1/2, 1/3, 1/6). The latter point satisfies also
the conditions
Let R2 be the region defined by (4.16). Then we can find as before the
intersection of the boundary of S with R2 ; the boundary hits the line
92
= 93 at the point (3/7, 2/7, 2/7), which point lies in R2 . Hence the
boundary of S* actually does intersect R2 and there coincides with the
segment joining (1/2, 1/3, 1/6) to (3/7, 2/7, 2/7). If we continue this
method, it can be shown that Si is bounded by the polygon with vertices
(2/3, 1/3, 0), (1/2, 1/3, 1/6), (3/7, 2/7, 2/7), (2/5, 1/5, 2/5), (1/2, 0,
1/2), and (1, 0, 0). It is easily verified that S* is actually a subset of S ,
as demanded by (4.6). The vertices of the polygons bounding S* and S*
can be obtained by cyclic permutation of the coordinates.
For any given j, the regions S , S* , and S completely define the
optimal procedure. It remains to find the minimax procedure.
As shown by (4.12), the maximum conditional expected risk given
that xi = 1 is 3, and this occurs when g3 < g2 293. Similarly, the maxi-
mum conditional expected risks given that xi = 2 and 3, respectively,
are both equal to 3, and they occur when gi9 93 22g , g2 9gi 292,
respectively. Any g* satisfying these three conditions will be a least
favorable a priori distribution; clearly, the only set of values is gi = 92 =
93
= 1/3. If xl = 1, the corresponding a posteriori distribution is
(0, 1/2, 1/2). This is on the boundary of S , so that the optimum pro-
cedure is to stop after one observation and choose H3 . In general, then,
the minimax procedure is to take one observation, stop, and accept H3
if xi = 1, H1 if xi = 2, and H2 if x = 3. The risk associated with this
test is 3, of which the cost of observation is 1, and the expected loss due
to incorrect decision is 2, independent of the true a priori distribution.
It may be of interest to note that the minimax test is not unique, the
(1,0,0)
C =l
(4%o
/O 4 6\
W· 6
604A 0 (3 3 \3
\I6I0/
({4 ^-,~o)
((2\.
~~~~H2
H3
'0(o-,.f)
(0',I,) (0o, . ) ',,'
FIGURE 7
3 3~ ~ ~~~~~~~S
FI GURES 8 and 9
But the inequality (5.3) must hold for all values of g, 0 < g < 1. Hence
from continuity considerations we must have
and
Let rjobe any a priori distribution for which goi > 0 for all i, and choose
0 < 6o < min goi . Let G be the region in g-space for which gi -
goi I 6o;
o then gi > 0 for all i and all y in the compact set G. Let To
be the test referred to in the hypothesis.
(6.2) stup in f R T) < sup R(yI To) = K < + ,
g, G Tf2 gEG
the last inequality following since R(yJ To) is linear and hence con-
tinuous on the compact set G.
Let I' be the subelass of Y for which inf R(j I T) < K + 1. As Y'
EG
is a subset of X, inf R(]j I T) ) inf R(J I T). Suppose for some j' in G,
IT).
inf R(y' 7') > inf JiSQj'
Te^'
Then
positive lower bound because of the compactness of G and the fact that
9, > 0 for all g in G. As i takes on only a finite number of values, gt has
a positive uniform lower bound. Then (6.6) implies that Ai(T) is bounded
from above uniformly in i and T. Let C be this upper bound.
Choose any a < bo, and any g such that E gi -oi I <a.