INFORMATION AND CONTROL
16, 123-127 (1970)
On,Decisions and Information Concerning
an Unknown Parameter
JAMES F. KORSH
University of Pennsylvania, The Moore School of Electrical Engineering,
Philadelphia, Pennsylvania 19104
Let 2 1 , )22 .... be a sequence of random variables whose finite dimensional
distributions depend on a random variable 0. We study the error probability
and equivocation of specific decision functions which are used to decide on 0
based on a sequence of n observations of the {Xn} process. In particular, we
show that if the process is ergodic for each value of O, the error probability
and equivocation go to zero as n goes to infinity. If the {X~}process is a Markov
chain with distinct state behavior for each value of O, then they approach
zero exponentially.
INTRODUCTION
Let X 1 , X2 .... be a sequence of random variables. Suppose their finite
dimensional distribution functions depend on a parameter 0 which is also a
random variable. I n this paper it is assumed that 0 may take one of M values,
say v i , v~ ..... vM, while the Xn have finite discrete distributions. Let Ps(~:~)
denote the conditional distribution function of ~:n = (X1, X2 .... , X . ) given
that 0 = vj and assume further that Pj(s¢l) is not identical with Ph(~l) for
M
h :/=j. Define H(O/~) to be 2j=1 P(v/~n)log P(vj/se~), the average uncertainty about 0 given the sample ~ . T h e equivocation about 0 is then
E[H(O/~)] where E denotes the expectation of the random variable H(O/~).
We assume specific decision schemes are used to decide on the value of 0
after observing ~n, and investigate the behavior of the error probability and
equivocation with n. I n particular, we show that if the {X~} process is ergodic
for each value of O, then the error probability and equivocation go to zero
as n approaches infinity. If the {X~} process is a Markov Chain with distinct
state behavior for each value of 0 then they approach zero exponentially
with n. This result generalizes that of Renyi for the case where the X~ are
independent and identically distributed given the value of 0. These results
123
124
KORSH
imply that the Bayes risk under a bounded loss function must approach 0
as n - + oo in the ergodic case, with this convergence being exponential in
the Markov Chain case.
General Case
By the lemma on page 16 of Feinstein, E[H(O/~+I)] ~ E[H(O/~n)]. Also,
since 0 may assume only a fixed finite number of values, H(O/~) is bounded
for all n and thus E[H(O/~n)] will be too. Consequently, t h e s e q u e n c e of
equivocations {E[H(O/~n)]} must converge as n approaches infinity. However,
it is not difficult to construct examples for which the limit is not zero when
the only assumption is that the (Xn} process is stationary under the condition
t h a t 0 = v 3for 1 ~ j ~ M .
Ergodic Case
Let Yk = [P~(Xk)/P~(Xk)] ~ for 0 < ~ < 1, Assume that 0 = v~ ; h @ j,
and the {Xn} process is ergodic under the condition that 0 = vh. By the
ergodic theorem, 1/n ~=1 Yk converges to Eh[Ya] with Ph probability one.
Here Eh denotes conditional expectation given 0 = vh.
Now Eh[Y1] = Zu PJ~(Y) P~-~(Y) = h~ which is a convex function of
for 0 ~ ~ ~ 1. At ~ = 0 and ~ = 1 it has the value 1 and, thus, assumes
a minimum value with respect to a that is less than one. Actually, h~ < 1
n
for all 0 < ~ < 1 by Holder's inequality. Thus, when 0 = v~, 1In Zk=l Y~
approaches h~ < 1 almost surely with respect to Ph f o r j ~ h.
Consider the following decision scheme: decide 0 = vh if
(a)
1/n ~=~ [Pj(X~)/Ph(Xk)] ~ < 1 for all j ~ h and
(b)
the h of condition a is unique.
This scheme yields a correct decision unless events a and b do not
occur when 0 = vn. Thus, P(error/O = vh)-~ P(A ~ or Be~O= vl~)
P(A~/O = vh) + P(B~/O = vh). We h a v e seen above that P(A~/O = vh)
approaches 0 as n --~ or. Now, since x -1 is convex for x > 0,
[ P~(XO
1"
tt[ P,(X~) "1~"~-~t
G t P~(X1) J = Ea t~t~-h(X~- J ]
1
----a~>l
for
[ P~(X~) 1~'t-~
) >~ {Eh L Pa(X1) J t
l=/:h.
Thus, again by the ergodic theorem, 1/n ~.,~=1[Ph(Xk)/Pz(X~)] ~' approaches
INFORMATION CONCERNING AN UNKNOWN
PARAMETER
125
a limit which is greater than 1 almost surely with respect to Pn ° Consequently, P(Bc/O = %) approaches 0 as n --~ oo.
Thus, the probability of an error under this scheme, which is
M
E P(O = %) e(error/O = %)
h=l
goes to 0 as n --~ oo. In fact, from some point on all decisions will be correct
with probability one. By the theorem on p. 35 of Feinstein it follows that,
E[H(O/~,~)] <~ --P~ log P~ - - (1 -- P~) log(1 - - P~) + P~ log(M - - 1), where
Pe is the probability of error of the above scheme. But Pe goes to zero as
n - + oo so that the equivocation must also. We, thus, have the following
theorem.
THEOREM 1. The above decision scheme yields a probability of error and
an equivocation which approach 0 as n approaches oo when the {Xn} process
is conditionally ergodic.
Marhov Chain Case
Consider the following decision scheme: decide 0 -= % if Ph(~.) > Pj(~n)
for all j :/: h. Then the probability of an error, P~, is given by
M
P(O = %) P(error/O = %).
h=l
Or,
Pj(~'~) >/ 1 f o r s o m e j 4: h/O = %I
Po = YM p(o = v ~ ) e 1 Ph(~.)
h=l
M
h=l
¢#z~
for
(\ Pn(~n) ]
By Markov's inequality,
M
(p~(~,,) ~,
P~ ~ ~, P(O = %) ~, E~ \ Ph(~n) 1"
h=l
j~h
0 <oe<
I.
126
KORSH
Now,
Eh \ Ph(~:~) ] = 2 PJ ( - ) Ph
(~:n)
fn
= 2 PJ"(xl) Pi-"(X~) ~. Ps"(X2/X~) P~-"(X2/X~)""
xl
X2
Dl-~t X IX
•" Y
x~
= a[B~(~) C~-lta~lrjh
V~JJ
with a all l's
under the assumption that the {Xn} process is a Markov Chain under the
condition that 0 = % . Here 1 Bjh(o~) is the vector Pj~(X1)P~-~(X1) and
C~n(O~)is the matrix whose i, kth entry is P~(xi/xk) P~-"(xi/xk) where x i and x k
range over the possible values of X n . Suppose for each x, Pj(y/x) ~ Pn(y/x)
for all y. Then maxz,je~ Ev Pfl(y/x)P~-~(y/x) < 1 for 0 < a < 1. Consequently, max~.3~min0<~<lAjh(~ ) = q is less than 1 where Ajh(c~) is the
largest eigenvalue of Cj~(e). Thus, Pe goes to 0 exponentially with q. That
is, there exists a constant A such that Pe <~ Aq n for all n.
Since E[H(O/~)] = ~ = 1 P(O = vh) Eh[H(O/~n)] it follows from the lemma
of Renyi that
M
,4
/P(o
=
M
= c 2 Y X/P(O
= vj) P(O = %) ~ Pj½(~n) P~(~n)
h=l j:C:h
M
<. C E ~ VF(O = vj) P(O = %) A'~ where
a=q}.
h=l jvSh
< ~ C ( M - - 1 ) An where
h<l.
Thus, we obtain the theorem below.
THEOREM 2. The above decision function yields a probability of error and
an equivocation which approach 0 exponentially with n when the {X~} process
is conditionally a Markov Chain with distinct state behavior.
1 The author would like to thank a referee for this version of the proof.
INFORMATION CONCERNING AN UNKNOWN PARAMETER
127
I n conclusion, it follows that Theorems 1 and 2 must apply to Bayes
decision functions also in the case of bounded loss functions. Consequently,
the corresponding Bayes risks will behave as the error probability of the
decision function specified here.
RECEIVED: January 20, 1969; revised: October 7, 1969
REFERENCES
1. A. FEINSTEIN,"Foundations of Information Theory," McGraw-Hill Book Co.,
Inc., New York, 1958.
2. A. RENYI, On the amount of information concerning an unknown parameter in
a sequence of observations, Publ. Math. Inst. Hung. Acad. Nci. 9, 617-625.
3. A. RENYI, "On the Amount of Missing Information and the Neyman-Pearson
Lemma," in Research Papers in Statistics, F. N. David (Ed.), John Wiley and
Sons, London, 1966.