13
IEEE
TRANSACTIONS
ON
INFORMATION
THEORY,
VOL.
IT-14,
NO.
1,
JANUARY
1968
On a Class of Processes Arising in Linear
Estimation Theory
IAN F. BLAKE,
MEMBER,
IEEE, AND
Abstract-This
paper considers a class of stochastic processes,
called spherically invariant,which
have the property that all meansquare estimation problems on them have linear solutions. It is
shown that their multivariate characteristic functions are univariate
functions of a quadratic form. The corresponding
densities are
easily found by means of the Hankel transform. Relations between
spherical invariance and normality are discussed. Properties relating
to the linear estimation problem are given.
I.
INTR~DT~~TIoN
I[
‘T IS WELL KNOWN that all mean-square estimation problems on normal processes have linear solutions and that normal processes are closed under
linear operations. In an interesting paper by Vershik,rl’
it is shown that these two properties do not uniquely
characterize the normal process. They do, however, characterize the class of spherically invariant processes. Vershik’s principal interest in using these processes was to
characterize normal processes, and he showed that, if
a spherically invariant process is ergodic, then it is also
normal.
It is interesting to examine the class of spherically
invariant processes further. In particular, their characteristic functions and their relations to linear estimation
theory and normal processes will be discussed in some
detail.
A random process may be defined as an indexed set of
E T) together with all finite
random variables {zt,
dimensional distribution
functions of arbitrary collections of the zti,
E T. In practice, however, relatively
few processes are defined in this manner, the major exception being the normal process. One of the difficulties
with this method of definition is the mathematical intractability of general n-variate functions. For spherically
invariant processes, however, characteristic functions of
all orders are simply defined and mathematically tractable
in that their corresponding density functions are easily
found.
It is noted that the approach of defining a class of
processes for which a given system is optimal, as is used
here, is identical to that of Balakrishnan.“’
His results
are not relevant to the specific linear problem being considered here.
ti
t
Manuscript
received April 6, 1967; revised August 23, 1967.
This work was supported in part by the National Science Foundation
under Grants GK-187 and GK-1439 and by the U. S. Army Research
Office, Durham, N. C., under Contract DA-31-124ARO-D-292.
I. F. Blake was with the Dept. of Elec. Engrg., Princeton University, Princeton, N. J. He is now with the Jet Propulsion Laboratory, Pasadena, Calif.
J. B. Thomas is with the Dept. of Elec. Engrg., Princeton University, Princeton, N. J.
JOHN B. THOMAS,
FELLOW,
IEEE
II. PRELIMINAEIES
All random variables considered will be assumed realvalued with finite variance. In the. space H of real random
variables which are square integrable over some measure
space, the metric is defined as
the inner product as (2, y) = E[zy], and the norm as
]]x]] = {E[x’] j”‘. With these definitions, H is a Hilbert
space. Random variables differing only on sets of probability zero in the measure space are taken as identical.
Given a set of random variables [x1, . -. , x,] E H the
set of elements of the form
(with finite variance) is the linear manifold spanned by
the given set. If, in addition, the presence of the sequence
{g,,) in the manifold implies that y = Ii%,,
gn is also
in the manifold, the manifold is said to be closed. In
terms of these concepts, the solution’61*‘81 to the problem
of estimating x0 by a linear combination of x1, . . . , 2,
is simply the projection of x0 on the linear manifold
spanned by x,, . . . , x,.
The concept of semi-independence of random variables,
as introduced by Vershik,[”
is of considerable interest
and can be stated as follows.
Dejkition:
Two random varibles, x1 and x2, will be
called semi-independent if
E[x, 1Xj] = E[xJ;
i,j=1,2;i#j.
This concept lies between that of two random variables,
being uncorrelated and that of two random variables,
being independent. It is entirely equivalent to each random
variable being uncorrelated with- an arbitrary function
of the other,“’ i.e.,
E[xif(xi>l = EIXxi)E[xi I xil1
i
= -eil~[f(~i:i)l;
# j;
i, j = 1, 2.
A well-known theorem (e.g., Ferguson[71 and Balakrishnan121) which is useful in linear estimation considerations is the following.
Theorem 1: A necessary and sufficient condition for
is that
BLAKE AND THOMAS:
PROCESSES IN LINEAR
ESTIMATION
13
THEORY
It follows immediately
g- qu,, . . . , 4
0
from (3) that
QjLUw, hvd(a,bi - uzblbz)= @L(blv2,b2v2)(a1b1b2
- a&).
w-hem
Let u1 = b1v2,uz = b2v2.Then
quo, .
, 24 = E[ew
(j gwi)].
or
Notice that if E[3c0 1 x1, . . . , x,] = ET-1 a,xj, then the
random variable
which implies (see the Appendix
is semi-independent of every random variable in the
manifold generated by x1, . * . , x,. In the case where the
&I, *** , 2, are normal random variables, z0 - c;=l ajzi
is independent of x1, . . . , x,.
III.
SPHERICALLY
INVARIANT
Pnocnss~s
While the results of this section follow naturally from
Vershik,“’ the approach is considerably different. Greater
use is made of the characteristic function, which results
in certain simplifications. The following lemma will serve
to motivate later results.
Lemma 1: Consider the linear manifold H generated by
two random variables x1 and x2, where E[z,] = E[Q] = 0,
E[z:] = E[x& and E[x,zJ = 0. If, for any two-random
variables y1 and yz in H
HYl I Yzl = w2
(2)
then the joint characteristic function tpz(uI, u,) of x1 and
x2 is a function of 26: + uz only.
Proof: Represent the random variabIes y1 and y2 by the
equn)tions
Now attempt to find conditions on ap,(uI, u,) such t(hat
(2) is true for every a,, az, b,, and b,.
The joint characteristic function @.s(~,,v2) of y1 and yz
must satisfy (1) in Theorem 1. Since
then this implies that, from (1) and (2),
a I1d
for justification)
that
i.e., the function is constant on circles in the ul, uz plane.
It is noted that the normal characteristic function is such
a function. Thus in the linear manifold spanned by x,, za
imposing the condition that all conditional expectations
be linear leads to a characteristic function with argument
cu: + 24.
The following definition of a spherically invariant set
of random variables is due to Vershik.“’
Definition: Let H be a linear manifold generated by
some set of random variables { zrn1. If all random variables
in H which have the same variance have the same distribution function, we call the set {x,1 sphericaZZy invariant. (If the (z, 1 are arbitrary samples from a random
process {zt, t E T}, this defines a spherically invariant
random process.) The linear manifold H generated by a
spherically invariant set of random variables is itself
called spherically invariant.
It is easy to use this definition to find the multivariate
charact,eristic function of an arbitrary collection of spherically invariant random variables. The result is contained
in the following lemma.
Lemma 2: The multivariate characteristic function of an
arbitrary collection of n random variables yl, * . . , Y,,
contained in a spherically invariant manifold H is of the
form
where E[y;;ll$] = -uiuiq”(0)pii,
and all random variables
are assumed to have zero mean.
Proof: Consider an arbitrary linear combination of the
yi’s, c;=1 a,yi. The variance of this combination is
CT=, c;=l a,a,E[y,y,] and its characteristic function is
@(VUl, * * * , ~a,). From the definition of spherical invariance, this characteristic function is constant for all
sets of (a,) such that CT=, c;=l a,a,E[y,y,] = constant.
It follows that
and setting vai = ‘ui yields the result stated.
Notice from this lemma that, if two spherically invariant random variables are uncorrelated, then they are
also semi-independent as may be seen easily by using
Theorem 1. Vershik, ‘I1 in his Lemma 2, shows that this
14
IEEE TRANSACTIONS
property uniquely characterizes spherical invariance in
the following sense: If, in a linear manifold H, whenever
two random variables are uncorrelated they are also semiindependent, then H is spherically invariant.
Consider now the problem of linear estimation. It is
well known that the best mean-square estimator of z.
given zl, . * * , 2, is the conditional expectation E[z, 1 x1,
s . . z,]. If the conditional expectation is a linear function ’ of the conditioning random variables, then that
particular estimation problem has a linear solution. More
generally, however, a time series (either discrete or continuous) is given, and the interest is in estimating a
random variable, given a portion of the time series. To
show that such a problem has a linear solution it is sufficient to show that the conditional expectation E[xo 1 xl,
. . . > x,] is a linear function for arbitrary n and for any
of the time
Xl,
***
, x, chosen from the given portion
series. Part of the justification for this is the fact that
(see Doobf6’)
ON INFORMATION
THEORY,
JANUARY
1968
solution for the ai’s ,which exists when e is nonsingular,
is unique. Therefore
and estimation is linear.
2) Suppose every estimation problem in H is linear and
that there exists a finite orthonormal basis yl, . . . , y,,
for H. Then the random variables y1 and x1 = x:=, aiyi
are uncorrelated and have equal variance if x1=, a: = 1.
By the assumption of linear estimation and the results
of Lemma 1, it is clear that
+e(u,, u2a2, -. . , u2an> = 44
+ t&at + * +* + $1)
where @(ul, . . . , UJ is the multivariate characteristic
function of the basis elements y,, * * * , y,,. The same
result is true for all sets of constants {ai) such that
x:-2 a: = 1. A s b ef ore, setting ua, = 21,yields the equation
NVl, . . . , v,) = p(v; + *** +v3.
In the following theorem, by “every mean-square
estimation problem on the process” is meant every meansquare estimation problem on the linear manifold generated
by the given process. The following theorem was first
proved by Vershik.“’ A different method of proof is given
here, using the multivariate
characteristic function introduced previously.
Theorem 2 (Vershik) : Every mean-square estimation
problem on a process has a linear solution if and only if
the process is spherically invariant.
Proof: Consider a process {x1, t E T), E[x,] = 0,
E[xf] = d* and the linear manifold H spanned by it.
1) Suppose H is spherically invariant. Consider any
n + 1 random variables x0, . . . , 2, in H. It is necessary
to show that
The form of this characteristic function is clearly closed
under arbitrary linear transformations
and the desired
result follows.
The class of processes for which all estimation problems
on the manifold generated by the process are linear has
been characterized. However, there are many examples
of nonspherically invariant processes for which particular
estimation problems are linear. A good example of this
is the discrete ‘(linear process” for which the optimal
predictor is linear, as shown in Wolff et a1.16’
IV. THE CHARACTERISTIC
FUNCTION
It has been shown that if x1, . . * , x, are spherically
invariant and have zero means, then their multivariate
characteristic function is of the form
E[x, 1x1, - - . , x,,] = 2 ajxi.
i-1
(4)
As previously shown, the joint characteristic function of
the n + 1 random variables is of the form CP(U’~U)where
d2 = --a”(O) and E[xix,] = d2pii. Using (1) in Theorem 1
$
0
~(l.l’@U)
uo-0
= 2 2
pojujw(u’@u)
i=l
%X=0
and
where u indicates the column vector of uO, . . * , u,,. For
these two equations to be equal, we must equate coefficients of uj, and this yields
poj
=
2
aipii
j = 1, ..a ,n.
i-l
This is the same equation which the ai’s must satisfy
when estimating z0 by a linear combination of the xi’s,
i = 1, . . . ) n; and hence, by the projection theorem, a
where E[xixi] = -aiaipi@“(0).
It is often of interest
to find the corresponding multivariate probability density function. First, consider the case where the xi are
uncorrelated and have equal variance. The required solution is contained in the following well-known theorem of
Bochner. N’
Theorem 3: If I+!(($ + * * . + ~3”~) is absolutely integrable, then
. exp
-j
2
UiXi
>
CZU,
. . . du,
is a function of r = (XT + * a. + xz)*‘* and can be expressed by the single integral
P(r> = KW n’2r(n-2)‘2)-1 Jy ~(x)xn’2J(,-2,,,(xr)
cix
(5)
BLAKE AND THOMAS:
PROCESSES IN LINEAR
ESTIMATION
THEORY
where J,(t) denotes the Bessel function of the kth order
and X = (u: + * . . + u:)I’~.
Note that this theorem involves arguments of the
radius, while previous equations used arguments of the
radius squared. This was for convenience only, the relationship between a($ + . . . + ~2) and #((u: + . . . + u:)““)
being clear.
Theorem 3 may be used in the following manner to
find the transform of (4). Write the transform as
Ph,
*** ,x3
= -(2:y s En 9WeuY2)
- exp
-j
2
UiXi
i-1
ClU,
l
-
*
Oh,;
(61
>
Assume e is symmetric, nonsingular, and positive definite
and let u = Av with A chosen so that
A’eA = I.
Then
du = /Al dv
and
IAl = /PI-“”
and
-1
P = AA’.
where
Pb
9- * . , x,) = (lpl)-““f(?q,
r = (x’p-‘Xy
and
$(A) = g
X[r(n-1)‘2f(r)])
x = (u’@l)““.
Consider the problem of defining a stochastic process
by all orders of distribution functions or characteristic
functions. In practice, there must be some simple method
for defining these functions. Furthermore,
they must
satisfy the symmetry and consistency conditions of
Kolmogorov. [‘I Consider now the specific problem of
defining a spherically invariant process by means of
spherical density and characteristic functions. It is clear
that the form of the nth-order characteristic function
must be independent of n, for all n, if the consistency
conditions are to be met. The order n, will of course determine the order of the quadratic form in the function.
Hence if the nth-order characteristic function is given by
cp((u’~u)“~), cp(.) will not involve n, and n will enter the
expression only in the order of the quadratic form u’gm.
In such a situation, the corresponding nth-order probability density will, in general, be functionally dependent
on n. This is quite consistent, since lower order densities
are obtained by integration.
Then (6) reads
Ph,
V. COMMENTS
--* , 4
= (&)”
1p/l’“)-’
1
En
#((v’v)““)
exp (- jv’A’x)
dv.
Lettingr’fi = A’x, we obtain
where f(e) and $( .) are connected by the equation
f(r> = KW n’2r(n-2)‘2)-1 s,- yQ(A)X”2J~,,z,,2(~r) dx.
(7)
It is noted that (5) and (7) are very closely related to
the Hankel transform defined by
where g(y, v) is the Hankel transform of f(x) (see
Erdelyir4’). Such a transform has the property that it is
self-reciprocal, i.e.,
f(x) = waf(411
where X is the Hankel transform operator. In the more
convenient terminology of Hankel transforms then, the
characteristic function and probability density are related
by the following equations:
It is clear that the notion of spherical invariance is a
generalization of that of normality. The two ideas are
uniquely defined by the statement: In a linear manifold
in which zero correlation implies independence (semiindependence) all random variables are normal (spherically invariant).
It is interesting to examine further the relationship
between spherically invariant and normal processes. Both
classes have the properties that they are closed under
linear operations and that conditional expectations on
them are linear. A normal process is completely specified
by its mean value and covariance functions; for spherically
invariant pror.esses the additional univariate function
cp(a) is required to specify its characteristic functions of
all orders. While spherically invariant processes do not
enjoy quite the same degree of mathematical tractability
as normal processes, the fact that a simple relationship
exists between probability
densities and characteristic
functions of all orders is significant. It appears, then, that
some of the properties discussed here and commonly
ascribed to normal processes are attributable
to the
quadratic form in the normal characteristic and density
functions. The exponential function, however, is vital
to other considerations, such as independence and ergodicity, and accounts for many of the more remarkable
properties of normal populations.
It is easy to show, for example, that two spherically
invariant random variables which are independent are
16
IEEE
TRANSACTIONS
also norma,l. This implies that if a linear manifold is
generated by a set of independent random variables and
all conditional expectations are linear, then the manifold
contains only normal random variables.
The fact that a spherically invariant process which is
ergodic is also normalL I sheds doubt on the physical
significance of non-normal spherically invariant processes.
The concepts involved in the definition of spherical
invariance, however, appear to be of interest in the consideration of linear estimation problems.
ON
INFORMATION
THEORY,
Since the partial differential
it follows that
VOL.
IT-14,
NO.
1,
JANUARY
196s
will not, in general, be zero,
ul du, + up du, = 0
which implies that
26; + ui = constant
and hence @(ul, u2) is constant on circles, i.e., (9) is true.
APPENDIX
I~EFERENCES
It is required to show that if
%
-$
wh,
u2)
=
u2
j$
e-h,
2
u2>
1
on the whole plane, then
@.(Ul,
u2)
=
cp(u;
+
u;>.
(9)
Consider a curve on the ul, up plane defined by
@.(Ul,
where c is an arbitrary
dc = &
To “remain”
1
u2>
=
C
constant. The total differential
is
@(Ul, 24,) du, + -$ @(ul, 24,) du,.
2
on the curve, set dc = 0, and using (8),
On Optimal
ABRAHAM
111A. M. Vershikl, ” Some characteristic
properties of Gaussian
stochastic processes, Theory of Probability and Its Applications,
vol. 9, pp. 353-356,. 1964.
121A. V. Balakrishnan,
“On a characterization
of processes for
which optimal mean-square systems are of specified form,” IEEE
Trans. Information Theory, vol. IT-6, pp. 490-500, September 1960.
r31S. Bochner, Lectures on Fourier Zntearals. Annals of Mathematical Studies, Study 42. Prim :eton, N. J.: Princeton University
Press, 1959.
141A. Erdelyi et al., Tables of Integral Trar Ls.forms, Bateman
Manuscript Project, vol. 2. New York: McGraw-l %ll, 1954.
IsI J. L. Doob, Stochastic Processes. New Yc hrk: Wiley, 1953.
[61s. s. Wolff, J. L. Gastwirth, and J. B. Thomas, “Linear optimum prec iictors,”
IEEE Trans. Information
Theory, vol. IT-13,
pp. 30-32, Janu arv 1967.
~1 T. Fergus on, “On the existence of linear regression in linear
structural relatil ens,” California University Publications in Statistics,
vol. 2 (1953-19: iS),.pp. l&-165.
181U. Grena nder and M. Rosenblatt,
Statistical Analysis of
onary ‘I’ime Series. New York: Wiley, 1959.
StaII 1 A. N.- Kolmogorov,
Foundations of the Theory of Probability.
I “‘R. Ph,lr,n
“LlwaGa, IOM,
lilV”.
New Vnrl,,
and Suboptimal Nonlinear
for Discrete Inputs
H. HADDAD,
MEMBER,
EEE, AND
Absfracf-The
determination
of minimum-mean-squared-error
(MMSE) nonlinear filters usually involves formidable mathematical
difficulties. These difficulties may be bypassed by restricting attention to special classes of filters or special processes. One such class
is Zadeh’s class nl, which for the general case also involves mathematical difficulties. In this work two realizations of class nl are used
for the MMSE reconstruction
and filtering of a sampled signal.
The cases where the filter reduces to a zero-memory nonlinearity
followed by a linear filter are discussed. A suboptimum scheme
composed of a zero-memory nonlinearity followed by a linear filter
is considered for the reconstruction
and filtering of a subclass of
the separable process.
Manuscript received October 18, 1966. This work was supported
by the National Science Foundation
under Grants GP 1647 and
GK 187; and by the U. S. Army Research Oflice, Durham, N. C.,
under Contract DA-31-124-ARO-D-292.
A. H. Haddad is with the Dept. of Elec. Engrg., University of
Illinois, Urbana, 111.
J. B. Thomas is with the Dept. of Elec. Engrg., Princeton University, Princeton, N. J.
JOHN B.THOMAS,
Filters
FELLOTV,IEEE
I. INTRODUCTION
ONLINEAR
filters have received increasing attention in recent years, partly due to their
v
superior performance for non-Gaussian inputs.
One of the most widely considered classes of general nonlinear filters[“-“I
is Zadeh’s class 7,. Filters of this class
possess simply physical realizations in terms of combinations of zero-memory nonlinearities (ZNL) and linear
filters. The minimum-mean-squared-error
(MMSE) optimization of such filters requires only the second-order
probability density of the input process. However, the
resulting integral equations involve formidable mathematical and practical difficulties. Therefore, one approach
is to resort to special subclasses of filters and to restrict
suitably the input processes.
This study is concerned with the filtering and re-
A