On a class of processes arising in linear estimation theory

I. Blake

13 IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. IT-14, NO. 1, JANUARY 1968 On a Class of Processes Arising in Linear Estimation Theory IAN F. BLAKE, MEMBER, IEEE, AND Abstract-This paper considers a class of stochastic processes, called spherically invariant,which have the property that all meansquare estimation problems on them have linear solutions. It is shown that their multivariate characteristic functions are univariate functions of a quadratic form. The corresponding densities are easily found by means of the Hankel transform. Relations between spherical invariance and normality are discussed. Properties relating to the linear estimation problem are given. I. INTR~DT~~TIoN I[ ‘T IS WELL KNOWN that all mean-square estimation problems on normal processes have linear solutions and that normal processes are closed under linear operations. In an interesting paper by Vershik,rl’ it is shown that these two properties do not uniquely characterize the normal process. They do, however, characterize the class of spherically invariant processes. Vershik’s principal interest in using these processes was to characterize normal processes, and he showed that, if a spherically invariant process is ergodic, then it is also normal. It is interesting to examine the class of spherically invariant processes further. In particular, their characteristic functions and their relations to linear estimation theory and normal processes will be discussed in some detail. A random process may be defined as an indexed set of E T) together with all finite random variables {zt, dimensional distribution functions of arbitrary collections of the zti, E T. In practice, however, relatively few processes are defined in this manner, the major exception being the normal process. One of the difficulties with this method of definition is the mathematical intractability of general n-variate functions. For spherically invariant processes, however, characteristic functions of all orders are simply defined and mathematically tractable in that their corresponding density functions are easily found. It is noted that the approach of defining a class of processes for which a given system is optimal, as is used here, is identical to that of Balakrishnan.“’ His results are not relevant to the specific linear problem being considered here. ti t Manuscript received April 6, 1967; revised August 23, 1967. This work was supported in part by the National Science Foundation under Grants GK-187 and GK-1439 and by the U. S. Army Research Office, Durham, N. C., under Contract DA-31-124ARO-D-292. I. F. Blake was with the Dept. of Elec. Engrg., Princeton University, Princeton, N. J. He is now with the Jet Propulsion Laboratory, Pasadena, Calif. J. B. Thomas is with the Dept. of Elec. Engrg., Princeton University, Princeton, N. J. JOHN B. THOMAS, FELLOW, IEEE II. PRELIMINAEIES All random variables considered will be assumed realvalued with finite variance. In the. space H of real random variables which are square integrable over some measure space, the metric is defined as the inner product as (2, y) = E[zy], and the norm as ]]x]] = {E[x’] j”‘. With these definitions, H is a Hilbert space. Random variables differing only on sets of probability zero in the measure space are taken as identical. Given a set of random variables [x1, . -. , x,] E H the set of elements of the form (with finite variance) is the linear manifold spanned by the given set. If, in addition, the presence of the sequence {g,,) in the manifold implies that y = Ii%,, gn is also in the manifold, the manifold is said to be closed. In terms of these concepts, the solution’61*‘81 to the problem of estimating x0 by a linear combination of x1, . . . , 2, is simply the projection of x0 on the linear manifold spanned by x,, . . . , x,. The concept of semi-independence of random variables, as introduced by Vershik,[” is of considerable interest and can be stated as follows. Dejkition: Two random varibles, x1 and x2, will be called semi-independent if E[x, 1Xj] = E[xJ; i,j=1,2;i#j. This concept lies between that of two random variables, being uncorrelated and that of two random variables, being independent. It is entirely equivalent to each random variable being uncorrelated with- an arbitrary function of the other,“’ i.e., E[xif(xi>l = EIXxi)E[xi I xil1 i = -eil~[f(~i:i)l; # j; i, j = 1, 2. A well-known theorem (e.g., Ferguson[71 and Balakrishnan121) which is useful in linear estimation considerations is the following. Theorem 1: A necessary and sufficient condition for is that BLAKE AND THOMAS: PROCESSES IN LINEAR ESTIMATION 13 THEORY It follows immediately g- qu,, . . . , 4 0 from (3) that QjLUw, hvd(a,bi - uzblbz)= @L(blv2,b2v2)(a1b1b2 - a&). w-hem Let u1 = b1v2,uz = b2v2.Then quo, . , 24 = E[ew (j gwi)]. or Notice that if E[3c0 1 x1, . . . , x,] = ET-1 a,xj, then the random variable which implies (see the Appendix is semi-independent of every random variable in the manifold generated by x1, . * . , x,. In the case where the &I, *** , 2, are normal random variables, z0 - c;=l ajzi is independent of x1, . . . , x,. III. SPHERICALLY INVARIANT Pnocnss~s While the results of this section follow naturally from Vershik,“’ the approach is considerably different. Greater use is made of the characteristic function, which results in certain simplifications. The following lemma will serve to motivate later results. Lemma 1: Consider the linear manifold H generated by two random variables x1 and x2, where E[z,] = E[Q] = 0, E[z:] = E[x& and E[x,zJ = 0. If, for any two-random variables y1 and yz in H HYl I Yzl = w2 (2) then the joint characteristic function tpz(uI, u,) of x1 and x2 is a function of 26: + uz only. Proof: Represent the random variabIes y1 and y2 by the equn)tions Now attempt to find conditions on ap,(uI, u,) such t(hat (2) is true for every a,, az, b,, and b,. The joint characteristic function @.s(~,,v2) of y1 and yz must satisfy (1) in Theorem 1. Since then this implies that, from (1) and (2), a I1d for justification) that i.e., the function is constant on circles in the ul, uz plane. It is noted that the normal characteristic function is such a function. Thus in the linear manifold spanned by x,, za imposing the condition that all conditional expectations be linear leads to a characteristic function with argument cu: + 24. The following definition of a spherically invariant set of random variables is due to Vershik.“’ Definition: Let H be a linear manifold generated by some set of random variables { zrn1. If all random variables in H which have the same variance have the same distribution function, we call the set {x,1 sphericaZZy invariant. (If the (z, 1 are arbitrary samples from a random process {zt, t E T}, this defines a spherically invariant random process.) The linear manifold H generated by a spherically invariant set of random variables is itself called spherically invariant. It is easy to use this definition to find the multivariate charact,eristic function of an arbitrary collection of spherically invariant random variables. The result is contained in the following lemma. Lemma 2: The multivariate characteristic function of an arbitrary collection of n random variables yl, * . . , Y,, contained in a spherically invariant manifold H is of the form where E[y;;ll$] = -uiuiq”(0)pii, and all random variables are assumed to have zero mean. Proof: Consider an arbitrary linear combination of the yi’s, c;=1 a,yi. The variance of this combination is CT=, c;=l a,a,E[y,y,] and its characteristic function is @(VUl, * * * , ~a,). From the definition of spherical invariance, this characteristic function is constant for all sets of (a,) such that CT=, c;=l a,a,E[y,y,] = constant. It follows that and setting vai = ‘ui yields the result stated. Notice from this lemma that, if two spherically invariant random variables are uncorrelated, then they are also semi-independent as may be seen easily by using Theorem 1. Vershik, ‘I1 in his Lemma 2, shows that this 14 IEEE TRANSACTIONS property uniquely characterizes spherical invariance in the following sense: If, in a linear manifold H, whenever two random variables are uncorrelated they are also semiindependent, then H is spherically invariant. Consider now the problem of linear estimation. It is well known that the best mean-square estimator of z. given zl, . * * , 2, is the conditional expectation E[z, 1 x1, s . . z,]. If the conditional expectation is a linear function ’ of the conditioning random variables, then that particular estimation problem has a linear solution. More generally, however, a time series (either discrete or continuous) is given, and the interest is in estimating a random variable, given a portion of the time series. To show that such a problem has a linear solution it is sufficient to show that the conditional expectation E[xo 1 xl, . . . > x,] is a linear function for arbitrary n and for any of the time Xl, *** , x, chosen from the given portion series. Part of the justification for this is the fact that (see Doobf6’) ON INFORMATION THEORY, JANUARY 1968 solution for the ai’s ,which exists when e is nonsingular, is unique. Therefore and estimation is linear. 2) Suppose every estimation problem in H is linear and that there exists a finite orthonormal basis yl, . . . , y,, for H. Then the random variables y1 and x1 = x:=, aiyi are uncorrelated and have equal variance if x1=, a: = 1. By the assumption of linear estimation and the results of Lemma 1, it is clear that +e(u,, u2a2, -. . , u2an> = 44 + t&at + * +* + $1) where @(ul, . . . , UJ is the multivariate characteristic function of the basis elements y,, * * * , y,,. The same result is true for all sets of constants {ai) such that x:-2 a: = 1. A s b ef ore, setting ua, = 21,yields the equation NVl, . . . , v,) = p(v; + *** +v3. In the following theorem, by “every mean-square estimation problem on the process” is meant every meansquare estimation problem on the linear manifold generated by the given process. The following theorem was first proved by Vershik.“’ A different method of proof is given here, using the multivariate characteristic function introduced previously. Theorem 2 (Vershik) : Every mean-square estimation problem on a process has a linear solution if and only if the process is spherically invariant. Proof: Consider a process {x1, t E T), E[x,] = 0, E[xf] = d* and the linear manifold H spanned by it. 1) Suppose H is spherically invariant. Consider any n + 1 random variables x0, . . . , 2, in H. It is necessary to show that The form of this characteristic function is clearly closed under arbitrary linear transformations and the desired result follows. The class of processes for which all estimation problems on the manifold generated by the process are linear has been characterized. However, there are many examples of nonspherically invariant processes for which particular estimation problems are linear. A good example of this is the discrete ‘(linear process” for which the optimal predictor is linear, as shown in Wolff et a1.16’ IV. THE CHARACTERISTIC FUNCTION It has been shown that if x1, . . * , x, are spherically invariant and have zero means, then their multivariate characteristic function is of the form E[x, 1x1, - - . , x,,] = 2 ajxi. i-1 (4) As previously shown, the joint characteristic function of the n + 1 random variables is of the form CP(U’~U)where d2 = --a”(O) and E[xix,] = d2pii. Using (1) in Theorem 1 $ 0 ~(l.l’@U) uo-0 = 2 2 pojujw(u’@u) i=l %X=0 and where u indicates the column vector of uO, . . * , u,,. For these two equations to be equal, we must equate coefficients of uj, and this yields poj = 2 aipii j = 1, ..a ,n. i-l This is the same equation which the ai’s must satisfy when estimating z0 by a linear combination of the xi’s, i = 1, . . . ) n; and hence, by the projection theorem, a where E[xixi] = -aiaipi@“(0). It is often of interest to find the corresponding multivariate probability density function. First, consider the case where the xi are uncorrelated and have equal variance. The required solution is contained in the following well-known theorem of Bochner. N’ Theorem 3: If I+!(($ + * * . + ~3”~) is absolutely integrable, then . exp -j 2 UiXi > CZU, . . . du, is a function of r = (XT + * a. + xz)*‘* and can be expressed by the single integral P(r> = KW n’2r(n-2)‘2)-1 Jy ~(x)xn’2J(,-2,,,(xr) cix (5) BLAKE AND THOMAS: PROCESSES IN LINEAR ESTIMATION THEORY where J,(t) denotes the Bessel function of the kth order and X = (u: + * . . + u:)I’~. Note that this theorem involves arguments of the radius, while previous equations used arguments of the radius squared. This was for convenience only, the relationship between a($ + . . . + ~2) and #((u: + . . . + u:)““) being clear. Theorem 3 may be used in the following manner to find the transform of (4). Write the transform as Ph, *** ,x3 = -(2:y s En 9WeuY2) - exp -j 2 UiXi i-1 ClU, l - * Oh,; (61 > Assume e is symmetric, nonsingular, and positive definite and let u = Av with A chosen so that A’eA = I. Then du = /Al dv and IAl = /PI-“” and -1 P = AA’. where Pb 9- * . , x,) = (lpl)-““f(?q, r = (x’p-‘Xy and $(A) = g X[r(n-1)‘2f(r)]) x = (u’@l)““. Consider the problem of defining a stochastic process by all orders of distribution functions or characteristic functions. In practice, there must be some simple method for defining these functions. Furthermore, they must satisfy the symmetry and consistency conditions of Kolmogorov. [‘I Consider now the specific problem of defining a spherically invariant process by means of spherical density and characteristic functions. It is clear that the form of the nth-order characteristic function must be independent of n, for all n, if the consistency conditions are to be met. The order n, will of course determine the order of the quadratic form in the function. Hence if the nth-order characteristic function is given by cp((u’~u)“~), cp(.) will not involve n, and n will enter the expression only in the order of the quadratic form u’gm. In such a situation, the corresponding nth-order probability density will, in general, be functionally dependent on n. This is quite consistent, since lower order densities are obtained by integration. Then (6) reads Ph, V. COMMENTS --* , 4 = (&)” 1p/l’“)-’ 1 En #((v’v)““) exp (- jv’A’x) dv. Lettingr’fi = A’x, we obtain where f(e) and $( .) are connected by the equation f(r> = KW n’2r(n-2)‘2)-1 s,- yQ(A)X”2J~,,z,,2(~r) dx. (7) It is noted that (5) and (7) are very closely related to the Hankel transform defined by where g(y, v) is the Hankel transform of f(x) (see Erdelyir4’). Such a transform has the property that it is self-reciprocal, i.e., f(x) = waf(411 where X is the Hankel transform operator. In the more convenient terminology of Hankel transforms then, the characteristic function and probability density are related by the following equations: It is clear that the notion of spherical invariance is a generalization of that of normality. The two ideas are uniquely defined by the statement: In a linear manifold in which zero correlation implies independence (semiindependence) all random variables are normal (spherically invariant). It is interesting to examine further the relationship between spherically invariant and normal processes. Both classes have the properties that they are closed under linear operations and that conditional expectations on them are linear. A normal process is completely specified by its mean value and covariance functions; for spherically invariant pror.esses the additional univariate function cp(a) is required to specify its characteristic functions of all orders. While spherically invariant processes do not enjoy quite the same degree of mathematical tractability as normal processes, the fact that a simple relationship exists between probability densities and characteristic functions of all orders is significant. It appears, then, that some of the properties discussed here and commonly ascribed to normal processes are attributable to the quadratic form in the normal characteristic and density functions. The exponential function, however, is vital to other considerations, such as independence and ergodicity, and accounts for many of the more remarkable properties of normal populations. It is easy to show, for example, that two spherically invariant random variables which are independent are 16 IEEE TRANSACTIONS also norma,l. This implies that if a linear manifold is generated by a set of independent random variables and all conditional expectations are linear, then the manifold contains only normal random variables. The fact that a spherically invariant process which is ergodic is also normalL I sheds doubt on the physical significance of non-normal spherically invariant processes. The concepts involved in the definition of spherical invariance, however, appear to be of interest in the consideration of linear estimation problems. ON INFORMATION THEORY, Since the partial differential it follows that VOL. IT-14, NO. 1, JANUARY 196s will not, in general, be zero, ul du, + up du, = 0 which implies that 26; + ui = constant and hence @(ul, u2) is constant on circles, i.e., (9) is true. APPENDIX I~EFERENCES It is required to show that if % -$ wh, u2) = u2 j$ e-h, 2 u2> 1 on the whole plane, then @.(Ul, u2) = cp(u; + u;>. (9) Consider a curve on the ul, up plane defined by @.(Ul, where c is an arbitrary dc = & To “remain” 1 u2> = C constant. The total differential is @(Ul, 24,) du, + -$ @(ul, 24,) du,. 2 on the curve, set dc = 0, and using (8), On Optimal ABRAHAM 111A. M. Vershikl, ” Some characteristic properties of Gaussian stochastic processes, Theory of Probability and Its Applications, vol. 9, pp. 353-356,. 1964. 121A. V. Balakrishnan, “On a characterization of processes for which optimal mean-square systems are of specified form,” IEEE Trans. Information Theory, vol. IT-6, pp. 490-500, September 1960. r31S. Bochner, Lectures on Fourier Zntearals. Annals of Mathematical Studies, Study 42. Prim :eton, N. J.: Princeton University Press, 1959. 141A. Erdelyi et al., Tables of Integral Trar Ls.forms, Bateman Manuscript Project, vol. 2. New York: McGraw-l %ll, 1954. IsI J. L. Doob, Stochastic Processes. New Yc hrk: Wiley, 1953. [61s. s. Wolff, J. L. Gastwirth, and J. B. Thomas, “Linear optimum prec iictors,” IEEE Trans. Information Theory, vol. IT-13, pp. 30-32, Janu arv 1967. ~1 T. Fergus on, “On the existence of linear regression in linear structural relatil ens,” California University Publications in Statistics, vol. 2 (1953-19: iS),.pp. l&-165. 181U. Grena nder and M. Rosenblatt, Statistical Analysis of onary ‘I’ime Series. New York: Wiley, 1959. StaII 1 A. N.- Kolmogorov, Foundations of the Theory of Probability. I “‘R. Ph,lr,n “LlwaGa, IOM, lilV”. New Vnrl,, and Suboptimal Nonlinear for Discrete Inputs H. HADDAD, MEMBER, EEE, AND Absfracf-The determination of minimum-mean-squared-error (MMSE) nonlinear filters usually involves formidable mathematical difficulties. These difficulties may be bypassed by restricting attention to special classes of filters or special processes. One such class is Zadeh’s class nl, which for the general case also involves mathematical difficulties. In this work two realizations of class nl are used for the MMSE reconstruction and filtering of a sampled signal. The cases where the filter reduces to a zero-memory nonlinearity followed by a linear filter are discussed. A suboptimum scheme composed of a zero-memory nonlinearity followed by a linear filter is considered for the reconstruction and filtering of a subclass of the separable process. Manuscript received October 18, 1966. This work was supported by the National Science Foundation under Grants GP 1647 and GK 187; and by the U. S. Army Research Oflice, Durham, N. C., under Contract DA-31-124-ARO-D-292. A. H. Haddad is with the Dept. of Elec. Engrg., University of Illinois, Urbana, 111. J. B. Thomas is with the Dept. of Elec. Engrg., Princeton University, Princeton, N. J. JOHN B.THOMAS, Filters FELLOTV,IEEE I. INTRODUCTION ONLINEAR filters have received increasing attention in recent years, partly due to their v superior performance for non-Gaussian inputs. One of the most widely considered classes of general nonlinear filters[“-“I is Zadeh’s class 7,. Filters of this class possess simply physical realizations in terms of combinations of zero-memory nonlinearities (ZNL) and linear filters. The minimum-mean-squared-error (MMSE) optimization of such filters requires only the second-order probability density of the input process. However, the resulting integral equations involve formidable mathematical and practical difficulties. Therefore, one approach is to resort to special subclasses of filters and to restrict suitably the input processes. This study is concerned with the filtering and re- A

RELATED PAPERS

RELATED TOPICS

Log In

On a class of processes arising in linear estimation theory

On a class of processes arising in linear estimation theory

Related Papers

RELATED PAPERS

RELATED TOPICS