Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Linear Estimation in Krein Spaces - Part I: Theory

Advanced Courses, 1996
...Read more
1s zyxwvutsrqp IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 1, JANUARY 1996 Linear Estimation in ein Spaces- eory Babak Hassibi, zyxwvutsr Ali H. Sayed, zyxwvutsr Member, IEEE, and Thomas Kailath, Fellow, IEEE zyxw Abstract- The authors develop a self-contained theory for linear estimation in Krein spaces. The derivation is based on simple concepts such as projections and matrix factorizations and leads to an interesting connection between Krein space projection and the recursive computation of the stationary points of certain second-order (or quadratic) forms. The authors use the innovations process to obtain a general recursive linear estimation algorithm. When specialized to a state-space structure, the algorithm yields a Krein space generalization of the celebrated Kalman filter with applications in several areas such zyxwvut as Hw- filtering and control, game problems, risk sensitive control, and adaptive filtering. I. INTRODUCTION N some recent explorations, we have found that H" esti- mation and control problems and several related problems (risk-sensitive estimation and control, finite memory adaptive filtering, stochastic interpretation of the KYP lemma, and others) can be studied in a simple and unified way by relating them to Kalman filtering problems, not in the usual (stochastic) Hilbert space, but in a special kind of indefinite metric space known as a Krein space (see, e.g., [9], [lo]). Although the two types of spaces share many characteristics, they differ in special ways that turn out to mark the differences between the linear-quadratic-Gaussian (LQG) or H2 theories and the more recent zyxwvutsrq H" theories. The connections with the conventional Kalman filter theory will allow several of the newer numerical algorithms, developed over the last three decades, to be applied to the H" theories [22]. In this paper the authors develop a self-contained theory for linear estimation in Krein spaces. The ensuing theory is richer than that of the conventional Hilbert space case which is why it yields a unified approach to the above mentioned problems. Applications will follow in later papers. The remainder of the paper is organized as follows. We introduce Krein spaces in Section I1 and define projections in Krein spaces in Section 111. Contrary to the Hilbert space case where projections always exist and are unique, the Krein- space projection exists and is unique if, and only if, a certain Gramian matrix is nonsingular. In Section IV, we first remark Manuscript received March 4, 1994; revised June 16, 1995. Recommended by Associate Editor at Large, B. Pasik-Duncan. This work was supported in part by the Advanced Research Projects Agency of the Department of Defense monitored by the Air Force Office of Scientific Research under Contract F49620-93-1-0085 and in part by a grant from NSF under award B. Hassibi and T. Kailath are with the Information Systems Laboratory, Stanford University, Stanford, CA 94305 USA. A. H. Sayed is with the Department of Electrical and Computer Engineering, University of Califomia, Santa Barbara, CA 93106 USA. Publisher Item Identifier S 0018-9286(96)00386-8. MIP-9409319. that while quadratic forms in Hilbert space always have minima (or maxima), in Krein spaces one can assert only that they will always have stationary points. Further conditions will have to be met for these to be minima or maxima. We explore this by first considering the problem of finding a vector k to stationarize the quadratic form zyxw (z zyxw - k*y, z - k*y), where (., .) is an indefinite inner product, * denotes conjugate transpose, zy y is a collection of vectors in a Krein space (which we can regard as generalized random variables), and z is a vector outside the linear space spanned by the y. If the Gramian matrix R, = (y,y) is nonsingular, then there is a unique stationary point kGy, given by the projection of z onto the linear space spanned by the y; the stationary point will be a minimum if, and only if, R, is strictly positive definite as well. In a Hilbert space, the nonsingularity of R, and its strict positive definiteness are equivalent properties, but this is not true with y in a Krein space. Now in the Hilbert space theory it is well known (moti- vated by a Bayesian approach to the problem) that a certain deterministic quadratic form J(z,y), where now z and y are elements of the usual Euclidean vector space, is also minimized by kGy with exactly the same k as before. In the Krein-space case, kgy also yields a stationary point of the corresponding deterministic quadratic form, but now this point will be a minimum if, and only if, a different condition, not 4 > 0, but R, - R,,R;lR,, > 0, is satisfied. In Hilbert space, unlike Krein space, the two conditions for a minimum hold simultaneously (see Corollary 3 in Section IV). This simple distinction turns out to be crucial in understanding the difference between H2 and H" estimation, as we shall show in detail in Part I1 of this series of papers. In this first part, we continue with the general theory by exploring the consequences of assuming that {z, y} are based on some underlying state-space model. The major ones are a reduction in computational effort, O(Nn3) versus O(N3), where N is the number of observations and n is the number of states and the possibility of recursive solutions. In fact, it will be seen that the innovations-based derivation of the Hilbert space-Kalman filter extends to Krein spaces, except that now the Riccati variable P,, and the innovations Gramian Re+ are not necessarily positive (semi)definite. The Krein space-Kalman filter continues to have the interpretation of performing the triangular factorization of the Gramian matrix of the observations, R,; this reduces the test for R, > 0 to recursively checking that the Re,% > 0. Similar results are expected for the corresponding indefinite quadratic form. While global expressions for the station- ary point of such quadratic forms and of the minimization 0018-9286/96$05.00 0 1996 IEEE
HASSIBI zyxwvutsrqponmlkjih et zyxwvutsrqponmlkjihg al.: LINEAR ESTIMATION IN KREIN SPACES-PART I 19 zyx condition were readily obtained, as previously mentioned, recursive versions are not easy to obtain. Dynamic pro- gramming arguments are the ones usually invoked, and they turn out to be algebraically more complex than the simple innovations (Gram-Schmidt orthogonalization) ideas available in the stochastic (Krein space) case. Briefly, given a possibly indefinite quadratic form, our approach is to associate with it (by inspection) a Krein-space model whose stationary point will have the same gain IC; as for the deterministic problem. The Kalman filter (KF) recursions can now be invoked and give a recursive algorithm for the stationary point of the deterministic quadratic form; moreover, the condition for a minimum can also be expressed in terms of quantities easily related to the basic Riccati equations of the Kalman filter. These results are developed in Sections V and VI, with Theorems 5 and 6 being the major results. While it is possible to pursue many of the results of this paper in greater depth, the development here is sufficient to solve several problems of interest in estimation theory. In the companion paper [l], we shall apply these results to H" and risk-sensitive estimation and to finite memory adaptive filtering. In a future paper we shall study various dualities and apply them to obtain dual (or so-called complementary) state- space models and to solve the H2, H", and risk-sensitive control problems. We may mention that using these results we have also been able to develop the (possibly) numeri- cally more attractive square root arrays and Chandrasekhar recursions for H" problems [22], to study robust adaptive filtering [23], to obtain a stochastic interpretation of the Kalman-Yacubovich-Popov lemma, and to study convergence issues and obtain steady-state results. The point is that the many years of experience and intuition gained from the LQG or H2 theory can be used as a guide to the corresponding H" results. A. Notation A remark on the notation used in the paper. Elements in a Krein space are denoted by bold face letters, and elements in the Euclidean space of complex numbers are denoted by normal letters. Whenever the Krein-space elements and the Euclidean space elements satisfy the same set of constraints, we shall denote them by the same letters with the former ones being bold and the latter ones being normal. (This convention is similar to the one used in probability theory, where random variables are denoted by bold face letters and their assumed values are denoted by normal letters.) zyxwvutsrq 11. ON KREIN SPACES We briefly introduce the definitions and basic properties of Krein spaces, focusing on those results that we shall need later. Detailed expositions can be found in books [9]-[ll]. Most readers will be familiar with finite-dimensional (often called Euclidean) and infinite-dimensional Hilbert spaces. Finite- dimensional (often called Minkowski) and infinite-dimensional Krein spaces share many of the properties Hilbert spaces but differ in some important ways that we shall emphasize in the following. Definition 1 (Krein Spaces): An abstract vector space {K, (., .)} that satisfies the following requirements is called a Krein Space: i) zyxwvu K is a linear space over C, the complex numbers. ii) There exists a bilinear form (., .) E C on IC such that b) (ax + by,z) = a(x,z) + b(y,z) for any x,y,z E K, a, b E C, and where * denotes complex conjugation. iii) The vector space K: admits a direct orthogonal sum decomposition a) (Y,4 = zyxw by)*. IC=K+$K- such that {K,, (.,.)} and {IC-, -(.,.)} are Hilbert spaces, and (X,Y) = 0 for any x E IC+ and y E IC-. Remarks: 1) Recall that Hilbert spaces satisfy not only i), ii)-a), and ii)-b) above, but also the requirement that (x,z) > 0 when z # 0. 2) The fundamental decomposition of K defines two pro- jection operators P+ and P- such that P+K=K+ and P-K=K-. Therefore, for every x E IC we can write x=P+x+P-x=x++z~,x* €IC*. Note that for every x E IC+, we have (z,z) zy 2 0, but the converse is not true: (2, x) zyxw 2 0 does not necessarily imply that x E IC+. 3) A vector x E K will be said to be positive if (z, x) > 0, neutral if (x,x) = 0, or negative if (z,x) < 0. Corre- spondingly, a subspace M c IC can be positive, neutral, or negative, if all its elements are so, respectively. We now focus on linear subspaces of K. We shall define .C{yo, . . . , yN} as the linear subspace of K spanned by the elements yo, yl, . . . , yN in IC. The Gramian of the collection of elements {yo, . . . , yN} is defined as the (N + 1) x (N + 1) matrix The reflexivity property, zyxw (y,,yj) = (y3,yi)*, shows that the Gramian is a Hermitian matrix. It is useful to introduce some matrix notation here. We shall write the column vector of the {y,} as Y = COl{YO, Y1, . . . 7 YNl and denote the above Gramian of the {y,} as
1s zyxwvutsrqp IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 1, JANUARY 1996 zyxwvutsr zyxwvutsr zyxw zyxw zyxw zy zyxwvut Linear Estimation in ein Spaces- eory Babak Hassibi, Ali H. Sayed, Member, IEEE, and Thomas Kailath, Fellow, IEEE Abstract- The authors develop a self-contained theory for linear estimation in Krein spaces. The derivation is based on simple concepts such as projections and matrix factorizations and leads to an interesting connection between Krein space projection and the recursive computation of the stationary points of certain second-order (or quadratic) forms. The authors use the innovations process to obtain a general recursive linear estimation algorithm. When specialized to a state-space structure, the algorithm yields a Krein space generalization of the celebrated Kalman filter with applications in several areas such as H w filtering and control, game problems, risk sensitive control, and adaptive filtering. I. INTRODUCTION N some recent explorations, we have found that H" estimation and control problems and several related problems (risk-sensitive estimation and control, finite memory adaptive filtering, stochastic interpretation of the KYP lemma, and others) can be studied in a simple and unified way by relating them to Kalman filtering problems, not in the usual (stochastic) Hilbert space, but in a special kind of indefinite metric space known as a Krein space (see, e.g., [9], [lo]). Although the two types of spaces share many characteristics, they differ in special ways that turn out to mark the differences between the linear-quadratic-Gaussian (LQG) or H 2 theories and the more recent H" theories. The connections with the conventional Kalman filter theory will allow several of the newer numerical algorithms, developed over the last three decades, to be applied to the H" theories [22]. In this paper the authors develop a self-contained theory for linear estimation in Krein spaces. The ensuing theory is richer than that of the conventional Hilbert space case which is why it yields a unified approach to the above mentioned problems. Applications will follow in later papers. The remainder of the paper is organized as follows. We introduce Krein spaces in Section I1 and define projections in Krein spaces in Section 111. Contrary to the Hilbert space case where projections always exist and are unique, the Kreinspace projection exists and is unique if, and only if, a certain Gramian matrix is nonsingular. In Section IV, we first remark that while quadratic forms in Hilbert space always have minima (or maxima), in Krein spaces one can assert only that they will always have stationary points. Further conditions will have to be met for these to be minima or maxima. We explore this by first considering the problem of finding a vector k to stationarize the quadratic form ( z - k*y,z - k*y),where (., .) is an indefinite inner product, * denotes conjugate transpose, y is a collection of vectors in a Krein space (which we can regard as generalized random variables), and z is a vector outside the linear space spanned by the y. If the Gramian matrix R, = (y,y) is nonsingular, then there is a unique stationary point kGy, given by the projection of z onto the linear space spanned by the y; the stationary point will be a minimum if, and only if, R, is strictly positive definite as well. In a Hilbert space, the nonsingularity of R, and its strict positive definiteness are equivalent properties, but this is not true with y in a Krein space. Now in the Hilbert space theory it is well known (motivated by a Bayesian approach to the problem) that a certain deterministic quadratic form J ( z , y ) , where now z and y are elements of the usual Euclidean vector space, is also minimized by kGy with exactly the same k as before. In the Krein-space case, kgy also yields a stationary point of the corresponding deterministic quadratic form, but now this point will be a minimum if, and only if, a different condition, not 4 > 0, but R, - R,,R;lR,, > 0, is satisfied. In Hilbert space, unlike Krein space, the two conditions for a minimum hold simultaneously (see Corollary 3 in Section IV). This simple distinction turns out to be crucial in understanding the difference between H 2 and H" estimation, as we shall show in detail in Part I1 of this series of papers. In this first part, we continue with the general theory by exploring the consequences of assuming that { z , y} are based on some underlying state-space model. The major ones are a reduction in computational effort, O ( N n 3 )versus O ( N 3 ) , where N is the number of observations and n is the number of states and the possibility of recursive solutions. In fact, it will be seen that the innovations-based derivation of the Hilbert space-Kalman filter extends to Krein spaces, except that now the Riccati variable P,, and the innovations Gramian Re+ are not necessarily positive (semi)definite. The Krein space-Kalman filter continues to have the interpretation of performing the triangular factorization of the Gramian matrix of the observations, R,; this reduces the test for R, > 0 to recursively checking that the Re,%> 0. Similar results are expected for the corresponding indefinite quadratic form. While global expressions for the stationary point of such quadratic forms and of the minimization zyxwvutsrq Manuscript received March 4, 1994; revised June 16, 1995. Recommended by Associate Editor at Large, B. Pasik-Duncan. This work was supported in part by the Advanced Research Projects Agency of the Department of Defense monitored by the Air Force Office of Scientific Research under Contract F49620-93-1-0085 and in part by a grant from NSF under award MIP-9409319. B . Hassibi and T. Kailath are with the Information Systems Laboratory, Stanford University, Stanford, CA 94305 USA. A. H. Sayed is with the Department of Electrical and Computer Engineering, University of Califomia, Santa Barbara, CA 93106 USA. Publisher Item Identifier S 0018-9286(96)00386-8. 0018-9286/96$05.00 0 1996 IEEE zyxwvutsrqponmlkjih zyxwvutsrqponmlkjihg zyx HASSIBI et al.: LINEAR ESTIMATION IN KREIN SPACES-PART I condition were readily obtained, as previously mentioned, recursive versions are not easy to obtain. Dynamic programming arguments are the ones usually invoked, and they turn out to be algebraically more complex than the simple innovations (Gram-Schmidt orthogonalization) ideas available in the stochastic (Krein space) case. Briefly, given a possibly indefinite quadratic form, our approach is to associate with it (by inspection) a Krein-space model whose stationary point will have the same gain IC; as for the deterministic problem. The Kalman filter (KF) recursions can now be invoked and give a recursive algorithm for the stationary point of the deterministic quadratic form; moreover, the condition for a minimum can also be expressed in terms of quantities easily related to the basic Riccati equations of the Kalman filter. These results are developed in Sections V and VI, with Theorems 5 and 6 being the major results. While it is possible to pursue many of the results of this paper in greater depth, the development here is sufficient to solve several problems of interest in estimation theory. In the companion paper [l], we shall apply these results to H" and risk-sensitive estimation and to finite memory adaptive filtering. In a future paper we shall study various dualities and apply them to obtain dual (or so-called complementary) statespace models and to solve the H 2 , H", and risk-sensitive control problems. We may mention that using these results we have also been able to develop the (possibly) numerically more attractive square root arrays and Chandrasekhar recursions for H" problems [22], to study robust adaptive filtering [23], to obtain a stochastic interpretation of the Kalman-Yacubovich-Popov lemma, and to study convergence issues and obtain steady-state results. The point is that the many years of experience and intuition gained from the LQG or H 2 theory can be used as a guide to the corresponding H" results. A. Notation A remark on the notation used in the paper. Elements in a Krein space are denoted by bold face letters, and elements in the Euclidean space of complex numbers are denoted by normal letters. Whenever the Krein-space elements and the Euclidean space elements satisfy the same set of constraints, we shall denote them by the same letters with the former ones being bold and the latter ones being normal. (This convention is similar to the one used in probability theory, where random variables are denoted by bold face letters and their assumed values are denoted by normal letters.) 19 zyxwvu zyxw Definition 1 (Krein Spaces): An abstract vector space { K , (., .)} that satisfies the following requirements is called a Krein Space: i) K is a linear space over C, the complex numbers. ii) There exists a bilinear form (., .) E C on IC such that a) ( Y , 4= b y ) * . b) (ax by,z) = a(x,z) b(y,z) for any x,y,z E K , a , b E C, and where * denotes complex conjugation. iii) The vector space K: admits a direct orthogonal sum decomposition + + IC=K+$Ksuch that { K , , (.,.)} and {IC-, -(.,.)} are Hilbert spaces, and (X,Y)= 0 for any x E IC+ and y E IC-. Remarks: 1) Recall that Hilbert spaces satisfy not only i), ii)-a), and ii)-b) above, but also the requirement that (x,z)> 0 when z # 0. 2) The fundamental decomposition of K defines two projection operators P+ and P- such that P+K=K+ and P - K = K - . zy zyxw Therefore, for every x E IC we can write x = P + x + P - x = x + + z ~ , x * €IC*. Note that for every x E IC+, we have (z,z) 2 0, but x) 2 0 does not necessarily the converse is not true: (2, imply that x E IC+. 3) A vector x E K will be said to be positive if (z, x) > 0, neutral if (x,x)= 0, or negative if (z,x)< 0. Correspondingly, a subspace M c IC can be positive, neutral, or negative, if all its elements are so, respectively. We now focus on linear subspaces of K . We shall define .C{yo,. . . ,yN} as the linear subspace of K spanned by the elements yo,yl,. . . , yN in IC. The Gramian of the collection of elements {yo, . . . ,yN} is defined as the ( N 1) x ( N 1) matrix zyxwvutsrq zyxw + + 11. ON KREIN SPACES The reflexivity property, (y,,yj) = (y3,yi)*, shows that the We briefly introduce the definitions and basic properties of Gramian is a Hermitian matrix. Krein spaces, focusing on those results that we shall need later. It is useful to introduce some matrix notation here. We shall Detailed expositions can be found in books [9]-[ll]. Most write the column vector of the {y,} as readers will be familiar with finite-dimensional (often called Euclidean) and infinite-dimensional Hilbert spaces. FiniteY = COl{YO, Y1,. . . 7 Y N l dimensional (often called Minkowski) and infinite-dimensional Krein spaces share many of the properties Hilbert spaces but and denote the above Gramian of the {y,} as differ in some important ways that we shall emphasize in the following. 20 zyxwvutsrqponmlkji zyxwvutsrqp zyxwvutsrqpon IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 1, JANUARY 1996 (A useful mnemonic device for recalling this is to think of the {yo,. . . , yN} as “random variables” and their Gramian as the “covariance matrix” 1 NeptiLe subspacc .. . Neutral cone f-- where E ( .) denotes “expectation.” We use the quotation marks because in our context, the covariance matrix will generally be indefinite, so we are dealing with some kind of generalized “random variables.” We do not pursue this interpretation here since our aim is only to provide readers with a convenient device for interpreting the shorthand notation.) Also, if we have two sets of elements {zo,...,z~} and {yo,. . . , yN} we shall write zyxwvu z = co1{zo,z~,. . . , Z M } and Y = cO1{YO, Y l , + and introduce the (A4 1) x ( N Fig. 1. Three-dimensional Minkowski space. The (indefinite) squared norm of each vector equal to . . . ,YN> + 1) cross-Gramian matrix (‘u,V) We now proceed with a simple result. Lemma 1 (Positive and Negative Linear Subspaces): Suppose yo, ’ . . ,yN are linearly independent elements of IC. Then C{yo, . . . , yN} is a “positive” (negative) subspace of IC if, and only if R, > O(R, < 0). (2,Z)= k*(y,y)k = > 0 for all z k*R& E C{yo, . . . ,yN}, if, and only if, R, > 0. The proof for R, < 0 is similar. Note that any linear subspace whose Gramian has mixed inertia (both positive and negative eigenvalues) will have elements in both the positive and negative subspaces. A. A Geometric Interpretation = ZlZ2 when + a] a], zyx zyxwvu m.PROJECTIONS IN mEIN SPACES An important notion in both Hilbert and Krein spaces is that Indefinite metric spaces were perhaps first introduced into the solution of physical problems via the finite-dimensional Minkowski spaces of special relativity [12], and some geometric insight may be gained by considering the special three-dimensional Minkowski space of Fig. 1, defined by the inner product (‘U1,VZ) negative subspace, x 2 y2 - t2 < 0, and points outside the cone conesponding to the positive subspace, x 2 y2 - t2 > 0. Moreover, any plane passing through the origin but lying outside the neutral cone will have positive definite Gramian, and any line passing through the origin and inside the neutral cone will have negative definite Gramian. Also, any plane passing through the origin that intersects the neutral cone will have Gramian with mixed inertia, and any plane tangent to the cone will have singular Gramian. Two key differences between Krein spaces and Hilbert spaces are the existence of neutral and isotropic vectors. As mentioned earlier, a neutral vector is a nonzero vector that has zero length; an isotropic vector is a nonzero vector lying in a linear subspace of IC that is orthogonal to every element in that linear subspace. There are obviously no such vectors in Euclidean or Hilbert spaces. In the Minkowski space described is a neutral vector, and if one considers the above, [l 1 linear subspace L{[1 1 [&0 l]}, then [l 1 fi]is also an isotropic vector in this linear subspace. zyxwvuts zyxwvutsr Proofi Since the y2 are linearly independent, for any z # 0 E C{yo, . . . , yN} there exists a unique k E CN+’ such that z = k*y. NOW = (Zl,Yl,tl), + y2 - t2. + R,, = Rt,. U1 = ( x , y , t ) is In this case, we can take IC+ to be the LC - y plane and IC- as the t-axis. The neutral subspace is given by the cone, x2+ y2 - t2 = 0, with points inside the cone belonging to the Note the property so that (z,z ) = LC2 ‘U of the projection onto a subspace. Definition 2 (Projections): Given the element z in IC and the elements {yo,yl, . . . ,yN} also in IC, we define 2 to be the projection of z onto C{yo, yl, . . . ,yN} if zyxwv z=5+2 where i E C{y,, . . . ,yN} and 2 satisfies the orthogonality condition + YlY2 - t l t 2 2LL{Y0,”’,YN} ‘U2 = (22,Y2,t2) and (2) G,Yi,t, E c. or equivalently, (2,yi) = 0 for i = 0,1, . . . , N zyxwvutsrqponmlk zyxwvutsrqponmlkjihgfed zyxwvutsrqpo zyx HASSIBI ef al.: LINEAR ESTIMATION IN KREIN SPACES-PART I L1 In Hilbert space, projections always exist and are unique. In Krein space, however, this is not always the case. Indeed we have the following result, where for simplicity we shall write The proof of the above lemma shows that in Hilbert spaces the singularity of R, implies that the (y,} are linearly dependent, i.e., zyxwvutsrqp zyxwvuts zyxwvu zyxwvutsrqp zyxwvu zyxwvuts C(Y} 2 L{YO,. . . , YN}. Lemma 2 (Existence and Uniqueness of Projections): In the Hilbert space setting, projections always exist and are unique. I n the Krein-space setting, however: a) If the Gramian matrix R, = (y,y)is nonsingular, then the projection of z onto C(y} exists, is unique, and is given by = (z,d(Y, Y)-lY = RZ,R,lY. (3) b) If the Gramian matrix R, = (y,y)is singular, then i) If R(R,,) C R(R,) (where R ( A ) denotes the column range space of the matrix A), the projection i exists but is nonunique. In fact, i = k: y, where ko is "any" solution to the linear matrix equation R,ko = R,,. det(R,) = 0 * k*y = 0 for some vector k E CN+l. In the Krein-space setting, all we can deduce from the singularity of R, is that there exists a linear combination of the (y,} that is orthogonal to every vector in C(yo, . . ,yN}, i.e., that C(yo, . . . , yN} contains an isotropic vector. This follows by noting that for any complex matrix k1, and for any k in the null space of R,, we have + k:R,k = (kTy,k*y) = 0 which shows that the linear combination k*y is orthogonal to k;y, for every ICl, i.e., k*y is an isotropic vector in L{y}. Standing Assumption: Since existence and uniqueness will be important for all our future results, we shall make the standing assumption that the Gramian (4) R(R,), the projection i does not exist. ii) If R(R,,) Prooj Suppose i is a projection of z onto the desired space. By ( 2 ) , we can write R, is nonsingular. A. Vector-Valued Projections Consider the n-vector z = col(z1,... ,zn}composed of elements z, E IC, and the set (yO,...,yN},where y3 E IC; z = k,*y+H project each element z, onto L(yo,...,y,} to obtain i z . We define i = c o l ( i l , . . . ,in} as the projection of z onto for some ko E c ( ~ + ~Since ) . ( 2 , ~= )o L(yo,...,yN} . (Strictly speaking, we should call i E IC" ( 5 ) the projection of z E IC" onto Ln(yo,...,yN}, since it R,, = (z,y)= k,*(y,y) 0 = k:R,. is an element of Ln{yO,...,yN} and not L{yo,...,yN}. If R, is nonsingular, then the solution for k in ( 5 ) is unique For simplicity, however, we shall generally use the looser and the projection is given by (3). If R, is singular, two things terminology.) may happen: either R(R,,) R(R,), in which case ( 5 ) will It is easy to see that the results on the existence and have a nonunique solution (since any k ; in the left null space uniqueness of projections in Lemma 2 continue to hold in of R, can be added to IC:), or R(R,,) R(R,), in which the vector case as well. case the projection does not exist since a solution to (5) does In this connection, it will be useful to introduce a slight not exist. generalization of the definition of Krein spaces that was given In Hilbert spaces the projection always exists because it in Section 11. There, in Definition 1, we mentioned that IC is always true that R(R,,) C R(R,), or equivalently, that should be linear over the field of complex numbers, C. It turns N(R,) C N(R,,) where N ( A ) is the right nullspace of the out, however, that we can replace C with any ring S. In other matrix A. To show this, suppose that 1 E N(R,). Then words, the first two axioms for Krein spaces can be replaced by : R,l = 0 + l*R,l = 0 i) K is a linear space over the ring S. l*(y,y)l = (l*y,l*y) = 0 ii) There exists a bilinear form (., .) E S on K such that l*y = 0 a) ( Y , 4 = (Z,Y)* b) (ax by,z) = a ( z , z ) b(y,z) where the last equality follows from the fact that in Hilbert for any q y , x E IC and a , b E S , and where the spaces ( 2 , ~=) 0 z = 0. We now readily conclude operation * depends on the ring S . that (z,l*y) = R,,l = 0, i.e., 1 E N(R,,) and hence When the inner product (., .) E S is positive, (IC, (., .)} N(R,) C N(R,,). Therefore a solution to (5) (and hence is referred to as a module. Thus the third axiom for a projection) always exists in Hilbert spaces. Krein spaces can be replaced by iii). In Hilbert spaces the projection is also unique because if kl iii) The vector space IC admits a direct orthogonal sum and IC2 are two different solutions to ( 5 ) , then (?GI - k z ) * R y = decomposition 0. But the above argument shows that we must then have (kl - ka)y = 0. Hence the projection K = K, e3 IC2 = IC;y = k;y such that {IC+, (., .)} and {IC-, -(., .)} are modules, and (2, is unique. 0 y) = 0 for any 3: E IC+ and y E IC-. + * * * + + zyxwvutsrq 22 zyxwvut zyxwvutsr zyxwvuts zyxwvutsrq zyxwvutsrqp zyxwvutsr zyxw IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO 1, JANUARY 1996 The most important case for us is when S is a ring of complex matrices, and the operation * denotes Hermitian transpose. The point of this generalization is that we can now directly define the projection of a vector z E IC" onto Cn{yo,. . . ,yN} as an element 2 E Gn{yo, . . . ,yN}, such that 2 = k;;y, IC;; E cnxN It is well known that the linear least-mean-square estimate, which minimizes P ( k ) ,is given by the projection of z on L{y} 2 = k,*y where k; = Ezy*[Eyy*]-' = RzyR$'. The simple proof will be instructive. Thus note that where k is such that P ( k ) = llz - k*Yll& = llz - 2 f - k*yl/& A + 0 1(2 - kGy, y) = Rzy - k;R, or = llz - 211; + 112 - k*& since by the definition of 2, it holds that k; R, = Rzy. Finally, let us remark that to avoid additional notational burden, we shall often refrain from writing ICn and shall simply use the notation K: for any Krein space. The ring S over which the Krein space is defined will be obvious from the context. IV. PROJECTIONS AND QUADRATIC FORMS In Hilbert space, projections extremize (minimize) certain quadratic forms, as we shall briefly first describe. In Krein spaces, we can in general only assert that projections stationarize such quadratic forms; further conditions need to be met for the stationary points to be extrema (minima). This will be elaborated in Section IV-A, in the context of (what we shall call) a stochastic minimization problem. In Section IV-B, we shall study a closely related quadratic form arising in what we shall call a partially equivalent deterministic minimization problem. (z - i , f - k*y)z = 0. Clearly, since f = k,*y P ( k ) 2 P(ko) with equality achieved only when 5 = ko. This argument breaks down, however, when the elements are in a Krein space, since then we could have IJi - k*y1I2 = Ilk,*y- k*y/I2= 0, even if ko A11 we can assert is that k;y - k*y = an isotropic vector in the linear subspace spanned by {yo, . . . ,yN}. Moreover, since Ilkty- k*y1I2could be negative, it is not true that P ( k ) will be minimized by choosing k = ko. So a closer study is necessary. We shall start with a definition. DeJinition 3 (Stationary Point): The matrix ko E d N + l ) x ( M 1) is said to be a stationary point of an (Ad 1) x ( M 1) matrix quadratic form in k , say + + A. Stochastic Minimization Problems in Hilbert and Krein Spaces # k. + zyxwvut zyxwvuts Consider a collection of elements { y o , . . - , y N } in a Krein space IC with indefinite inner product (., .), Let z = col{zo, . . . ,Z M } be some column vector of elements in IC, and consider an arbitrary linear combination of {yo,. . . ,yN}, say k*y, where k* E C(M+l)X(N+l) and y = col{yo,. . . ,y N } . A natural object to study is the error Gramian P ( k ) = ( z - k*y,z - k*y). (6) To motivate the subsequent discussion, let us first assume that the {y,} and { z j } belong to a Hilbert space of zero-mean random variables and that their variance and cross-variances are known. In this case the inner product is ( z ~y ,j ) z = Ez,y,T (where E ( . ) denotes expectation), and P ( k ) is simply the mean-square-error (or error variance) matrix in estimating z using k*y, viz. P ( k ) = E(" - k*y)(z - k*y)* = 112 . - k*yll&. P(k)= A + B k + k*B* + k*Ck iff koa is a stationary point of the "scalar" quadratic form a * P ( k ) afor all complex column vectors a E C M + l , i.e., iff aa;f)alkxk0 = 0. Now we can prove the following. Lemma 3 (Conditionfor Minimum): A stationary point of P ( k ) is a minimum iff for all a E CM+l (7) Moreover, it is a unique minimum iff zyxwvutsrqponm zyxwvutsrqponmlkjihgfedc zyxwvutsrqpon zyx zyxw zyxwv HASSIBI ef al.: LINEAR ESTIMATION IN KREIN SPACES-PART I 23 Theorem 1 {Stationary Point of the Error Gramian): When R, is nonsingular, ko, the unique coefficient matrix in the projection of z onto L{y} zyxw zyxwv zyxwvutsrqp 2 = kiy, ko = RG'R,, yields the unique stationary point of the error Gramian A P ( k )= (2 - k*y,z- k*y) = [I Fig. 2. The projection 2 = k:y stationarizes the error Gramian P ( k ) = ( z - k*y , z - k*y) over all k*y E L { y } . -k*l[2, nd,"] [_Ik] (12) over all k E C ( N S 1 ) x ( M + l ) .Moreover, the value of P ( k ) at the stationary point is given by zyxwvu Proofi Writing the Taylor series expansion of u*P(k)u P ( k 0 ) = R, - R,,R,'R,,. around the stationary point ko yields (since u*P(k)u is quadratic in ka), as shown at the bottom of the previous Proof: The claims follow easily from (11) by differentiapage, or equivalently tion. 0 Further differentiation and use of Lemma 3 yields the u*P(k)u- u*P(ko)u following result. Corollary 1 {Conditionfor a Minimum): In Theorem 1, ko * ( k - k0)u. is a unique minimum iff Using the above expression, we see that ko is a minimum, i.e., u * P ( k ) u - u*P(ko)u 2 0 for all k # ko iff (7) is satisfied. Moreover, ko will be a unique minimum, i.e., u*P(k)u- u*P(ko)u > 0 for all k # ko iff (8) is satisfied. R, > 0 i.e., R, is not only nonsingular but also positive definite. B. A Partially Equivalent Deterministic Problem Let us now return to the error Gramian P ( k ) in (6) and expand it as or more compactly Note that the center matrix appearing in (9b) is the Gramian of the vector col{z,y}. For this particular quadratic form, we can use the easily verified triangular factorization (recall our standing assumption that R, is nonsingular) to write u * ~ ( k )= u [U* u*k* - u*R,,R;~] [".- ~ , f ; l ~ , ~ R, O I[ U ku - R;lR,,u 1. (11) Calculating the stationary point of P ( k )and the corresponding condition for a minimum is now straightforward. Note, moreover, that R, nonsingular implies that the stationary point is unique. We shall now consider what we call a partially equivalent deterministic problem. We refer to it as deterministic because it involves computing the stationary point of a certain scalar quadratic form over ordinary complex variables (not Krein space ones). Moreover, it is called partially equivalent since its solution, i.e., the stationary point, is given by the same expression as the projection of one suitably defined Kreinspace vector onto another, while the condition for a minimum is different than that for the Krein-space projection. To this end, consider the scalar second-order form where the central matrix is the inverse of the Gramian matrix in the stochastic problem of Theorem 1 [see (9b)l. Suppose we seek the stationarizing element zo for a given U. [Of course now we assume not only that R, is nonsingular, but so also the block matrix appearing in (13).] Note that z and y are no longer boldface, meaning that they are to be regarded as (ordinary) vectors of complex numbers. Referring to the discussion at the beginning of Section IVA on Hilbert spaces, the motivation for this problem is the fact that for jointly Gaussian random vectors {z,y}, the linear least-mean-squares estimate can be found as the conditional mean of the conditional density pZy(z,y)/py(y). When {z, y} are zero-mean with covariance matrix. [t, 2;].tam logarithms of the conditional density results in the quadratic form (13) which is the negative of the so-called log-likelihood function. In this case, the relation between (13) and the 24 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 1, JANUARY 1996 zyxwvutsr zyxwvuts zyxwvut zyxwvutsrqp projection follows from the fact that the linear least-meansquares estimate is the same as the maximum likelihood estimate [obtained by minimizing (13)]. With this motivation, we now introduce and study the quadratic form J ( z ,y) without any reference to { z ,y} being Gaussian. Theorem 2 (Deterministic Stationary Point): Suppose both R, and the block matrix in (13) are nonsingular. Then a) The stationary point zo of J ( z , y) over z is given by This is the major difference from the classical Hilbert space context where we have When (14) holds, the approaches of Theorems 1 and 2 give equivalent results. Corollary 3 (Simultaneous Minima): For vectors z and y of linear independent elements in a Hilbert space X, the conditions R, - R,,R;'R,, > 0 and R, > 0 occur zo = R , , R ; ~ ~ . simultaneously. 0 Proof: Immediate from the factorization (10). b) The value of J ( z , y ) at the stationary point is We shall see in more detail in Part 11, and to some extent in 4 x 0 , Y ) = Y*RylY. Section VI-B of this paper, that this difference is what makes H" (and risk-sensitive and finite memory adaptive filtering) Corollary 2 (Conditionfor a Minimum): In Theorem 2, zo results different from H 2 results. Briefly, H" problems will is a minimum iff lead directly to certain indefinite quadratic forms: to stationarize them we shall find it useful to set up the corresponding R, - R,,R~lR,, > 0. JSrein-space problem and appeal to Theorem 1. While this will give an algorithm, further work will be necessary to check for Prooj? We note that [see (lo)] the minimum condition of Theorem 2 in the H" problem. It is this difference that leads us to say that the deterministic problem is only partially equivalent to the stochastic problem of Section IV-A. (We may remark that we are making a distinction between equivalence and "duality": one can in fact define duals to both the above problems, but we defer this topic to another occasion.) Remark 3: Finally, recall that Lemma 2 on the existence and uniqueness of the projection implies that the stochastic so that we can write problem of Theorem 1 has a unique solution if, and only if, R, is nonsingular, thus explaining our standing assumption. The following result is the analog for the deterministic problem. Lemma 4 (Existence of Stationarizing Solutions): The deterministic problem of Theorem 2 has a unique stationarizing It now follows by differentiation that the stationary point of solution for all y if, and only if, R, is nonsingular. Proofi Let us denote J ( x , y ) is equal to zo = R,,R;'y, and that J(zo,y) = y*R;'y. To prove the Corollary, we differentiate once again, A B and use Lemma 3. 0 = [B c] Remark I : Comparing the results of Theorems 1 and 2 shows that the stationary point 20,of the scalar quadratic form so that (13) is given by a formula that is exactly the same as that in Theorem 1 for the Krein-space projection of a vector z onto the linear span L{y}. In Theorem 2, however, there is no Krein space: x and y are just vectors (in general of different If J(z,y) has a unique stationarizing solution for all y , dimensions) in Euclidean space and 20 is not the projection then A must be nonsingular (since by differentiation the of x onto the vector y. What we have shown in Theorem 2 stationary point must satisfy the equation Azo = By). But is that by properly defining the scalar quadratic form as in the invertibility of A and the whole center matrix appearing in (13) using coefficient matrices R, , R,, Rzy, and hzthat are J ( z ,y) imply the invertibility of the Schur complement C arbitrary but can be regarded as being obtained from Gramians B*AP1B.But it is easy to check that this Schur complement and cross-Gramians of some Krein-space vectors { z , y}, we must be the inverse of R,. Thus R, must be invertible. On the other hand if R, is invertible, then the deterministic can calculate the stationary point using the same recipe as in problem has a unique stationarizing solution as given by Theorem 1. Theorem 2. U Remark2: Although the stationary points of the matrix quadratic form P ( k ) and the scalar quadratic form J ( z , y) are found by the same computations, the two forms do C. Altemative Inertia Conditionsfor Minima In many cases it can be complicated to directly check for not necessarily simultaneously have a minimum, since one requires the condition R, > 0 (Corollary l), and the other the positivity condition of the deterministic problem, namely > 0 (Corollary 2). R, - R,YR;lRy, > 0. On the other hand, it is often easier requires the condition R, - R,,R;'R,, zyxw zyxwvutsr % ; I , zyxwvuts zyxwvutsrq zyxwvutsrq :[ zyxwvutsrqponml zyxwvutsrqponmlkjihgf zyxwvutsrqponml HASSIBI et al.: LINEAR ESTIMATION IN KREIN SPACES-PART I zyxwvuts 25 to compute the inertia (the number of positive, negative, and zero eigenvalues) of R, itself. This often suffices [24]. Lemma 5 (Inertia Conditions for Deterministic Minimization): If R, and R, are nonsingular, then the deterministic problem of Theorem 2 will have a minimizing solution (i.e., R, - R,,R;lR,, will be > 0) if, and only if unknown complex vectors. The output y j E C P is assumed known for all j. In many applications one is confronted with the following deterministic minimization problem: Given { yj}j",,, minimize over xo and the quadratic form zyxwvutsrq zyxwvutsrq zy I-[R,] 1I-[%] {U~}Y=~ + I-[(Ry- R y z R ~ ' R z y ) ] (15) where I-[A] denotes the negative inertia (number of negative eigenvalues) of A. When R, > 0 (rather than just being nonsingular) then we will have a minimizing solution iff I-[Ry]= I-[R, - R,,R,lR,,] (16) i.e., if, and only if, R, and R, - R,,R;lR,, have the same inertia. Proof: If R, and R, are both nonsingular, then equating the lower-upper and upper-lower block triangular factorizations of the Gramian matrix in (10) will yield the result that subject to the state-space constraints (17), and where Q, E , S, E C m x p , R, E CPxp, IIo E C n X n are (possibly indefinite) given Hermitian matrices. The above deterministic quadratic form is usually encountered in filtering problems; a special case that we shall see in the companion paper is the 23"-filtering problem where the cmxm [' zyxwvu weighting matrices are IIo, Q, = I , and R, = -;;I]. and where H, is now replaced by col{H,,L,}. Another application arises in adaptive filtering in which case we usually have U , 0 and F, = I 1151, [23]. In the general and case, however, IIo represents the penalty on the initial state, 0 R,R,,R;lR,, O I RY and {Q,, R,, S,} represents the penalty on the driving and are congruent. By Sylvester's Law that congruent matrices measurement disturbances {U,, w,}. (There is also a "dual" have the same inertia 1161, we have quadratic form that arises in control applications which we shall study elsewhere.) I-[R, - R,,R;lR,,] I-[R,] Such deterministic problems can be solved via a variety of = I- [R,] I- [ ( R y - Ry,R;'R,y)]. methods, such as dynamic programming or Lagrange multipliers (see, e.g., [5]), but we shall find it easier to use the Now if (15) holds, then I-[R, - R,,R;'R,,] = 0, so that equivalence discussed in Section IV: construct a (partially) R, - R,,R;lR,, > 0. equivalent Krein space (or stochastic) problem. To do so we Conversely if I-[R, - R,,R;lR,,] = 0, then (15) holds. first need to express the J(XO,U,y) of (18) in the form of (13) When R, > 0, we have I-[R,] = 0, and (16) follows of Section IV-B. immediately. 0 For this, we first introduce some vector notation. Note that The general results presented so far can be made even the states {x,} and the outputs {y,} are linear, combinations more explicit when there is more structure in the problems. In of the fundamental quantities ( 2 0 ,{U,, w,},"=,}. We introduce particular, we shall see that when we have state-space structure (the state transition matrix) both R, and R, - R,, R;' R,, are block-diagonal. Moreover, a Krein space-Kalman filter will yield a direct method for computing the inertia of R,. Thus, when we have state-space structure, it will be much easier to use the results of Lemma 5 and define than to directly check for the positivity of R, - R,,R;lR,, ~ 2 1 ~, 4 1 . '1 [". + + zyxwvutsr zyxwvutsrq V. STATE-SPACE STRUCTURE One approach at this point is to begin by assuming that the components {y,} of y arise from an underlying Krein space state-space model. To better motivate the introduction of such state-space models, however, we shall start with the following (indefinite) quadratic minimization problem. Consider a system described by the state-space equations c X ~ + I= F,x, + G,U,, Y, = HJX, + U, as the response at time j to an impulse at time k both 20 = 0 and 01, E 0). Then with <j (assuming zy 0I j5 N (17) where F, E CnXn, G, E C n X m , and H, E C p x n are given matrices and the initial state xo E Cn, the driving disturbance U , E C m , and the measurement disturbance v, E C p , are the state-space equations (17) allow us to write Ho - (; H 1@(I,0) U= zyx zyx zyxwvutsr F! .I. zyxwvutsrqp IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 1, JANUARY 1996 26 H2@(2,0) _I r= and -HN@iN,0) - respectively, and that the {U%, vz} form a white (uncorrelated) sequence. As mentioned before, the Krein-space elements can be thought of as some kind of generalized random variables. Now if, as was done earlier, we define : ;2 Y = COl{YO , . . .Y N 1 = col{uo, . . .UN} U 21 = COl{VO,~~ .UN} then we can use the state-space model (22a) to write I 0 0' O I O U ~ I and to see that Finally we make the change of coordinates zyxwvu zyxwv zyxwvut to obtain J(Z0, U, Y) = -U -r = E ] * { [ O I which is exactly the inverse of the central matrix appearing in expression (21) for J ( z 0 , U , y). Therefore, referring to Theorems 1 and 2, the main point is that to find the stationary point of J ( z o , u , y ) over { z o , ~ } we , can alternatively find the projection of ( 2 0 ,U} onto L{y} in the Krein-space model (22a). Now that we have identified the stochastic and deterministic problems when a state-space structure is assumed, we can give the analogs of Theorems 1 and 2. Lemma 6 (Stochastic Interpretation): Suppose z = col{xo, U} and y are related through the state-space model (22a) and (22b), and that R, given by (27) is nonsingular. Then the stationary point of the error Gramian I 0 0 I I 0 0 0 I O ] 1 0 Q S] 0 r 1 o S*R O ~ over all k*y is given by the projection I This is now of the desired form (13) (with z 2 c o l { z ~ , ~ } ) . Therefore, comparing with (12) in Theorem 1, we introduce a where Krein space state-space model zyxwvutsr + xJ+l FJxJ GJuJ, 0 Y3 = H J X 3 vu3 + 5j 5N (224 where the initial state, xo, and the driving and measurement disturbances, {uJ} and {vj}, are such that The condition (22b) is the Krein-space version of the usual assumption made in the stochastic (Hilbert space) state-space models, viz., that the initial condition 20 and the driving and measurement disturbances { uz,TI%} are zero-mean uncorrelated random variables with variance matrices no and Moreover this stationary point is a minimum if, and only if, €2, > 0 . We can now also give the analog result to Theorem 2. Lemma 7 (Deterministic Quadratic Form): The expression yields the stationary point of the quadratic order form ]zyxw zyxwvutsrqpon x L J J J [Q, sJ]-'[ s; RJ YJ - u3 HJXJ (29a) zyx zyxwvutsrqponmlkjih zyxwvutsrqponmlkjihg zyxwvutsrqponm zyxwvutsrqpo zyxwvu zyxwvutsr zyxwvut zyxwvutsr HASSIBI et al.: LINEAR ESTIMATION IN KREIN SPACES-PART I 21 over ZO and U = c o l ( u ~ , . . . , u N } ,and subject to the statespace constraints { ~ j + l= F In particular, when + + wj. j ~ j Gjuj, 0 5 j 5 N y j = HjZj Q-l+ 0 or Sj # 0) at A. The Conditionsfor a Minimum As mentioned earlier, the important point is that the conditions for minima in these two problems are different: R, > 0 in the stochastic problem, and - R,,R;lR,, >0 so that we have the following result. Lemma 8 ( A Conditionfor a Minimum): If Q and R S*Q-'S are invertible, a necessary and sufficient condition for the stationary point of Lemma 7 to be a minimum is that i) POIN> 0. ii) Q-l+(r*+Q-lS)(R-S*Q-lS)-'(I'+S*Q-l) > 0. When S F 0, the second condition becomes Q-'+I'*R-lr > 0. J ( p O I N > f i l N , Y ) = y*R,'y. A (r*+ Q - l S ) ( R - S*Q-lS)-l(r + S*Q-l) > 0 Sj 0, the quadratic form is The value of J ( z 0 ,U , y) (with either Sj the stationary point is M = R, Now we use another well-known fact: the (2, 2) block element of M-' is just A-l (where A-l exists since M is positivedefinite). Therefore the condition now becomes where z = col{xo,u} in the deterministic problem. In the state-space case R, is given by (27). In this section we shall explore the condition for a deterministic minimum under the state-space assumption. First note that for M we have (30) as shown at the bottom of the page. Now we know that M > 0 iff both the (1, 1) block entry in (30) and its Schur complement are positive definite. The (1, 1) block entry may be identified as the Gramian of the error 20 - & I N , i.e., The conditions of Lemma 8 need to be reduced further to provide useful computational tests. This can be done in several ways, leading to more specific tests. One interesting way is by showing that Q - l + (r* Q-'S)(R - S*Q-'S)-'(r S*Q-l) may be regarded as the Gramian matrix of the output of a so-called backward dual state-space model. This identification will be useful in studying the Hm-control problem (and in other ways), but we shall not pursue it here. Instead we shall use the altemative inertia conditions of Lemma 5 to circumvent the need for direct analysis of the matrix R, - R,, R; R,, . Recall from Lemma 5 that if R, > 0, a unique minimizing solution to the deterministic problem of Theorem 2 exists if, and only if, R, and R, - R,,R;lR,, have the same inertia. For the state-space structure that we are considering, however + + A IIo - IIoO*R[~OIIO = (20 - &JN,ZO- 3 2 0 ~ ~=) POIN. (31) so that after some simple algebra we have To obtain a nice form for the Schur complement of the (1, 1) block entry, say A, we have to use a little matrix algebra. Recall that + R - S*Q-lS. Using the second expression for R, and a well-known matrix inversion formula leads to the expression Thus R, - R,,R;'R,, is block-diagonal, and we have the following result. Lemma 9 (Inertia Condition for Minimum): If IIo > 0 and Q > 0, then a necessary and sufficient condition for the stationary point of Lemma 7 to be a minimum is that the matrices R, and R - S*Q-lS have the same inertia. In particular, if S 0, then R, and R must have the same inertia. As we shall see in the next section, the Krein space-Kalman filter provides the block triangular factorization of R,, and thereby allows one to easily compare the inertia of R, and x ( R - S*Q-lS)-l[O I? + S*Q-']. (32) R = - S*Q-lS. zyxw zyx zyxwvu zyxwvutsrqp 28 IEEE TRANSACRONS ON AUTOMATIC CONTROL, VOL. 41, NO 1, JANUARY 1996 VI. RECURSIVEFORMULAS factorization of the Gramian R,. To this end, let us write So far we have obtained global expressions for computing Y, = 5, e, projections and for checking the conditions for deterministic = (g,,, eo)R,;eo . + (Y2, e2-1)11,,21_1e2-1 e, and stochastic minimization. Computing the projection requires inverting the Gramian matrix R, and checking for the and collect such expressions in matrix form minimization conditions requires checking the inertia of R,, both of which require O ( N 3 )(where N is the dimension of R,) computations. The key consequence of state-space structure in Hilbert 1= space is that the computational burden of finding projections YN can be significantly reduced, to O ( N n 3 ) (where n is the dimension of the state-space model), by using the Kalman filter recursions. Moreover, the Kalman filter also recursively factors the positive definite Gramian matrix R, as LDL*, L lower triangular with unit diagonal, and D diagonal. We shall presently see that similar recursions hold in Krein where L is lower triangular with unit diagonal. Therefore, space as well, provided since the e, are orthogonal, the Gramian of y is R, is strongly nonsingular (or strongly regular) (34) R, = LReL*, where Re = Re,O@ Re,1@ . . . @ re,^. in the sense that all its (block) leading minors are nonzero. Recall that in Hilbert space if the {y2} are linearly indepen- We thus have the following result. Lemma IO (Inertia of R,): The Gramian R, of y has the dent, then R, is strictly positive definite; so that (34) holds automatically. In the Krein-space theory, we have so far only same inertia as the Gramian of the innovations, Re.The strong assumed that R, is invertible which does not necessarily imply regularity of % implies the nonsingularity of Re,,,0 5 e 5 N . (34). Recursive projection, i.e., projection onto C{y,, . . . , y,} In particular, Iz?J > 0, if and only if for all 2 , however, requires that all the (block) leading submaRe,%> 0, for all i = 0 , 1 , . . . ,N . trices of R, are nonsingular; recall also that (34) implies that R, has a unique triangular decomposition We should also point out that the value at the stationary point + + + ’ ‘ zyxwvuts zyxwvuts zyxwvuts zyxwvuts I”] (35) of the quadratic form in Theorem 2 can also be expressed in terms of the innovations Therefore, In(R,) = In(D), and in particular, I1?J > 0 iff J ( z 0 , y ) = y*RL1y = Y*L-*R,~L-’Y D > 0. This is the standard way of recursively computing the N inertia of R,. The standard method of recursive estimation, which also = eXR,le = e,*R,’e,. (39) =O gives a very useful geometric insight into the triangular factorization of R,, is to introduce the innovations A. The Krein Space-Kalman Filter e, = YJ - Y,, 0 Ij 5 N (36) Now we shall show that the state-space structure allows R, = LDL*. , a where y, = = the projection of y, onto C {yo, ... zyxwvu zyxwvu Note that due to the construction (36), the innovations form an orthogonal basis for C{yo, . . . ,yN} (with respect to the Krein-space inner product) which simplifies the calculation of projections. For example, we can express the projection of the fundamental quantities z 0 and uJ onto C{y,, . . . ,yN} as N 20lN = us to efficiently compute the innovations by an immediate extension of the Kalman filter. Theorem 3 (Kalman Filter in Krein Space): Consider the Krein-space state equations C(Q, e2)(e2, e2)-leZ with (37) 2=0 and N GjIN =~(UJ,4(e2,ez)% (38) 2=0 where the state-space structure may be used to calculate the above inner products recursively. Before proceeding to show this, however, let us note that any method for computing the innovations yields the triangular Assume that R, = [(y,, y,)] is strongly regular. Then the innovations can be computed via the formulas e, = y, - H,x,,0 5 a 5 N xz+1 = Fa& Kp,z(yz- &%), Kp,, = (F,P,H,* G,S,)R,t + + 20 = 0 (41) (42) (43) zyxwvutsrqponmlkjih zyxwvutsrqponmlkjihg zyxwvutsrqpon zyx HASSIBI et al.: LINEAR ESTIMATION IN KREIN SPACES-PART I 29 where and The number of computations is dominated by those in (44) and is readily seen to be O(n3)per iteration. Remark: The only difference from the conventional Kalman filter expressions is that the matrices Pa and R e , , (and, by assumption, IIo, Q , and R,) may now be indefinite. Proof: The same as in the usual Kalman filter theory (see, e.g., [13]). For completeness and to show the power of the geometric viewpoint, however, we present a simple derivation. There is absolutely no formal difference between the steps in the (usual) Hilbert space case and in the Krein-space case. Begin by noting that Pi = IIi - ci. zyxwvu zyxwvu The state-space equations (22a) show that the state variance Hi, obeys the recursion IIi+l = FiIIiF: + GiQfGf. Likewise, the orthogonality of the innovations implies that (47) will yield zyxwvutsrqp zyxwvut zyxwvuts zyxwvu + e, = y, - 9, = y, - (H& .;a) = y, - H,X, = Hap, U, + (45) where 5, is the projection of z,on L{y,, . . . ,Y,-~} and where we have defined 3, = 2, - ka.It follows readily that Re,%= ( e , , e , ) = R , + H,PaH,*, n Pa = (3a,3,). (46) Recall (see Lemma 10) that the strong nonsingularity (all leading minors nonzero) of R , implies that the {Re,,} are nonsingular (rather than positive-definite, as in the Hilbert space case). The Kalman filter can now be readily derived by using the orthogonality of the innovations and the state-space structure. Thus we first write 2 5a+11$ Subtracting the above two equations yields the desired Riccati recursion for Pi, Equations (46)-(49) constitute the Kalman filter of Theorem 3. 0 In Kalman filter theory there are many variations of the above formulas and we note one here. Let us define the filtered estimate, f i l i = the projection of zi onto L{yO,.. . ,y i } . Theorem 4 (Measurement and Time Updates): Consider the Krein state-space equations of Theorem 3 and assume that R, is strongly regular. Then when Si 0, the filtered estimates 5+ can be computed via the following (measurement and time update) formulas =ka+l= C ( x , + 1 : e j ) ( e j , e j 1) ej ~: j =O and to seek a recursion we decompose the above as a-1 + where e,, R,,,, and Pa are as in Theorem 3. Corollary 4 (Filtered Recursions): The two step recursions of Theorem 4 can be combined into the single recursion = C ( z 2 + 1 , e j ) R ; : e j Kp,tea j=0 A KP,, = (z,+i,ea)~ii. = 0. (52) For numerical reasons, certain square-root versions of the KF are now more often used in state-space estimation. Furthermore, for constant systems or in fact for systems where the time-variation is structured in a certain way, the Riccati recursions and the square-root recursions, both of which take O ( n 3 )elementary computations (flops) per iteration, can be replaced by the more efficient Chandrasekhar recursions which require only O ( n 2 )flops per iteration [17], [18].The squareroot and Chandrasekhar recursions can both be extended to the Krein-space setting, as described in [22]. Before closing this section we shall note how the innovations computed in Theorem 3 can be used to determine the using the formulas (37) and (38). ~ projections 5 o l and 2,+11a+1 Now Note also that the first summation can be rewritten as 2-1 2-1 Fa ej)R,;ej j + G , x ( u a , e j ) R G i e j = Fa& + 0. =O j=0 Combining these facts we find xa+l = K& + KP,,ea (47) = C.%aIz+Kf,a+1(Yz+l -Ha+lFZ&l,>, 8-11-1 30 zyxwvutsrqponmlkji zyxwvuts IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO. 1, JANUARY 1996 Lemma I 1 (Computation of Inner Products): We can write zyxwv zyxw zyxw zyx zyx zyxwvut zyxwvu zyxw (54) where en(Fk- In particular, pol^ and B,IN are the stationary points of J~(zo,u,y) over ICO and u3 and subject to the state-space constraints zj+l = F3x3 GJu3,j = 0 , . . . , N . In the recursions, for each time i , we find Dolz and C312 which are the stationary points of + I-1 @F--KH(Z,j) identified in Lemma 7 Kp,kHk). k=3 These lead to the recursions ["i and (56),found at the bottom of the page, where @ & - K H ( i , j ) ( i 2 j ) satisfies the recursion @>-KH(~ + 1,j)= @ > - ~ ~ ( i , j ) (-FKp,iHz)* i @>-KH(j,j) = Pro08 Straightforward computation. s; u3 %]-I[ R3 Y3 -HjXJ 1. Theorem 6 (Deteiministic Problem): If R, is strongly regular, the stationary point of the quadratic form 0 + Er.,. 2 Ji(X0,U , Y> = z;Fq1zo (Y3 - HJ%)* 1 3 =O B. Recursive State-Space Estimation and Quadratic Forms Theorems 5 and 6 below are essentially restatements of Theorems 1 and 2 when a state space model is assumed and a recursive solution is sought. over 20 and U?, subject to the state-space constraints x3+1 = The error Gramian associated with the problem of projecting F3z3 G,u,, j = 0,1, . . . ,z can be recursively computed as {ZO, U} onto C{y} has already been identified in Lemma 6 and (55), and (56) furnishes a recursive procedure for calculating 201, = zo12--1-t- & @ L K H ( z , O)H,*R,be,, 201-1 = 0 this projection. The condition for a minimum is R, > 0, where RY has been shown to be to the diagonal matrix Re' and see (x), shown at the bottom of the page, where the This gives the following theorem. innovations e3 can be computed via the recursions Theorem (Stochastic Problem): Suppose z = col(x0, a} and y are related through the state-space model (22a) and (22b) &+I = F A Kp,,e,, 20 = 0 and that R, is strongly regular. Then the state-space estimation algorithm (S),(56) recursively computes the stationary point with KP,% = (F,P,H,* G,S,)R;t, Re,% = R2 H,P,H,*, of the error Gramian e, = y, - H,&,and P, satisfying the Riccati recursion + + + + ( z - k*y,z - k*y) over all k*y. Moreover, this stationary point is a minimum if, and only if Rc,3> 0 for j = Moreover, the value of J z ( z o ,U , y) at the stationary point is given by O , . - . ,i. 2 Similarly, the scalar quadratic form associated with the (partially) equivalent deterministic problem has already been JZ(~Ol2, f42, Y) = ep,;e,. j=0 zyxwvutsrqponmlkjih zyxwvutsrqponm zyx zyxwvutsrqpo zyxwvutsrqp zyxwvu zyxwvutsrqp HASSIBI et al.: LINEAR ESTIMATION IN KREIN SPACES-PART I 31 Proof: The proof follows from the basic equivalence between the deterministic and stochastic problems. The recursions for Eolz and G,,l are the same as those in the stochastic problem of Lemma 11, and the innovations e, are found via the Krein space-Kalman filter of Theorem 3. 0 As mentioned earlier, the deterministic quadratic form of Theorem 6 is often encountered in estimation problems. By appeal to Gaussian assumptions on the w,, U,, and 20, and maximum likelihood arguments, it is well known that state estimates can be obtained via a deterministic quadratic minimization problem. Here we have shown this result using simple projection arguments and have generalized it to indefinite quadratic forms. The result of Theorem 6 is probably the most important result of this paper, and we shall make frequent use of it in the companion paper [ l ] to solve the problems of H" and risk-sensitive estimation and finite-memory adaptive filtering. In those problems we shall also need to recursively check for the condition for a minimum, and therefore we will now study these conditions in more detail. Recall from Lemma 9 that the above deterministic problem has a minimum iff, R, and R- S*Q-lS have the same inertia. Since R, is congruent to the block diagonal matrix Re, and since R - S*QP1Sis also block diagonal, the solution of the recursive stationarization problem will give a minimum at each step if and only if all the block diagonal elements of Re and R - S*Q-lS have the same inertia. This leads to the following result. Lemma 12 (Inertia Conditionsfor a Minimum): If IIo > 0, Q > 0, and R is nonsingular, then the (unique) stationary points of the quadratic forms (59), for i = O , l , . . . N , will each be a unique minimum iff the matrices Remark: In comparison to our result in Lemma 12, we here have the additional requirement that the [Fj G, 1 must be full rank. Furthermore, we not only have to compute the P, (which is done via the Riccati recursion of the Kalman filter), but we also have to invert P, (and R3) at each step and then check for the positivity of P,-' H;Ry1H,. The test of Lemma 12 uses only quantities already present in the Kalman filter recursion, viz. Re,, and R,. Moreover, these are p x p matrices (as opposed to P,?: which is n x n) with p typically less than n and whose inertia is easily determined via a triangular factorization. Furthermore it can be shown [22] that even this computation can be effectively blended into the filter recursions by going to a square-root-array version of the Riccati recursion. Here, however, for completeness we shall show how Lemma 13 follows from our Lemma 12. Proof of Lemma 13: We shall prove the lemma by induction. Consider the matrix + zyxw 1. 1 zyxwvut zyxwvutsr R+ -Iq1 [ 0 Ho Two different triangular factorizations (lower-upper and upperlower) of the above matrix show that [ -nil P,T;=P,-l+H;RrlH, >O j=O,l,...,N. It also follows in the minimum case that Pj+l j = 0,1,..*,N. 0 I,' + > 0 for 0 0 0 -QO1 Ro+HoIIoH,* and (y), shown at the bottom of the page, have the same inertia. Thus, since IIo > 0, QO > 0, and QO- SoR;'S,* > 0, then the matrices R,,o = Ro + HODOH,*and Ro - S,*QOlSowill have the same inertia (and we will have a minimum for Jo) iff and R j - Sj*Q;lSj have the same inertia for all j = 0,1, . . . N . In particular, when S, s 0, the condition becomes that Re,,, and R, should have the same inertia for all j = 0,1, . . N . The conditions of the above Lemma are easy to check since the Krein space-Kalman filter used to compute the stationary point also computes the matrices Re,,. There is another condition, more frequently quoted in the H" literature, which we restate here (see, e.g., [4]). Lemma 13 (Conditionfor a Minimum): If I I o > 0, Q > 0 , R is invertible, Q-SR-lS* > 0, and [F, G,] has full rank for all j , then the quadratic forms (59) will each have a unique minimum if, and only if 0 4 -QO1 QOlSo SO*QO1 Ro-S,*QG1S0 + H;ROIHo > 0. Now with some effort wd may write the first step of the Riccati recursion as Pl = IF0 Go1 (r:' $1 + [QFso] -1 x ( R i l - S;Q;lSo)-l[Ho S;Q;']) [z]. Moreover, the center matrix appearing in the above expression is congruent to [no1+ YR,'Ho (Qo - SoRi'S,*)-' O I and hence is positive definite. Thus if [Fo Go] has full rank, we can conclude that PI > 0. We can now repeat the argument 0 for the next time instant and so on. We close this section with yet another condition which will be useful in control problems. IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 41, NO 1, JANUARY 1996 32 zyx zyxwvuts zy zyxwvutsrqp zyxw Lemma 14 (Conditionfor a Minimum): If in addition to the conditions of Lemma 13, the matrices Fj - GjSjR;’Hj are invertible for all j , then the deterministic problems of Theorem 6 will each have a unique minimum iff P N + ~> 0 and ACKNOWLEDGMENT The authors would like to thank P. P. Khargonekar and D. J. N. Limebeer for helpful discussions during the preparation of this manuscript. Seminars by P. Park on the KYP Lemma were also helpful in leading us to begin the research. zyxwvu zy REFERENCES Proof Let us first note that the Riccati recursion can be rewritten as B. Hassibi, A. H. Sayed, and T. Kailath, “Linear estimation in Krein spaces-Part II: Applications,” this issue, pp. 3 4 4 9 . P. P. Khargonekar and K. M. Nagpal, “Filtering and smoothing in an H=-setting,” IEEE Trans. Automat. Contr., vol. 36, pp. 151-166, 1991. M. J. Grimble, “Polynomial matrix solution of the Hw-filtering problem and the relationship to Riccati equation state-space results,’’IEEE Trans. Signal Processing, vol. 41, no. 1, pp. 67-81, Jan. 1993. U. Shaked and Y. Theodor, “Hm-optimal estimation: A tutorial,” in Proc. IEEE Con& Decision Contr., Tucson, AZ, Dec. 1992, pp. 2278-2286. T. Basar and P. Bemhard, H”-Optimal Control and Related Mini m a Design Problems-A Dynamic Game Approach. Boston, MA: Birkhauser, 1991. D. Limebeer, B. D. 0. Anderson, P. P. Khargonekar, and M. Green, “A game theoretic approach to HO” control for time varying systems,” SIAM J. Contr. Optimization, vol. 30, pp. 262-283, 1992. G. Tadmor, “HO“ in the time domain: The standard problem,” in Am. Contr. Con$, 1989, pp. 772-773. P. Whittle, Risk Sensitive Optimal Control. New York: Wiley, 1990. J. Bognar, Indefinite Inner Product Spaces. New York SpringerVerlag, 1974. V. I. Istratescu, Inner Product Structures, Theory and Applications, Mathematics and Its Applications. Dordrecht, Holland: Reidel, 1987. I. S. Iohvidov, M. G. Krein, and H. Langer, “Introduction to the spectral theory of operators in spaces with an indefinite metric,” in Mathematical Research. Berlin, Germany: Akademie-Verlag, 1982. A. Einstein, Relativiry: The Special and General Theory, transl. by R. W. Lawson. New York: Crown, 1931. T. Kailath, Lectures on Wiener and Kalman Filtering. Berlin, Germany: Springer-Verlag, 1981, M. Green and D. J. N. Limebeer, Linear Robust Control. Englewood Cliffs, NJ: F‘rentice-Hall, 1995. A. H. Sayed and T. Kailath, “A state-space approach to adaptive RLS filtering,” IEEE Signal Processing Mag., pp. 18-60, July 1994. G. H. Golub and C. F. Van Loan, Matrix Computations. Baltimore, MD: Johns Hopkins Univ., 1989. M. Mod, G. S. Sidhu, and T. Kailath, “Some new algorithms for recursive estimation in constant, linear, discrete-time systems,” IEEE Trans. Automat. Contr., vol. AC-19, pp. 315-323, 1974. A. H. Sayed and T. Kailath, “Extended Chandrasekhar recursions,” IEEE Trans. Automat. Contr., vol. 39, pp. 2265-2269, Nov. 1994. A. E. Bryson and Y.C. Ho, Applied Optimal Control. Blaisdell, 1969. A. H. Jazwinski, Stochastic Processes and Filtering Theory New York: Academic, 1970. B. D. 0. Anderson and J. B. Moore, Optimal Filtering. Englewood Cliffs, NJ: Prentice-Hall, 1979. E. Hassibi, A. H. Sayed, and T. Kailath, “Square-root arrays and Chandrasekhar recursions for H” problems,” submitted to IEEE Trans. zyxwvut zyxwvu The proof, which uses the last of the above equalities, now follows from the sequence of congruences, found in (2) at the top of the page, and Lemma 13. 0 VII. CONCLUDING REMARKS We developed a self-contained theory for linear estimation in Krein spaces. We started with the notion of projections and discussed their relation to stationary points of certain quadratic forms encountered in a pair of partially equivalent stochastic and deterministic problems. By assuming an additional state-space structure, we showed that projections could be recursively computed by a Krein space-Kalman filter, several applications for which are described in the companion paper U]. The approach, in all these applications, is that given an indefinite deterministic quadratic form to which Ha, risksensitive, and finite-memory problems lead almost by inspection, one can relate them to a corresponding Krein-space stochastic problem for which the Kalman filter can be written down immediately and used to obtain recursive solutions of the above problems. zyxwvutsrqponmlkjih zyxwvutsr zyxwvutsrqponm zyxwvutsrqponm zyxwvutsrqponm HASSIBI et al.: LINEAR ESTIMATION IN KREIN SPACES-PART I Automat. Contr., also in Proc. 33rd IEEE Con$ Dec. Cont., 1994, pp. 2237-2243. [23] -, ‘‘Ha optimality of the LMS algorithm,” IEEE Trans. Signal Processing, to appear. Also in the Proc. 32rd IEEE Con$ Dec. and Cont., 1993, pp. 74-80. [24] -, “Fundamental inertia conditions for the solution of H a problems,” in Proc. ACC, June 1995. Babak Hassibi was hom in Tehran, Iran, in 1967. He received the B.S. degree from the University of Tehran in 1989, and the M.S. degree from Stanford University, Stanford, CA, in 1993, both in electrical engineering. He is currently pursuing the Ph.D. degree at Stanford University. From June 1992 to September 1992 he was a Summer Intern with Ricoh, Califomia Research Center, Menlo Park, CA, and from August 1994 to December 1994 he was a short-term Research Fellow at the Indian Institute of Science, Bangalore, India. His research interests include robust estimation and control, adaptive signal processing and neural networks, blind equalization of communication channels, and linear algebra. Ali H. Sayed (S’90-M’92) was bom in Siio Paulo, Brazil, in 1963. He received the B.S. and M.S. degrees in electrical engineering from the University of Siio Paulo, in 1987 and 1989, respectively. In 1992 he received the Ph.D. degree in electrical engineering from Stanford University, Stanford, CA. From September 1992 to August 1993, he was a Research Associate with the Information Systems Laboratory at Stanford University, after which he joined the Department of Electrical and Computer Engineering at the University of Califomia, Santa Barbara, as an Assistant Professor. His research interests include adaptive and statistical signal processing, robust filtering and control, interplays between signal processing and control methodologies, interpolation theory, and structured computations in systems and mathematics. Dr. Sayed is a recipient of the Institute of Engineering Prize, 1987 (Brazil), the Conde Armando Alvares Penteado Prize, 1987 (Brazil), and a 1994 NSF Research Initiation Award. 33 Thomas Kailath (S’57-M’62-F’70) received the S.M. degree in 1959 and the Sc.D degree in 1961 from the Massachusetts Insmute of Technology. From October 1961 to December 1962, he worked at the Jet Propulsion Laboratories,Pasadena, CA, where he also taught part-time at the California Institute of Technology. He then went to Stanford University, where he served as Director of the Information Systems Laboratory from 1971 through 1980, as Associate Department Chairman from 1981 to 1987, and currently holds the Hitachi America Professorship in Engineering. He has held short-term appointments at several institutions around the world. His recent research interests include applications of signal processing, computation and control to problems in semconductor manufacturing, and wireless communications. He is the author of Linear Systems (Englewood Cliffs, NJ: Prentice Hall, 1980) and Lectures on Wiener and Kalman Filtering (New York Spnnger-Verlag, 1981). Dr. Kailath is a fellow of the Institute of Mathematical Statistics and is a member of the National Academy of Engineering and the American Academy of Arts and Sciences. He has held Guggenheim, Churchill, and Royal Society fellowships, among others, and received awards from the B E E Information Theory Society and the American Control Council, in addition to the Technical Achievement and Society Awards of the IEEE Signal Processing Society. He served as President of the IEEE Information Theory Society in 1975.