Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Root Modulus Constraints in Autoregressive Model Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

University of Kuopio Department of Applied Physics Report Series ISSN 0788-4672

Root modulus constraints in autoregressive model estimation


M. Juntunen, J. Tervo, J.P. Kaipio

June 3, 1997

Report No. 1/97

This manuscript has been submitted to Circuits, Systems and


Signal Processing.

University of Kuopio  Department of Applied Physics


P.O.Box 1627, FIN-70211 Kuopio, Finland
Root modulus constraints in autoregressive model estimation

M. Juntunen, J. Tervo, J.P. Kaipio

June 3, 1997

Abstract The stability of autoregressive (AR) models is an important issue in many applications such as
spectral estimation, simulation and decoding of linear prediction coded (LPC) signals. There are methods
for AR parameter estimation that guarantee the stability of the model, that is, all roots of the characteristic
polynomial of the model have moduli less than unity. However, in some situations, such as the decoding
problem, the models that exhibit roots with almost unit modulus are dicult to use. In this paper we
propose a method for the estimation of AR models that guarantees hyperstability, that is, the moduli
of the roots are less or equal to some arbitrary positive number. The method is based on an iterative
minimization scheme in which the associated nonlinear constraints are linearized sequentially.

1 Introduction
An AR(p) model for a stationary process xt is of the form
X
p
xt = ak xt?k + et (1)
k=1

where ak are the model parameters, p is the model order and et are the residuals of the model. If
the process is normally distributed and is or can be approximated well with an AR(p) model, the
residuals can be identi ed with the innovations of the process. Further, in this case the innovations
equal the prediction error process of the minimum mean square error predictor. There are several
methods for the estimation of the model order p and the parameters ak , see for example [14, 4, 6].
We do not discuss model order selection here.
The AR(p) model is stable, if the roots k ; k = 1; : : : ; p of the associated characteristic poly-
nomial
Xp
p(z ; a) = 1 ? ak z ?k ; z 2 C (2)
k=1

have modulus less than unity. In some applications, such as when the model is used for prediction,
instability or near-instability of the model does not constitute any problems. On the other hand,
stability is often an important issue in such applications as spectral estimation, simulation of time
series and decoding of linear prediction coded (LPC) samples. In LPC an AR(p) model is formed
for a sample that is then described by the model parameters and the (possibly quantized) prediction
errors. An approximation for the original process is then obtained recursively using Eq. (1). If
the process is of narrow band type, that is, at least one of the roots has near unit modulus, the
roots that correspond to the estimated parameters may be even nearer to the unit circle. In such
cases the reconstruction of the process can turn out to be an unstable problem even when the
AR parameter estimation scheme guarantees stability. Examples of such methods that guarantee
stability are lattice methods and the Yule-Walker method [14].
In the analysis of heart rate variability the standard approach is to model the heart rate signal
with an AR model. The spectrum of this signal consists of some well de ned peaks that represent
di erent control mechanisms. The power ratios of these peaks are calculated directly from the
 M. Juntunen and J.P. Kaipio are with Dept. of Applied Physics, University of Kuopio, and J. Tervo is with
Dept. of Computer Science and Applied Mathematics, University of Kuopio.

1
model as in [18] and subsequently used as clinical statistics [2]. In the analysis of EEG signals
there are similar applications [11]. Also in these applications the narrow-bandedness of the process
can cause meaningless results. These diculties are related to the lack of Parseval's identity for
AR models and to the rise times of very narrow band processes [14].
A further class of problems is related to time-varying AR modelling in which the time-varying
parameters are linear combinations of some predetermined basis functions. In this case the es-
timated models become very easily temporarily unstable thus e ectively preventing the end use
of the models [8, 12]. The stabilization of time-varying AR models is not discussed directly in
this paper but we note that the proposed method seems to be the most feasible approach to the
stabilization of the time-varying case.
In all above applications a method that constrains the roots adequately far from the unit circle
would be preferable. In some cases it is perfectly adequate to adjust the roots directly after the
estimation but this does not allow the other roots to adjust themselves.
We de ne that an AR model is hyperstable if all roots of the model's characteristic polynomial
lie inside a circle of radius   1 in the complex plane. In this paper we give a method for
the estimation of hyperstable AR models. Although the proposed method is applicable to other
tranversal AR parameter estimation schemes we discuss here only the nonwindowed least squares
(LS) method. The constrained least squares AR parameter estimation problem is of the form
min
a
kHa ? X k2 ; jk (a)j   ; for all k = 1;    ; p ; (3)
where  < 1, k (a) are the roots of the polynomial p(z ; a) and
0x 1 0 x  x 1 0 a1 1
p+1 1
B
X = @ ... C B C
p

A H =@ ..
. . . .
. . A
. a = @ ... A : (4)
xT xT ?1    xT ?p ap
The constraints are nonlinear inequality constraints for which there is no explicit form. This con-
stitutes the main diculty of the problem. We solve the problem iteratively using a Gauss-Newton
type algorithm that sequentially linearizes the constraints. The rest of the paper is organized as
follows. In Section 2.1 we derive the linearization of the constraints. In Section 2.2 we transform
the linearized problem to a least distance programming (LDP) form that can be solved with the
non-negative least squares (NNLS) algorithm. This algorithm is included in most data analysis
software packages such as Matlab. In Section 3 we study an example that illustrates the perfor-
mance of the algorithm.
2 Hyperstability constrained AR estimation
2.1 Linearization of the constraints
The inequality constraints of the problem (3) can be written in vector form as
(j11 (a) + i12 (a)j ;    ; jp1 (a) + p2 (a)j)T  (; : : : ; )T (5)
where k (a) = k1 (a) + ik2 (a). According to Galois theory [16, p. 186] there are no explicit
formulas for the roots of a polynomial when the order of the polynomial p  5.
De ne the constraint function
 
F (a) = j1 (a)j2 ? 2 ;    ; jp (a)j2 ? 2 : (6)
The constraints are now written as F (a)  0. The rst order Taylor expansion of F (a) is of the
form
F (a)  F (a0 ) + JF (a0 )(a ? a0 ) ; (7)
where JF (a0 ) is the Jacobian of the mapping F (a) at a = a0 . We discuss the di erentiability
of F (a) later. To obtain the Jacobian JF we express F (a) as a composite function. De ne the
mapping  : Rp 7! R2p
(a) = (11 (a); 12 (a); 21 (a); 22 (a);    ; p1 (a); p2 (a))T : (8)

2
De ne further the di erentiable mapping G : R2p 7! Rp
G() = (g(11 ; 12 ); g(21 ; 22 );    ; g(p1 ; p2 ))T (9)
g(1 ; 2 ) = 21 + 22 ? 2 : (10)
The constraint function F (a) is thus the composition of the two functions
F (a) = (F1 (a); F2 (a);    ; Fp (a))T = (G  )(a) = G((a)) : (11)
The Jacobian of the composition F (a) is obtained by the chain rule [1, p. 353]
JF (a) = JG ((a))J (a) : (12)
The Jacobian of G() is obtained by direct calculation
0 11 12 0          0 1
JG() = 2 B
B 0 0 21 22 0       0 CC 2 Rp2p : (13)
@ ... ... . . . . . . . . . . . . ... A
0 0    0 0 p1 p2
The Jacobian J (a) can be written as
0 @11    @11 1
BB @@a 1
12    @12 C
@ap

@ap C
J (a) = B B .. . . .. CC 2 R2pp :
@a1
(14)
@ @. . . A
p2
@a1  @p2
@ap

Thus the components of the Jacobian JF (a) are of the form


(JF (a0 ))k;j = 2k1 (a0 ) @@a
k1 (a0 )
+ 2k2 (a0 ) @@a
k2 (a0 )
(15)
j j

In the following we compute (JF (a0 ))k;j with the aid of matrix perturbation theory. Denote
a = a ? a0 . It is well known that the roots of a polynomial coincide with the eigenvalues of the
associated companion matrix [10, p. 147]
0 a1       a p 1
B 1 0 . . . 0 CC
A=B
B@ .. . . . . .. CA (16)
. . . .
0  1 0
We assume that all roots are simple, which is in practice always the case when the probability
distribution of xt is continuous. Then the eigenvalues of A are analytical [9, p. 71] with respect to
ak and the Jacobian J (a) exists. De ne
k () = k1 (a0 + a) + ik2 (a0 + a) : (17)
The matrix perturbation A that corresponds to the change a is
 aT 
A = 2 Rpp :
0 (18)
Then by [7, p. 334] and [17, p. 69] we have

_k (0) = wwk HAv


H k

k vk

3
? 
= wkH vk ?1 wk;H 1 vkT a
= k a = h k ; ai 8 a (19)
where ()H denotes complex conjugate transpose, h; i denotes inner product, wk and vk are the
kth left and right eigenvectors of the companion matrix, respectively, wk;1 is the rst component
of wk and ? 
k = wkH vk ?1 wk;H 1 vkT : (20)
On the other hand the chain rule implies
X
p
@k1 (a0 ) X
p
@k2 (a0 )
_k (0) = @aj aj + i j=1 @aj aj
j =1
= hrk1 (a0 ) + irk2 (a0 ); ai (21)
By (19) and (21) we obtain
k = rk1 (a0 ) + irk2 (a0 ) (22)
rk1 (a0 ) = rk (23)
rk2 (a0 ) = ik (24)
where the superscripts r and i denote the real and imaginary parts, respectively.
Hence
(J (a ))
F 0 = 2 (a ) @k1 (a0 ) + 2 (a ) @k2 (a0 )
k;j k1
@aj 0 k2 0
@aj (25)
? r r 
= 2 k k;j + ik ik;j ; (26)
where rk;j and ik;j are the j th components of rk and ik , respectively.
We have thus obtained a LS problem with linear inequality constraints
min
a
kHa ? (X ? Ha)k2 ; F (a0 ) + JF (a0 )a  0 ; (27)
where the components of JF (a0 ) are obtained from (26).
2.2 The LDP form
The problem (27) can be solved for example with so-called projection methods [13]. However,
the row rank of JF (a) depends on the number of complex conjugate root pairs. For example, if
all coecients are real and all roots occur as conjugate pairs the row rank of JF (a) is p=2. We
consider next a systematic approach of how to solve the linearized-constraint problem with the
non-negative least squares (NNLS) algorithm.
Consider the linear transform (change of variables) a ! h such that
?h = F (a0 ) + JF (a0 )a : (28)
In the case in which JF (a0 ) is invertible the problem (27) can be transformed into the NNLS form,
that is, a quadratic functional with homogenous non-negativity constraints h  0. However, if
JF (a) is not invertible, we can not perform the transform of (28) in order to write the problem in
terms of h.
It can be shown that in the case of a complex root pair, not only are the corresponding rows
of JF (a0 ) equal, but also the constraints coincide. Thus JF (a0 ) is not invertible and a more
complicated reduction process must be performed. We apply the reduction process given in [13,
pp. 165-166] that converts the 27 into a LDP problem by the orthogonal tranformation
 
H = Q R0 00 K T (29)

4
where Q and K are orthogonal and R is a diagonal matrix. By introducing the orthogonal change
of variables
a = Ky (30)
and denoting the rst p columns of Q by Q1 we obtain a standard LDP problem
min kz k ; Gz  h1 (31)
where
z = Ry ? Q1 (Ha ? x) (32)
G = ?JF (a0 )KR?1 (33)
h1 = F (a0 ) ? GQ1 (Ha ? x) (34)
The solution of the LDP problem is obtained with the method presented in [13] and the solution
of the problem (27) can computed from (32) and (30).
The solution to the LDP problem is obtained by writing
zj = ?rj =rp+1 (35)
001
 GT  B ... CC :
E = hT1 f =B
@ A (36)
0
1
For example, in Matlab we can now write r=NNLS(E; f ). A more detailed discussion of the
solution of the LDP problem with the NNLS algorithm is given in [13].
2.3 Iteration
The solution to the original problem is sought iteratively by
a(`) = a(`?1) + a(`) (37)
where ` denotes the iteration number,  is a step size and in (27) we replace a with a(`) , a0 with
a(`?1) and a with a(`) . Although it has not been explicitly denoted earlier, the variables C ,
D, X~ and H~ depend on the approximation center a0 , that is, on a(`?1) . Thus the algorithm is a
sequence of NNLS problems.
Let a be a local minimizer of the original problem (3). Then by the Kuhn-Tucker conditions a
is also the solution of (27) since the Kuhn-Tucker conditions are given in terms of the linearizations
of the constraints. Since the constraint function is analytic by the assumptions, if the iteration
(28-37) assumes a stationary point, it is a local minimizer of (3).
Since the solutions to the linearized problems are not necessarily in the feasible region, each a(`)
has to be projected onto the feasible region before the problem is linearized. The projection can
be done for example by calculating the roots corresponding to a(`?1) and adjusting their moduli if
necessary.
The termination of the iteration can be done by observing the angle between the gradient of
the objective functional and the hyperplane that is normal to the active (linearized) constraint set.
The active set refers to those constraints in (27) for which equality holds.
3 An example
We study the constrained parameter estimation problem of a 4th order AR process. The sample is
of length T = 256 and the matrix H corresponds to the prewindowed forward prediction equations.
The roots of the characteristic polynomial of the process are
1;2 = 0:92 exp (0:20i) ; 3;4 = 0:60 exp (0:32i)

5
We choose  = 0:9. The unconstrained least squares estimate gives the prediction coecients
a = (?2:1225; 2:1546; ?1:0797; 0:3103) :
and the corresponding roots were
1;2 = 0:9223 exp(0:1996i) ; 3;4 = 0:6039 exp(0:3258i)
As the initial value of the iteration we use the polynomial coecients that are obtained by
moving the roots that correspond to the unconstrained LS solution radially inside the feasible
region.
The step size  = 0:2 was chosen and the algorithm iterates until the inner products between the
gradient and the active linearized constraints are between [?; ]. We used the tolerance  = 10?4.
Table I shows the angles between the active constraint planes and the negative gradient. As can
be seen, the angle approaches 90 i.e. the inner product of the gradient and the normal to the
plane approaches zero as the function of the iteration counter. In this case and with the chosen
step size we obtain the accuracy of 0.01 within 10 iterations.
The projections of the negative gradient, the actual constraint and the linearized constraint
onto the plane (a2 ; a4 ) after 30 iterations are shown in Fig. 1. The unconstrained minimum and
one error contour are also shown in Fig. 1. The constrained minimum corresponds to the situation
in which the negative gradient points directly outward from the the feasible region, that is, the
gradient is normal to the tangent plane of the constraints.
The coecients and the roots corresponding to the local minimum are
a = (?2:1294; 2:1624; ?1:0854; 0:2998)
1;2 = 0:900 exp (0:201i) 3;4 = 0:608 exp (0:313i) :
The coecient estimates satisfy clearly the hyperstability constraints. The roots that correspond
to the unconstrained and constrained problems are shown in Fig. 1. We observe that the addition
of the constraints moves the unstable root to the edge of feasible region and furthermore, the other
root is also adjusted.

Table I: The angle (in degrees) between the active constraint plane and the negative
gradient, and the termination criterion jj as functions of the iteration.

Iteration 1 5 10 15 20 25 30
86.781 88.660 89.576 89.865 89.956 89.985 89.995
jj 0:056 0:050 0:010 0:005 1  10?3 2  10?4 1  10?4

4 Discussion
We have proposed an algorithm for the hyperstabilization of autoregressive models. While there
are several methods that guarantee classical stability, it seems that there have been only ad hoc
methods for the hyperstabilization of the models, such as radial pole adjustment.
Although we have discussed only AR modelling and speci cally the (nonwindowed, forward
prediction) least squares estimation of the coecients, the method is clearly applicable to other
AR parameter estimation methods, such as the Yule-Walker method. The method is also relatively
easily modi ed for lter design problems that require hyperstability. For adaptive algorithms
hyperstability can be implemented e.g. as in [15, 5], but block algorithms seem not to have been
proposed.
However, the main motivation for this paper were the problems that are encountered in basis
constrained time-varying autoregressive (TVARLS), or time-varying linear prediction coding (TV-
LPC) modeling. In TVARLS modeling the possibility of temporarily unstable models is very great

6
0.34 0.8

0.33 0.7

0.32 0.6

imag
a_4
0.31 0.5

0.3 0.4

0.29 0.3

0.28 0.2
2.14 2.16 2.18 0.2 0.4 0.6 0.8
a_2 real

Figure 1: Left: the projections of the negative gradient, the actual constraint and the
linearized constraint onto the plane (a2 ; a4 ) after 30 iterations, the unconstrained minimum
(small circle) and one error contour. Right: a section of the circle with radius  and the roots
that correspond to the solutions of the unconstrained ('') and constrained ('+') problems.

when the signal contains transitions between narrow-band and wide-band epochs. Such transitions
occur e.g. in the modeling of EEG signals [11]. It can be said that this unstability problem has
been one of the main hindrances for the use of TVARLS models. Speci cally, it has been said
that \It may be possible to develop a time-varying estimation method or to determine sets of basis
functions for time-varying LPC that will necessarily lead to stable lters. This remains for the
future." [8]. This problem is also stated in [3, 12]. Since it can easily be shown that no set of basis
function guarantees the global stability of a TVARLS model we are left with only such methods
as the one proposed in this paper. We are currently constructing an algorithm for this end that is
based on the use of the algorithm that is proposed in this paper.
References
[1] T.M. Apostol. Mathematical Analysis. Addison-Wesley, 1974.
[2] G. Baselli, S. Cerutti, S. Civardi, F. Lombardi, A. Malliani, M. Merri, M. Pagani, and G. Rizzo. Heart rate
variability signal processing: a quantitative approach as an aid to diagnosis in cardiovascular pathologies. Int
J Bio-Med Comput, 20:51{70, 1987.
[3] M. Boudaoud and L. Chaparro. Composite modeling of nonstationary signals. Journal of the Franklin Institute,
324:113{124, 1987.
[4] P.J. Brockwell and R.A. Davis. Time Series: Theory and Methods. Springer-Verlag, 1991.
[5] J. Chang and J.R. Glover. The feedback adaptive line enhancer: a constrained IIR adaptive lter. IEEE Trans
Signal Processing, 41:3161{3166, 1993.
[6] B. Choi. ARMA Model Identi cation. Springer-Verlag, 1992.
[7] G.H. Golub and C.F. van Loan. Matrix Computations. The Johns Hopkins University Press, 1989.
[8] M. Hall, A. Oppenheimer, and A. Willsky. Time-varying parametric modeling of speech. Signal Processing,
5:267{285, 1983.
[9] L. Hormander. An Introduction to Complex Analysis in Several Variables, volume 7. North-Holland, 1973.
[10] R.A. Horn and C.R. Johnson. Matrix Analysis. Cambridge University Press, 1985.
[11] J.P. Kaipio. Simulation and Estimation of Nonstationary EEG. PhD thesis, University of Kuopio, 1996.
[12] J.P. Kaipio and P.A. Karjalainen. Simulation of nonstationary EEG. Biol Cybern, 76, 1997.
[13] C.L. Lawson and R.J. Hanson. Solving Least Squares Problems. SIAM, 1995.
[14] S.L. Marple. Digital Spectral Analysis with Applications. Prentice-Hall, 1987.
[15] A. Nehorai and D. Starer. Adaptive pole estimation. IEEE Trans Acoust, Speech Signal Processing, 38:825{838,
1990.

7
[16] B.L. van der Waerden. Algebra I. Frederik Ungar, 1970.
[17] J.H. Wilkinson. The algebraic eigenvalue problem. Clarendon Press, 1965.
[18] L. Zetterberg. Estimation of parameters for a linear di erence equation with application to EEG analysis.
Math Biosci, 5:227{275, 1969.

You might also like