90
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 1, JANUARY 2004
Mean-Square Performance of a Family of Affine
Projection Algorithms
Hyun-Chool Shin and Ali H. Sayed, Fellow, IEEE
Abstract—Affine projection algorithms are useful adaptive filters whose main purpose is to speed the convergence of LMS-type
filters. Most analytical results on affine projection algorithms assume special regression models or Gaussian regression data. The
available analysis also treat different affine projection filters separately. This paper provides a unified treatment of the mean-square
error, tracking, and transient performances of a family of affine
projection algorithms. The treatment relies on energy conservation
arguments and does not restrict the regressors to specific models or
to a Gaussian distribution. Simulation results illustrate the analysis
and the derived performance expressions.
Index Terms—Affine projection algorithm, energy-conservation,
learning-curve, steady-state analysis, tracking analysis, transient
analysis.
I. INTRODUCTION
HE normalized least mean-squares (NLMS) algorithm is
a widely used adaptive algorithm due to its computational
simplicity and ease of implementation. However, colored input
signals can deteriorate its convergence speed appreciably [1],
[2]. To address this problem, Ozeki and Umeda [3] developed
the basic form of an affine projection algorithm (APA) using
affine subspace projections. APA is a useful family of adaptive filters whose main purpose is to speed the convergence of
LMS-type filters, especially for correlated data, at a computational cost that is still comparable to that of LMS. This class
of filters is particularly useful in echo cancellation applications,
e.g., [4]. While NLMS updates the weights based only on the
current input vector, APA updates the weights based on previous input vectors. Since [3], many variants of APA have been
devised independently from different perspectives such as the
regularized APA (R-APA) [4], the partial rank algorithm (PRA)
[5], the decorrelating algorithm (DA) [6], and NLMS with orthogonal correction factors (NLMS-OCF) [7]. We will refer to
all these algorithms as belonging to the APA family (see also
[8] and [9]).
T
Manuscript received October 23, 2002; revised April 11, 2003. This work
was supported in part by the National Science Foundation under Grants ECS9820765 and CCR-0208573. This work was performed while H. Shin was a visiting graduate student at the UCLA Adaptive Systems Laboratory. His work was
supported in part by the Brain Korea (BK) 21 Program funded by the Ministry of
Education and in part by HY-SDR Reserch Center at Hanyang University under
the ITRC Program of MIC, Korea. The associate editor coordinating the review
of this paper and approving it for publication was Dr. Behrouz Farhang-Boroujeny.
H.-C. Shin is with Division of Electronics and Computer Engineering, Pohang University of Science and Technology (POSTECH), Pohang, Korea.
A. H. Sayed is with the Department of Electrical Engineering, University of
California, Los Angeles, CA 90095 USA (e-mail: sayed@ee.ucla.edu).
Digital Object Identifier 10.1109/TSP.2003.820077
The transient behavior of affine projection algorithms is not
as widely studied as that of NLMS. The available results have
progressed more for some variations than others, and most
analyses assume particular models for the regression data.
For example, in [10], convergence analyses in the mean and
in the mean-square senses are presented for the binormalized
data-reusing LMS (BNDR-LMS) algorithm. Although the
results show good agreement with simulations, the arguments
are based on a particular model for the input signal and are applicable only to second-order APA. Likewise, the convergence
results in [9] focus on NLMS-OCF and rely on a special model
for the input signal vector. A convergence analysis of DA is
given in [11], where the theoretical results of [6] are extended
to the evaluation of learning curves assuming a Gaussian
autoregressive input model. All these results provide useful
design guidelines. However, each APA form is usually studied
separately with specific techniques. Such distinct treatments
tend to obscure commonalities that exist among algorithms.
In this paper, we provide a unified treatment of the transient
performance of the APA family. In particular, we derive expressions for the mean-square error and tracking performances, as
well as conditions on the step-size for mean-square stability. Our
derivation relies on energy conservation arguments [12]–[18],
and it does not restrict the regression data to being Gaussian or
white. Extensive simulations at the end of the paper illustrate
the derived results.
Throughout the paper, the following notations are adopted:
Euclidean norm of a vector.
Tr
Trace of a matrix.
Diagonal matrix of its entries
.
diag
Hermitian conjugation (complex conjugation for
scalars).
Transpose of a vector or a matrix.
Determinant of a matrix.
Largest eigenvalue of a matrix.
Set of positive real numbers.
In addition, small boldface letters are used to denote vectors,
and capital letters are used to denote matrices, e.g., and . The
symbol denotes the identity matrix of appropriate dimensions.
All vectors are column vectors except for the input data vector
denoted by , which is taken to be a row vector for convenience
of notation.
The paper is organized as follows. In the next section, the
data model and reviews of the APA family are provided. In
Section III, by examining the mean-square performance of the
APA family, expressions for the steady-state mean-square error
(MSE) are derived. Section IV studies the tracking ability of the
APA family. In Section V, the transient performance is analyzed,
1053-587X/04$20.00 © 2004 IEEE
SHIN AND SAYED: MEAN-SQUARE PERFORMANCE OF A FAMILY OF AFFINE PROJECTION ALGORITHMS
and then, the learning behavior is characterized. Section VI illustrates the theoretical results by giving several simulation results.
91
TABLE I
APA FAMILY WHERE f ; K; Dg ARE INTEGERS
II. DATA MODELS AND APA FAMILY
Consider reference data
model
that arise from the linear
(1)
where
is an unknown column vector that we wish to estimate,
accounts for measurement noise, and
denotes
row input (regressor) vectors with a positive-definite co1
variance matrix,
. In this paper, we focus on a
general class of affine projection algorithms for estimating
of the form
(2)
where
iteration ,
,
is an estimate for
at
is the step size, and
which can be rewritten in terms of the weight-error vector
as
(4)
If we multiply both sides of (4) by
from the left, we find that
(5)
Introduce the a posteriori and a priori error vectors
and
Then, from (5), it holds that
(6)
..
.
..
.
We can use (6) to solve for
Different choices of the parameters
result in different affine projection algorithms. Table I defines the param,
eters for some special cases. For example, the choices
, and
result in the standard APA
, assuming
is invertible
and substitute into (4) to get
(7)
which can be rearranged as
For NLMS-OCF, it is further assumed that
is orthog. For PRA, it is understood that
onal to
, for
, i.e., the weight vector
is updated once every iterations.
Most algorithms assume
. Moreover, although we
focus on (2), our approach can be extended to other APA algorithms such as DA, which is not covered by (2).
III. MEAN SQUARE PERFORMANCE OF APA
Our first objective is to evaluate the steady-state mean-square
error performance of the APA family (2), i.e., to compute
MSE
where
is the output estimation error at time . To do so, we will rely on
energy-conservation arguments.
A. Energy Conservation Relation
(8)
By evaluating the energies of both sides of this equation, we find
that the following energy equality should hold:
(9)
The important fact to emphasize is that no approximations are
used to establish the energy relation (9); it is an exact relation
that shows how the energies of the weight-error vectors at two
successive iterations are related to the weighted energies of the
a priori and a posteriori estimation error vectors. Relation (9) is
the extension to the APA case of the energy-conservation relation originally derived in [12] and [13] in the context of robustness analysis and subsequently used in [15]–[18] in the context
of steady-state and transient performance analysis. See also [15]
B. Variance Relation for Steady-State Performance
The relevance of (9) to the mean-square analysis of affine projection algorithms can be seen as follows. Taking expectations
of both sides of (9), we get
. Note that
for all algorithms
Let
listed in Table I, except PRA. Then, (2) becomes
(3)
(10)
92
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 1, JANUARY 2004
Taking the limit as
, and using the steady-state condition
, we obtain
(11)
Substituting (6) into the right-hand side (RHS) of (11), we get
RHS of (11)
(12)
where we are defining
In order to evaluate the EMSE, we need to deal with the expectations in (14). For this purpose, we shall rely on the following
assumption.
is statistically independent of
A.2) At steady-state,
and moreover,
where
for small
and
for large
where
Note that since
for all algorithms listed in Table I, except
, and
is the top entry of
.
PRA, then,
For PRA,
, and therefore,
is also
. The condition on
is
equal to the top entry of
motivated in Appendix A. Using (14) and A.2), the first term on
the left-hand side (LHS) of (14) becomes
Tr
Tr
and
(15)
as
. Similar manipulations can be applied to the remaining terms in (14). Thus, we get
Using (12), equality (11) simplifies to
Tr
(13)
. This equation can now be used to evaluate the meanas
square performance of affine projection algorithms.
C. Mean-Square Performance
Introduce the noise vector
(16)
and
Tr
.
If we introduce the quantities (which are solely dependent on
the statistics of the regression data):
as
Tr
and
Tr
(18)
then (14) becomes
Then, (1) gives
Tr
and under the often realistic assumption that
is i.i.d. and statistically independent of
A.1) the noise
.
the regression matrix
on past noises, we find that
Neglecting the dependency of
the variance relation (13) reduces to
as
as
. This expression can be used to deduce an expression
for the filter MSE or, equivalently, for the filter excess mean
square error (EMSE), which is defined by
EMSE
(20)
and the steady-state MSE is
EMSE
Tr
MSE
(21)
Two simplifications can be made when the regularization parameter is small.
• If is small enough so that its effect can be ignored, then
, and the definitions of
and
will coincide.
In this case, (20) reduces to
Tr
Tr
EMSE
MSE
Tr
EMSE
. Now, from (1), we get
and therefore, the MSE and EMSE define each other via
(19)
, and the EMSE of the filter is therefore given by
(14)
where
(17)
If we use
, we obtain
EMSE
(22)
SHIN AND SAYED: MEAN-SQUARE PERFORMANCE OF A FAMILY OF AFFINE PROJECTION ALGORITHMS
and if we use
, we get
Using the
Tr
EMSE
• Another approximation assumes is small and
and uses
Tr
and
93
random-walk model (24), we know that
for
, and therefore
Tr
is large
(28)
Substituting into (27), we obtain
Tr
to get
Tr
EMSE
Tr
Note that this expression for the EMSE is proportional to
contrast, the expression given in [9] is
EMSE
(23)
. In
Tr
Tr
IV. TRACKING PERFORMANCE OF APA
(30)
, and the EMSE is then given by
as
EMSE
Tr
Tr
(31)
The two simplifications of Section III can be used to get
A similar analysis can be used to evaluate the performance
of APA in nonstationary environments. Thus, assume that
, where the unknown system
is now
is according
time-variant. It is assumed that the variation in
to the random-walk model (see, e.g., [1], [2], [15], and [19])
(24)
of the
Comparing with (10), we see that the only difference in the nonstationary case is the appearance of the additional term
Tr
. Note that the other terms are identical. Therefore,
similar manipulations to those in Section III lead to
Tr
which does not take into account the effect of . Simulation results in Section VI (see Figs. 7–12) show that (22) and (23) provide good approximations for filter performance for relatively
small step-size and order .
where
(29)
is an i.i.d. sequence with autocorrelation matrix
and independent of the initial conditions
for all and of the
for all
. Let
,
, and
. Then
Tr
(32)
Tr
or
Tr
Tr
(33)
From (32) and (33), we see that for a given , there is an
optimal that minimizes the EMSE, and for a given , there
that minimizes the EMSE. Comparisons of
is an optimal
the tracking performance among the APA family are given in
Table II.
V. TRANSIENT ANALYSIS OF APA
and
(25)
If we multiply (25) by
from the left, we obtain that (6) still
holds for the nonstationary case. Substituting (6) into (25), we
get
(26)
Evaluating the energies of both sides of (26) and taking expectations, we find that
(27)
We now study the transient (i.e., convergence and stability)
performance of the APA family. This task is more challenging
than mean-square performance. Nevertheless, the same energy
conservation arguments of the previous section can still be used
if we incorporate weighting into the energy relation and into the
definition of the error quantities [14], [17], as we now explain.
. Then,
We will assume, without loss of generality, that
(2) becomes
In the following analysis, if we substitute
then the results for
would be obtained.
by
,
A. Weighted Energy Relation
Let
and
. If we multiply
from the left, for any
both sides of the above recursion by
Hermitian positive-definite matrix , we find that the a priori
and a posteriori estimation errors
are related via
(34)
94
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 1, JANUARY 2004
TABLE II
EMSE OF APA FAMILY IN NONSTATIONARY ENVIRONMENTS WHERE A
Similarly to the arguments in Section III, we can get
Replacing
(35)
On each side of this identity, we have a combination of a priori
and a posteriori errors. If we equate the weighted Euclidean
norms of both sides of (35), we find that
(U U
)
by its equivalent expression in (34), we get
(38)
Using the relation
, we can eliminate . Since
most of the factors disappear under A.1) and expectation, we get
(36)
The special choice
Moreover, since
reduces to the energy relation (9).
(39)
where
we also get
(37)
In addition,
Thus, we have
.
(40)
B. Weighted Variance Relation
In transient analysis, we are interested in the time evolution
of
for some desirable choices of . For this reason,
rather than eliminate the effect of the weight-error vector, the
are incontributions of the other error quantities
stead expressed in terms of the weight-error vector itself. In so
doing, the energy relation (36) will lead to a recursion that de.
scribes the evolution of
can be expressed in terms of
where
(41)
Recursion (40) provides a compact characterization of the
time evolution of the weight-error variance. However, recursion
SHIN AND SAYED: MEAN-SQUARE PERFORMANCE OF A FAMILY OF AFFINE PROJECTION ALGORITHMS
95
(40) is still hard to propagate due to the presence of the expectation
with
This expectation is difficult to evaluate due to the dependence of
on and of
on prior regressors. One way to overcome
this difficulty is to introduce an independence assumption on the
regressor sequence , namely, to assume the following.
is independent and identiA.3) The matrix sequence
cally distributed.
is independent of both
This assumption guarantees that
and . Clearly, A.3) is a strong assumption (it is actually
stronger than the usual independence assumption, which only
requires the
to be i.i.d [1], [2]). Observe, however, from
that it is sufficient for our purposes to require the
(41) for
following:
is independent of
.
A.3’)
This is generally a weaker assumption. In this way, recursion
(40) reduces to
We can rewrite the recursion for
instead of the matrices
vectors
vec
in (40) by using the
, say, as
vec
(48)
where, for the last term, we used the fact that
Tr
where
vec
. For compactness
of notation, we drop the vec notation from the subscripts and
keep the vectors so that the above is simply rewritten as
(49)
In addition, we obtain the following result for the evolution of
the mean of the weight-error vector:
(42)
(50)
where now
(43)
with expectations appearing in (43). In addition, taking expectations of both sides of (37) and using assumption A.1), we obtain the following result for the evolution of the mean of the
weight-error vector:
(44)
Relations (42) and (44) can be used to derive conditions for
mean-square stability, as well as expressions for the steady-state
MSE and mean-square deviation (MSD) of the APA family. To
notation, e.g.,
see this, we introduce some notation. The vec
vec
, allows us to replace an
arbitrary matrix
by an
1 column vector whose entries are formed by
stacking the successive columns of the matrix on top of each
other. On the other hand, writing vec
for an
1 column
vector results in an
matrix whose entries are obtained from . Therefore, we also write
vec
. The
notation is convenient when working with Kronecker
vec
products. The Kronecker product of two matrices and , say
and
, respectively, is denoted
of dimensions
by
[20]. For any matrices
of compatible dimensions, it holds that
vec
vec
(45)
Applying (45) to (40), we find that it leads to the vector relation
(46)
where the coefficient matrix
is
, we
Recursion (49) shows that in order to evaluate
, with a weighting matrix whose
need to know
entries are determined by
. Now, the quantity
can be inferred from (49) by writing the recursion for
, i.e.,
We again find that in order to evaluate
, we need
. The natural question is whether this
to know
procedure terminates. Fortunately, as in [14] and [17], this procedure does terminate. This is because once we write (48) by
, we get
substituting by
where the weighting matrix on the RHS is
. This term can
be deduced from the prior weighting factors. Indeed, let
denote the characteristic polynomial of ,
It is a polynomial of order
with coefficients
guarantees that
in
. Now, the Cayley–Hamilton theorem
so that
(51)
Theorem 1 [Transient Performance]: Under assumptions
A.1) and A.3’), the transient performance of the APA family
(2) for
is described by the state recursion
and defined by
(47)
(52)
96
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 1, JANUARY 2004
where
TABLE III
STABILITY BOUNDS COMPUTED BY THEOREM II (GAUSSIAN INPUT)
..
.
..
.
..
.
..
..
.
.
..
.
vec
..
.
TABLE IV
STABILITY BOUNDS COMPUTED BY THEOREM II (UNIFORM INPUT)
vec
,
, and are coefficients of the characteristic polynomial of .
Observe that the eigenvalues of coincide with those of .
,
C. Learning Curves
Gaussian input
20
K=1
K=2
K=4
K=8
10
MSE in dB
The learning curve of an adaptive filter describes the time
. Now, if the are assumed
evolution of the variance
to be i.i.d., then
and the learning curve can be evaluated by computing
for each . This task can be accomplished recursively from (48) by iterating it and setting
vec
.
This yields
0
-10
stability bound
µ≈2
-20
-30
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
Uniform input
20
K=1
K=2
K=4
K=8
(53)
(54)
where the vector
and the scalar
satisfy the recursions
MSE in dB
10
That is
0
-10
stability bound
µ≈2
-20
-30
0.4
0.6
Fig. 1.
D. Mean-Square Stability
From (50), the convergence in the mean of the APA family is
guaranteed for any satisfying
(55)
Moreover, recursion (49) is stable if, and only if, the matrix
is stable. Thus, let
and
so that
. The following holds.
Theorem 2 [Stability]: The convergence in the mean-square
sense of the APA family is guaranteed for any in the range
where
,
, and
.
The above condition on is in terms of the largest positive
eigenvalue of
when it exists. The theorem is proved Ap-
2.4
Step--size (µ)
0.8
1
1.2
1.4
1.6
Step size (µ)
1.8
2
2.2
2.4
Simulated MSE of APA as a function of the step size.
pendix B. By combining (55) and Theorem 2, a bound on the
step-size for both mean and mean-square stability is obtained.
Theorem 2 provides an explicit and unified stability bound
for a general class of input signals and various affine projection
algorithms.
E. Steady-State Behavior
In the above, we used the variance relation (49) to characterize the transient behavior of the APA family in terms of a
state recursion. We can use the same variance relation to shed
further light on the mean-square performance of the APA family.
In particular, we shall re-examine the EMSE, as well as study
the mean-square deviation (MSD), which is defined as
MSD
Assuming the step-size is chosen to guarantee filter stability, recursion (49) becomes in steady-state
(56)
SHIN AND SAYED: MEAN-SQUARE PERFORMANCE OF A FAMILY OF AFFINE PROJECTION ALGORITHMS
0
0
(a) K=1, D=8
(b) K=2, D=8
(c) K=4, D=8
(d) K=8, D=8
Theory
Simulation
-5
(a) Using (54)
(b) From [10]
(c) Simulation
-5
-10
-10
(a) K=1
MSE in dB
MSE in dB
97
-15
(b) From [10]
-15
(b) K=2
(c) K=4
-20
-20
(a) Using (54)
(c) Simulation
-25
-25
(d) K=8
-30
0
Fig. 2.
50
100
150
200
250
300
Iteration number
350
400
450
-30
500
Learning curves of the APA family for colored Gaussian input using
= 1:0 and D = 8. (a) K = 1. (b) K = 2. (c) K = 4. (d) K = 8 [Input:
Gaussian AR(1), pole at 0.9. System: FIR (16)].
0
50
100
150
200
Iteration number
250
300
350
400
Fig. 4. Comparison of learning curves for colored Gaussian input using K =
2, = 1:0, and D = 8. (a) Using (54). (b) Using the results of [10]. (c)
Simulation [Input: Gaussian AR(1), pole at 0.9. System: FIR (16)].
0
0
(a) K=1, D=8
(b) K=2, D=8
(c) K=4, D=8
(d) K=8, D=8
Theory
Simulation
-5
(a) Using (54)
(b) From [9]
(c) Simulation
-5
-10
MSE in dB
MSE in dB
-10
(a) K=1
-15
(b) K=2
-15
(c) K=4
-20
-20
(a) Using (54)
(c) Simulation
-25
-25
(d) K=8
-30
(b) From [9]
0
Fig. 3.
50
100
150
200
250
300
Iteration number
350
400
450
500
Learning curves of the APA family for colored uniform input using
= 1:0 and D = 8. (a) K = 1. (b) K = 2. (c) K = 4. (d) K = 8 [Input:
uniform AR(1), pole at 0.5. System: FIR (16)].
0
20
40
60
80
100
120
Iteration number
140
160
180
200
Fig. 5. Comparison of learning curves for colored Gaussian input using K =
4, = 1:0, and D = 8. (a) Using (54). (b) Using the results of [9]. (c)
Simulation [Input: Gaussian AR(1), pole at 0.9. System: FIR (16)].
which is equivalent to
(57)
We choose to reduce the weight into the identity matrix. Thus,
it needs to be selected as the solution to the linear system of
vec
, i.e.,
vec
. In
equations
this case, the weighting quantity that appears in (57) reduces
to the vector of unit entries. Then, the left-hand side of (57)
becomes the filter MSD, and (57) leads to
MSD
-30
vec
(58)
In a similar way, let us evaluate the EMSE of the APA family.
Note that since
we need to evaluate
, where the weighting factor is
vec
. Assume we select as the solution to the linear
system of equations
. In this case, the weighting
. Then, the LHS of
quantity that appears in (57) reduces to
(57) becomes the filter EMSE, and (57) leads to the desired
result
EMSE
vec
(59)
VI. SIMULATION RESULTS
We illustrate the theoretical results presented in this paper
by carrying out computer simulations in a channel estimation
scenario. The unknown channel has 16 taps and is randomly
generated. Two different types of signals, viz., Gaussian and
98
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 1, JANUARY 2004
0
-21
(a) Using (54)
(b) From [9]
(c) Simulation
(a) K=4, D=1
(b) K=4, D=4
(c) K=4, D=8
Theory
Simulation
-22
-5
-23
-10
MSE in dB
MSE in dB
-24
-15
-25
-26
(a) D=1
(b) D=4
-20
-27
(a) Using (54)
(c) Simulation
-28
-25
(c) D=8
-29
(b) From [9]
-30
0
20
40
60
80
100
120
Iteration number
140
160
180
200
Fig. 6. Comparison of learning curves for colored Gaussian input using K =
8, = 1:0, and D = 8. (a) Using (54). (b) Using the results of [9]. (c)
Simulation [Input: Gaussian AR(1), pole at 0.9. System: FIR (16)].
-30
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size (µ)
0.7
0.8
0.9
1
Fig. 8. Steady-state MSE curves of the APA family for colored Gaussian input
using K = 4 in stationary environments. (a) D = 1. (b) D = 4. (c) D = 8
[Input: Gaussian AR(1), pole at 0.9. System: FIR (16)].
-18
K=1
(a) K=1, D=1
(b) K=2, D=1
(c) K=4, D=1
(d) K=8, D=1
Theory
Simulation
Simulation
Eq.(20)
Eq.(22)
Eq.(23)
Eq.(59)
-26
MSE in dB
-20
-25
-22
-27
-28
MSE in dB
-29
-30
-24
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size (µ)
0.7
0.8
0.9
1
0.2
0.3
0.4
0.5
0.6
Step size (µ)
0.7
0.8
0.9
1
(d) K=8
(c) K=4
K=4
20
-
-26
(b) K=2
- 22
MSE in dB
(a) K=1
-28
-30
-24
Simulation
Eq.(20)
Eq.(22)
Eq.(23)
Eq.(59)
-26
28
-
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size(µ)
0.7
0.8
0.9
1
- 30
Fig. 7. Steady-state MSE curves of the APA family for colored Gaussian input
using D = 1 in stationary environments. (a) K = 1. (b) K = 2. (c) K = 4.
(d) K = 8 [Input: Gaussian AR(1), pole at 0.9. System: FIR (16)].
uniformly distributed signals, are used for the input signal
viz.,
,
which is a first-order autoregressive (AR) process with a pole at
. For the Gaussian case,
is a white, zero-mean, Gaussian
random sequence having unit variance, and is set to 0.9. As
a result, a highly colored Gaussian signal is generated. For the
is a uniform random sequence between 1.0
uniform case,
and 1.0, and is set to 0.5. In Tables III and IV we evaluate
the bounds in (55) and Theorem 2. These tables indicate that
the stability bound on is approximately
for both
Gaussian input (which is consistent with [9] and uniform input
signals). This fact is further verified by simulation in Fig. 1,
0
0.1
Fig. 9. Comparison of MSE expressions when K = 1 or K = 4 and D = 1
[Input: Gaussian AR(1), pole at 0.9. System: FIR (16)].
where MSE curves are plotted as a function of the step size.
The expectations involved in evaluating and are estimated
via ensemble averaging.
The signal-to-noise ratio (SNR) is calculated by
SNR
where
. The measurement noise
is added to
such that SNR
30 dB. The adaptive filter and the unknown channel are assumed to have the same number of taps.
All adaptive filter coefficients are initialized to zero. In addition,
the regularization parameter is set to 0.001. We set
. The
simulation results shown are obtained by ensemble averaging
over 200 independent trials.
SHIN AND SAYED: MEAN-SQUARE PERFORMANCE OF A FAMILY OF AFFINE PROJECTION ALGORITHMS
99
-22
-23
(a) Simulation
(b) Eq.(20)
(c) Eq.(22)
(d) Eq.(23)
(e) Eq.(59)
(f) [8]
-24
(a) K=4, D=1
(b) K=4, D=4
(c) K=4, D=8
Theory
Simulation
-23
(d)
-24
-25
-25
MSE in dB
MSE in dB
(e)
-26
(a)
-27
(a) D=1
-26
(b) D=4
(b) or (c)
(c) D=8
-27
-28
-28
(f)
-29
-29
-30
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size (µ)
Fig. 10. Comparison of MSE when
AR(1), pole at 0.9. System: FIR (16)].
K
= 2 and
0.7
D
0.8
0.9
-30
1
= 1 [Input: Gaussian
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size(µ)
0.7
0.8
0.9
1
Fig. 12. Steady-state MSE curves of the APA family for colored uniform input
using
= 4 in stationary environments. (a) = 1. (b) = 4. (c) = 8
[Input: uniform AR(1), pole at 0.5. System: FIR (16)].
K
D
D
D
-18
(a) K=1, D=1
(b) K=2, D=1
(c) K=4, D=1
(d) K=8, D=1
Theory
Simulation
-20
-12
-16
-22
-18
(c) K=4
-24
MSE in dB
MSE in dB
(a) K=1, D=1
(b) K=2, D=1
(c) K=4, D=1
(d) K=8, D=1
Theory
Simulation
-14
(d) K=8
-26
(a) K=1
-20
-22
(c) K=4
(d) K=8
-24
(b) K=2
(b) K=2
(a) K=1
-28
-26
-30
-28
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size(µ)
0.7
0.8
0.9
1
-30
Fig. 11. Steady-state MSE curves of the APA family for colored uniform input
using
= 1 in stationary environments. (a) = 1. (b) = 2. (c) = 4.
(d)
= 8 [Input: uniform AR(1), pole at 0.5. System: FIR (16)].
K
D
K
K
K
A. Transient Performance
Figs. 2–6 show the learning curves of the APA family. The
step size is set to
, and the delay parameter is set to 8.
Fig. 2 shows how close the simulation results are to the theoretical results (54), where and were evaluated via ensemble
averaging. The theoretical results are very close to the simu.
lated results, although there is some discrepancy when
In Fig. 3, the colored uniform input signal is used for the simulation. For generating the input signal, is set to 0.5, unlike
the Gaussian case. In Figs. 4–6, the learning curves in Fig. 2 are
compared with the theoretical results in [9] and [10].
B. Steady-State Performance
Fig. 7 shows the steady-state MSE curves of the APA family
for colored Gaussian input as a function of the step size. The step
size varies from 0.04 to 1.0. This range guarantees stability as
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size(µ)
0.7
0.8
0.9
1
Fig. 13. Steady-state MSE curves of the APA family for colored Gaussian
input using = 1 in nonstationary environments. (a)
= 1. (b) = 2. (c)
= 4. (d) = 8 [Input: Gaussian AR(1), pole at 0.9. System: FIR (16)].
K
D
K
K
K
mentioned before. The theoretical results are calculated using
(22), and the simulation results are obtained by averaging more
than 1000 instantaneous square errors in the steady-state and
then averaging 200 independent trials. The simulation results
present good agreement with the theoretical results for small
step size but deviates from the theoretical one for a larger step
sizes and larger . The theoretical MSE in [9] is almost the
same as the curve corresponding to
in Fig. 7; the MSE
expression in [9] is independent of and is therefore not able to
predict the variations in MSE as a function of . Fig. 8 shows
the steady state MSE for different delay parameters . As
increases, the MSE decreases. To compare the EMSE expressions in Sections III and V, theoretical MSE curves using (20),
(22), (23), and (59) are plotted in Fig. 9. The EMSE curves using
(20) and (22) show good agreement with the simulation results.
100
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 1, JANUARY 2004
-16
-14
(a) K=2, D=1
(b) K=2, D=2
(c) K=2, D=4
Theory
Simulation
-18
(a) K=2, D=1
(b) K=2, D=2
(c) K=2, D=4
Theory
Simulation
-16
-18
-20
(a) D=1
-22
(b) D=2
MSE in dB
MSE in dB
-20
(c) D=4
-24
(b) D=2
-22
(c) D=4
-24
(a) D=1
-26
-26
-28
-30
-28
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size(µ)
0.7
0.8
0.9
1
Fig. 14. Steady-state MSE curves of the APA family for colored Gaussian
input = 2 in nonstationary environments. (a) = 1. (b) = 2. (c) = 4
[Input: Gaussian AR(1), pole at 0.9. System: FIR (16)].
K
D
D
D
-12
-16
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size(µ)
0.7
0.8
0.9
1
Fig. 16. Steady-state MSE curves of the APA family for colored uniform input
using = 2 in nonstationary environments. (a) = 1. (b) = 2. (c) = 4
[Input: uniform AR(1), pole at 0.9. System: FIR (16)].
K
D
D
D
different values of . The simulation results show the dependence of the tracking performance on . For a given , there
exists an optimal that minimizes the MSE. Figs. 15 and 16
show the theoretical and simulated results for colored uniform
input signal.
(a) K=1, D=1
(b) K=2, D=1
(c) K=4, D=1
(d) K=8, D=1
Theory
Simulation
-14
-30
-18
MSE in dB
(a) K=1
VII. CONCLUSIONS
-20
(d) K=8
-22
(b) K=2
-24
(c) K=4
-26
-28
-30
0
0.1
0.2
0.3
0.4
0.5
0.6
Step--size(µ)
0.7
0.8
0.9
1
Fig. 15. Steady-state MSE curves of the APA family for colored uniform input
using = 1 in nonstationary environments. (a) = 1. (b) = 2. (c) = 4.
(d)
= 8 [Input: uniform AR(1), pole at 0.9. System: FIR (16)].
K
D
K
K
K
Fig. 10 shows comparison of MSE with [10]. Figs. 11 and 12
present the results for a colored uniform input signal.
C. Tracking Performance
Figs. 13–16 show the steady-state MSE tracking performance
of the APA family in a nonstationary environment. The steadystate tracking MSE in (31) is not a monotonically increasing
function of . Therefore, there exists an optimal value of step
size
that minimizes the MSE in the nonstationary case. To
see this, the range of the step-size is set from 0.04 to 1.0. We
are using an i.i.d. sequence with autocorrelation matrix
, where
. Fig. 13 shows the theoretical and simulated results for colored Gaussian input for the different value
of . For a given , there exists an optimal that minimizes
the MSE, and for a given , there exists an optimal , which
minimizes the MSE. Fig. 14 shows the tracking performance for
In this paper, we carried out a rather detailed mean-square
performance evaluation of the family of affine projection algorithms under the assumptions A1), A2), and A3’). Using energy-conservation arguments, we were able to derive expressions for the steady-state mean-square error and mean-square
deviation without restricting the distribution of the input data to
being Gaussian or white and without assuming any particular
model for the input signals. Both stationary and nonstationary
environments were considered. We also characterized the transient behavior of the filters by means of a first-order state-space
model, whose stability was shown to determine the mean-square
stability of the adaptive filter. Several simulation results were included to illustrate the application of the theory. In particular, it
was seen that there is relatively good match between theory and
practice.
APPENDIX A
EVALUATION OF
Recall that the a priori and a posteriori error vectors are defined by
..
.
where we are assuming
and
generality. From (6), we know that
..
.
without loss of
SHIN AND SAYED: MEAN-SQUARE PERFORMANCE OF A FAMILY OF AFFINE PROJECTION ALGORITHMS
when is small. Then, the following relations hold:
101
Now, we want to determine conditions on in order to guarantee
, where
..
.
Following the same argument used in [17, App. A], we can establish the condition
From these relations, we also get
ACKNOWLEDGMENT
The authors would like to thank Prof. W.-J. Song for his support of the first author’s visit to the UCLA Adaptive Systems
Laboratory.
..
.
REFERENCES
but since in steady-state
and neglecting off-diagonal terms in
, we find that
(60)
where the diagonal matrices (
,
..
..
) are given by
.
.
Note that when is small,
and
. In addition, when is close to 1 and when SNR is high,
and
so that (60) agrees with our assumption A.2.
and
are
Expression (60) suggests that other choices for
possible for assumption A.2). However, simulations show that
the simpler conditions in A.2) lead to good results.
APPENDIX B
PROOF OF THEOREM 2
From properties of Kronecker products, we know that the
eigenvalues of
are all the combinations
are the eigenvalues of
definite. Moreover,
for all
, where
. Since
, is positive
is non-negative definite.
[1] B. Widrow and S. D. Stearns, Adaptive Signal Processing. Englewood
Cliffs, NJ: Prentice-Hall, 1985.
[2] S. Haykin, Adaptive Filter Theory, 3rd ed, NJ: Prentice-Hall, 1996.
[3] K. Ozeki and T. Umeda, “An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties,” Electron.
Commun. Jpn., vol. 67-A, no. 5, pp. 19–27, 1984.
[4] S. L. Gay and J. Benesty, Acoustic Signal Processing for Telecommunication. Boston, MA: Kluwer, 2000.
[5] S. G. Kratzer and D. R. Morgan, “The partial-rank algorithm for adaptive
beamforming,” in Proc. SPIE Int. Soc. Opt. Eng., vol. 564, 1985, pp.
9–14.
[6] M. Rupp, “A family of adaptive filter algorithms with decorrelating
properties,” IEEE Trans. Signal Processing, vol. 46, pp. 771–775, Mar.
1998.
[7] S. G. Sankaran and A. A. (Louis) Beex, “Normalized LMS algorithm
with orthogonal correction factors,” in Proc. 31st Annu. Asilomar Conf.
Signals, Syst., Comput., Pacific Grove, CA, Nov. 1997, pp. 1670–1673.
[8] D. R. Morgan and S. G. Kratzer, “On a class of computationally efficient, rapidly converging, generalized NLMS algorithms,” IEEE Signal
Processing Lett., vol. 3, pp. 245–247, Aug. 1996.
[9] S. G. Sankaran and A. A. (Louis) Beex, “Convergence behavior of
affine projection algorithms,” IEEE Trans. Signal Processing, vol. 48,
pp. 1086–1096, Apr. 2000.
[10] J. Apolinário, Jr., M. L. R. Campos, and P. S. R. Diniz, “Convergence
analysis of the binormailzed data-reusing LMS algorithm,” IEEE Trans.
Signal Processing, vol. 48, pp. 3235–3242, Nov. 2000.
[11] N. J. Bershad, D. Linebarger, and S. McLaughlin, “A stochastic analysis
of the affine projection algorithm for gaussian autoregressive inputs,” in
Proc. ICASSP, Salt Lake City, UT, 2001, pp. 3837–3840.
[12] A. H. Sayed and M. Rupp, “A time-domain feedback analysis of adaptive algorithms via the small gain theorem,” Proc. SPIE, vol. 2563, pp.
458–469, July 1995.
[13] M. Rupp and A. H. Sayed, “A time-domain feedback analysis of filterederror adaptive gradient algorithms,” in IEEE Trans. Signal Processing,
June 1996, vol. 44, pp. 1428–1439.
[14] A. H. Sayed, Fundamentals of Adaptive Filtering. New York: Wiley,
2003.
[15] N. R. Yousef and A. H. Sayed, “A unified aproach to the steady-state and
tracking analyzes of adaptive filters,” IEEE Trans. Signal Processing,
vol. 49, pp. 314–324, Feb. 2001.
, “Ability of adaptive filters to track carrier offsets and random
[16]
channel nonstationarities,” IEEE Trans. Signal Processing, vol. 50, pp.
1533–1544, July 2002.
[17] T. Y. Al-Naffouri and A. H. Sayed, “Transient analysis of data-normalized adaptive filters,” IEEE Trans. Signal Processing, vol. 51, pp.
639–652, Mar. 2003.
, “Transient analysis of adaptive filters with error nonlinearities,”
[18]
IEEE Trans. Signal Processing, vol. 51, pp. 653–663, Mar. 2003.
[19] E. Eweda, “Comparison of RLS, LMS, and sign algorithms for tracking
randomly time-varying channels,” IEEE Trans. Signal Processing, vol.
42, pp. 2937–2944, Nov. 1994.
[20] G. Alexander, Kronecker Products and Matrix Calculus With Applications. New York: Halsted, 1981.
102
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 52, NO. 1, JANUARY 2004
Hyun-Chool Shin was born in Seoul, Korea, in
1974. He received the B.Sc. and M.Sc. degrees in
electronic and electrical engineering from Pohang
University of Science and Technology (POSTECH),
Pohang, Korea, in 1997 and 1999, respectively.
Since 1997, he has been a Research Assistant
with the Department of Electronic and Electrical
Engineering, POSTECH, where he is currently
pursuing the Ph.D. degree.
His research interests include adaptive filter
theory and methods applied to channel equalization
and identification.
Ali H. Sayed (F’01) received the Ph.D. degree
in electrical engineering in 1992 from Stanford
University, Stanford, CA.
He is currently Professor and Vice Chair of
electrical engineering at the University of California,
Los Angeles. He is also the Principal Investigator of the UCLA Adaptive Systems Laboratory
(www.ee.ucla.edu/asl). He has over 190 journal and
conference publications, is the author of the textbook
Fundamentals of Adaptive Filtering (New York:
Wiley, 2003), is coauthor of the research monograph
Indefinite Quadratic Estimation and Control (Philadelphia, PA: SIAM, 1999)
and of the graduate-level textbook Linear Estimation (Englewood Cliffs,
NJ: Prentice-Hall, 2000). He is also co-editor of the volume Fast Reliable
Algorithms for Matrices with Structure (Philadelphia, PA: SIAM, 1999). He
is a member of the editorial boards of the SIAM Journal on Matrix Analysis
and Its Applications and the International Journal of Adaptive Control and
Signal Processing and has served as coeditor of special issues of the journal
Linear Algebra and Its Applications. He has contributed several articles to
engineering and mathematical encyclopedias and handbooks and has served
on the program committees of several international meetings. He has also
consulted with industry in the areas of adaptive filtering, adaptive equalization,
and echo cancellation. His research interests span several areas, including
adaptive and statistical signal processing, filtering and estimation theories,
signal processing for communications, interplays between signal processing
and control methodologies, system theory, and fast algorithms for large-scale
problems.
Dr. Sayed is recipient of the 1996 IEEE Donald G. Fink Award, a 2002 Best
Paper Award from the IEEE Signal Processing Society in the area of Signal
Procesing Theory and Methods, and co-author of two Best Student Paper awards
at international meetings. He is also a member of the technical committees on
Signal Processing Theory and Methods (SPTM) and on Signal Processing for
Communications (SPCOM), both of the IEEE Signal Processing Society. He is
a member of the editorial board of the IEEE SIGNAL PROCESSING MAGAZINE.
He has also served twice as Associate Editor of the IEEE TRANSACTIONS ON
SIGNAL PROCESSING, of which he is now serving as Editor-in-Chief.