522
zyxwv
zyxw
zyxwvutsrqponmlkjih
zyxwvutsrq
zyxwvutsrqp
zyxwvutsrqp
zyxwvutsrqp
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. 2 . FEBRUARY 1991
in a variety of white additive input noise distributions, namely, for
uniform, Gaussian, and mixed noise. The mixed noise that has been
used in the simulations consists of zero mean unit variance white
Gaussian noise and 10% impulsive noise taking values + 10 and
- 10. Such an input noisy signal is shown in Fig. 2(a). The size of
the signal vector is 256 samples in all cases. The optimal filter for
the mixed Gaussian and impulsive noise is not known exactly.
However, it is close to the median filter, as it has been found by
using (5) and by evaluating R numerically. The adaptive L-filter
has been chosen to have n = 5 coefficients. In all experiments the
initial filter coefficients have been chosen a, = 1 / 5 , i = 1, . . . ,
5, so that the initial L-filter is equivalent to the moving average
filter. The value of the step size chosen for the LMS L-filter in the
case of the mixed noise is p =
This step is smaller than that
given by (20) and guarantees stability. The update of the filter coefficients was done according to (15). However, since the noise distribution is symmetric about the mean of the distribution, the
unbiasedness condition [l 11
n
has been enforced at each step of the algorithm by scaling the coefficients obtained by (15). The output of the adaptive LMS L-filter
is shown in Fig. 2(b). It is clearly seen that its performance is very
good after a short adaptation period. The plot of the filter coefficients a , , a 3 , a5 corresponding to minimum, median, and maximum, is shown in Fig. 3(a). As expected, it is seen that the filter
coefficient u l , u5 decrease and that u3 increases. This fact indicates
that the adaptive L-filter tends to the median in this case. The coefficient estimation error J ( a, i ) is defined as follows:
(39)
where a;p‘, j = 1, . . . , n is the optimal set of coefficients. This
error function is plotted in Fig. 3(b). It clearly decreases with time.
However, the convergence speed of the LMS algorithm is relatively slow. The performance of the adaptive RLS L-filter in the
presence of mixed impulsive and Gaussian noise is shown in Fig.
3(b), (c). By observing Fig. 3(b) it is seen that it has much faster
convergence than the LMS filter. It also converges to the median
filter, as is indicated in Fig. 3(c), because the coefficients a , , u5
tend to zero and the coefficient a3 tends to 1.
Both RLS and LMS adaptive L-filters have also been tested for
the cases of white additive uniform and Gaussian noise. As expected, they converge to the arithmetic mean and to the midpoint
filter, respectively.
An open question which is still under investigtion is the theoretical
study of the convergence properties of the algorithms that have
been presented in this correspondence. We feel that tighter bounds
on the convergence rate can be imposed.
REFERENCES
T. Alexander, Adaptive Signal Processing. Berlin: Springer, 1986.
M. Bellanger, Adaptive Digital Filters and Signal Analysis. Marcel
Dekker, 1987.
J. D. Proakis and D. G. Manolakis, Znfroduction to Digifal Signal
Processing. New York: Macmillan, 1988.
G. L. Sicuranza and G. Ramboni, “Adaptive nonlinear digital filters
using distributed arithmetic,” IEEE Trans. Acoust., Speech, Signal
Processing, vol. ASSP-34, no. 3, pp. 518-526, June 1986.
J . C. Slapeton and S . C. Bass, “Adaptive noise cancellation for a
class of nonlinear dynamic reference signals,” in Proc. Int. Symp.
Circuits Sysr., 1984, pp. 268-271.
I. Pitas and A. N. Venetsanopoulos, “Nonlinear order statistic filters
for image filtering and edge detection,” Signal Processing, vol. IO,
pp. 395-413, 1986.
R. Bernstein, “Adaptive nonlinear filters for simultaneous removal
of different kinds of noise in images,” IEEE Trans. Circuits Syst.,
vol. CAS-34, no. 11, pp. 1275-1291, Nov. 1987.
X. Z. Sun and A. N. Venetsanopoulos, “Adaptive schemes for noise
filtering and edge detection by use of local statistics,” IEEE Trans.
Circuits Syst., vol. CAS-35, no. 1, pp. 57-69, Jan. 1988.
H. A David. Order Statistics. New York: Wiley. 1981.
[lo] 1. Pitas and A. N. Venetsanopoulos, Nonlinear Digital Filters: Principles and Applications. Kluwer Academic, 1990.
[ I I ] A. C. Bovik, f.S . Huang, and D. C. Munson, “A generalization of
median filtering using combinations of order statistics,” IEEE Trans.
Acoust., Speech, Signal Processing, vol. ASSP-31, no. 6, pp. 13421349, Dec. 1983.
[12] A. S. Householder, The Theory ofMarrices in Numerical Analysis.
Waltham, MA: Blaisdell, 1964.
~~
zyxwvu
Blind Equalization of Digital Communication
Channels Using High-Order Moments
Boaz Porat and Benjamin Friedlander
Abstract-This correspondence describes new algorithms for blind
equalization of digital communication channels of the QAM type. The
algorithms use the fourth-order statistical moments of the symbol sequence to explicitly estimate the channel impulse response. The estimated impulse response is used, in turn, to construct a linear meansquare error equalizer.
V. CONCLUSIONS
A nonlinear adaptive L-filter is presented that can easily adapt to
the noise probability distributions. It can be used for the filtering
of both long-tailed and short-tailed distributions. It performs well
in the cases where linear adaptive filters fail, e.g., in the case of
impulsive noise. Two different algorithms to update the filter coefficients have been presented. They have a close resemblance to the
LMS and RLS adaptive algorithms used in the adaptive linear FIR
filters. The LMS algorithm can be easily implemented, it has low
computational complexity, but it converges slowly to the optimum.
Faster convergence rates can be obtained by using the RLS
adaptation algorithm. However, this algorithm has a much higher
computational complexity. This fact restricts its use in real-time
applications and in image processing. In general, the adaptive
L-filters have definite advantages over their linear counterparts. The
extra computational complexity needed involves only the calculation of running ordering. However, this computational load is not
large if special running ordering algorithms or structures are used.
I. INTRODUCTION
The most common approach to adaptive equalization of digital
communication channels is by using known training sequences. The
equalizer is typically a linear transversal (FIR) filter, the coefficients of which are adjusted to minimize the mean-square error.
After the initial training, the equalizer is normally switched to a
decision-directed mode, where the detected symbols are considered
Manuscript received May 7, 1989; revised April 16, 1990. This work
was supported by the National Science Foundation under Grant ISI-87600
95 and by the Army Research Office under Contract DAAL 03-89-C-0007.
B. Porat is with the Department of Electrical Engineering, TechnionIsrael Institute of Technology, Haifa 32000, Israel.
B. Friedlander is with Signal Processing Technology, Ltd., Palo Alto,
CA 94303.
IEEE Log Number 9041 114.
zyxwvutsr
1053-587X/91/0200-0522$01.OO 0 1991 IEEE
I
zyxwvutsrq
IEEE TRANSACTIONS ON SIGNAL PROCESSING,
VOL. 39,
NO. 2, FEBRUARY 1991
as the true ones, and the error is used to adjust the equalizer gains
as in the training mode.
The situation where initial training sequence is not available at
the receiver is known as the blind equalization problem. In cases
where the intersymbol interference and the noise are sufficiently
small so that “the eye is open,” decision-directed equalization is
possible even without an initial training sequence. However, in the
presence of severe intersymbol interference (“when the eye is
closed”), other approaches are necessary.
Most of the existing works on blind equalization use nonquadratic cost functions, and recursively minimize these functions with
respect to the equalizer’s parameters. Typically, the adaptation is
done by gradient-type algorithms, e.g., the LMS or its numerous
variations. Some important works on the blind equalization problem are [1]-[8]
In this correspondence we propose two algorithms for blind
channel equalization for general QAM signals. These algorithms
are based on the fourth-order statistical moments of the received
data sequence. The first of the two is the linear least squares type
algorithm, which uses ideas put forward by Giannakis and Mendel
[9]. The second algorithm is the nonlinear least squares type, and
is based on the previous works of the present authors, mainly [lo]-
Wl.
523
channel parameters { hk} from the output sequence { y,, 1 5 t 5
T }. The estimated parameters are then used to reconstruct (detect)
the symbol stream {U,} by means of a linear equalizer.
111. LINEARLEASTSQUARES
ESTIMATION
ALGORITHM
In this section we derive a simple linear least squares algorithm
for estimating the channel response { hk} from the second- and
fourth-order moments of the output sequence. The algorithm is
based on the approach of Giannakis and Mendel [9], with proper
modifications necessary for the complex noncausal model at hand.
The algorithm is not optimal in any sense, but its performance can
be quite satisfactory in cases where the intersymbol interference is
not severe, or when the time variation of the channel is very slow,
so that a large number of data points can be accumulated for a
single estimate of the channel parameters. This algorithm is also
essential to the nonlinear least squares algorithm described in the
next section, where it is used for initialization purposes.
Let us denote
r(m)
d(m) = ~ { x ~ x ~ + ~ x f + , } ;
= ~{x:x,+,};
c ( n ) = d(m) - 2r(O)r(m).
(4)
The { r ( m ) } are the covariances of the process. The { d ( m ) } are
the so-called diagonal fourth-order moments, while the { c ( m ) }
are the corresponding diagonal fourth-order cumulants.
It is easy to show, using standard techniques, that { r ( m ) , c ( m ) }
are given by the following expressions:
zyxwvu
zyxwvutsrqp
zyxwvutsrqponm
11. PROBLEMFORMULATION
The communication signals considered in this paper are of the
quadrature amplitude modulation (QAM) type. A QAM symbol can
be described as a complex number, belonging to a discrete set in
the complex plane, called the symbol constellation.
Let { U,} denote the symbol stream. The U, are assumed to be
independent identically distributed random variables. Each U, can
take one of M possible values with probability 1 /M.The symbol
constellation is assumed to possess sufficient symmetry such that
all odd-ordered moments are zero. The even-order moments will
be denoted by
The communication channel is assumed to be linear. In reality,
it is usually time varying. However, we make the common assumption that the time variation is sufficiently slow compared to
the rate at which the channel parameters are to be estimated. In
other words, the channel is assumed to be time-invariant during the
observation interval.
The equivalent discrete-time complex impulse response of the
channel is denoted by { hk } . This includes the transmit and receive
filters, the channel itself and the sampler (one sample per symbol
is assumed). Thus, in the absence of‘noise, the output symbol sequence is given by
r ( m ) = 71.1
t; hk*hk+rn;
c(m)=
c(2.2
hk*hk*+rnh:+rn
where P2,2 = 7 2 . 2 - 2 d . I .
Let C ( z ) be the z transform of { c ( m ) } , i.e.,
(6)
zyx
zyxwvutsrqpon
where H ( z ) is the transfer function of the channel, and H 3 ( z ) is
the z transform of {hTh?; -ql 5 I 5 q 2 } .
Let R ( z ) be the z transform of { r ( m ) } , i.e.,
R(z) =
71.1
c
hk*hk+mZ-rn= y l , l H * ( Z - ’ ) H ( z ) .
rn
(7)
We get from (6) and (7)
C(z)H(z)
+ EH3(Z)R(Z) = 0 .
(8)
where E = - c L 2 , 2 / Y I . I .
Since the impulse response { h,} is finite, we can write (8) explicitly as
zyxwvutsrqpo
zyxwvutsrqpon
zyxwvutsrqp
The channel impulse response is infinite in general. However, as
commonly seen in the communication literature, we approximate
it by a finite impulse response. We also assume an additive complex Gaussian white noise with variance a:, so the actual sampler’s
output is
(5
(5
( r n $ q ~ ( m ) ~ - r+n )E
h,z-*)
k = -41
k=-ql
h*h2k z - k )
42
Yr =
k = -41
hkU,-k
+ U,.
(9)
(3)
The order of the impulse response will be denoted by q = ql + q2.
Noncasuality is introduced into the model in order to enable us to
assume that the impulse response is effectively centered at ho, i.e.,
that the ideal response is n, = U,. Alternatively, we could have used
a causal model and assumed that the impulse response is centered
at some hL ( L > 0). There is no loss of generality in assuming that
ho = 1, and scaling the symbol sequence { U ,} accordingly. This
scaling can be complex in general, i.e., it can include both amplitude scaling and phase shift.
The problem we address in this paper is the estimation of the
Let us define
qk =
Eh,*h:;
-41 5 k
5
t 10)
q2.
We treat the { q k } as free parameters, even though in reality they
depend on the impulse response parameters { hk } . Using our assumption that ho = 1, we can rewrite (9) in the following form:
=
-c(I
-
k ) ; -(2q,
+ q2) 5
1 5 (2q2
+ ql).
(11)
524
zyxw
zyxwvutsrqponmlkjih
zyxwvutsrq
zyxwvutsrq
zy
zyxwvutsrq
zyxwvuts
zyxwvutsrq
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. 2. FEBRUARY 1991
This is an overdetermined set of equations, with 3q + 1 equations
1 unknowns (regarding the qk as independent of the h k ) .
and 2q
The coefficient matrix and the right-hand side vector of this set
dependonthemoments{r(m),O Im 5 q } (sinter(-rn) =
r * ( m ) ) and { c ( m ) , - q i m 5 q } . If { c ( m ) , r ( r n ) } were
known, it could be solved exactly. The channel estimation algorithm uses, instead, estimates of these quantities, computed from
the measurements as follows:
+
channel parameters 8 is very difficult to express analytically.
Therefore, estimation based on global minimization of (17) cannot
be regarded as a practical algorithm.
Suppose, however, that we can find a consistent estimate of
E ( e ) , computed directly from the measurements { y r , 0 5 t 5 T } ,
and denote this estimate by 2. Let 8 be the global minimizer of the
cost function
+(e)
= [s(e)
-
i]"~-l[s(e)-
$1.
(18)
In [12] it was shown that, subject to some regularity conditions
(formally defined in [12]), this estimate is also asymptotically minimum variance, i.e., it achieves the same asymptotic variance as
the global minimizer of (17). Thus, nonlinear minimization of (18)
can lead to practical estimation algorithm, provided we can find a
consistent estimate, E, of C ( e ) .
Similarly to [12], the entries of the matrix can be computed as
follows:
c
When the sample moments { i ( m ) , i ? ( m ) } are substituted in
(1 l ) , we obtain a set of equations of the form
AX
=
-b.
T . cov { i ( k ) , P * ( l ) }
(13)
zyxwvutsrqponm
The individual entries of the matrix A and the vectors x and b are
obvious from (11). This set is solved in the least squares sense,
i.e.,
x = -(AHA)-'A"~.
T . cov
{ i(k),d*(I)}
(14)
The estimates { h,, G k } are finally extracted from the components
of x.
The values of 7 , . p 2 , 2can be estimated, if desired, by
IV. NONLINEAR
LEASTSQUARESESTIMATION
ALGORITHM
In this section we describe a nonlinear least squares algorithm
that is asymptotically minimum variance in a sense defined below.
The algorithm is based on ideas developed in [12]. Because of its
improved accuracy, the algorithm is useful in cases where the
channel has severe intersymbol interference, and/or the number of
data points available for a single estimate is relatively small. On
the other hand, the nonlinear algorithm requires considerably more
computations than the linear one, which may limit its use in some
real-time applications.
Let 0 denote the vector of unknown parameters-the channel impulse response { hk } and possibly the moments of the symbol constellation y I ,I , p 2 , 2 .Let us denote by s a vector consisting of some
subset of the moments { r ( m ) ; 0 Im Iq } and { d ( m ) ;- q 5
m 5 q } . Similarly, we will denote by the corresponding vector
of estimated moments, defined in (12a)-(12c). The dimension of s
is required to be equal or larger than the dimension of 0. In the
extreme case, s consists of the entire set of moments, and then the
entries of 4 are exactly the moments used in the linear least squares
algorithm described in Section 111.
Let C (e) b e the asymptotic normalized covariance matrix of
the vector 4, i.e.,
q e ) = Tlim
-m
T . E { ( $ - s)(i - s)"]
(16)
where ( . ) " denotes complex conjugate transposition.
Let V ( 0 ) be the nonlinear cost function
v(e) = [?(e)
-
i]"~-l(e)[qe) - 41.
(17)
In [lo] it was shown that the global minimizer of V (e ) is an asymptotically minimum variance estimate of 8 in the class of estimates
which are "well behaved" functions of j . (in a sense formally defined in [lo]). However, the dependence of the matrix C ( e ) on the
where
i,
,i
= min (0, k ) - max (0, I ) - q
=
max (0, k ) - min (0, I )
+ q.
(19d)
The nonlinear least square estimation algorithm can now be described as follows.
i) Compute the estimated moments from (12a)-(12c).
ii) Compute the estimated, covariances of the estimate! moments
from (19a)-( 19c) and use them to construct the matrix E.
iii) Run the linear least squaJes algorithm described in Section
I11 and obtain the estimates { h,, ql, I , fi2, } . Build the vector Bo
from these estimates. Bo serves as an initial condition to the nonlinear minimization that follows.
iv) Use some nonlinear minimization procedure to minimize the
cost function $ ( e ) with respect to 8 , starting with the above initial
condition.
V. THE EQUALIZER
Equalization schemes can be broadly classified into two categories: linear equalizers, and decision-directed equalizers [ 131.
Blind equalization is characterized by relatively inaccurate estimation of the channel (compared to equalization based on a training
sequence), especially during the initial phase. Under such circumstances, decision-directed equalization tends to yield relatively high
error rates, and may even lead to divergence. For this reason, we
have chosen to test the channel estimators described in the previous
sections with a linear equalizer. Of course, after initial convergence (when the "eye" becomes sufficiently open), it may be desirable to switch to decision-directed equalization. Switching to
decision-directed mode is considered "safe" when the intersymbol
I
zyxwvut
zyxwvuts
zyxw
zyxwvutsrq
525
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. 2, FEBRUARY 1991
t: \
-181
0
1
"
02
04
1
06
'
08
"
1
12
'
14
"
16
N
18
'
-15'
-15
2
5
-10
0
5
I
15
IO
REAL PART
x10'
Fig. 2. The symbol constellation before equalization.
Fig. 1. Companson between the experimental and the theoretical ISI.
zyxwvuts
2
interference (ISI) gets below a certain threshold.' This threshold is
in the range of - 15 dB [8] for quaternary QAM signals, and may
be lower for multilevel QAM signals.
The equalizer is chosen as a linear transversal (FIR) filter whose
coefficients are computed so as to minimize the mean-square error
(MSE). Let { gk; -K 5 k 5 K } denote the impulse response of
the equalizer corresponding to the estimated channelresponse { i k } .
We denote the column vector of the { & } by g.
Let e be a column vector of dimension 2 K
q
1 with 1 in
the ( K + q1 + 1 )th position and zeros elsewhere. Let P be the ( 2 K
q
1 ) X (2K
1 ) matrix whose ( i ,j ) t h entry is ki-j-ql (or
zero, if the subscript i - j - q1 is outside the range [ - q l , q 2 ] ) .
It is well known [13, pp. 205-2171 that the minimum mean-square
error equalizer g is given by
zyxwvutsrqp
+ +
+ +
+
zyxwvutsrq
REAL PART
Fig. 3. The symbol constellation after equalizatlon.
VI. AN EXAMPLE
In this section we illustrate the performance of the blind equalization method proposed in this paper by an example. The channel
impulse response was taken as { 2 - 0.4j, 1.5 + 1.8j, 1 , 1.2 1.3j, 0.8
1.6j }. This channel has an initial IS1 of + 12 dB, i.e.,
it represents a case of extremely severe distortion. The signal was
quaternary QAM, and the SNR was 40 dB. The length of the equalizer was taken as 65.
Fig. 1 shows the mean residual IS1 (after equalization), obtained
from 100 Monte Carlo runs, for ten values of data length, up to
20 000 (marked by the circles in the graph). Also shown in this
figure is the corresponding theoretical IS1 (the computation of which
is not discussed in this correspondence). As we see, the actual performance matches the theoretical one very well.
Figs. 2 and 3 show the received and the equalized symbol constellations, respectively, of a single simulation. Here we used
15 000 data points to estimate the channel, of which 2000 are shown
in the figures. It is interesting to observe the apparent rotation of
the reconstructed constellation, due to the imperfect estimation of
the channel. Fig. 4 shows the frequency responses of the channel,
before and after equalization. As we see, the response of the equalized channel is almost flat, except for the notch at frequency 0.47.
+
VII. CONCLUSIONS
We have presented two methods for blind equalization of QAM
channels, based on the fourth-order moments computed from the
'The IS1 is defined as the sum of square magnitudes of all nonzero terms
of the total impulse response (channel plus equalizer), divided by the square
magnitude of the zero term.
m
p
-10
0'1
0'2
0:3
04
05
0'6
fKquency
07
0'8
09
07
08
09
1
1
zyxwvu
zyxwvutsrqpo
PHASE
200
100
3E
8
O
-100
m0
01
02
03
04
05
06
f7equency
1
Fig. 4. The frequency responses of the channel before and after equali.
zation.
received symbol stream. Both methods estimate the channel parameters explicitly, and use the estimated parameters to construct a
linear MSE equalizer.
The first equalizer is based on the least squares solution of a
linear set of equations. The performance of this equalizer is not
optimal in any sense, but it is adequate for channels with mild intersymbol interference, or when the number of data points available for estimating the channel response is very large. Another
application of the linear least squares algorithm is to provide initial
estimates to the second, nonlinear least squares algorithm.
526
zyxwvutsrqponmlkjihgf
zyxw
IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 39, NO. 2. FEBRUARY 1991
The second equalizer is based on nonlinear minimization of a
certain cost function, which takes into account the second-order
statistical properties of -the estimated moments. This algorithm is
asymptotically minimum variance in a sense defined in Section IV.
Its convergence rate is very fast, but its computational complexity
is considerably higher than that of the first equalizer.
ing, based on energy. As such, it is ah almost immediately usable,
reasonably robust, and easily implemented algorithm. However,
the necessary parameter setting and adjusting detracts from its ease
of implementation. This correspondence will suggest an improved
method of determining the probability density function of the energy in the recording, from which energy level based threshold parameters can be automatically determined. The second part will
clarify and correct the flowcharts presented in the original paper.
The notation will be that of Lamel’s flowchart.
zyxwvutsrqpo
zyxwvutsrqpon
zyxwvutsrqp
REFERENCES
[l] Y. Sato, “A method of self-recovering equalization for multilevel
amplitude-modulation systems,” ZEEE Trans. Commun., vol. COM23, no. 6, pp. 679-682, June 1975.
[2] A. Benveniste, M. Goursat, and G. Ruget, “Robust identification of
a nonminimum phase system: Blind adjustment of a linear equalizer
in data communications,” ZEEE Trans. Automat. Contr., vol. AC25, no. 3, pp. 385-399, June 1980.
[3] A. Benveniste and M. Goursat, “Blind equalizers,’’ IEEE Trans.
Commun., vol., COM-32, no. 8, pp. 871-883, Aug. 1984.
[4] D. N. Godard, “Self-recovering equalization and carrier tracking in
two-dimensional data communication systems, IEEE Trans. Commun., vol. COM-28, no. 11, pp. 1867-1875, Nov. 1980.
[5] G. B. Foschini, “Equalizing without altering or detecting data,”
AT&TTech. J . , vol. 64, no. 8, pp. 1885-1911, Oct. 1985.
[6] J. C. Treichler and B. G. Agee, “A new approach to multipath correction of constant modulus signals,” IEEE Trans. Acoust., Speech,
Signal Processing, vol. ASSP-31, no. 2, pp. 459-412, Apr. 1983.
[7] J. R. Treichler and M. G. Larimore, “New processing techniques
based on the constant-modulus adaptive algorithm,” IEEE Trans.
Acoust., Speech, Signal Processing, vol. ASSP-33, no. 2, pp. 420431, Apr. 1985.
[SI Z. Pritzker and A. Feuer, “The variable length stochastic gradient
algorithm,” submitted for publication.
[9] G. B. Giannakis and J. M. Mendel, “Identification of nonminimum
phase systems using higher order statistics,” ZEEE Trans. Acousr.,
Speech, Signal Processing, vol. 37, no. 3, pp. 360-377, Mar. 1989.
[IO] B. Porat and B. Friedlander, “Performance analysis of parameter estimation based on high-order moments,” J . Adaprive Confr., Signal
Processing, vol. 3, pp. 191-229, 1989.
[ l l ] B. Friedlander and B. Porat, “Adaptive IIR filtering based on highorder statistics,” IEEE Trans. Acoust., Speech, Signal Processing,
vol. 37, no. 4, pp. 485-495, Apr. 1989.
1121 B. Friedlander and B . Porat, “Asymptotically optimal estimation of
MA and ARMA parameters of non-Gaussian processes from highorder moments,” ZEEE Trans. Automat. Contr., vol. 35, no. 1, pp.
27-35, Jan. 1990.
[13] A. P. Clark, Equalizers f o r Digital Modems. London: Pentech,
1985.
”
11. AUTOMATIC
DETERMINATION
OF THRESHOLDS
The level-based parameters depend on the ambient noise and can
be automatically set by examining the energy level distribution in
a single recording. For recordings with noise quieter than 35 dB
below the speech, the energy level thresholds K I throu h K,, which
are required for the algorithm in Lamel et al.’s paper to run, can
be determined from the distribution of energy. Lamel suggests generating a histogram to estimate the noise level, but a procedure
given in [l] can also estimate the voicing signal level and is simpler, faster, and more precise. The procedure is as follows. First,
sort the frames’ energies so that R ( i ) < R ( i + 1 ). Then, the
probability of the energy R being between R ( i ) and R ( i + J ) is
inversely proportional t o R ( i
J ) - R ( i ) , i.e.,
4
+
PDF[i
+ 3/21
=
constant
R(i
+ J) - R(i)
where PDF is an estimate of the probability density of the energy
in the entire recording. The interval J should be chosen such that
the resulting distribution shows 2 modes-one for noise, and one
for voiced speech. A reasonable value for J is the number of frames
corresponding to 0.25 s.
From the distribution, Lamel’s K,, the “averaged” background
noise level, can be estimated as the mode (the local maximum of
PDF from (1)) of the distribution among the lowest 10 dB, and the
typical voicing level, U of Table I, can be assumed to be the mode
of the highest 15 dB. The rest of the K thresholds can be estimated
as shown in Table I. Care should be taken to ensure that K2 is
greater than K3.This is no problem if the signal-to-noise ratio of
the recording is over 30 dB.
zyxwvutsrqpon
zyxwvu
Comments on “An Improved Endpoint Detector for
Isolated Word Recognition”
Ben Reaves
Abstract-Robust word boundary detection remains an unsolved
problem. This correspondence presents an automatic threshold setting
algorithm and corrects a paper on single utterance word boundary detection in a quiet environment, promising great improvement in the
accuracy of fully automatic word recognition and assistance in the hand
labeling of endpoints.
I. INTRODUCTION
In their paper,’ Lamel et al. published a detailed flowchart for
determining the boundaries of a single utterance within a recordManuscript received October 5, 1989; revised April 30, 1990.
The author is employed by the Speech Technology Laboratory (a Division of Panasonic Technologies Inc.), Santa Barbara, CA, and is working
at Matsushita Electrical Industrial Company, Central Research Laboratory,
Moriguchi, 570, Japan.
IEEE Log Number 9041 120.
‘L. F. Lamel, L. R. Rabiner, A. E. Rosenberg, and J. G. Wilpon, ZEEE
Trans. Acoust., Speech, Signal Processing, vol. ASSP-29, pp. 777-785,
Aug. 1981.
111. CLARIFICATIONS
AND CORRECTIONS
Because of some inconsistencies between the notation of the text
and that of the flowchart in the original paper, it was not clear how
to set some of the time-based parameters. Table I1 will be of assistance here.
The following corrections are necessary for obtaining the results
published in Lamel e t al.’s paper.’ They refer to Figs. 1 through
5.
1) In Fig. 1, on the path from the “YES” output of “ K , <
& ( l ) 5 K2,” a box labeled “CHECK l,” with L and 1 as arguments, is inserted.
2) The “CHECK 1” subroutine, Fig. 2, has an “I = I - 1”
box inserted between the “YES”output of the “ 1 > L” decision
and the “2.”
3) The left side decision of Fig. 3 is j ‘ j - 1 = I. The prime
is removed from the corresponding figure in Lamel et al.’s work.
4) The lower “LEVEL CHECK” subroutine of Fig. 4 shows K
1 as indices for PB and P E , not K as in Lamel et al.’
5) Fig. 5 shows a “NO” decision from “a # 1 & b # L”
which was omitted in Lamel et al.
6) The in the first section, labeled “Pulse Reordering,” of
Fig. 5(b) of Lamel et al.’ represents a zero. Only here was zero
represented by 4; elsewhere it was represented by 0.
+
’
+
+
’
zyxwvutsr
1053-587X/91/0200-0526$01.OO 0 1991 IEEE