Notes
Notes
UW Summer 2006
Sequences
As mentioned, a sequence is a set of numbers, such as x={0,1,2,0,1,2}. But sequences are usually indexed, which means that we associate it with another set of equal length that contains sequential numbers, for example n={-1,0,1,2,3,4}. Sequence notation We use the following notation for a sequence: x[n]={0,1,2,0,1,2} for n=-1,,4. Displaying a sequence We can graphically display this sequence as follows:
x [n]
1
-1
If we had defined the sequence on another index set, for example x={0,1,2,0,1,2} for n=-3,,2
SPHSC 503 Speech Signal Processing then the sequence would be displayed as
x [n] 2
2
UW Summer 2006
-3
-2
-1
Note that a sequence with the same values but with different indexes is a different sequence. Sequence conventions 1. Specifying a sequence without specifying an index usually means the sequence starts at n=0. For example, x[n]={0,1,2,0,1,2} means x[n]={0,1,2,0,1,2} for n=0,,5. 2. A sequence is assumed to be 0 outside the specified indices. Some important sequences Unit sample sequence, also known as the impulse sequence Notation:
[ n] =
Graphical:
1 n = 0 0 n 0
1
-2 -1 0 1 2
1 n 0 u[n] = 0 n < 0
Graphical:
u[n] 1 1 1 1
-2 0 1 2 3
SPHSC 503 Speech Signal Processing Sinusoids For example x[n] = sin(2 n / 8)
1 x[n] 1
UW Summer 2006
-8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8
-1
-1
Periodic sequences Defined as x[n] = x[n + N ] for all n and some integer N. For example x[n] = sin(2 n / 8) , see figure above.
Systems
We can represent a system in the following diagram:
x[n] input T system y[n] output
and the following notation: y[n] = T {x[n]} . In this representation, T can be anything. But if we dont put any further restrictions on T, then theres not much more we can say about the system. Therefore, we usually look at special kinds of systems in digital signal processing.
Special system: a linear system Given a system T, y[n] = T {x[n]}
x[n]
y[n]
If the input x[n] is multiplied (scaled) by a constant, a, then the output of a linear system will be scaled by the same amount, or: T {a x[n]} = a T {x[n]} = a y[n] .
ax[n] T ay[n]
Furthermore, if it is known about the system T that y1[n] = T {x1[n]} and y2 [n] = T {x2 [n]}
x1[n]
y1[n]
x2[n]
y2[n]
UW Summer 2006
If two inputs are added, the output of a linear system will be the two outputs added, or: T {x1[n] + x2 [n]} = T {x1[n]} + T {x2 [ n]} = y1[ n] + y2 [n]
x1[n]+x2[n]
y1[n]+y2[n]
These two properties of multiplication and addition of a linear system can be combined into the property of superposition:
T {a x1[n] + b x2 [n]} = T {a x1[ n]} + T {b x2 [n]} = a T {x1[ n]} + b T {x2 [n]} = a y1[n] + b y2 [n]
ax1[n]+bx2[n]
ay1[n]+by2[n]
Special system: a time-invariant (a.k.a. shift-invariant) system Given a system T, y[n] = T {x[n]}
x[n]
y[n]
If the input x[n] is delayed (shifted) by N samples, then the output of a time-invariant system will be shifted by the same amount, or: T {x[n N ]} = y[n N ] .
x[n-N] T y[n-N]
Special system: a linear, time-invariant (LTI, or LSI) system A linear, time-invariant system is a system that combines the properties of linearity and timeinvariance. Consider a LTI system T. Suppose we know about T that:
when the input x[n] is the impulse sequence, i.e., x[n] = [n] then the output y [n] = {1,1} for n = 0,1
With this information, we are able to determine the output of this LTI system for all other possible input sequences. Lets see how Given some input sequence x[n], for example x[ n] = {0,1, 2,0,1, 2} for n = 0,...,5 . We define the following set of sequences (see the diagram below)
UW Summer 2006
Using these sequences, we can reconstruct the input sequence x[n] by summing all of these sequences, x[n] = xi [n] , as illustrated in the diagram below.
i
2 1 x[n] 0 1 2 3 1 4
x [n] 0 0 x [n] 1 0
1 1 1
Note that the xi [n] sequences are all scaled and shifted impulse sequences:
2 2
x [n] 2 0
x [n] 3 0 x [n] 4 0
1
1
2
2
3
3
4
4 1 4
5
5
5 2
Therefore, we can reconstruct the input sequence by summing these scaled and shifted impulse sequences:
x [n] 5 0
2 1 0 1 2 3 1 4
For each of the xi [n] sequences, we can find the output of the LTI system, as follows: It was given about the system that when the input is the impulse response, then the output of the system is T {[n]} = y [n] By using the time-invariance property, we find that the output for a shifted impulse is: T {[n i ]} = y [n i ] By using the scaling property of linearity, we find that the output for the scaled (and shifted) impulse is: T {[n i ]x[i ]} = y [n i ]x[i ] So for the sum of the xi [n] sequences we can find the output of the LTI system by using the additive property of linearity: T {x[n]} = T [n i ]x[i ] i
= T {[n i ]x[i ]}
i
= T {[n i ]}x[i ]
i
= y [ n i ]x[i ]
i
SPHSC 503 Speech Signal Processing The whole process is illustrated by the diagram below:
1 1 1 1 2 3
UW Summer 2006
Given:
[n] 0
y [n] 0
1 1 1
y [n] 0 0 y [n] 1 0
1 1 1
2 1 2 2
2 2
3 2
1 1
2 2
3 3
4 4 1 4
5 5
6 6
1 1
2 2
3 3
4 4 1 4
5 5 1 5 2
6 6
5 2
6 2
x [n] 5 0
2 2
5 2
y [n] 5 0
2 3
3 2 3
5 3
6 2 6
1 2 3 4 5 6 y[n] 0
Sum:
1 1 2
1 4 5
x[n] 0
Conclusion about LTI systems: impulse response Given the impulse response of an LTI system:
1 [n] 0 1 2 3 4 5
1 y [n] 0
1 1 2 3 4 5 6
Then the output of the system for any input can be found using the LTI properties of scaling, shifting and summing. A linear, time-invariant system is therefore completely described by its impulse response:
x[n] input
y[n]
y[n]
UW Summer 2006
With this information, we are able to determine the output of this LTI system for all other possible input sequences. Lets see how Given some input sequence x[n], for example x[n] = {0,1, 2,0,1, 2} for n = 0,...,5 . We define the following set of sequences (see also the figure below)
x[n], if n = i xi [ n] = otherwise 0,
Using these sequences, we can reconstruct the input sequence x[n] by summing all of these sequences, x[n] = xi [n] , as illustrated in the figure below.
i
UW Summer 2006
x [n] 0 0 x [n] 1 0
1 1 1
Note that the xi [n] sequences are all scaled and shifted impulse sequences:
x[ n], if n = i xi [ n] = otherwise 0, = [ n i] x[i ]
shift scaling
2 2
x [n] 2 0
x [n] 3 0 x [n] 4 0
1
1
2
2
3
3
4
4 1 4
5
5
5 2
Therefore, we can reconstruct the input sequence by summing these scaled and shifted impulse sequences: x[n] = xi [n] = [n i ]x[i ]
i i
x [n] 5 0
2 1 0 1 2 3 1 4
For each of the xi [n] sequences, we can find the output of the LTI system, as follows: It was given about the system that when the input is the impulse response, then the output of the system is T {[ n]} = y [ n] By applying the time-invariance property to this expression, we find that the output for a shifted impulse is: T {[n i ]} = y [n i ] By applying the scaling property of linearity, we find that the output for the scaled (and shifted) impulse is: T {[ n i ]x[i ]} = y [n i ] x[i ] So for the sum of the xi [n] sequences we can find the output of the LTI system by using the additive property of linearity: T {x[n]} = T [n i ]x[i ] i
= T {[n i ]x[i ]}
i
= T {[n i ]}x[i ]
i
= y [n i ]x[i ]
i
UW Summer 2006
Given:
[n] 0
y [n] 0
1 1 1
y [n] 0 0 y [n] 1 0
1 1 1
2 1 2 2
2 2
3 2
1 1
2 2
3 3
4 4 1 4
5 5
6 6
1 1
2 2
3 3
4 4 1 4
5 5 1 5 2
6 6
5 2
6 2
x [n] 5 0
2 2
5 2
y [n] 5 0
2 3
3 2 3
5 3
6 2 6
1 2 3 4 5 6 y[n] 0
Sum:
1 1 2
1 4 5
x[n] 0
Conclusion about LTI systems: impulse response Given the impulse response of an LTI system:
1 [n] 0 1 2 3 4 5
1 y [n] 0
1 1 2 3 4 5 6
Then, as weve seen in the example above, the output of the system for any input can be found using the LTI properties of scaling, shifting and summing. A linear, time-invariant system is therefore completely described by its impulse response:
x[n] input
y[n]
y[n]
If we have two identical systems, then their impulse responses must be the same. And vice versa, if we have two systems with identical impulse responses, then they are the same system.
UW Summer 2006
The convolution sum As illustrated by the example above, any sequence can be viewed as a sum of scaled, shifted impulse sequence x[n] = [ n i ] x[i ]
i
As a result, an LTI systems output can be viewed as a sum of scaled, shifted impulse responses y[n] = y [n i ]x[i ]
i
This special summation is called the convolution sum. It is usually written using the following conventions: Notation for the impulse response: h[n] Variable of summation: k Range of summation: negative infinity to infinity Short-hand notation:
y[n] =
k =
h[n k ]x[k ]
= h[n] x[n]
The convolution sum is one way to implement a system in Matlab. To compute the output of a system for a certain input sequence, you can evaluate the convolution sum for all values of n, using the systems impulse response and the input sequence.
-1 0 -1
1 -1
UW Summer 2006
y[ n] =
k =
h[n k ]x[k ]
Let the input to this system be the complex frequency sequence x[n] = e jn . In this equation, j1 stands for the imaginary unit, which you may remember is defined as j = 1 , and variable is a number between 0 and that indicates frequency (more on this later). Then, the output of the system will be
y[n] = =
k =
j k
(let r = n k )
r =
h[r ]e
r =
h[r ]e
j n j r
j r = e j n h[r ]e jr = x [ n ] h [ r ] e r = r =
We see that the output of an LTI system to a complex frequency sequence is the same complex frequency sequence multiplied by the complex constant
r =
h[r ]e
jr
We call this a constant because it does not depend on n. But it does depend on the frequency variable . We can therefore write it as a function of :
H ( ) =
r =
h[r ]e
j r
We can evaluate this function for many values of to determine how the system with impulse response h[n] responds to all kinds of frequencies.
h[n] 2
H()
3 2 1 0
-1 0 -1
1 -1
Frequency variable
Mathematicians use the symbol i for the imaginary unit, but electrical engineers prefer j.
SPHSC 503 Speech Signal Processing From the frequency response we can see that this system: suppresses low frequencies H ( ) is small when is close to zero passes high frequencies H ( ) is large when is close to Thus, this system is a high-pass filter. A few notes regarding the frequency response:
UW Summer 2006
r =
h[r ]e
j r
discrete-time Fourier transform (DTFT) of the impulse response h[n]. Matlab has a builtin function to quickly evaluate the discrete-time Fourier transform of a sequence. This function is called fft, which stands for Fast Fourier Transform.
The function H ( ) is complex valued, which makes it a little complicated to plot the function. Most frequently, people plot the magnitude of H ( ) . That is what is done in the frequency response plot above. Sometimes, the phase of H ( ) is also of interest, in which case it is plotted in a separate plot. Another way to plot H ( ) is to separately plot the real and imaginary parts, but this is less common.
Frequency variable
Frequency variable
0 -1 -2 -3
Frequency variable
Frequency variable
UW Summer 2006
k =
h[n k ]x[k ]
LTI systems are also completely described by their frequency response, i.e. the output of the system when the input is a complex frequency sequence, for all possible complex frequency sequences To find the frequency response of a system, we must compute the discrete-time Fourier transform of its impulse response:
H ( ) =
n=
h[n]e
j n
Frequency analysis
In this lecture, we will study frequency analysis of systems, sequences and signals. We will look at some of the frequency analysis functions available in Matlab, and discuss the details of those functions.
UW Summer 2006
transform of the impulse response to find the frequency response exactly. And since the system is LTI, it will behave exactly according to the frequency response, even for sums of tones or other more complicated inputs. The freqz function To compute the frequency response of a system, Matlab has the function freqz. Well discuss this function by looking at an example.
>> >> >> >> >> h = [-1 2 -1]; % define the impulse response nh = [-1 0 1]; % ... and its index vector N = 1024; % an additional parameter for freqz1 figure, stem(nh,h), title(h[n]) figure, freqz(h,1,N), title(H(w))
H(w) 50 Magnitude (dB)
h[n] 2
-50
-100
0.1
0.2
0.9
0 Phase (degrees)
-1 0 -1
1 -1
0.1
0.2
0.9
Here we see the frequency response of the high-pass filter we studied before, as generated by freqz. We will now address some of the features of this frequency response. Magnitude response and phase response As you can see, the frequency response consists of two plots: a magnitude plot (top) and a phase plot (bottom). As we saw in the previous lecture, the frequency response of a system is a complex function of frequency. That means that for each frequency, the frequency response is a complex number. It is kind of hard to plot complex numbers directly in Matlab, and therefore we separate the complex numbers into their magnitude and phase, and plot those separately. We refer to those plots as the magnitude response and phase response. Occasionally, we separate complex numbers into their real and imaginary parts, and plot those separately, but the magnitude/phase representation is usually easier to interpret. Magnitude in decibels (dB) You may notice that the magnitude is plotted on a decibel (dB) scale. The decibel scale is a relative logarithmic scale. For example, -20 on the decibel scale indicates a quantity that is 10 times smaller than the reference, -40 corresponds to 100 times smaller, -60 corresponds to 1000 times smaller, etc. Zero on the decibel scale means that the quantity is the same as the reference
1
The third parameter of freqz, N, determines how many points of the frequency response H ( ) are evaluated for the plot. A value of N=1024 gives a nice smooth plot of H ( ) .
UW Summer 2006
value, and positive values means that the quantity is bigger than the reference value (20 dB = 100 times bigger, 40 dB = 1000 times bigger, etc). When there is no explicit reference value, the reference value is usually taken to be 1. Normalized frequency When we use freqz to plot the frequency response of an LTI system, it uses a normalized frequency axis. Normalized frequencies are frequencies from 0 to . In the plot above, the x-axis runs from 0 to 1, but the x-axis label reads ( rad/sample). This means that when we read an x-value of the graph, we should multiply it by to get the true x-value. For example, the frequency response in the plot crosses 0 dB around 0.33 radians/sample. But why exactly do we use normalized frequency? Well, when we analyze the frequency response of a system, we dont know anything about a sampling frequency. In the example above, we only know that the impulse response h[ n] = {1, 2, 1} for n = 1,0,1 . And as weve seen in homework 1, we need a sampling frequency to convert a sequence index to a point in time. Without a sampling frequency, the best we can do is to use a normalized frequency representation with frequencies between 0 and .
Suppose we want to apply this system to a speech signal that is sampled at 10 kHz. In that case, we can supply the sampling frequency of the signal to freqz. freqz can then use this sampling frequency to convert the normalized frequencies to real frequencies, and we can get an idea of how this system will effect the real frequencies of the speech signal.
>> figure, freqz(h,1,N,10000), title(H(f), fs=10000)
500
1000
1500
3500
4000
4500
5000
1000
2000
6000
7000
8000
0 Phase (degrees) -50 -100 -150 -200 Phase (degrees) 0 500 1000 1500 2000 2500 3000 Frequency (Hz) 3500 4000 4500 5000
1000
2000
6000
7000
8000
In both cases, the normalized frequency is converted to real frequencies. A normalized frequency of 0 radians/sample always corresponds to 0 Hz, and a normalized frequency of radians/sample corresponds to Fs/2 Hz (either 5000 or 8000 Hz in the examples above). The formula for converting a normalized frequency to a real frequency F is Fs F= 2
UW Summer 2006
So when this system is applied to signals sampled at 10 kHz, it attenuates frequencies below 0.33 radians/sample = 1650 Hz and it amplifies frequencies above that frequency. At 16 kHz, that threshold is at 2640 Hz
The phase of this system is a line An important characteristic of the phase response of this system is that it is a line. A linear phase response like this means that the system is a particularly nice system that has no phase distortion. We will learn more about this when we discuss infinite impulse response systems. The second parameter of freqz is 1 You may have noticed that, so far, weve set the second parameter to 1 when we used freqz. You may have wondered why we do that. The reason for that is the subject of the next section. Finite impulse response (FIR) systems and infinite impulse response (IIR) systems Before going into a discussion about finite impulse response (FIR) systems and infinite impulse response (IIR) systems, it may be helpful to take a step back and give an overview of the systems we have discussed. All systems Linear systems Time-invariant systems
LTI systems
We have seen that an LTI system is completely characterized by its impulse response. We can therefore distinguish different types of LTI systems based on properties of their impulse response. The most prominent feature of the impulse response is its length: it can either be infinite or finite. An example of an infinite impulse response is h[n] = ( 1 u[n] 2)
n
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
UW Summer 2006
u[n]
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
A problem with systems that have an infinite impulse response is that we cant work with them on computers, because computers have a finite amount of memory. As a consequence, we can never store an infinite impulse response on a computer. We therefore restrict ourselves to a special kind of IIR systems that can be implemented using two finite sequences, according to the following diagram: x[n]
hb[n]
+
ha[n]
y[n]
We call this kind of system a rational system. The elements in the dashed box are called a feedback loop or simply feedback. Without feedback, a rational system reduces to a regular, finite impulse response (FIR) system, with impulse response hb[n]. An example of a rational system is the following:
>> >> >> >> >> >> >> b = [1]; a = [1 0 0.81]; imp = [1 zeros(1,20)]; h = filter(b,a,imp); nh = 0:20; figure, stem(nh,h) figure, freqz(b,a) % % % % % % % define hb[n] define ha[n], feedback create 21-point impulse sequence determine impulse response index vector for impulse response plot impulse response plot frequency response
15 Magnitude (dB) 10 5 0 -5 -10 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 Normalized Frequency ( rad/sample) 0.9 1
0.5
0
100
-0.5
Phase (degrees)
50 0 -50 -100
-1
10
15
20
0.1
0.2
0.9
UW Summer 2006
The frequency response of this rational system has the same features as the frequency response of the high-pass filter shown earlier. The only difference is that the phase response of this system is a curve instead of a line. This means that this system has a non-linear phase response, which is a characteristic of all rational systems. As a result, this system is not so nice and has some phase distortion. Despite their phase distortion, rational systems are still very interesting because they are often much more efficient and have much less delay than equivalent FIR systems. The reason that we used 1 for the second parameter in the example above now becomes clear. It basically makes ha[n] equal to the impulse response sequence. And as we have seen in Exercise 2.1a, convolution with the impulse sequence produces an output sequence that is identical to the input sequence. It is in fact the identity system, and using the identity system in the feedback loop ensures us that it has no effect on the output. Given this classification of LTI systems, we can complete the systems overview diagram:
LTI systems IIR rational FIR
UW Summer 2006
-20
-15
-10
-5
10
15
20
25
The problem here is that X is a complex vector. The figure above shows how Matlab plots complex numbers. We immediately get better results when we take the magnitude of X using the abs function before plotting it
>> X = fft(x); >> plot(abs(X));
30
25
20
15
10
1000
2000
3000
4000
5000
6000
7000
Still, the x-axis is not in units of Hz, nor in units of radians/sample for normalized frequency. And the spectrum appears to be mirrored halfway through. It turns out that a little more work is necessary to create a nice plot of the frequency spectrum of a sequence. In this course, we will use a function called spec to take care of that for us. To plot a frequency spectrum, we can simply type
>> spec(x) % plot frequency spectrum of x[n]
UW Summer 2006
30
25
20 Magnitude
15
10
0.1
0.2
0.9
This gets rid of the mirrored copy in the spectrum, and puts appropriate labels on the axes. By default, spec plots the magnitude of the spectrum on a linear scale, but it can be instructed to plot it on a dB scale by adding an optional flag:
>> spec(x,db)
30 20 10 0 Magnitude (dB) -10 -20 -30 -40 -50 -60
0.1
0.2
0.9
Since we havent specified a sampling frequency, spec treats the speech signal as a sequence with unknown sampling frequency and plots it on a normalized frequency axis, like freqz. We can specify the sampling frequency as the second parameter:
>> spec(x,fs,db)
30 20 10 0 Magnitude (dB) -10 -20 -30 -40 -50 -60
500
1000
1500
3500
4000
4500
5000
UW Summer 2006
n =
h[n]e
j n
Similarly, for a sequence x[n], its long-term frequency spectrum is defined as the DTFT of the sequence:
UW Summer 2006
X ( ) =
n=
x[ n]e j n .
Theoretically, we must know the sequence x[n] for all values of n (from n = until n = ) in order to compute its frequency spectrum. Fortunately, all terms where x[n] = 0 do not matter in the sum, and therefore an equivalent expression for the sequence's spectrum is
X ( ) = x[ n]e j n
n=0 N 1
Here we've assumed that the sequence starts at 0 and is N samples long. This tells us that we can apply the DTFT only to all of the non-zero samples of x[n], and still obtain the sequence's true spectrum X ( ) . But what is the correct mathematical expression to compute the spectrum over a short section of the sequence, that is, over only part of the non-zero samples of the sequence? Window sequence It turns out that the mathematically correct way to do that is to multiply the sequence x[n] by a window sequence w[n] that is non-zero only for n = 0,, L 1 , where L, the length of the window, is smaller than the length N of the sequence x[n]:
xw [ n] = x[ n] w[ n]
The following figure illustrates how a window sequence w[n] is applied to the sequence x[n]:
x[n] 1 Amplitude 0 -1 0 10 20 30 40 50 w[n] 1 Amplitude 0 -1 0 1 Amplitude 10 20 30 40 50 x w[n] = x[n] w[n] 60 70 80 90 100 60 70 80 90 100
-1
10
20
30
40
50 Index (n)
60
70
80
90
100
As the figure shows, the windowed sequence is shorter in length than the original sequence. So we can further truncate the DTFT of the windowed sequence:
UW Summer 2006
X w ( ) = ( x[ n]w[ n]) e j n .
n=0
Using this windowing technique, we can select any section of arbitrary length of the input sequence x[n] by choosing the length and location of the window accordingly. The only question that remains is: how does the window sequence w[n] affect the short-term frequency spectrum? Effect of the window To answer that question, we need to introduce an important property of the Fourier transform. The diagram below illustrates the property graphically: h[n]
x[n]
convolution
h[n] II. Equivalent implementation of an LTI system in the frequency domain. x[n]
DTFT
LTI system
DTFT
y[n]
multiplication IDTFT
Y ( ) = X ( ) H ( )
The two implementations of an LTI system are equivalent: they will give the same output for the same input. Hence, convolution in the time domain = multiplication in the frequency domain:
F y[n] = x[n] h[n] Y ( ) = X ( ) H ( )
And since the time domain and the frequency domain are each others dual in the Fourier transform, it is also true that multiplication in the time domain = convolution in the frequency domain: F xw [n] = x[n] w[n] X w ( ) = X ( ) W ( ) .
This shows that multiplying the sequence x[n] with the window sequence w[n] in the time domain is equivalent to convolving the spectrum of the sequence, X ( ) , with the spectrum of the window, W ( ) . The result of the convolution of the spectra in the frequency domain is that the spectrum of the sequence is smeared by the spectrum of the window. This is best illustrated by the example in the figure below:
UW Summer 2006
w[n] 1 Amplitude
Magnitude (dB) 0 -10 -20 -30 -40 -50 -60 -70 -20 -15 -10
W()
-5 0 5 Frequency (Hz)
10
15
20
Choice of window Because the window determines the spectrum of the windowed sequence to a great extent, the choice of the window is important. Matlab supports a number of common windows, each with their own strengths and weaknesses. Some common choices of windows are shown below.
Rectangular window: Sequence 1.2 1 0.8 Amplitude 0.6 0.4 0.2 0 -0.2 -0.4 0 10 20 30 40 Index (n) 50 60 -70 -1 -0.5 0 0.5 Normalized Frequency ( rad/sample) 1 Magnitude (dB) 10 0 -10 -20 -30 -40 -50 -60 Rectangular window: Spectrum
UW Summer 2006
Triangular window: Spectrum
Hamming window: Sequence 1.2 1 0.8 Amplitude 0.6 0.4 0.2 0 -0.2 -0.4 0 10 20 30 40 Index (n) 50 60 -70 -1 Magnitude (dB) 10 0 -10 -20 -30 -40 -50 -60
Hann window: Sequence 1.2 1 0.8 Amplitude 0.6 0.4 0.2 0 -0.2 -0.4 0 10 20 30 40 Index (n) 50 60 -70 -1 Magnitude (dB) 10 0 -10 -20 -30 -40 -50 -60
Kaiser window: Sequence 1.2 1 0.8 Amplitude 0.6 0.4 0.2 0 -0.2 -0.4 0 10 20 30 40 Index (n) 50 60 -70 -1 Magnitude (dB) 10 0 -10 -20 -30 -40 -50 -60
All windows share the same characteristics. Their spectrum has a peak, called the main lobe, and ripples to the left and right of the main lobe called the side lobes. The width of the main lobe and the relative height of the side lobes are different for each window. The main lobe width determines how accurate a window is able to resolve different frequencies: wider is less accurate. The side lobe height determines how much spectral leakage the window has. We'll learn more about these terms in the next lecture. An important thing to realize is that we can't have short-term frequency analysis without a window. Even if we don't explicitly use a window, we are implicitly using a rectangular window.
UW Summer 2006
Parameters of the short-term frequency spectrum Besides the type of window rectangular, Hamming, etc. there are two other factors in Matlab that control the short-term frequency spectrum: window length and the number of frequency sample points.
The window length controls the fundamental trade-off between time resolution and frequency resolution of the short-term spectrum, irrespective of the window's shape. A long window gives poor time resolution, but good frequency resolution. Conversely, a short window gives good time resolution, but poor frequency resolution. For example, a 250 ms long window can, roughly speaking, resolve frequency components when they are 4 Hz or more apart (1/0.250 = 4), but it can't tell where in those 250 ms those frequency components occurred. On the other hand, a 10 ms window can only resolve frequency components when they are 100 Hz or more apart (1/0.010 = 100), but the uncertainty in time about the location of those frequencies is only 10 ms. The result of short-term spectral analysis using a long window is referred to as a narrowband spectrum (because a long window has a narrow main lobe), and the result of short-term spectral analysis using a short window is called a wideband spectrum. In short-term spectral analysis of speech, the window length is often chosen with respect to the fundamental period of the speech signal, i.e., the duration of one period of the fundamental frequency. A common choice for the window length is either less than 1 times the fundamental period, or greater than 2-3 times the fundamental period. Examples of narrowband and wideband short-term spectral analysis of speech are given in the figures below.
Wideband analysis of speech
40 Magnitude (dB) Magnitude (dB) 20 0 -20 -40 -60 0 1000 2000 3000 Frequency (Hz) 4000 5000 40 20 0 -20 -40 -60 0 1000 2000 3000 Frequency (Hz) 4000 5000
The other factor controlling the short-term spectrum in Matlab is the number of points at which the frequency spectrum H ( ) is evaluated. The number of points is usually equal to the length of the window. Sometimes a greater number of points is chosen to obtain a smoother looking spectrum. Evaluating H ( ) at fewer points than the window length is possible, but very rare.
Time-frequency domain: Spectrogram An important use of short-term spectral analysis is the short-time Fourier transform or spectrogram of a signal. The spectrogram of a sequence is constructed by computing the shortterm spectrum of a windowed version of the sequence, then shifting the window over to a new location and repeating this process until the entire sequence has been analyzed. The whole process is illustrated in the figure below:
UW Summer 2006
Step 2
Step 3
0.2 0 -0.2 1000 2000 3000 4000 5000 6000 1 0.5 0 -0.5 -1 1000 2000 3000 4000 5000 6000 1000 2000 3000 4000 5000 6000
0.2 0 -0.2 1000 2000 3000 4000 5000 6000 1 0.5 0 -0.5 -1 1000 2000 3000 4000 5000 6000
Shifting window
0.5 0 -0.5 -1
0.1 Windowed sequence 0.05 0 -0.05 -0.1 1000 2000 3000 4000 5000 6000 20 Short-term Spectrum 0 -20 -40 -60 0 1000 2000 3000 4000 5000
0.2 0.1 0 -0.1 -0.2 1000 2000 3000 4000 5000 6000 20 0 -20 -40 -60 0 1000 2000 3000 4000 5000
0.1 0 -0.1 1000 2000 3000 4000 5000 6000 20 0 -20 -40 -60 0 1000 2000 3000 4000 5000
Together, these short-term spectra (bottom row) make up the spectrogram, and are typically shown in a two-dimensional plot, where the horizontal axis is time, the vertical axis is frequency, and magnitude is the color or intensity of the plot. For example:
5000 4000 Frequency 3000 2000 1000 0
0.1
0.2
0.3 Time
0.4
0.5
0.6
The appearance of the spectrogram is controlled by a third parameter: window overlap. Window overlap determines how much the window is shifted between repeated computations of the shortterm spectrum. Common choices for window overlap are 50% or 75% of the window length. For example, if the window length is 200 samples and window overlap is 50% , the window would be shifted over 100 samples between each short-term spectrum. In the case that the overlap was 75%, the window would be shifted over 50 samples.
UW Summer 2006
The choice of window overlap depends on the application. When a temporally smooth spectrogram is desirable, window overlap should be 75% or more. When computation should be at a minimum, no overlap or 50% overlap are good choices. If computation is not an issue, you could even compute a new short-term spectrum for every sample of the sequence. In that case, window overlap = window length 1, and the window would only shift 1 sample between the spectra. But doing so is wasteful when analyzing speech signals, because the spectrum of speech does not change at such a high rate. It is more practical to compute a new spectrum every 20-50 ms, since that is the rate at which the speech spectrum changes.
Length of the window and fundamental frequency In a wideband spectrogram (i.e., using a window shorter than the fundamental period), the fundamental frequency of the speech signal resolves in time. That means that you can't really tell what the fundamental frequency is by looking at the frequency axis, but you can see energy fluctuations at the rate of the fundamental frequency along the time axis. In a narrowband spectrogram (i.e., using a window 2-3 times the fundamental period), the fundamental frequency resolves in frequency, i.e., you can see it as an energy peak along the frequency axis. See for example the figures below:
4000
Frequency
3000
2000
1000
0.1
0.2
0.3 Time
0.4
0.5
0.6
4000
Frequency
3000
2000
1000
0.1
0.2
0.3 Time
0.4
0.5
0.6
UW Summer 2006
Spectrogram modification
The spectrogram is not only a great tool to analyze (speech) signals, it is also often used to modify signals in various ways. The basic idea is to multiply each column in the spectrogram with a weighting vector. Each column in the spectrogram is a short-term spectrum of the signal, and by multiplying a column of the spectrogram with a weighting vector we modify the shortterm spectrum of the signal. The weighting vector can be the same for each column in the spectrogram, which essentially is the same as applying an LTI system with a frequency response given by the weighting vector. For example, if we want to apply a band-pass filter to the signal, we can multiply each column of the spectrogram with the weighting vector:
1 Weight
0.5
0 0 500 1000 1500 2000 2500 3000 Frequency (Hz) 3500 4000 4500 5000
UW Summer 2006
Spectrogram before
5000 4500 4000 3500 Frequency (Hz) 3000 2500 2000 1500 1000 500 0 0.1 0.2 0.3 0.4 Time (s) 0.5 0.6
A very powerful way to modify the spectrogram, however, is by using a different weighting vector for each column of the spectrogram. This allows us to apply a time-varying filter to the signal, which can not be done with an LTI system. For example, we can apply different filters to different parts of the signal:
Spectrogram before
5000 4500
Low-pass filter 1 Weight
4000 3500
0.5
Frequency (Hz)
0 0 500 1000 1500 2000 2500 3000 Frequency (Hz) 3500 4000 4500 5000
0.5
0 0 500 1000 1500 2000 2500 3000 Frequency (Hz) 3500 4000 4500 5000
Frequency (Hz)
3500 4000 4500 5000
1 Weight
0.5
1000 500 0 0.1 0.2 0.3 0.4 Time (s) 0.5 0.6 0.7
UW Summer 2006
This technique has many uses: LTI filtering and time-varying filtering, signal separation, noise suppression, etc. It basically gives us the ability to carve out any part of a given spectrogram by attenuating undesired parts of the spectrogram or setting them to zero. But once we have modified the spectrogram, how do we convert it back to a waveform? How can we reconstruct a signal from the modified spectrogram?
This process seems easy enough, but there are some catches. Undoing the window is not as easy as it sounds, unless the window was the rectangular window. When we compute a spectrogram, we multiply each short section of the signal with the window. Therefore, to undo the window, we need to divide each short section by the window. But most windows taper to zero at their endpoints, and dividing samples of the signal by values close to zero is an operation that is very sensitive to noise. To avoid this problem, the spectrogram of a signal is computed and later reconstructed using windows that sum to one, like the sine window in this example:
Overlapping windows 1 0.5 0
200
400
800
1000
1200
1 0.5 0
200
400
600
800
1000
1200
But the most important catch in reconstructing a signal from a modified spectrogram is the following. The columns in the spectrogram are dependent, because they come from overlapping short sections of the signal. For example, when we use 50% overlap between windows, each sample of the signal is represented in two columns of the spectrogram:
UW Summer 2006
Frequency ( rad/sample)
40
60
80
120
140
160
180
200
Amplitude
0.2 0 -0.2 -0.4 0 20 40 60 80 100 Time (n) 120 140 160 180 200
When we reconstruct the signal from the spectrogram, those two columns must agree on the value of that sample. But arbitrary modifications of the spectrogram may break those agreements or dependencies between the columns of the spectrogram. The only exception that doesnt break the dependencies is when we apply the same weighting vector to all columns. In the case of arbitrary modifications, we can still use the reconstruction procedure described above to reconstruct a time-domain signal. But what will happen then is that the two columns will not agree on the value of the sample, and the samples value will be a weighted average of the values that each column assigns to it. As a result, the spectrogram of the reconstructed signal is not exactly the modified spectrogram that we created. It is only a close approximation of it. This inaccuracy in reconstruction is usually taken for granted, because the approximation is often quite good, and this technique is so valuable. In extreme cases such as highly random modifications, however, the approximation may not be as good and could cause audible artifacts. Of course this problem in reconstruction could be avoided altogether by using non-overlapping windows to compute and reconstruct the spectrogram. But in that case, other problems arise, such as discontinuities between boundaries of adjacent windows, which are easily audible. Although overlapping windows are not the perfect solution, they guarantee a certain smoothness of the reconstructed signal and are therefore preferable over non-overlapping windows.
UW Summer 2006
In todays lab we will see an even more powerful version of this technique, where we dont need to have an estimate of the noise spectrum in advance. It is able to estimate the noise spectrum from the signal itself, and it can even update its estimate over time to track noise with a timevarying spectrum. A very powerful technique indeed!
UW Summer 2006
The analog signal x(t) is converted by an analog-to-digital (A/D) converter to the sequence x[n], which is processed by an LTI system with output sequence y[n]. The output sequence y[n] is then converted by an digital-to-analog (D/A) converter to an analog signal y(t) The task of the A/D converter is two-fold. First, it must sample the analog signal at a regular rate. This step if often treated as an independent conversion step called continuous-time to discretetime conversion or C/D conversion. Second, the A/D converter must quantize the samples to a finite number of signal levels that can be represented on the digital platform on which the LTI system is implemented. The task of the D/A converter is to convert the samples back to an analog, continuous-time signal. The A/D converter (including its C/D converter) and the D/A converter must perform their tasks in such a way that they are transparent to the LTI system. That means the entire system, indicated by the dashed box in the diagram above, should have a frequency response that is identical to the frequency response of the LTI system and not be affected by the A/D and D/A conversion.
UW Summer 2006
Sampling
The first conversion performed by an A/D converter is sampling. In this conversion step, a continuous-time signal is converted to a discrete-time signal. A discrete-time signal is a signal that is sampled, but that still can take on any signal value at its sample points. In contrast, a digital signal is a sampled signal that is quantized and can only take on a finite set of discrete signal values at its sample points. The sampling conversion step is also called C/D conversion.
Continuous-time signal 1 0.5 Amplitude 0 -0.5 -1 Amplitude 1 0.5 0 -0.5 -1 Discrete-time signal
0.1
0.2
0.3
0.4
0.7
0.8
0.9
10
20
30
40
50 60 Index (n)
70
80
90
100
Theoretically, the sampling or C/D conversion consists of two steps: Keep the signal values at the sampling times, and set the signal to zero everywhere else Normalize the time axis from seconds to sample index The first step can be viewed as multiplying an analog signal x(t) by an analog impulse train s(t), resulting in a sampled analog signal xs (t ) , as illustrated by the figure below:
Analog signal, x(t) 1 1 0.5 Amplitude 0 -0.5 -1 Amplitude 0.5 0 -0.5 -1 0 0.1 0.2 0.3 0.4 0.5 0.6 Time (s) 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 Time (s) 0.7 0.8 0.9 1 Analog impulse train, s(t)
0.1
0.2
0.3
0.4
0.7
0.8
0.9
The spacing of the impulses of the impulse train is called the sampling period, specified by the symbol T, and is the reciprocal of the sampling frequency, Fs, i.e., T = 1 / Fs. Multiplying the signal with an impulse train in the time-domain is the same as convolving the spectrum of the signal with the spectrum of the impulse train. We saw this multiplication in time is convolution in frequency property of the Fourier transform before when we discussed multiplying a sequence with a window to obtain its short-term spectrum. To understand the effect of multiplication here, we need to know the Fourier transform of the impulse train. Without proof,
UW Summer 2006
we will state that the Fourier transform of an impulse train in time is an impulse train in frequency. The impulse trains in time and frequency are related according to the figure below.
s(t) 1 0 T 2T 3T 4T 5T Fs t -2Fs -Fs 0 Fs 2Fs f S(f)
A spacing of T = 1 / Fs seconds of impulses with an amplitude of 1 in the time domain results in a spacing of Fs Hz of impulses of amplitude Fs in the frequency domain. Therefore, given the spectrum of an analog signal, we can find the spectrum of the sampled analog signal by convolving it with the spectrum of the impulse train:
X(f) 1 0 f Xs(f) Fs -2Fs -Fs 0 Fs 2Fs f -2Fs -Fs 0 Fs 2Fs S(f) Fs f
The signals spectrum is shifted and weighted by the impulses in the spectrum of the impulse train signal. As a result, the sampled analog signals spectrum is periodic, and its period is equal to the sampling frequency. The next step in the C/D converter is normalization of the time-axis. Before normalization, the sampled analog signal xs(t) has non-zero values at t = ,-2T,-T,0,T,2T,. To normalize these sampling times, we divide t by the sampling period T, such that the signal now has non-zero values at t = -2,-1,0,1,2, . The signal values at these time instances are then treated at the values of the sequence x[n] at the corresponding index. As a result, the relationship between the analog signal x(t) and the sequence x[n] is x[ n] = x(nT ) . Normalization of the time axis by T in the time domain causes a normalization of the frequency axis by Fs in the frequency domain. That would leave the frequency axis in units of cycles per sample. This can be converted to radians per sample by realizing that 1 cycle = 2 radians. Therefore, the spectrum of the signal before and after sampling is:
This explains why the spectrum of any sequence is periodic with a period of 2.
Restoration
To obtain a continuous-time signal from a discrete-time signal, the D/A converter needs to undo the effect of sampling by reversing all the steps done in sampling. This is illustrated graphically in the following figure: x[n]
1 0.5 Amplitude 0 -0.5 -1
Xs()
g
Fs
0 10 20 30 40 50 60 Index (n) 70 80 90 100
-4 -2 0 Xs(f) Fs 2 4
xs(t)
s
0 0.1 0.2 0.3 0.4 0.5 0.6 Time (s) 0.7 0.8 0.9 1
0 x(t)
1 0.5 Amplitude 0 -0.5 -1 g
X(f) 1 0 f
0.1
0.2
0.3
0.4
0.7
0.8
0.9
First, the sample values of the sequence are placed on a time axis to obtain the sampled analog signal xs(t) with spectrum Xs(f). This scales the frequency back from units of cycles per sample to
UW Summer 2006
Hz. Then a low-pass filter, Hlp(f), is applied to the sampled analog signal to remove all periodic copies of the spectrum except for the copy centered at 0 Hz. This low-pass filter must have a gain of T to correct the change in gain incurred during sampling. The low-pass filter is called an interpolating filter or anti-imaging filter, because it interpolates between the samples in the time domain, and removes the periodic images of the spectrum in the frequency domain.
Aliasing
Because sampling introduces periodicity in the spectrum, we run the risk that the periodic copies of the spectrum will overlap with one and other. This happens when the sampling frequency is too small, as illustrated in the figure below. X(f) 1 0 S(f) Fs -2Fs -Fs 0 Xs(f) Fs -2Fs -Fs 0 Fs 2Fs f -2Fs -Fs 0 Fs 2Fs Fs 2Fs f -2Fs -Fs 0 Xs(f) Fs f Fs 2Fs f 1 0 S(f) Fs f f X(f)
On the left, the sampling frequency is chosen large enough for the signal, and no overlap occurs between the periodic copies of the spectrum. On the right, however, the sampling frequency is chosen too small for the signal, and overlap occurs between the copies of the spectrum. This artifact is known as aliasing distortion or just aliasing. To give a simple numerical example: consider a 9 Hz sinusoid sampled at 10 Hz. The 9 Hz sinusoid has two impulses in the spectrum, at +9 Hz and -9 Hz. When sampling at 10 Hz, the spectrum becomes periodic with a period of 10 Hz. That means that copies of the sinusoids impulses appear at f = ,-21,-11,-1,9,19,29, and at f = ,-29,-19,-9,1,11,21, and aliasing has occurred. Because, even if we dont perform any processing on the sampled sequence, we will apply a low-pass filter to the signal during reconstruction, which will only keep the impulses at f = -1 and f = 1. This shows that the 9 Hz sinusoid becomes a 1 Hz sinusoid when sampled at 10 Hz.
UW Summer 2006
Those samples represent a 1 Hz tone
0.1
0.2
0.3
0.4
0.7
0.8
0.9
0.1
0.2
0.3
0.4
0.7
0.8
0.9
When we choose our sampling frequency too small, aliasing will occur, and the C/D and D/C conversion will no longer be transparent. In fact, when aliasing occurs the overall system becomes non-linear, because aliasing creates frequencies in the output that were not present in the input, which linear systems cant do. Sometimes it is not desirable to choose a high enough sampling frequency, for example because it makes subsequent processing of the digital signal slower (there are simply more samples to process). In order to prevent aliasing when the sampling frequency is too low, we must use an anti-aliasing filter. This is a low-pass filter that removes frequencies from the input signal that would otherwise cause aliasing distortion. When we use an anti-aliasing filter, the overall system remains a linear system, and the overall frequency response is the combined response of the antialiasing filter and the digital LTI system.
Quantization
The second step in the analog-to-digital (A/D) converter is quantization. During the process of quantization, the infinite precision in signal level of an analog signal is converted to a finite precision in signal level for a digital signal. This idea is illustrated in the following figure:
Discrete-time signal, x[n] 1 Amplitude
-1
10
20
30
40
60
70
80
90
100
Amplitude
10
20
30
40
50 Index (n)
60
70
80
90
100
10
20
30
40
50 Index (n)
60
70
80
90
100
UW Summer 2006
The difference between the analog signal and the quantized signal is called quantization error. On todays computers, as well as in Matlab, signal quantization plays a minimal role, because computers can almost always represent signals using 4 billion different signal levels, and often even orders of magnitude more than that. When we can use that many signal levels, the quantization error becomes really small and imperceptible. It may not always be possible to represent signals using that many levels in hearing aids and other small digital devices with limited memory and/or processing power. In that case, quantization introduces quantization noise into the system. Like all types of noise, quantization noise is characterized by is frequency spectrum. The shape of the quantization noise spectrum depends on the type of A/D converter used. We will briefly discuss two A/D converters and their quantization noise spectrum. 1. The sample and hold A/D converter. This converter samples analog signals close to the lowest sampling rate that avoid aliasing. At each sampling time, it measures the signal level of the analog signal, and holds it constant for the remainder of the sampling period. This allows the quantizer to match the signal level to an internal table of possible signal levels, and to output the quantized signal level. This type of A/D converter produces quantization noise with a flat spectrum, which means that the quantization noise is equally present in all frequencies. 2. The Sigma-Delta A/D converter. This converter samples analog signals at a very high sampling rate, often 64 times or more than the sampling rate used by the sample and hold A/D converter. At each sampling time, it compares the difference between the current signal level and the signal level at the previous sampling time in a somewhat complicated way. If that comparison is positive, the Sigma-Delta converter outputs +1, and if it turns out negative it outputs -1. The reason for this strange design is that the quantization noise of this converter no longer has a flat spectrum. Instead, quantization noise is low for low frequencies and increases gradually with higher frequencies. Combined with the 64 times over-sampling, this noise-shaping is a very nice property. It allows subsequent processing with a low-pass filter and downsampling by a factor of 64. The result is a very accurate representation of the input analog signal with very little quantization noise.
UW Summer 2006
Quantization
The second step in the analog-to-digital (A/D) converter is quantization. During the process of quantization, the infinite precision in signal level of an analog signal is converted to a finite precision in signal level for a digital signal. This idea is illustrated in the following figure:
UW Summer 2006
-1
10
20
30
40
60
70
80
90
100
Amplitude
10
20
30
40
50 Index (n)
60
70
80
90
100
10
20
30
40
50 Index (n)
60
70
80
90
100
The difference between the analog signal and the quantized signal is called quantization error. On todays computers, as well as in Matlab, signal quantization plays a minimal role, because computers can almost always represent signals using 4 billion different signal levels, and often even orders of magnitude more than that. When we can use that many signal levels, the quantization error becomes really small and imperceptible. It may not always be possible to represent signals using that many levels in hearing aids and other small digital devices with limited memory and/or processing power. In that case, quantization introduces quantization noise into the system. Like all types of noise, quantization noise is characterized by is frequency spectrum. The shape of the quantization noise spectrum depends on the type of A/D converter used. We will briefly discuss two A/D converters and their quantization noise spectrum. 1. The sample and hold A/D converter. This converter samples analog signals close to the lowest sampling rate that avoid aliasing. At each sampling time, it measures the signal level of the analog signal, and holds it constant for the remainder of the sampling period. This allows the quantizer to match the signal level to an internal table of possible signal levels, and to output the quantized signal level. This type of A/D converter produces quantization noise with a flat spectrum, which means that the quantization noise is equally present in all frequencies. 2. The Sigma-Delta A/D converter. This converter samples analog signals at a very high sampling rate, often 64 times or more than the sampling rate used by the sample and hold A/D converter. At each sampling time, it compares the difference between the current signal level and the signal level at the previous sampling time in a somewhat complicated way. If that comparison is positive, the Sigma-Delta converter outputs +1, and if it turns out negative it outputs -1. The reason for this strange design is that the quantization noise of this converter no longer has a flat spectrum. Instead, quantization noise is low for low
UW Summer 2006
frequencies and increases gradually with higher frequencies. Combined with the 64 times over-sampling, this noise-shaping is a very nice property. It allows subsequent processing with a low-pass filter and downsampling by a factor of 64. The result is a very accurate representation of the input analog signal with very little quantization noise.
0 0 1 2 3 Frequency (kHz) 4 5
0 0 1 2 3 Frequency (kHz) 4 5
0 0 1 2 3 Frequency (kHz) 4 5
Such a filter cant be obtained in practice, because it requires an infinitely long impulse response (and one that cant be modeled by a rational system). Therefore, a practical filter will only approximate the ideal filter and have non-ideal properties such as ripple in the pass-band and stop-band, a non-zero stop-band, and the inclusion of a transition band. To obtain a minimum order, or minimum length, FIR filter, we must be aware of certain design requirement that will increase the length of the filter: designing an extreme low-pass, high-pass, or narrow band-pass filter steep and narrow transition bands high stop-band suppression small pass-band ripple These factors can be summarized into a single line: the more a FIR filter must behave like an ideal brick-wall filter, the longer it needs to be.
UW Summer 2006
Magnitude
Impulse response for ideal low-pass filter (cut-off frequency 0.1) 0.15
0.1 Amplitude
0.05
hd [n] =
sin(0.1 n) n
-0.05 -100
-80
-60
-40
-20
0 Index (n)
20
40
60
80
100
Once we have the analytic expression for the ideal desired impulse response, we can truncate it to the filter length that we desire. Of course, the longer we make the filter, the better we will approximate the ideal frequency response. Truncation of the ideal impulse response has one drawback, which is illustrated in the following figures:
Low-pass filter, length = 7 1 Magnitude Magnitude 1 Low-pass filter, length = 21
UW Summer 2006
These figures show the frequency response of a truncated impulse response of increasing lengths. As the length of the frequency response increase, the approximation of the ideal response becomes better. But the height of the ripples around the band edge does not decrease. This effect is known as the Gibbs phenomenon, and it does not disappear even as the filter length is further increased. To understand this problem and find a solution for it, we have to use the old idea that truncation of a sequence is the same as multiplying the sequence with a rectangular window. And multiplication of two sequences in time is convolution of their spectra in frequency. The figure below visualizes the convolution of the spectrum of a rectangular window with the ideal desired frequency response of our filter.
Frequency response of an ideal low-pass filter 1.2 1 0.8 0.6 Magnitude 0.4
Convolution of spectra
0.2
1
0 -0.2 -0.4 -1
0.8
Magnitude
-0.8
-0.6
0.8
0.6
0.4
0.2
0 0.6 Amplitude -1 0.4 0.2 0 -0.2 -0.4 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Normalized Frequency ( rad/sample) 0.8 1
-0.8
-0.6
0.8
UW Summer 2006
As we can see, the side-lobes of the rectangular window are causing the ripples around the bandedge. And as we saw earlier when discussing windows: when we increase the length of the rectangular window, we decrease the width of its main-lobe, but we do not lower its side-lobes. To get lower side-lobes, we need to taper the ideal impulse response with a window instead of truncating it with a rectangular window. As the figure below shows, this significantly reduces the height of the band-edge ripples, at the expense of a wider transition.
Frequency response of truncated ideal impulse response 1.2 1 Magnitude Magnitude 0.8 0.6 0.4 0.2 0 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Normalized Frequency ( rad/sample) 0.8 1 1.2 1 0.8 0.6 0.4 0.2 0 -1 -0.8 -0.6 -0.4 -0.2 0 0.2 0.4 0.6 Normalized Frequency ( rad/sample) 0.8 1 Frequency response of tapered ideal impulse response
Frequency sampling (Matlab: fir2) It may not always be easy (or possible) to find an analytic expression for the desired frequency response of our filter. In that case, we can use the frequency sampling technique to design an FIR filter. This technique works as follows. Suppose we have the desired frequency response of our filter. Instead of deriving an analytic expression for the impulse response of that filter, we sample the desired frequency response at N equispaced frequencies. We then take the inverse discretetime Fourier transform of those samples to determine the length N FIR filter.
This technique yields an FIR filter that exactly meets our design requirements at the N frequency sample points. But outside of those points, the frequency response has the same ripple problem as we observed with the windowing technique.
Ideal frequency response, and samples 0.3 1 0.2 Magnitude Amplitude Impulse response
0.1
0 0 0 0.2 0.3 Normalized Frequency ( rad/sample) Effective amplitude response 1 Magnitude (dB) Amplitude 0 16 1 -0.1 0 5 10 n 15 20
0 0 0.2 0.3 Normalized Frequency ( rad/sample) 1 0 0.2 0.3 Normalized Frequency ( rad/sample) 1
UW Summer 2006
We can reduce the height of the ripples by multiplying the impulse response by a window, but then were no longer guaranteed that the FIR filter meets our design requirements exactly at the N frequency sample points. Another way to get more control over the height of the ripples is by incorporating transition bands in our design, and allowing the frequency samples in the transition bands to take on any value.
Ideal frequency response, and samples with transition band 0.3 1 0.25 0.2 Magnitude Amplitude 0.15 0.1 0.05 0 0 0 0.2 0.3 Normalized Frequency ( rad/sample) 1 -0.05 -0.1 0 5 10 15 20 n 25 30 35 40 Impulse response
0.39
Amplitude
43
0.39
Then, we can freely choose the values of the frequency samples in the transition band to find an impulse response that minimizes the height of the ripple. Solutions to this optimization problem are available in table form for a selected number of designs. Matlabs Signal Processing Toolbox has the function firls which can solve the optimization problem for any design:
Ideal frequency response, with transition band 0.3 1 0.25 0.2 Magnitude Amplitude 0.15 0.1 0.05 0 0 0 0.2 0.3 Normalized Frequency ( rad/sample) 1 -0.05 -0.1 0 5 10
Impulse response
15
20 n
25
30
35
40
UW Summer 2006
Amplitude
43
0.39
Equiripple design (Matlab: firpm) Although the windowing and frequency sampling design methods are easy to understand and implement, they have serious drawbacks: The band edges of a design can not be specified precisely, instead we have to accept whatever band edge locations we obtain after the design The amount of pass-band ripple and stop-band ripple cant be controlled independently at the same time The ripples are not uniformly distributed over the band-intervals, as they are higher near band edges and smaller away from band edges
To solve these problems, Parks and McClellan developed an algorithm called the Parks and McClellan algorithm. Their algorithm allows us to specify the actual values for our pass-band and stop-band, which it meets, so there are no surprises there! It also allows independent control over the pass-band and stop-band ripple. And it distributes the ripples uniformly over the bandintervals, which reduces the filter order that is required to satisfy the same specifications. We algorithm is too complicated to go into detail in this course, but we will see how to use it in todays lab.
UW Summer 2006
Practical quantization
Some more research into quantization uncovered some practical information about it: - CDs use 16-bit quantization (65536 signal levels, good) - Digital telephones use 8-bit quantization (256 signal levels, ok/poor for speech) - Signals generated in Matlab use 64 bits (>18 billion billion signal levels) - Wav-files use 16-bit quantization by default, capable of 32 bits (> 4 billion signal levels) - If a signal uses the maximum range in signal levels (-1 to +1), then its average power level is roughly 0.7, and 16 bit quantization noise is of the order (2 / 216 ) = 87dB 20log10 0.7 For some other numbers of bits used in quantization, quantization noise is of the order: 1-bit = 3 dB, 2-bit = -3 dB, 4-bit = -15 dB 8-bit = -39 dB, 16-bit = -87 dB, 32-bit = -184 dB, 64-bit = -376 dB If a signal doesnt use the maximum signal level range, then quantization noise becomes more audible
UW Summer 2006
We will ignore the precise details of these steps, and focus on the characteristics of the various types of IIR filters and the Matlab functions that generate them (according to the steps above). Butterworth (Matlab: buttord and butter) A Butterworth filter is characterized by the property that its magnitude response is flat in both the pass-band and the stop-band, and is monotonically decreasing, i.e., no ripples. As we can see in the magnitude-squared plot of the Butterworth filter, the magnitude response for F = 0 is always 1 for all N. The magnitude response at the cut-off frequency is always for all N. As the order of the filter increases, the Butterworth filter approaches an ideal low-pass filter. Chebyshev Type I (Matlab: cheb1ord and cheby1) and Chebyshev Type II (Matlab: cheb2ord and cheby2) There are two types of Chebyshev filters. The Chebyshev Type I filters have equiripple response in the pass-band, while the Chebyshev Type II filters have equiripple response in the stop-band. Recall our discussion of equiripple FIR filters, where we saw that we can obtain lower order filters that meet our design requirements when we choose a filter that has an equiripple rather than a monotonic behavior. Likewise, Chebyshev filters provide lower order than Butterworth filters for the same specifications. Elliptic (Matlab: ellipord and ellip) Elliptic filters exhibit equiripple behavior in the pass-band as well as in the stop-band. They are similar in magnitude response characteristics to the FIR equiripple filters. Therefore, elliptic filters are optimum filters in that they achieve the minimum order N for the given specifications, or alternatively, achieve the sharpest transition band for a given order N.
UW Summer 2006
elliptic filters. The choice depends on both the filter order, which influences processing speed and implementation complexity, and the phase characteristics, which control the distortion. When using an IIR filter with a non-linear phase response, it is usually not easy to tell from the phase response how the filter will distort a signal. Another form of filter analysis called group delay gives more insight into the distortion of the filter. The fourth plot of the IIR filter plots shows the group delay of each filter. In a group delay plot, the horizontal axis is frequency in Hz or normalized frequency in radians/sample, and the vertical axis is group delay in samples. Group delay is computed as the negative derivative with respect to frequency of the phase response. Group delay indicates how much nearby frequencies in a frequency region will be delayed by the filter. Nearby frequencies in a signal are not perceived individually, but as an average tone with an amplitude envelope. Group delay determines how much the filter will delay the envelope in frequency regions. This is illustrated by the example below:
Component 1 x 1[n] = cos(237t + 0)/4 + cos(240t + 0) + cos(243t + 0)/4 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1
Component 2 x 2[n] = cos(277t + 0)/4 + cos(280t + 0) + cos(283t + 0)/4 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1
Linear phase y 1[n] = cos(237t - 37)/4 + cos(240t - 40) + cos(243t - 43)/4 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1
Linear phase y 2[n] = cos(277t - 77)/4 + cos(280t - 80) + cos(283t - 83)/4 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1
Non-linear phase y 1[n] = cos(237t - 48)/4 + cos(240t - 51)/4 + cos(243t - 59)/4 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1
Non-linear phase y 2[n] = cos(277t - 81)/4 + cos(280t - 88) + cos(283t - 95)/4 2 1 0 -1 -2 0 0.2 0.4 0.6 0.8 1
Imagine a signal with two components: a 40 Hz sinusoid with a 3 Hz envelope (top left) and an 80 Hz sinusoid with a 3 Hz envelope (top right). When this signal is filtered with an LTI system with a linear phase response (= constant group delay), then the envelope of both components of the signal is delayed by the same amount (second row, left and right), and the signal is not distorted. When the signal is filtered with an LTI system with a non-linear phase response, then the envelope of both components of the signal will be delayed by a different amount, causing distortion in the signal. If a linear phase response is desired, but an IIR filter must be used, there are two ways to improve the phase response of the IIR filter.
UW Summer 2006
Forward-backward filtering It is possible to remove all of the effects of an IIR filter with non-linear phase response from a signal by reversing the filtered signal in time, running it through the filter again, and reversing the twice filtered signal in time again. This technique of forward-backward filtering of a signal is based on a property of the Fourier transform. According to this property, the phase-response of the backward filtering operation will precisely cancel the phase-response of the forward filtering operation. The magnitude response of both filters is the same, so the twice filtered signals magnitude spectrum will be multiplied by the filters spectrum twice. The filtfilt function in Matlabs Signal Processing Toolbox implements forward-backward filtering. There are two catches: The IIR filter used in forward-backward filtering must be designed with very little passband ripple, but may have half the stop-band suppression. Because it will be applied twice, pass-band ripple will double, as will stop-band suppression. It can not be applied in real-time, because the entire signal must be known in advance to be able to reverse it in time and run it backward through the filter. All-pass filters Another way to improve the phase response of an IIR filter is to cascade it, i.e., follow it, with an all-pass filter. An all-pass filter has a magnitude response that is always 1 for all frequencies. In other words, is does not change the magnitude of any frequencies present in the input. All-pass systems are designed specifically for their phase response. It is usually possible to design an allpass filter that will compensate the non-linear phase response of an IIR filter in its pass-band. The overall system may not be exactly linear phase, but it will be closer than without the all-pass filter. This technique can be used in real-time applications, because no signal reversal is required. Cascading filters will, however, increase the total delay in the overall system.
UW Summer 2006
-10
Magnitude Response (dB) 5 Butterworth, N = 1 Butterworth, N = 2 Butterworth, N = 10 20 18 16 14 Group delay (in samples) -10 Magnitude (dB) 12 10 8 6 -30 4 2 0
Group Delay
-5
-15
-20
-25
-35
-40
0.5
1.5
3.5
4.5
0.5
1.5
3.5
4.5
UW Summer 2006
-8
-10
Magnitude Response (dB) 5 Chebyshev Type I, N = 1 0 Chebyshev Type I, N = 2 Chebyshev Type I, N = 10 18 16 14 Group delay (in samples) -10 Magnitude (dB) 12 10 8 6 -30 4 2 0 20
Group Delay
-5
-15
-20
-25
-35
-40
0.5
1.5
3.5
4.5
0.5
1.5
3.5
4.5
UW Summer 2006
-1
-2
Magnitude Response (dB) 5 Chebyshev Type II, N = 1 Chebyshev Type II, N = 2 Chebyshev Type II, N = 10 20 18 16 14 Group delay (in samples) -10 Magnitude (dB) 12 10 8 6 -30 4 2 0
Group Delay
-5
-15
-20
-25
-35
-40
0.5
1.5
3.5
4.5
0.5
1.5
3.5
4.5
UW Summer 2006
-2
-3
Magnitude Response (dB) 5 Elliptic, N = 1 Elliptic, N = 2 Elliptic, N = 10 20 18 16 14 Group delay (in samples) -10 Magnitude (dB) 12 10 8 6 -30 4 2 0
Group Delay
-5
-15
-20
-25
-35
-40
0.5
1.5
3.5
4.5
0.5
1.5
3.5
4.5
UW Summer 2006
Introduction
In the second week of this course we discussed short-term frequency analysis of speech signals in the form of the short-time Fourier transform or the spectrogram. Those analysis tools are generic in the sense that they not require any prior knowledge about a signal to be successfully applied to that signal. This is a great strength of these techniques and one of the main reasons why they have been so widely applied to many signals. On the other hand, these techniques do not give very specific information about speech signals, because they are so general. If were only interested in frequency analysis of speech signals, we may want to use different techniques that yield specific information about the speech signal, such as fundamental frequency and formant frequencies. In this lecture well discuss a common model of speech signals that is used to more specific frequency analysis of speech signals, and use that model to perform linear predication of speech signals.
Text and figures copied/adapted from Spoken language processing, X. Huang et al.
UW Summer 2006
properly adjusted, the reduced pressure allows the cords to come together, and the cycle is repeated. This condition of sustained oscillation occurs for voiced sounds, and is illustrated in the figure below.
This glottal excitation can be further separated into an impulse train that drives a glottal pulse FIR filter g[n]:
For unvoiced sounds, the airflow between lungs and vocal tract is not or very little obstructed by the vocal chords. In that case, the glottal excitation consists mostly of turbulence which is modeled as random noise:
This model of the glottal excitation is a decent approximation, but fails on voiced fricatives, since those sounds contain both a periodic component and an aspirated component. In this case, a mixed excitation model can be applied, using a sum of both an impulse train and random noise. Lossless tube concatenation A widely used model for speech production is based on the assumption that the vocal tract can be represented as a concatenation of lossless tubes, as shown in the figure below:
The constant cross-sectional areas A1, A2,, A5 of the tubes approximate the continuous area function A(x) of the vocal tract. In this model of the vocal tract a number of things are ignored, such as the vocal tracts three-dimensional bend, its elasticity, viscosity and thermal condition. By leaving those aspects out of the model, the sound waves in the tubes satisfy a pair of differential equations, which can be solved to find the system that models the vocal tract frequency response. In general, the concatenation of N lossless tubes results in an IIR system with an N-th order feedback sequence and a feed-forward sequence that is only a gain. The N-th order
UW Summer 2006
feedback sequences causes at most N/2 resonances or formants in the vocal tract. These resonances occur when a given frequency gets trapped in the vocal tract because it is reflected at the lips and then again at the glottis. The number of tubes that is required to accurately model the formants in a speech signal generated by a given vocal tract depends on the physical length of the vocal tract, the sampling frequency of the speech signal, and the speed of sound, as follows: 2 LFs N= c For example, for Fs = 8000 Hz, c = 34000 cm/s, and L = 17 cm, the average length of a male adult vocal tract, we obtain N = 8, or alternatively 4 formants. Experimentally, the vocal tract system has been observed to have approximately 1 formant per kHz. Shorter vocal tract lengths (females or children) have fewer resonances per kHz and vice versa. Source-filter models of speech production For a total model of human speech production, we combine the glottal excitation with the lossless tube concatenation model into a mixed excitation model, as shown below:
H()