Module3 SSP
Module3 SSP
ECE3028
BY
DR. K. GOWRI
ASSISTANT PROFESSOR,
DEPARTMENT OF ECE,
PRESIDENCY UNIVERSITY,
BANGALORE
Module 3
• and for sustained unvoiced speech, assuming white noise excitation with unit
power, the power spectrum of the output is,
• Thus, it is to be expected that the Fourier spectrum of the output would reflect
the properties of the excitation, the vocal tract and radiation frequency
responses.
• However, although vowels and fricatives can be sustained for several seconds
with little variation, natural speech is continually changing in time.
Introduction- Frequency Domain Representations
• Thus the standard Fourier representations that are appropriate for periodic,
transient, or stationary random signals are not directly applicable to the
representation of speech signals.
• That temporal properties such as energy, zero crossings, and correlation (short-
time analysis principle ) are slowly varying so that they can be assumed to be
fixed over time intervals on the order of 10 to 40 msec.
• In order to study spectral properties of speech signals, we will define a time-
varying Fourier transform, which we generally refer to as the short-time Fourier
transform (STFT).
• Use of the STFT is therefore termed short-time Fourier analysis (STFA).
• The STFT is invertible in the sense that, with certain constraints, we can recover
the original sampled signal by a process that we term short-time Fourier
synthesis (STFS).
Introduction- Frequency Domain Representations
• Indeed, STFA/STFS provides a representation of the speech waveform that can
serve as the basis for many types of speech processing including coding and
various types of signal enhancement.
• This is depicted in Figure 2, which shows that the processing can be controlled
by “side information” extracted by other means from the speech signal.
Discrete time Fourier Analysis
• The discrete-time Fourier transform (DTFT) of a discrete time signal, x[n],
related by the pair of equations,
•
• --(1)
• where ω is the normalized frequency variable of X(e ^jω) in units of radians.
• The discrete Fourier transform (DFT), which is inherently a representation of
periodic sequences, but is applicable to finite length sequences if care is taken
to ensure that one period is precisely equal to the desired finite-length
sequence.
Discrete time Fourier Analysis
• The DFT and its inverse are given by the equations,
• The DFT and DTFT can both be used as mathematical representations of a finite-
length sequence; specifically, the DFT and the DTFT of a finite-length sequence
are related by,
that is, the DFT is a sampled (in frequency) version of the DTFT.
SHORT-TIME FOURIER ANALYSIS
• Define the time-dependent, or short-time, Fourier transform (STFT) as,
• --(1)
• where w[nˆ − m] real window sequence whose purpose is to determine the
portion of the input signal that receives emphasis at a particular time index, nˆ.
• The time dependent Fourier transform is a complex function of two variables:
1. the time index, nˆ, which is discrete,
2. the frequency variable, ωˆ, which is continuous and periodic, with period 2π.
SHORT-TIME FOURIER ANALYSIS
• A plot showing the domain of the two variables (is given in Figure 3 ),
1. nˆ for the range 0 ≤ nˆ ≤ 8 (nˆ is defined for all discrete values but only a few
are shown in this figure) and
2. ωˆ for 0 ≤ ˆω < 2π (since ωˆ is periodic over intervals of 2π). Alternatively,
we could use the range −π < ωˆ ≤ π.
• An alternative form of Eq. (1) is obtained by a change of summation index,
which yields the expression,
FIGURE 3 Domain of variables nˆ and ωˆ in
the definition of the STFT.
SHORT-TIME FOURIER ANALYSIS
• If we define
DTFT Interpretation
DFT Implementation
DTFT Interpretation
• Consider as the DTFT of the sequence w[nˆ − m]x[m], − ∞ < m < ∞, for
fixed nˆ.
• The time-dependent Fourier transform is a function of the time index, nˆ, which
takes on all integer values so as to “slide” the window, w[nˆ − m], along the
sequence, x[m].
• This is depicted in Figure 4, which shows x[m] and w[nˆ − m] as functions of m
for several values of nˆ.
• (2)
• (3)
• (4)
DTFT Interpretation
• Note that the integration could be over any interval of length 2π (e.g., 0 to 2π).
• Now if w[0] ~= 0, Eq. (7.12) can be evaluated for m = nˆ, thereby obtaining,
• --(5)
DFT Implementation
• where the window is chosen to be an L-point non-causal window such that w[ −
m] ~= 0 only in 0 ≤ m ≤ L − 1 and L ≤ N.
• It follows that the STFT at time nˆ and frequencies ω_k = 2πk/N is,
• --(6)
• Equation (5) should be recognized as the DFT of the windowed sequence x˜nˆ
[m] = x[nˆ + m]w[−m] for 0 ≤ m ≤ L − 1.
• Hence X˜ nˆ [k] can be computed efficiently by an FFT algorithm if N is a power
of two or some other highly composite number.
Steps for DFT
• Thus, to compute X˜ [k], we iterate the following steps:
• 1. Select a set of L samples starting at nˆ. (For a causal window, we would take
sample nˆ and the L − 1 samples preceding nˆ.)
• 2. Multiply by the window w[ − m] to form , for m = 0,
1, ... , L − 1.
• Compute the N-point DFT of using a fast (FFT) algorithm.
• If magnitude and phase of are required, use eqn(6).
• Otherwise, note that,
Sampling Rates in Time and Frequency
• The STFT is a complex two-dimensional representation of the one-dimensional
real speech signal x[n].
• That is, is a function of both the discrete index n which represents time,
and continuous normalized radian STFT analysis frequency ωˆ.
• As such, is like a (complex-valued) two-dimensional image with one
discrete (n) and one continuous dimension.
• Figure 5a shows the region of support for in two dimensions.
• A basic consideration in the digital implementation of systems for STFA is the
rate at which should be sampled in both the time and frequency
dimensions to provide an un-aliased representation of from which x[n]
can be exactly recovered.
Sampling Rates in Time and Frequency
FIGURE 5 Domain of STFT variables ωˆ and n for (a) the case with no sampling and (b) the case with sampling based
on the bandwidth and timewidth of the lowpass window, w[n].
Sampling Rates in Time and Frequency
• Figure 5b shows the discrete region of support when is sampled in time
with interval R samples and in frequency at frequencies ωk = (2πk/N) as in,
• --(7)
• where k = 0, 1, ... , N − 1 and r ranges over the integers.
• For example, for an L-point causal Hamming window, the limits would be finite,
ranging from m = rR − L + 1 to m = rR.
• R and N should be chosen carefully, so that the speech signal can be
reconstructed from its sampled STFT.
Sampling Rates in Time and Frequency
• Sampling rates lower than the theoretically minimum rate required to avoid
aliasing in both dimensions can be used in either the time or the frequency
dimensions, and x[n] can still be exactly recovered from the aliased (under-
sampled) short-time transform.
• Such under-sampled representations are indeed quite useful for applications -
spectral estimation, pitch and formant analysis, and digital speech spectrograms
or for speech and audio coding applications.
Sampling Rate of in Time
• The linear filtering interpretation- for determining the required sampling rate in
the time dimension.
• For a fixed value of ωˆ, is the output of a linear filter with impulse
response w[n].
• DTFT W(e^jω) has the properties of a (non-ideal) low-pass filter frequency
response.
• Let us denote the effective bandwidth of the analysis window as B Hz.
• Thus the sequence (as a function of n with ωˆ fixed) has bandwidth
determined by the DTFT of the window, and therefore according to the sampling
theorem, must be sampled at a rate of at least 2B samples/sec to avoid
aliasing.
Sampling Rate in STFT Frequency ω^
• As an example, consider an (L = 2M + 1)−point Hamming window,
• Then the approximate filter bandwidth of W(e^jω) in terms of analog
frequencies is,
• if N ≥ L.
• Otherwise time aliasing will occur; i.e., the inverse DFT evaluated in the interval
rR − L + 1 ≤ m ≤ rR will be composed of a sum of shifted (by N) copies of
x[m]w[rR − m].
• Thus, for the example of a Hamming window of duration L = 100 samples, we
require to be evaluated for at least 100 uniformly spaced frequencies.
Total Sampling Rate
• We can determine the total number of samples of that must be computed
per second to give an un-aliased representation of the original signal x[n].
• The minimum sampling rate of in the time dimension is 2B, where B is
the frequency bandwidth of the window, and the minimum number of samples
in the frequency dimension is L, the time width of the window is,
• --(8)
• For most practical windows, B can be represented as a multiple of (Fs/L), where
Fs is the sampling frequency of x[n]; i.e.,
• --(9)
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• where for uniform sampling in frequency, ωk = 2πk/N, with k = 0, 1, ... , N − 1.
• These are the standard DFT frequencies,
• The STFS equation
• --(10)
• This is simply the inverse DFT of at the particular time n.
• The process of STFS from the point of view of the linear filtering interpretation
of the STFT is discussed in this topic.
• The method of synthesis that emerges from this interpretation is called the filter
bank summation method (FBS) of short-time synthesis.
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• Before exploring, observe that if N ≥ L, where L is the window length, the
inverse DFT produces w[n − m]x[m] for n − m inside the region of support of the
window.
• If m = n, --(11)
• That is, Eq. (11) exactly reconstructs x[n] to within a constant multiplier, and if
w[0] > 0, we can divide by w[0] to obtain x[n].
• Therefore, for all n, Eq. (10) is the desired synthesis equation and w[0] is
the scale factor on the synthesized output.
• has the interpretation of low-pass filtering following frequency down-
shifting by ωk.
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• With a change of summation variable, we have the alternative form (put m=n-
m),
• --(12)
• With the definition,
• --(13)
• Eq. (12) becomes,
• --(14)
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• W.k.t, window w[n] has the properties of a low-pass filter.
• Eq. (14) can be interpreted as in Figure 6 as a band-pass filter with impulse
response followed by frequency down-shifting by modulation with a
complex exponential .
• --(16)
• If we consider the entire collection of band-pass filters, each having the same
input and their outputs added together as in Figure 9, the composite frequency
response relating y[n] to x[n] is,
• --(19)
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• --(20)
• To derive Eq. (20), recall that is the DFT of the window.
• Therefore, the inverse DFT of is,
• --(21)
• We have used the concept of a bank of filters to confirm a very important result
that we have already observed from the DFT viewpoint.
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• Under the condition that
• 1. w[n] has finite duration,
• 2. N>=L,
• 3. the sequence x[n] can be reconstructed exactly from the time-dependent
Fourier transform sampled in the time dimension (n) at the sampling rate of the
input signal
• sampled in the frequency dimension at N ≥ L equally spaced frequencies over 0
≤ ˆω < 2π.
• There are many ways to achieve exact reconstruction, even when N < L
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• For example, if the window is the infinite sequence,
• ---(22)
• corresponding to the impulse response of an ideal lowpass filter with cutoff
frequency π/N,
• then the composite frequency response would be constant
independent of ω , 0 ≤ ω < 2π.
• This situation is depicted in Figure 9, which shows the composite response for N
= 6 equally spaced ideal filters each of total width 2π/N.
• It is clear that exact reconstruction is achieved for this window even though it
has infinite length.
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• where we have introduced a set of complex gain coefficients, P[k], for each of
the N filter bank channels.
• These complex gain coefficients are shown in Figure 10
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• i.e., the composite impulse response is simply the window sequence sampled at
intervals of N samples.
• This is depicted in Figure 11.
• Figure 11a shows the sequence p[n].
• Figure 11b shows w[n] as given in Eq. (7.75)
FILTER BANK SUMMATION METHOD OF SYNTHESIS
FILTER BANK SUMMATION METHOD OF SYNTHESIS
• Thus, apart from the scale factor, , and delay of r0N samples, the output
of the system for time-dependent Fourier analysis and synthesis is an exact
replica of the input sequence.