Werdibaji
Werdibaji
Werdibaji
Release 0.0
Sascha Spors
2 Random Signals 27
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.1 Statistical Signal Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.1.2 Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.1.3 Properties of Random Processes and Random Signals . . . . . . . . . . . . . . . . . . . 29
2.2 Cumulative Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.1 Univariate Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . 29
2.2.2 Bivariate Cumulative Distribution Function . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Probability Density Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.1 Univariate Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.3.2 Bivariate Probability Density Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4 Ensemble Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.1 First Order Ensemble Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2.4.2 Second Order Ensemble Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5 Stationary Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.5.2 Cumulative Distribution Functions and Probability Density Functions . . . . . . . . . . 38
2.5.3 First Order Ensemble Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.5.4 Cross- and Auto-Correlation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6 Weakly Stationary Random Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.6.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
i
2.6.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.7 Higher Order Temporal Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
2.8 Ergodic Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.9 Weakly Ergodic Random Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.9.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.9.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.10 Auto-Correlation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.10.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.10.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
2.10.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.11 Auto-Covariance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.12 Cross-Correlation Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.12.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.12.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.12.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2.13 Cross-Covariance Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.14 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.14.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.14.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.14.3 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.15 Cross-Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.16 Important Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.16.1 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
2.16.2 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
2.16.3 Laplace Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
2.16.4 Amplitude Distribution of a Speech Signal . . . . . . . . . . . . . . . . . . . . . . . . . 62
2.17 White Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.17.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.17.2 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
2.18 Superposition of Random Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
2.18.1 Cumulative Distribution and Probability Density Function . . . . . . . . . . . . . . . . 66
2.18.2 Linear Mean . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
2.18.3 Auto-Correlation Function and Power Spectral Density . . . . . . . . . . . . . . . . . . 67
2.18.4 Cross-Correlation Function and Cross Power Spectral Density . . . . . . . . . . . . . . 68
2.18.5 Additive White Gaussian Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
ii
3.11.2 Transfer Function of the Wiener Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
3.11.3 Wiener Deconvolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.11.4 Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5 Quantization 99
5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.1 Model of the Quantization Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.1.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
5.2 Characteristic of a Linear Uniform Quantizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2.1 Mid-Tread Chacteristic Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
5.2.2 Mid-Rise Chacteristic Curve . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.3 Quantization Error of a Linear Uniform Quantizer . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.1 Signal-to-Noise Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.2 Model for the Quantization Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3.3 Uniformly Distributed Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.3.4 Harmonic Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.5 Normally Distributed Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.3.6 Laplace Distributed Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.4 Requantization of a Speech Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4.1 Requantization to 8 bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5.4.2 Requantization to 6 bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4.3 Requantization to 4 bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
5.4.4 Requantization to 2 bit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5 Spectral Shaping of the Quantization Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.5.1 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
5.6 Oversampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.6.1 Ideal Analog-to-Digital Conversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.6.2 Nyquist Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.6.3 Oversampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
5.6.4 Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.6.5 Anti-Aliasing Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
5.7 Non-Linear Requantization of a Speech Signal . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.7.1 Quantization Characteristic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
5.7.2 Signal-to-Noise Ratio . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.7.3 Requantization of a Speech Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
iii
6.2 Fast Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.1 Convolution of Finite-Length Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
6.2.2 Linear Convolution by Periodic Convolution . . . . . . . . . . . . . . . . . . . . . . . . 125
6.2.3 The Fast Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
6.3 Segmented Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3.1 Overlap-Add Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
6.3.2 Overlap-Save Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
6.3.3 Practical Aspects and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
6.4 Quantization Effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.1 Quantization of Filter Coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
6.4.2 Quantization of Signals and Operations . . . . . . . . . . . . . . . . . . . . . . . . . . 140
10 Literature 195
11 Contributors 197
iv
Digital Signal Processing, Release 0.0
This collection contains the lecture notes to the masters course Digital Signal Processing, Institute of Communica-
tions Engineering, Universität Rostock read by Sascha Spors. The notes are provided as Jupyter notebooks using
IPython 3 as Open Educational Resource. Feel free to contact me if you have questions or suggestions.
• Reference Card Discrete Signals and Systems
• Reference Card Random Signals and LTI Systems
Table of Contents 1
Digital Signal Processing, Release 0.0
2 Table of Contents
CHAPTER 1
The analysis of the spectral properties of a signal plays an important role in signal processing. Some application
examples are
• Spectrum analyzer
• Detection of (harmonic) signals
• Estimation of fundamental frequency and harmonics
• Spectral suppression: acoustic echo suppression, noise reduction, ...
Spectral analysis often applies the discrete Fourier transformation (DFT) onto discrete finite-length signals in
order to determine the spectrum and its magnitude.
Spectral leakage is a fundamental effect of the DFT. It limits the ability to detect harmonic signals in signal
mixtures. In order to discuss the properties of the DFT, the transition from the Fourier transform applied to an
analytic continuous signal to the DFT applied to a sampled finite-length signal is investigated.
We first consider the spectrum of one single harmonic signal. For the continuous case this is given by the complex
exponential function
𝑥(𝑡) = e j 𝜔0 𝑡
where 𝜔0 = 2𝜋𝑓 denotes its angular frequency. The Fourier-Transformation of the exponential function is
∫︁∞
𝑋(j 𝜔) = 𝑥(𝑡) e −j 𝜔 𝑡 d𝑡 = 2𝜋 𝛿(𝜔 − 𝜔0 )
−∞
The spectrum consists of a single Dirac impulse, hence a clearly isolated and distinguishable event.
Now lets consider sampled signals. The discrete exponential signal is derived from its continuous counterpart by
equidistant sampling 𝑥[𝑘] := 𝑥(𝑘𝑇 ) with the sampling interval 𝑇
𝑥[𝑘] = e j Ω0 𝑘
3
Digital Signal Processing, Release 0.0
where Ω0 = 𝜔0 𝑇 denotes the normalized angular frequency. The discrete-time Fourier transform (DTFT) is the
Fourier transformation of a sampled signal. For the exponential signal it is given as
∞
∑︁ ∞
∑︁
𝑋(e j Ω ) = 𝑥[𝑘] e −j Ω 𝑘 = 2𝜋 𝛿((Ω − Ω0 ) − 2 𝜋 𝑛)
𝑘=−∞ 𝑛=−∞
The spectrum of the DTFT is periodic due to sampling. As a consequence, the transformation of the discrete
exponential signal consists of a series Dirac impulses. For the region of interest −𝜋 < Ω ≤ 𝜋 the spectrum
consists of a clearly isolated and distinguishable event, as for the continuous case.
The DTFT cannot be realized in practice, since is requires the knowledge of the signal 𝑥[𝑘] for all time instants 𝑘.
The DFT can be derived from the DTFT in two steps
1. truncation (windowing) of the signal
2. sampling of the DTFT spectrum
The consequences of these two steps are investigated in the following two sections.
Truncation of the signal 𝑥[𝑘] to a length of 𝑁 samples is modeled by multiplying the signal with a window function
𝑤[𝑘] of length 𝑁
where 𝑥𝑁 [𝑘] denotes the truncated signal. Its spectrum 𝑋𝑁 (e j Ω ) can be derived from the multiplication theorem
of the DTFT as
1
𝑋𝑁 (e j Ω ) = 𝑋(e j Ω ) ~ 𝑊 (e j Ω )
2𝜋
where ~ denotes the cyclic/circular convolution. For a hard truncation of the signal to 𝑁 samples the window
function 𝑤[𝑘] = rect𝑁 [𝑘] yields
𝑁 −1 sin( 𝑁2Ω )
𝑊 (e j Ω ) = e −j Ω 2 ·
sin( Ω2 )
Introducing the DTFT of the exponential signal into above findings, exploiting the properties of Dirac impulses
and the cyclic convolution allows to derive the DTFT of a truncated exponential signal
𝑁 −1 sin( 𝑁 (Ω−Ω0)
)
𝑋𝑁 (e j Ω ) = e −j (Ω−Ω0 ) 2 · 2
sin( (Ω−Ω
2
0)
)
# plot spectrum
plt.figure(figsize = (10, 8))
Excercise
• Change the frequency Om0 of the signal and rerun the cell. What happens?
• Change the length N of the signal and rerun the cell. What happens?
The maximum absolute value of the spectrum is located at the frequency Ω0 . It should become clear that truncation
of the exponential signal leads to a broadening of the spectrum. The shorter the signal the wider the mainlobe
becomes.
The DFT can be derived from the DTFT 𝑋𝑁 (e j Ω ) of the truncated signal by sampling the DTFT equiangularly
at (angles) Ω = 𝜇 2𝜋
𝑁
𝑋[𝜇] = 𝑋𝑁 (e j Ω )⃒Ω=𝜇 2𝜋
⃒
𝑁
The sampling of the DTFT is illustrated in the following example. Note, the normalized angular frequency Ω0 has
been expressed in terms of the periodicity 𝑃 of the exponential signal Ω0 = 𝑃 2𝜋
𝑁 .
# plot spectra
plt.figure(figsize = (10, 8))
plt.hold(True)
plt.show()
Exercise
• Change the periodicity P of the exponential signal and rerun the cell. What happens if the periodicity is an
integer? Why?
• Change the length N of the DFT? What happens?
• What conclusions can be drawn for the analysis of exponential signals by the DFT?
You should have noticed that for an exponential signal whose periodicity is an integer 𝑃 ∈ N, the DFT consists
of a discrete Dirac pulse 𝑋[𝜇] = 𝛿[𝜇 − 𝑃 ]. In this case, the sampling points coincide with the maximum of the
main lobe or the zeros of the DTFT. For non-integer 𝑃 , hence non-periodic exponential signals with respect to the
signal length 𝑁 , the DFT has additional contributions. The shorter the length 𝑁 , the wider these contributions
are spread in the spectrum. This smearing effect is known as leakage effect of the DFT. This effect limits the
achievable frequency resolution of the DFT when analyzing signals mixtures with more than one exponential
signal. This is illustrated in the following.
In order to discuss the implications of the leakage effect when analyzing signal mixtures, the superposition of
two exponential signals with different amplitudes and frequencies is considered. For convenience, a function is
defined that calculates and plots the magnitude spectrum
In [3]: def dft_signal_mixture(N, A1, P1, A2, P2):
# N: length of signal/DFT
# A1, P1, A2, P2: amplitude and periodicity of 1st/2nd complex exponential
# plot spectrum
plt.figure(figsize = (10, 8))
plt.stem(mu, abs(X))
plt.title(r'Absolute value of the DFT of a signal mixture')
plt.xlabel(r'$\mu$')
plt.ylabel(r'$|X[\mu]|$')
plt.axis([0, N, -0.5, N+5]);
Lets first consider the case that the frequencies of the two exponentials are rather apart
In [4]: dft_signal_mixture(32, 1, 10.3, 1, 15.2)
Investigating the magnitude spectrum one could conclude that the signal consists of two major contributions at the
frequencies 𝜇1 = 10 and 𝜇2 = 15. Now lets take a look at a situation when the frequencies are closer together
In [5]: dft_signal_mixture(32, 1, 10.3, 1, 10.9)
From visual inspection of the spectrum it is rather unclear if the mixture consists from one or two exponential
signals. So far the levels of both signals where chosen equal.
Lets consider the case that the second signal has a much lower level that the first one. The frequencies have been
chosen equal to the first example
In [6]: dft_signal_mixture(32, 1, 10.3, 0.1, 15.2)
Now the contribution of the second exponential is hidden in the spread spectrum of the first exponential.
For the discussion of the leakage effect in the previous section, a hard truncation of the signal 𝑥[𝑘] by a rectangular
window 𝑤[𝑘] = rect𝑁 [𝑘] was assumed. Also other window functions are used for spectral analysis. The resulting
properties depend on the spectrum 𝑊 (e j Ω ) of the window function, since the spectrum of the windowed signal is
1
given as 𝑋𝑁 (e j Ω ) = 2𝜋 𝑋(e j Ω ) ~ 𝑊 (e j Ω ). Different window functions have different properties. For instance
with respect to the capability to distinguish two neighboring signals (frequency resolution) or to detect two signals
where one is much weaker (sidelobe level). Since these two aspects counteract for typical window functions, the
choice of a suitable window depends heavily on the application. We therefore take a look at frequently applied
window functions and their properties.
In order to investigate the windows, a function is defined which computes and plots the spectrum of a given
window function. The spectrum 𝑊 (e j Ω ) is approximated numerically by the DFT.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
def dft_window_function(w):
N = len(w)
plt.figure()
plt.plot(mu, 20*np.log10(np.abs(W)))
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$| W(e^{j \Omega}) |$ in dB')
plt.axis([-np.pi, np.pi, -100, 5])
plt.grid()
The rectangular window 𝑤[𝑘] = rect𝑁 [𝑘] takes all samples with equal weight into account. The main lobe of its
magnitude spectrum is narrow, but the level of the side lobes is rather high. It has the highest frequency selectivity.
In [2]: dft_window_function(np.ones(64))
For an odd window length 2𝑁 − 1, the triangular window can be expressed as the convolution of two rectangular
windows 𝑤[𝑘] = rect𝑁 [𝑘] * rect𝑁 [𝑘]. The main lobe is wider as for the rectangular window, but the level of the
side lobes decays faster.
In [3]: dft_window_function(sig.triang(63))
The function for the analysis of a superposition of two exponential signals from the previous section is extended
by windowing
In [7]: def dft_signal_mixture_window(N, A1, P1, A2, P2, w):
# N: length of signal/DFT
# A1, P1, A2, P2: amplitude and periodicity of 1st/2nd complex exponential
# window applied to the signal
# plot spectrum
plt.figure(figsize = (10, 8))
plt.stem(mu, abs(X))
plt.title(r'Absolute value of the DFT of a signal mixture')
plt.xlabel(r'$\mu$')
plt.ylabel(r'$|X[\mu]|$')
plt.axis([0, N, -0.5, abs(X).max()+5]);
Now the last example is re-investigated by using a Blackman window which features a high suppression of the
sidelobes. The second exponential signal with the lower level now becomes visible in the spectrum.
In [8]: dft_signal_mixture_window(32, 1, 10.3, 0.1, 15.2, np.blackman(32))
Exercise
• Examine the effect of the other window functions for small/large frequency and level differences. What
window function is best suited for what situation?
1.3 Zero-Padding
1.3.1 Concept
Let’s assume a signal 𝑥𝑁 [𝑘] of finite length 𝑁 , for instance a windowed signal 𝑥𝑁 [𝑘] = 𝑥[𝑘] · rect𝑁 [𝑘]. The
discrete Fourier transformation (DFT) of 𝑥𝑁 [𝑘] reads
𝑁
∑︁−1
𝜇𝑘
𝑋𝑁 [𝜇] = 𝑥𝑁 [𝑘] 𝑤𝑁
𝑘=0
2𝜋
where 𝑤𝑁 = e−j 𝑁 denotes the kernel of the DFT. For a sampled time-domain signal, the distance in frequency
between two neighboring coefficients is given as ∆𝑓 = 𝑓𝑁𝑠 , where 𝑓𝑠 = 𝑇1 denotes the sampling frequency.
Hence, if 𝑁 is increased the distance between neighboring frequencies is decreased. This leads to the concept of
zero-padding in spectral analysis. Here the signal 𝑥𝑁 [𝑘] of finite length is filled up with (M-N) zero values to a
total length 𝑀 ≥ 𝑁
T
𝑥𝑀 [𝑘] = [𝑥[0], 𝑥[1], . . . , 𝑥[𝑁 − 1], 0, . . . , 0]
𝑓𝑠
The DFT 𝑋𝑀 [𝜇] of 𝑥𝑀 [𝑘] has now a decreased distance between neighboring frequencies ∆𝑓 = 𝑀.
1.3. Zero-Padding 17
Digital Signal Processing, Release 0.0
The question arises what influence zero-padding has on the spectrum and if it can enhance spectral analysis. On
first sight it seems that the frequency resolution is higher, however do we get more information for the signal? In
order to discuss this, a short numerical example is shown followed by a derivation of the relations between the
spectrum 𝑋𝑀 [𝑘] with zero-padding and 𝑋𝑁 [𝑘] without zero-padding.
Example
The following example computes and plots the magnitude spectra of a truncated complex exponential signal
𝑥𝑁 [𝑘] = e j Ω0 𝑘 · rect𝑁 [𝑘] and its zero-padded version 𝑥𝑀 [𝑘].
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# plot spectra
plt.figure(figsize = (10, 6))
plt.subplot(121)
plt.stem(np.arange(N),np.abs(XN))
plt.title(r'DFT of $e^{j \Omega_0 k}$ without zero-padding')
plt.xlabel(r'$\mu$')
plt.ylabel(r'$|X_N[\mu]|$')
plt.axis([0, N, 0, 18])
plt.subplot(122)
plt.stem(np.arange(M),np.abs(XM))
plt.title(r'DFT of $e^{j \Omega_0 k}$ with zero-padding')
plt.xlabel(r'$\mu$')
plt.ylabel(r'$|X_M[\mu]|$')
plt.axis([0, M, 0, 18]);
Exercise
• Check the two spectra carefully for relations. Are there common coefficients for the case 𝑀 = 2𝑁 ?
• Increase the length M of the zero-padded signal 𝑥𝑀 [𝑘]. Can you get additional information from the spec-
trum?
Lets step back to the discrete time Fourier transformation (DTFT) of the finite-length signal 𝑥𝑁 [𝑘] without zero-
padding
∞
∑︁ 𝑁
∑︁−1
𝑋𝑁 (e j Ω ) = 𝑥𝑁 [𝑘] e −j Ω 𝑘 = 𝑥𝑁 [𝑘] e− j Ω 𝑘
𝑘=−∞ 𝑘=0
𝑁 −1
2𝜋
∑︁
jΩ
𝑥𝑁 [𝑘] e −j 𝜇 𝑁 𝑘
⃒
𝑋𝑁 [𝜇] = 𝑋𝑁 (e ) Ω=𝜇 2𝜋 =
⃒
𝑁
𝑘=0
Since the DFT coefficients 𝑋𝑁 [𝜇] are sampled equidistantly (or rather equiangularly on the unit circle) from the
DTFT 𝑋𝑁 (e j Ω ), we can reconstruct the DTFT of 𝑥𝑁 [𝑘] from the DFT coefficients by interpolation. Introduce
the inverse DFT of 𝑋𝑁 [𝜇]
𝑁 −1
1 ∑︁ 2𝜋
𝑥𝑁 [𝑘] = 𝑋𝑁 [𝜇] e j 𝑁 𝜇 𝑘
𝑁 𝜇=0
𝜇=0
𝑁
𝑘=0 𝑘=0
1.3. Zero-Padding 19
Digital Signal Processing, Release 0.0
reveals the relation between 𝑋𝑁 (e j Ω ) and 𝑋𝑁 [𝜇]. The last sum can be solved analytically yielding the so called
periodic sinc function (aliased sinc function, Dirichlet kernel) psinc𝑁 (Ω) and a phase shift. This results in
𝑁 −1 (Ω− 2𝜋 𝜇)(𝑁 −1)
∑︁ 𝑁 2𝜋
𝑋𝑁 (e j Ω ) = 𝑋𝑁 [𝜇] · e− j 2 · psinc𝑁 (Ω − 𝜇)
𝜇=0
𝑁
where
1 sin( 𝑁2 Ω)
psinc𝑁 (Ω) =
𝑁 sin( 12 Ω)
Example
This example illustrates the interpolation of 𝑋𝑁 [𝜇] using the relation derived above. Using above definition, the
periodic sinc function is not defined at Ω = 2𝜋 𝑛 for 𝑛 ∈ Z. This is resolved by taking its limit value.
In [2]: N = 16 # length of the signal
M = 1024 # number of frequency points for DTFT
Om0 = 5.33*(2*np.pi/N) # frequency of exponential signal
# plot spectra
plt.figure(figsize = (10, 8))
plt.hold(True)
1.3.3 Relation between Discrete Fourier Transformations with and without Zero-
Padding
It was already outlined above that the DFT is related to the DTFT by sampling. Since the zero-paded signal
𝑥𝑀 [𝑘] differs to 𝑥𝑁 [𝑘] only with respect to the additional zeros, its DFT 𝑋𝑀 [𝜇] is given by resampling the DTFT
interpolation of 𝑋𝑁 [𝜂], i.e. 𝑋𝑁 (ej Ω ) at Ω = 2𝜋
𝑀𝜇
𝑁 −1 ( 2𝜋 𝜇− 2𝜋 𝜂)(𝑁 −1)
(︂ )︂
∑︁
−j 𝑀 𝑁 2𝜋 2𝜋
𝑋𝑀 [𝜇] = 𝑋𝑁 [𝜂] · e 2 · psinc𝑁 𝜇− 𝜂
𝜂=0
𝑀 𝑁
for 𝜇 = 0, 1, . . . , 𝑀 − 1.
Above equation relates the spectrum 𝑋𝑁 [𝜇] of the original signal 𝑥𝑁 [𝑘] to the spectrum 𝑋𝑀 [𝜇] of the zero-padded
signal 𝑥𝑀 [𝑘]. It essentially constitutes a bandlimited interpolation of the coefficients 𝑋𝑁 [𝜇].
All spectral information of a signal of finite length 𝑁 is already contained in its spectrum derived from a DFT of
length 𝑁 . By applying zero-padding and a longer DFT, the frequency resolution is only virtually increased. The
additional coefficients are related by bandlimited interpolation to the original ones. In general, zero-padding does
not bring a benefit in spectral analysis. It may bring a benefit in special applications, for instance when estimating
the frequency of an isolated harmonic signal from its spectrum.
Example
The following example shows that the coefficients 𝑋𝑀 [𝜇] of the spectrum after zero-padding can be derived from
the spectrum 𝑋𝑁 [𝜂] by interpolation.
1.3. Zero-Padding 21
Digital Signal Processing, Release 0.0
# plot spectra
plt.figure(figsize = (10, 6))
plt.subplot(121)
plt.stem(np.arange(N),np.abs(XN))
plt.title(r'DFT of $e^{j \Omega_0 k}$ without zero-padding')
plt.xlabel(r'$\mu$')
plt.ylabel(r'$|X_N[\mu]|$')
plt.axis([0, N, 0, 18])
plt.subplot(122)
plt.stem(np.arange(M),np.abs(XM))
plt.title(r'Interpolated spectrum')
plt.xlabel(r'$\mu$')
plt.ylabel(r'$|X_M[\mu]|$')
plt.axis([0, M, 0, 18]);
Exercise
• Compare the interpolated spectrum to the spectrum with zero padding from the first example.
• Estimate the frequency Ω0 of the exponential signal from the interpolated spectrum. How could you increase
the accuracy of your estimate?
The DFT is not very well suited for the analysis of instationary signals when applied to the entire signal. Practical
signals, for instance an antenna signal, cannot be analyzed in an on-line manner by the DFT. This motivates to
split a long signal into segments and compute the DFT on these segments. This transformation is known as the
short-time Fourier transformation (STFT).
The STFT 𝑋[𝜇, 𝑛] of a signal 𝑥[𝑘] is defined as
∑︁−1
𝑛+𝑁
𝑘𝜇
𝑋[𝜇, 𝑛] = 𝑥[𝑘] 𝑤[𝑘 − 𝑛] 𝑤𝑁
𝑘=𝑛
2𝜋
where 𝑤𝑁 = e −j 𝑁 denotes the kernel of the DFT and 𝑤[𝑘] a window function of length 𝑁 which is normalized
∑︀𝑁 −1
by 𝑘=0 𝑤[𝑘] = 1. Starting from 𝑘 = 𝑛, the signal 𝑥[𝑘] is windowed by 𝑤[𝑘] to a segment of length 𝑁 . This
windowed segment is then transformed by a DFT of length 𝑁 .
The STFT has many applications in digital signal processing. For instance in the spectral analysis of signals or the
processing of instationary signals. The resulting spectrum 𝑋[𝜇, 𝑛] depends on the frequency index 𝜇 and the time
index 𝑛. It is therefore also termed as time-frequency domain and techniques using the STFT as time-frequency
processing.
The properties of the STFT depend on
• the length 𝑁 of the segments,
• the overlap between the segments, and
• the window function 𝑤[𝑘].
The size 𝑁 of the segments and the window function influence the spectral and temporal resolution of the STFT.
The time index 𝑛 of the STFT can be increased by an arbitrary step size. The step size determines the overlap
between two consecutive STFTs. For instance the spectra 𝑋[𝜇, 𝑛] and 𝑋[𝜇, 𝑛 + 1] have 𝑁 − 1 overlapping
samples. The overlap is sometimes given as percentage of the segment length 𝑁 .
The magnitude |𝑋[𝜇, 𝑛]| is known as the spectrogram of a signal. It is frequently used to analyze signals in the
time-frequency domain. For instance by a spectrum analyzer. The following example computes the spectrogram
of an unknown signal
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
# generate signal
k = np.arange(N)
x = sig.chirp(k, 0, N, .5)
Exercise
• Can you characterize the signal from its spectrogram? How would it sound like?
• Change the segment length L and the overlap overlap between segments. Rerun the cell. What changes
in the spectrogram?
Random Signals
2.1 Introduction
Random signals are signals whose values are not (or only to a limited extend) predictable. Frequently used
alternative terms are
• stochastic signals
• non-deterministic signals
Random signals play an important role in various fields of signal processing and communications. This is due
to the fact that only random signals carry information. A signal which is observed by some receiver has to have
unknown contributions in order to represent information.
Random signals are often classified as useful/desired and disturbing/interfering signals. For instance
• useful signals: data, speech, music, ...
• disturbing signals: thermal noise at a resistor, amplifier noise, quantization noise, ...
Practical signals are frequently modeled as a combination of a useful signal and an additive noise component.
As the values of a random signal cannot be foreseen, the properties of random signals are described by the their
statistical characteristics. For instance by average values.
Statistical signal processing treats signals as random processes, in contrary to the assumption of deterministic
signals in traditional signal processing. Two prominent application examples involving random signals are
The measurement of physical quantities is often subject to additive noise and distortions by e.g. the amplifier. The
aim of statistical signal processing is to estimate the physical quantity from the observed sensor data.
27
Digital Signal Processing, Release 0.0
Communication channel
In communications a message is send over a channel distorting the signal by e.g. multipath propagation. Additive
noise is present at the receiver due to background and amplifier noise. The aim of statistical signal processing is
to estimate the message sent from the received message.
A random process is a stochastic process which generates an ensemble of random signals. A random process
• provides a mathematical model for an ensemble of random signals
• generates different sample functions with specific common properties
Its is important to differentiate between
• ensemble: collection of all possible signals of a random process
• sample function: one specific random signal
An example for a random process is speech produced by humans. Here the ensemble is composed from the speech
signals produced by all humans on earth, one particular speech signal produced by one person at a specific time is
a sample function.
The following example shows sample functions of a continuous amplitude real-valued random process. All sample
functions have the same characteristics with respect to certain statistical properties.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
Exercise
• What is different, what is common between the sample functions?
• Rerun the cell. What changes now?
A random process can be characterized by the statistical properties of its amplitude values. Cumulative distribution
functions (CDFs) are one possibility.
The univariate CDF 𝑃𝑥 (𝜃, 𝑘) of a continuous-amplitude real-valued random signal 𝑥[𝑘] is defined as
𝑃𝑥 (𝜃, 𝑘) := Pr{𝑥[𝑘] ≤ 𝜃}
where Pr{·} denotes the probability that the given condition holds. The univariate CDF quantifies the probability
that for a fixed time index 𝑘 the condition 𝑥[𝑘] ≤ 𝜃 holds for the entire ensemble. It has the following properties
which can be concluded directly from its definition
lim 𝑃𝑥 (𝜃, 𝑘) = 0
𝜃→−∞
and
lim 𝑃𝑥 (𝜃, 𝑘) = 1
𝜃→∞
Hence, the probability that a continuous-amplitude random signal takes a specific value 𝑥[𝑘] = 𝜃 is not defined by
the CDF. This motivates the use of the probability density function introduced later.
The bivariate or joint CDF 𝑃𝑥𝑦 (𝜃𝑥 , 𝜃𝑦 , 𝑘𝑥 , 𝑘𝑦 ) of two continuous-amplitude real-valued random signals 𝑥[𝑘] and
𝑦[𝑘] is defined as
The joint CDF quantifies the probability that for a fixed 𝑘𝑥 the condition 𝑥[𝑘𝑥 ] ≤ 𝜃𝑥 and for a fixed 𝑘𝑦 the
condition 𝑦[𝑘𝑦 ] ≤ 𝜃𝑦 holds for the entire ensemble of sample functions.
Propability density functions (PDFs) describe the probability for a random signal to take on a given value.
The univariate PDF 𝑝𝑥 (𝜃, 𝑘) of a continuous-amplitude real-valued random signal 𝑥[𝑘] is defined as the derivative
of the univariate CDF
𝜕
𝑝𝑥 (𝜃, 𝑘) = 𝑃𝑥 (𝜃, 𝑘)
𝜕𝜃
Due to the properties of the CDF and the definition of the PDF, it has the following properties
𝑝𝑥 (𝜃, 𝑘) ≥ 0
and
∫︁∞
𝑝𝑥 (𝜃, 𝑘) d𝜃 = 𝑃𝑥 (∞, 𝑘) = 1
−∞
∫︁𝜃
𝑃𝑥 (𝜃, 𝑘) = 𝑝𝑥 (𝜃, 𝑘) d𝜃
−∞
In the process of calculating a histogram, the entire range of amplitude values of a random signal is split into a
series of intervals (bins). Then it is counted how many values of the signal fall into these intervals. This constitutes
a numerical approximation of the PDF.
In the following example the histogram of an ensemble is calculated for each time index 𝑘. The CDF is approxi-
mated by the cumulative sum over the histogram bins.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
Exercise
• Change the parameters N and bins and rerun the cell. What changes? Why?
In numerical simulations of random processes only a finite number of sample functions and temporal samples can
be considered. This holds also for the number of intervals (bins) used for the histogram. As a result, numerical
approximations of the CDF/PDF will be subject to statistical uncertainties that typically will become smaller if
the number of sample functions is increased.
The bivariate or joint PDF 𝑝𝑥𝑦 (𝜃𝑥 , 𝜃𝑦 , 𝑘𝑥 , 𝑘𝑦 ) of two continuous-amplitude real-valued random signals 𝑥[𝑘] and
𝑦[𝑘] is defined as
𝜕2
𝑝𝑥𝑦 (𝜃𝑥 , 𝜃𝑦 , 𝑘𝑥 , 𝑘𝑦 ) := 𝑃𝑥𝑦 (𝜃𝑥 , 𝜃𝑦 , 𝑘𝑥 , 𝑘𝑦 )
𝜕𝜃𝑥 𝜕𝜃𝑦
The joint PDF quantifies the joint probability that 𝑥[𝑘] takes the value 𝜃𝑥 and that 𝑦[𝑘] takes the value 𝜃𝑦 for the
entire ensemble of sample functions.
If 𝑥[𝑘] = 𝑦[𝑘] the bivariate PDF 𝑝𝑥𝑥 (𝜃1 , 𝜃2 , 𝑘1 , 𝑘2 ) describes the probability that a random signal takes the value
𝜃1 at time instance 𝑘1 and the value 𝜃2 at time instance 𝑘2 . Hence, 𝑝𝑥𝑥 (𝜃1 , 𝜃2 , 𝑘1 , 𝑘2 ) provides insights into the
temporal dependencies of a random signal 𝑥[𝑘].
Ensemble averages characterize the average properties of a sample function across the population of all possible
sample functions of the ensemble. We distinguish between first and higher-order ensemble averages. The former
considers the properties of the sample functions for one particular time instant 𝑘, while the latter take different
signals at different time instants into account.
Definition
The first order ensemble average of a continuous-amplitude real-valued random signal 𝑥[𝑘] is defined as
𝑁 −1
1 ∑︁
𝐸{𝑓 (𝑥[𝑘])} = lim 𝑓 (𝑥𝑛 [𝑘])
𝑁 →∞ 𝑁
𝑛=0
where 𝐸{·} denotes the expectation operator, 𝑥𝑛 [𝑘] the 𝑛-th sample function and 𝑓 (·) an arbitrary mapping func-
tion. It is evident from the definition that the ensemble average can only be given exactly for random processes
where the internal structure is known. For practical random processes, like e.g. speech, the ensemble average can
only be approximated by a finite but sufficient large number 𝑁 of sample functions.
The ensemble average can also be given in terms of the univariate probability density function (PDF)
∫︁∞
𝐸{𝑓 (𝑥[𝑘])} = 𝑓 (𝜃) 𝑝𝑥 (𝜃, 𝑘) d𝜃
−∞
Properties
1. The ensemble averages for two different time instants 𝑘1 and 𝑘2 differ in general
̸ 𝐸{𝑓 (𝑥[𝑘2 ])}
𝐸{𝑓 (𝑥[𝑘1 ])} =
2. For a linear mapping 𝑓 (𝑥[𝑘]) = 𝑥[𝑘], the ensemble average is a linear operation
𝐸{𝐴𝑥[𝑘] + 𝐵𝑦[𝑘]} = 𝐴 · 𝐸{𝑥[𝑘]} + 𝐵 · 𝐸{𝑦[𝑘]}
Linear mean
The choice of the mapping function 𝑓 (·) determines the property of the random process which is characterized by
the ensemble average. The linear mean, which is given for 𝑓 (𝑥[𝑘]) = 𝑥[𝑘], is the arithmetic mean value across the
sample functions for a given time instant 𝑘.
Introducing 𝑓 (𝑥[𝑘]) = 𝑥[𝑘] into the definition of the ensemble average yields
∫︁∞
𝜇𝑥 [𝑘] = 𝐸{𝑥[𝑘]} = 𝜃 𝑝𝑥 (𝜃, 𝑘) d𝜃
−∞
where 𝜇𝑥 [𝑘] is a common abbreviation of the linear mean. A process with 𝜇𝑥 [𝑘] = 0 is termed zero-mean. Note
that 𝜇𝑥 should not be confused with the frequency index variable of the DFT.
Quadratic mean
It quantifies the mean instantaneous power of a sample function for a given time index 𝑘.
Variance
where 𝜎𝑥2 [𝑘] is a common abbreviation of the variance, 𝜎𝑥 [𝑘] is known as the standard deviation. The variance
characterizes how far the amplitude values of a random signal are spread out from its mean value.
The variance can be given in terms of the linear and quadratic mean as
Excercise
• Derive the relation above from the definitions and properties of the first order ensemble average
The following example shows the linear and quadratic mean, and variance of a random process. Since in practice
only a limited number 𝑁 of sample functions can be evaluated numerically these properties are only approxi-
mated/estimated.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# plot results
plt.rc('figure', figsize=(10, 3))
plt.figure()
plt.stem(x[0, :])
plt.title(r'Sample function $x_0[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$x_0[k]$')
plt.axis([0, K, -3, 3])
plt.figure()
plt.stem(mu)
plt.title(r'Estimated linear mean $\hat{\mu}_x[k]$ ')
plt.xlabel(r'$k$')
plt.ylabel(r'$\mu_x[k]$')
plt.axis([0, K, -1.5, 1.5])
plt.figure()
plt.stem(qu)
plt.title(r'Estimated quadratic mean')
plt.xlabel(r'$k$')
plt.ylabel(r'$E\{x^2[k]\}$')
plt.axis([0, K, 0, 2.5])
plt.figure()
plt.stem(sigma)
plt.title(r'Estimated variance $\hat{\sigma}^2_x[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$\sigma_x^2[k]$')
plt.axis([0, K, 0, 1.5]);
Exercise
• What does the linear and quadratic mean, and the variance tell you about the general behavior of the sample
functions?
• Change the number N of sample functions and rerun the cell. What influence has a decrease/increase of the
sample functions on the estimated ensemble averages?
Definition
The second order ensemble average of two continuous-amplitude real-valued random signals 𝑥[𝑘] and 𝑦[𝑘] is
defined as
𝑁 −1
1 ∑︁
𝐸{𝑓 (𝑥[𝑘𝑥 ], 𝑦[𝑘𝑦 ])} := lim 𝑓 (𝑥𝑛 [𝑘𝑥 ], 𝑦𝑛 [𝑘𝑦 ])
𝑁 →∞ 𝑁
𝑛=0
Cross-correlation function
The cross-correlation function (CCF) of two random signals 𝑥[𝑘] and 𝑦[𝑘] is defined as the second order ensemble
average for a linear mapping 𝑓 (𝑥[𝑘𝑥 ], 𝑦[𝑘𝑦 ]) = 𝑥[𝑘𝑥 ] · 𝑦[𝑘𝑦 ]
It characterizes the statistical dependencies of two random signals at two different time instants.
Auto-correlation function
The auto-correlation function (ACF) of a random signal 𝑥[𝑘] is defined as the second order ensemble average for
a linear mapping 𝑓 (𝑥[𝑘𝑥 ], 𝑦[𝑘𝑦 ]) = 𝑥[𝑘1 ] · 𝑥[𝑘2 ]
It characterizes the statistical dependencies between the samples of a random signal at two different time instants.
2.5.1 Definition
When the statistical properties of a random process do not depend on the time index 𝑘, this process is termed as
*stationary random process*. This can be expressed formally as
where ∆ ∈ Z denotes an arbitrary (temporal) shift. From this definition it becomes clear that
• random signals of finite length and
• deterministic signals
cannot be stationary random processes in a strict sense. However, in practice it is often assumed to be sufficient if
above condition holds within the finite length of a random signal.
It follows from above definition of a stationary process, that the univariate cumulative distribution function (CDF)
of a stationary random process does not depend on the time index 𝑘
𝑃𝑥 (𝜃, 𝑘) = 𝑃𝑥 (𝜃)
the same holds for the univariate probability density function (PDF). The bivariate CDF of two stationary random
signals 𝑥[𝑘] and 𝑦[𝑘] depends only on the difference 𝜅 = 𝑘𝑥 − 𝑘𝑦
For a first order ensemble average of a stationary process the following relation must hold
𝜇𝑥 [𝑘] = 𝜇𝑥
Introducing the PDF’s properties of a stationary process into the definition of the cross-correlation function (CCF)
and auro-correlation function (ACF) it follows
and
2.6.1 Definition
The definition of a stationary random process in the previous section must hold for any mapping function 𝑓 (·).
This cannot be checked in a strict sense for practical random processes. For a weakly (wide sense) stationary
random process the conditions for stationarity must hold only for linear mappings. This leads to the following two
conditions a weakly stationary random process has to fulfill
and
2.6.2 Example
From above definition of a weakly stationary process it is evident that is sufficient to check the time dependence
of the linear mean 𝑥𝜇 [𝑘] and the auto-correlation function 𝜙𝑥𝑥 [𝑘1 , 𝑘2 ] of a random process. Both quantities are
calculated and plotted for two different random processes.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.subplot(121)
plt.stem(mu)
plt.title(r'Estimate of linear mean $\hat{\mu}_x[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$\hat{\mu}[k]$')
plt.axis([0, L, -1.5, 1.5])
plt.subplot(122)
plt.pcolor(np.arange(L), np.arange(L), acf, vmin=-2, vmax=2)
plt.title(r'Estimate of ACF $\hat{\varphi}_{xx}[k_1, k_2]$')
plt.xlabel(r'$k_1$')
plt.ylabel(r'$k_2$')
plt.colorbar()
plt.autoscale(tight=True)
Exercise
• Which process can be assumed to be weakly stationary? Why?
• Increase the number N of sample functions. Do the results support your intial assumption?
Ensemble averages are defined as the average across all sample functions 𝑥𝑛 [𝑘] for particular time indexes. So far
we did not consider temporal averaging to characterize a random signal. For a stationary process, the higher order
temporal average along the 𝑛-th sample function is defined as
𝐾
1 ∑︁
𝑓 (𝑥𝑛 [𝑘], 𝑥𝑛 [𝑘 − 𝜅1 ], 𝑥𝑛 [𝑘 − 𝜅2 ], . . . ) = lim 𝑓 (𝑥𝑛 [𝑘], 𝑥𝑛 [𝑘 − 𝜅1 ], 𝑥𝑛 [𝑘 − 𝜅2 ], . . . )
𝐾→∞ 2𝐾 + 1
𝑘=−𝐾
An ergodic process is a stationary random process whose higher order temporal averages of all sample functions
are equal to the ensemble averages
This implies that all higher order temporal averages are equal. Any sample function from the process represents
the average statistical properties of the entire process. The ensemble averages for a stationary and ergodic random
process are given by the temporal averages of one sample function. This result is very important for the practical
computation of statistical properties of random signals.
2.9.1 Definition
As for a weakly stationary process, the conditions for ergodicity have to hold only for linear mappings 𝑓 (·). Under
the assumption of a weakly stationary process, the following two conditions have to be met by a weakly (wide
sense) ergodic random process
and
𝑥𝑛 [𝑘] = 𝐸{𝑥[𝑘]} ∀𝑛
2.9.2 Example
In the following example, the linear mean and autocorrelation function are computed as ensemble and temporal
averages for three random processes. The plots show the estimated temporal averages 𝜇 ˆ𝑥,𝑛 and 𝜙ˆ𝑥𝑥,𝑛 [𝜅] on the
right side of the sample functions 𝑥𝑛 [𝑘]. Note, the linear mean as temporal average is a scalar value 𝜇
ˆ𝑥,𝑛 which
has been plotted by a bar plot. The ensemble averages 𝜇 ˆ[𝑘] and 𝜙[𝑘
ˆ 1 , 𝑘2 ] are shown below the sample functions
to indicate the averaging across sample functions.
In [2]: L = 64 # number of random samples
N = 10000 # number of sample functions
for n in range(2):
plt.figure(figsize = (10, 5))
plt.subplot(131)
plt.stem(x[n, :])
plt.title(r'Sample function $x_%d[k]$'%n)
plt.xlabel(r'$k$')
plt.ylabel(r'$x_%d[k]$'%n)
plt.axis([0, L, -4, 4])
plt.subplot(132)
plt.bar(-0.4, mut[n])
plt.title(r'Linear mean $\hat{\mu}_{x,%d}$'%n)
plt.ylabel(r'$\hat{\mu}_{x,%d}$'%n)
plt.axis([-.5, .5, -1.5, 1.5])
plt.subplot(133)
plt.stem(kappa, acft[n, :])
plt.title(r'Autocorrelation $\hat{\varphi}_{xx,%d}[\kappa]$'%n)
plt.xlabel(r'$\kappa$')
plt.ylabel(r'$\hat{\varphi}_{xx,%d}[\kappa]$'%n)
plt.axis([-L//2, L//2, -30, 150])
Random Process 2
In [4]: compute_plot_results(x2)
Random Process 3
In [5]: compute_plot_results(x3)
Exercise
• Which process can be assumed to be stationary and/or ergodic? Why?
The auto-correlation function (ACF) characterizes the temporal dependencies of a random signal 𝑥[𝑘]. It is an
important measure for the analysis of signals in communications engineering, coding and system identification.
2.10.1 Definition
where 𝜅 is commonly chosen as sample index instead of 𝑘 in order to indicate that it denotes a shift/lag. The ACF
quantifies the similarity of a signal with a shifted version of itself. It has high values for high similarity and low
values for low similarity.
If the process is additionally weakly ergodic, the ACF can be computed by averaging along one sample function
𝐾
1 ∑︁
𝜙𝑥𝑥 [𝜅] = lim 𝑥[𝑘] · 𝑥[𝑘 − 𝜅]
𝐾→∞ 2𝐾 + 1
𝑘=−𝐾
Note that the normalization on the left side of the sum is discarded in some definitions of the ACF. Above sum-
mation resembles strongly the definition of the discrete convolution. For a random signal 𝑥𝑁 [𝑘] = rect𝑁 [𝑘] · 𝑥[𝑘]
of finite length 𝑁 and by exploiting the properties of a weakly ergodic random process one yields
𝑁 −1
1 ∑︁ 1
𝜙𝑥𝑥 [𝜅] = 𝑥𝑁 [𝑘] · 𝑥𝑁 [𝑘 − 𝜅] = 𝑥𝑁 [𝑘] * 𝑥𝑁 [−𝑘]
𝑁 𝑁
𝑘=0
where the ACF 𝜙𝑥𝑥 [𝜅] = 0 for |𝜅| > 𝑁 −1. Hence, the ACF can be computed by (fast) convolution of the random
signal with a time reversed version of itself.
Note in practical implementations (e.g. Python), the computed ACF is stored in a vector of length 2𝑁 − 1. The
positive indexes 0, 1, . . . , 2𝑁 − 1 of this vector cannot be directly interpreted as 𝜅. The indexes of the vector have
to be shifted by 𝑁 − 1.
2.10.2 Properties
The following properties of the ACF can be deduced from its definition
1. The ACF 𝜙𝑥𝑥 [𝜅] has a maximum for 𝜅 = 0. It is given as
This is due to the fact that the signal is equal to itself for 𝜅 = 0. Please note that for periodic random signals
more than one maximum will be present.
2. The ACF is a function with even symmetry
3. For typical random signals, the ACF approaches the limiting value
The similarity of a typical random signal is often low for large lags 𝜅.
2.10.3 Example
The following example computes and plots the ACF for a speech signal.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
# plot ACF
fig = plt.figure(figsize = (10, 8))
plt.stem(kappa, acf)
plt.xlabel(r'$\kappa$')
plt.ylabel(r'$\hat{\varphi}_{xx}[\kappa]$')
plt.axis([-K, K, 1.1*min(acf), 1.1*max(acf)]);
plt.grid()
Exercise
The auto-covariance function is the ACF for the zero-mean random signal 𝑥[𝑘] − 𝜇𝑥 . It is given as
The cross-correlation function (CCF) is a measure of similarity that two random signals 𝑥[𝑘] and 𝑦[𝑘 − 𝜅] have
with respect to the temporal shift 𝜅 ∈ Z.
2.12.1 Definition
The CCF of two continuous-amplitude real-valued weakly stationary processes 𝑥[𝑘] and 𝑦[𝑘] is given as
If 𝑥[𝑘] and 𝑦[𝑘] are weakly ergodic processes, the CCF can be computed by averaging along one sample function
𝐾
1 ∑︁
𝜙𝑥𝑦 [𝜅] = lim 𝑥[𝑘] · 𝑦[𝑘 − 𝜅]
𝐾→∞ 2𝐾 + 1
𝑘=−𝐾
For random signals 𝑥𝑁 [𝑘] = rect𝑁 [𝑘] · 𝑥[𝑘] and 𝑦𝑀 [𝑘] = rect𝑀 [𝑘] · 𝑌 [𝑘] of finite lengths 𝑁 and 𝑀 one yields
𝑁 −1
1 ∑︁ 1
𝜙𝑥𝑦 [𝜅] = 𝑥[𝑘] · 𝑦[𝑘 − 𝜅] = 𝑥[𝑘] * 𝑦[−𝑘]
𝑁 𝑁
𝑘=0
where the CCF 𝜙𝑥𝑦 [𝜅] = 0 for 𝜅 < −(𝑀 − 1) and 𝜅 > 𝑁 − 1. The CCF can be computed by (fast) convolution of
one random signal with a time reversed version of the other random signal. Note in practical implementations (e.g.
Python), the computed CCF is stored in a vector of length 𝑁 + 𝑀 − 1. The positive indexes 0, 1, . . . , 𝑁 + 𝑀 − 1
of this vector cannot be directly interpreted as 𝜅. The indexes of the vector have to be shifted by 𝑀 − 1.
Above definitions hold also for the CCF 𝜙𝑦𝑥 [𝜅] when exchanging 𝑥[𝑘] with 𝑦[𝑘] and 𝑁 with 𝑀 .
2.12.2 Properties
1. For an exchange of the two random signals, the CCF exhibits the following symmetry
𝜙𝑥𝑦 [𝜅] = 𝜇𝑥 · 𝜇𝑦
2.12.3 Example
The following example computes the CCF for two uncorrelated random signals
In [18]: K = 1024 # length of random signals
# compute CCF
ccf = 1/len(x) * np.correlate(x, y, mode='full')
kappa = np.arange(-(K-1), 2*K)
# plot CCF
plt.figure(figsize = (10, 8))
plt.stem(kappa, ccf)
plt.title('Cross-correlation function')
plt.ylabel(r'$\varphi_{xy}[\kappa]$')
plt.xlabel(r'$\kappa$')
plt.axis([kappa[0], kappa[-1], 0, 1.1*max(ccf)]);
plt.grid()
Mean of signal x[k]: 2.049598
Mean of signal y[k]: 0.944363
Exercise
• Why does the CCF of two finite length signals have this trapezoid like shape?
• What would be its theoretic value for signals of infinte length?
The cross-covariance function is the CCF for the zero-mean random signals 𝑥[𝑘] − 𝜇𝑥 and 𝑦[𝑘] − 𝜇𝑦 . It is given
as
Exercise
• How would the plot for 𝜓𝑥𝑦 [𝜅] look like for above example?
The power spectral density (PSD) is the Fourier transformation of the auto-correlation function (ACF).
2.14.1 Definition
where ℱ* {·} denotes the discrete-time Fourier transformation (DTFT). The PSD quantifies the power per fre-
quency for a random signal.
2.14.2 Properties
The properties of the PSD can be deduced from the properties of the ACF and the DTFT as
1. From the symmetry of the ACF it follows
Φ𝑥𝑥 (e j Ω ) = Φ𝑥𝑥 (e −j Ω )
The last relation can be found by introducing the definition of the inverse DTFT.
2.14.3 Example
In this example the PSD Φ𝑥𝑥 (e j Ω ) of a speech signal 𝑥[𝑘] is computed by applying a discrete Fourier transfor-
Ω
mation (DFT) to the auto-correlation function. For better interpretation of the PSD, the frequency axis 𝑓 = 2𝜋 𝑓𝑠
has been chosen, where 𝑓𝑠 denotes the sampling frequency of the signal.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from scipy.io import wavfile
# plot PSD
plt.figure(figsize = (10, 8))
plt.plot(f, np.abs(psd))
plt.title('Estimated power spectral density')
plt.ylabel(r'$\hat{\Phi}_{xx}(e^{j \Omega})$')
plt.xlabel(r'$f$')
plt.axis([0, 2000, 0, 1.1*max(np.abs(psd))]);
plt.grid()
Exercise
• What does the PSD tell you about the spectral contents of a speech signal?
The cross-power spectral density is the Fourier transformation of the cross-correlation function (CCF). It is defined
as follows
The symmetries of Φ𝑥𝑦 (e j Ω ) can be derived from the symmetries of the CCF and the DTFT as
Analytic cumulative distribution functions (CDFs) and probability density functions (PDFs) are frequently used
as models for practical random processes. They allow to describe the statistical properties of a random process by
a few parameters. These parameters are fitted to an actual random process and are used in algorithms for statistical
signal processing. For the following, weakly stationary random processes are assumed.
Definition
where 𝑎 and 𝑏 denote the lower and upper bound for the amplitude of the random signal 𝑥[𝑘]. The uniform
distribution assumes that all amplitudes between these bounds occur with the same probability. The CDF can be
derived from the PDF by integration over 𝜃
⎧
⎨0
⎪ for 𝜃 < 𝑎
𝑃𝑥 (𝜃) = 𝜃−𝑎
𝑏−𝑎 for 𝑎 ≤ 𝜃 < 𝑏
⎪
1 for 𝜃 ≥ 𝑏
⎩
𝑎+𝑏
𝜇𝑥 =
2
and the variance is
(𝑏 − 𝑎)2
𝜎𝑥2 =
12
In order to plot the PDF and CDF of the various distributions, a function is defined
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
plt.subplot(121)
plt.plot(x, distr.pdf(x))
plt.xlabel(r'$\theta$')
plt.ylabel(r'$p_x(\theta)$')
plt.title('PDF')
plt.grid()
plt.subplot(122)
plt.plot(x, distr.cdf(x))
plt.xlabel(r'$\theta$')
plt.ylabel(r'$P_x(\theta)$')
plt.title('CDF')
plt.grid()
The PDF/CDF for a uniformly distributed random signal with 𝑎 = 0 and 𝑏 = 1 is plotted
In [2]: plot_pdf_cdf(np.linspace(-.5, 1.5, num=1000), stats.uniform)
Example
Most software frameworks for numerical mathematics provide functions to generate random samples with
a defined PDF. So does ‘Numpy <http://docs.scipy.org/doc/numpy/reference/routines.random.html>‘__ or
‘scipy.stats <http://docs.scipy.org/doc/scipy/reference/stats.html#continuous-distributions>‘__. We again
first define a function that computes and plots the PDF and CDF of a given random signal.
In [3]: def compute_plot_pdf_cdf(x, nbins=100):
Exercise
• Why is the estimate of the CDF smoother that the estimate of the PDF?
• How can the upper 𝑏 and lower bound 𝑎 be changed in above code?
• What changes if you change the length of the random signal or the number nbins of histogram bins?
Definition
𝑝𝑥 (𝜃) = √ e 2𝜎𝑥2
2𝜋𝜎𝑥
where 𝜇𝑥 and 𝜎𝑥2 denote the linear mean and variance, respectively. Normal distributions are often used to repre-
sent random variables whose distributions are not known. The central limit theorem states that averages of random
variables independently drawn from independent distributions become normally distributed when the number of
random variables is sufficiently large. As a result, random signals that are expected to be the sum of many inde-
pendent processes often have distributions that are nearly normal. The CDF can be derived by integration over
𝜃
∫︁𝜃 (𝜁−𝜇 )2
1 − 2𝜎2𝑥
𝑃𝑥 (𝜃) = √ e 𝑥 d𝜁 (2.2)
2𝜋𝜎𝑥
¯∞
(︂ (︂ )︂)︂
1 𝜃 − 𝜇𝑥
= 1 + erf √ (2.3)
2 2𝜎𝑥
Example
For the standard zero-mean uniform distribution we get the following numerical results when drawing a large
number of random samples
In [6]: compute_plot_pdf_cdf(stats.norm.rvs(size=100000), nbins=100)
Linear mean: 0.004293
Variance: 0.998803
Exercise
• How can the linear mean 𝜇𝑥 and the variance 𝜎𝑥2 be changed?
• Assume you want to model zero-mean measurement noise with a given power 𝑃 . How do you have to chose
the parameters of the normal distribution?
Definition
where 𝜇𝑥 and 𝜎𝑥2 denote the linear mean and variance, respectively. Laplace distributions are often used to model
the PDF of a speech or music signal. The CDF can be derived by integration over 𝜃
{︃ √ 𝜃−𝜇𝑥
1
e 2 𝜎𝑥 for 𝜃 ≤ 𝜇𝑥
𝑃𝑥 (𝜃) = 2 1 −√2 𝜃−𝜇𝑥
1 − 2e 𝜎𝑥 for 𝜃 > 𝜇𝑥
Example
For the standard zero-mean Laplace distribution we get the following numerical results when drawing a large
numer of random samples
In [8]: compute_plot_pdf_cdf(stats.laplace(scale=1/np.sqrt(2)).rvs(size=10000), nbins=10
Linear mean: -0.015657
Variance: 1.004444
Lets take a look at the PDF/CDF of a speech signal in order to see if we can model it by one of the PDFs introduced
above.
In [9]: from scipy.io import wavfile
fs, x = wavfile.read('../data/speech_8k.wav')
x = np.asarray(x, dtype=float)/2**15
compute_plot_pdf_cdf(x, nbins=100)
Linear mean: -0.000067
Variance: 0.018548
Exercise
• Which analytic PDF/CDF can be used to model a speech signal?
• How would you chose the parameters of the distribution to fit the data?
2.17.1 Definition
White noise is a random signal with a constant power spectral density (PSD). White noise draws its name from
the analogy to white light. It refers typically to a model of random signals, like e.g. measurement noise. For a
zero-mean random signal 𝑥[𝑘], its PSD reads
Φ𝑥𝑥 (e j Ω ) = 𝑁0
where 𝑁0 denotes the power per frequency. The auto-correlation function of white noise can be derived by inverse
discrete-time Fourier transformation (DTFT) of the PSD
Hence, neighboring samples 𝑘 and 𝑘 + 1 are uncorrelated and have no statistical dependencies. The probability
density function (PDF) of white noise is not necessarily normally distributed. Hence, it is necessary to additionally
state the amplitude distribution when classifying a signal as white noise.
2.17.2 Example
Toolboxes for numerical mathematics like Numpy or scipy.stats provide functions to draw random uncorre-
lated samples from various PDFs. In order to evaluate this, a function is defined to compute and plot the PDF and
CDF for a given random signal 𝑥[𝑘].
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# plot PSD
plt.figure(figsize = (10, 6))
plt.subplot(121)
plt.hist(x, nbins, normed=True)
plt.title('Estimated PDF')
plt.xlabel(r'$\theta$')
plt.ylabel(r'$\hat{p}_x(\theta)$')
# plot ACF
plt.subplot(122)
plt.stem(kappa, acf)
plt.title('Estimated ACF')
plt.ylabel(r'$\hat{\varphi}_{xx}[\kappa]$')
plt.xlabel(r'$\kappa$')
plt.axis([-acf_range, acf_range, 0, 1.1*max(acf)]);
plt.grid()
For samples drawn from a zero-mean uniform distribution the PDF and ACF are estimated as
In [2]: compute_plot_pdf_acf(np.random.uniform(size=10000)-1/2)
For samples drawn from a zero-mean Laplace distribution the PDF and ACF are estimated as
In [3]: compute_plot_pdf_acf(np.random.laplace(size=10000))
Exercise
• Do both random processes represent white noise?
• How does the ACF change if you lower the length size of the random signal. Why?
is a frequently applied operation in statistical signal processing. For instance to model the distortions of a mea-
surement procedure or communication channel. We assume that the statistical properties of the real-valued signals
𝑥[𝑘] and 𝑛[𝑘] are known. We are interested in the statistical properties of 𝑦[𝑘], as well as the joint statistical
properties between the signals and their superposition 𝑦[𝑘]. For the following derivations it is assumed that 𝑥[𝑘]
and 𝑛[𝑘] are drawn from weakly stationary real-valued random processes.
The cumulative distribution function (CDF) 𝑃𝑦 (𝜃) of 𝑦[𝑘] is given by rewriting it in terms of the joint probability
density function (PDF) 𝑝𝑥𝑛 (𝜃𝑥 , 𝜃𝑛 )
∫︁∞ 𝜃−𝜃
∫︁ 𝑛
𝑃𝑦 (𝜃) = Pr{𝑦[𝑘] ≤ 𝜃} = Pr{(𝑥[𝑘] + 𝑛[𝑘]) ≤ 𝜃} = 𝑝𝑥𝑛 (𝜃𝑥 , 𝜃𝑛 ) d𝜃𝑥 d𝜃𝑛
−∞ −∞
since the inner integral on the right hand side of 𝑃𝑦 (𝜃) can be interpreted as the inverse operation to the derivation
with respect to 𝜃.
An important special case is that 𝑥[𝑘] and 𝑛[𝑘] are uncorrelated. Under this assumption the joint PDF 𝑝𝑥𝑛 (𝜃𝑥 , 𝜃𝑛 )
can be written as 𝑝𝑥𝑛 (𝜃𝑥 , 𝜃𝑛 ) = 𝑝𝑥 (𝜃𝑥 ) · 𝑝𝑛 (𝜃𝑛 ). For 𝑝𝑦 (𝜃) follows
Hence, the PDF of the superposition is given by the convolution of the PDFs of both signals.
Example
The following example estimates the PDF of a superposition of two uncorrelated signals drawn from uniformly
distributed white noise sources with 𝑎 = 0 and 𝑏 = 1.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
Exercise
• Check the result of the numerical simulation by calculating the theoretical PDF of 𝑦[𝑘]
The linear mean 𝜇𝑦 of the superposition is derived by introducing 𝑦[𝑘] = 𝑥[𝑘] + 𝑛[𝑘] into the definition of the
linear mean and exploiting the linearity of the expectation operator as
The linear mean of the superposition of two random signals is the superposition of its linear means.
The ACF is computed in the same manner as above by inserting the superposition into its definition and rearranging
terms
The ACF of the superposition of two random signals is given as the superposition of all auto- and cross-correlation
functions (CCFs) of the two random signals. The power spectral density (PSD) is derived by discrete-time Fourier
transformation (DTFT) of the ACF
This can be simplified further by exploiting the symmetry property of the CCFs 𝜙𝑥𝑛 [𝜅] = 𝜙𝑛𝑥 [−𝜅] and the DTFT
for real-valued signals as
The CCF 𝜙𝑛𝑦 [𝜅] between the random signal 𝑛[𝑘] and the superposition 𝑦[𝑘] is derived again by introducing the
superposition into the definition of the CCF
𝜙𝑛𝑦 [𝜅] = 𝐸{𝑛[𝑘] · (𝑥[𝑘 − 𝜅] + 𝑛[𝑘 − 𝜅])} = 𝜙𝑛𝑥 [𝜅] + 𝜙𝑛𝑛 [𝜅]
It is given as the superposition of the CCF between the two random signals and the ACF of 𝑛[𝑘]. The cross PSD
is derived by applying a DTFT to 𝜙𝑛𝑦 [𝜅]
The CCF 𝜙𝑥𝑦 [𝜅] and cross PSD Φ𝑥𝑦 (e j Ω ) can be derived by exchanging the signals 𝑛[𝑘] and 𝑥[𝑘]
𝜙𝑥𝑦 [𝜅] = 𝐸{𝑥[𝑘] · (𝑥[𝑘 − 𝜅] + 𝑛[𝑘 − 𝜅])} = 𝜙𝑥𝑥 [𝜅] + 𝜙𝑥𝑛 [𝜅]
and
In order to model the effect of distortions it is often assumed that a random signal 𝑥[𝑘] is distorted by additive
normal distributed white noise resulting in the observed signal 𝑦[𝑘] = 𝑥[𝑘] + 𝑛[𝑘]. It is furthermore assumed that
the noise 𝑛[𝑘] is uncorrelated to the signal 𝑥[𝑘]. This model is known as additive white Gaussian noise (AWGN)
model.
For zero-mean random processes it follows 𝜙𝑥𝑛 [𝜅] = 𝜙𝑛𝑥 [𝜅] = 0 and 𝜙𝑛𝑛 [𝜅] = 𝑁0 · 𝛿[𝜅] from the properties of
the AWGN model. Introducing this into the findings for additive random signals yields the following relations for
the AWGN model
The PSDs are given as the DTFT of these results. The AWGN model is frequently applied in communications as
well as measurement of physical quantities to cope for background, sensor and amplifier noise.
Example
For the following numerical example, the disturbance of a harmonic signal 𝑥[𝑘] = cos[Ω0 𝑘] by unit variance
AWGN is considered.
In [2]: N = 1024 # length of signals
K = 20 # maximum lag for ACF/CCF
# generate signals
x = np.cos(20*2*np.pi/N*np.arange(N))
n = np.random.normal(size=N)
# superposition of signals
y = x + n
# plot results
kappa = np.arange(-(K-1), K)
plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.stem(kappa, acf)
plt.title('ACF of superposition')
plt.xlabel(r'$\kappa$')
plt.ylabel(r'$\varphi_{yy}[\kappa]$')
plt.axis([-K, K, -.5, 1.1*np.max(acf)])
plt.subplot(122)
plt.stem(kappa, ccf)
plt.title('CCF between superposition and noise')
plt.xlabel(r'$\kappa$')
plt.ylabel(r'$\varphi_{ny}[\kappa]$')
plt.axis([-K, K, -.2, 1.1]);
Exercise
• Derive the theoretic result for 𝜙𝑥𝑥 [𝜅]
• Based in this, can you explain the results of the numerical simulation?
3.1 Introduction
The response of a system 𝑦[𝑘] = ℋ{𝑥[𝑘]} to a random input signal 𝑥[𝑘] is the foundation of statistical signal
processing. In the following we limit ourselves to linear-time invariant (LTI) systems.
Let’s assume that the statistical properties of the input signal 𝑥[𝑘] are known, for instance its first and second order
ensemble averages. Let’s further assume that the impulse response ℎ[𝑘] or the transfer function 𝐻(𝑒 j Ω ) of the
LTI system is given. We are looking for the statistical properties of the output signal 𝑦[𝑘] and the joint properties
between the input 𝑥[𝑘] and output 𝑦[𝑘] signal.
The question arises if the output signal 𝑦[𝑘] of an LTI system is (weakly) stationary or stationary and ergodic for
an input signal 𝑥[𝑘] exhibiting the same properties.
Let’s assume that the input signal 𝑥[𝑘] originates from a stationary random process. According to the definition of
stationarity the following relation must hold
where ∆ ∈ Z denotes an arbitrary (temporal) shift. The condition for time-invariance of a system reads
By introducing this into the right hand side of the definition of stationarity for the output signal 𝑦[𝑘] and recalling
that 𝑦[𝑘] = ℋ{𝑥[𝑘]} we can show that
where 𝑔(·) denotes an arbitrary mapping function that may differ from 𝑓 (·). From the equation above, it can be
concluded that the output signal of an LTI system for a (weakly) stationary input signal is also (weakly) stationary.
The same reasoning can also be applied to a (weakly) ergodic input signal.
Summarizing, for an input signal 𝑥[𝑘] that is
• (weakly) stationary, the output signal 𝑦[𝑘] is (weakly) stationary and the in- and output is jointly (weakly)
stationary
• (weakly) ergodic, the output signal 𝑦[𝑘] is (weakly) ergodic and the in- and output is jointly (weakly) ergodic
71
Digital Signal Processing, Release 0.0
This implies for instance, that for a weakly stationary input signal measures like the auto-correlation function
(ACF) can also be applied to the output signal.
3.2.1 Example
The following example computes and plots estimates of the linear mean 𝜇[𝑘] and auto-correlation function (ACF)
𝜙[𝑘1 , 𝑘2 ] for the in- and output of an LTI system. The input 𝑥[𝑘] is drawn from a normal distributed white noise
process.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.subplot(121)
plt.stem(mu)
plt.title(r'Estimate of linear mean $\hat{\mu}[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$\hat{\mu}[k]$')
plt.axis([0, L, -1.5, 1.5])
plt.subplot(122)
plt.pcolor(np.arange(L), np.arange(L), acf, vmin=-2, vmax=2)
plt.title(r'Estimate of ACF $\hat{\varphi}[k_1, k_2]$')
plt.xlabel(r'$k_1$')
plt.ylabel(r'$k_2$')
plt.colorbar()
plt.autoscale(tight=True)
Exercise
• Is the in- and output signal weakly stationary?
• Can the output signal 𝑦[𝑘] assumed to be white noise?
In the following we aim at finding a relation between the linear mean 𝜇𝑥 [𝑘] of the input signal 𝑥[𝑘] and the linear
mean 𝜇𝑦 [𝑘] of the output signal 𝑦[𝑘] = ℋ{𝑥[𝑘]} of an LTI system.
Let’s first impose no restrictions in terms of stationarity to the input signal. The linear mean of the output signal
is then given as
where ℎ[𝑘] denotes the impulse response of the system. Since the convolution and the ensemble average are linear
operations, and ℎ[𝑘] is a deterministic signal this can be rewritten as
Hence, the linear mean of the output signal 𝜇𝑦 [𝑘] is given as the convolution of the linear mean of the input signal
𝜇𝑥 [𝑘] with the impulse response ℎ[𝑘] of the system.
Example
Exercise
• Can you estimate the impulse response ℎ[𝑘] of the system from above plots of 𝜇
ˆ𝑥 [𝑘] and 𝜇
ˆ𝑦 [𝑘]?
• You can check your results by plotting the impulse response ℎ[𝑘], for instance with the command
plt.stem(h).
For a (weakly) stationary process, the linear mean of the input signal 𝜇𝑥 [𝑘] = 𝜇𝑥 does not depend on the time
index 𝑘. For a (weakly) stationary input signal, also the output signal of the system is (weakly) stationary. Using
the result for the non-stationary case above yields
𝜇𝑦 = 𝜇𝑥 * ℎ[𝑘] = 𝜇𝑥 · 𝐻(e j Ω )⃒
⃒
Ω=0
where 𝐻(e j Ω ) = ℱ* {ℎ[𝑘]} denotes the discrete time Fourier transformation (DTFT) of the impulse response.
Hence, the linear mean of a (weakly) stationary input signal is weighted by the transmission characteristics for
the constant (i.e. DC, Ω = 0) component of the LTI system. This implies that the output signal to a zero-mean
𝜇𝑥 = 0 input signal is also zero-mean 𝜇𝑦 = 0.
The auto-correlation function (ACF) 𝜙𝑦𝑦 [𝜅] of the output signal of an LTI system 𝑦[𝑘] = ℋ{𝑥[𝑘]} is derived. It
is assumed that the input signal is a weakly stationary real-valued random process and that the LTI system has a
real-valued impulse repsonse ℎ[𝑘] ∈ R.
Introducing the output relation 𝑦[𝑘] = ℎ[𝑘] * 𝑥[𝑘] of an LTI system into the definition of the auto-correlation
where the deterministic function 𝜙ℎℎ [𝜅] is frequently termed as filter ACF. This is related to the link between ACF
and convolution. The result above is known as the Wiener-Lee theorem. It states that the ACF of the output 𝜙𝑦𝑦 [𝜅]
of an LTI system is given by the convolution of the input signal’s ACF 𝜙𝑥𝑥 [𝜅] with the filter ACF 𝜙ℎℎ [𝜅].
3.4.1 Example
Let’s assume that the input signal 𝑥[𝑘] of an LTI system with impulse response ℎ[𝑘] = rect𝑁 [𝑘] is normal dis-
tributed white noise. Hence, 𝜙𝑥𝑥 [𝜅] = 𝑁0 𝛿[𝜅] can be introduced into the Wiener-Lee theorem yielding
# plot ACF
plt.figure(figsize = (10, 6))
plt.stem(kappa, acf)
plt.title('Estimated ACF of output signal $y[k]$')
plt.ylabel(r'$\hat{\varphi}_{yy}[\kappa]$')
plt.xlabel(r'$\kappa$')
plt.axis([-K, K, 1.2*min(acf), 1.1*max(acf)]);
plt.grid()
Exercise
• Why is the estimated ACF 𝜙ˆ𝑦𝑦 [𝜅] of the output signal not exactly equal to its theoretic result 𝜙𝑦𝑦 [𝜅] given
above?
• Change the number of samples L and rerun the cell. What changes?
The cross-correlation functions (CCFs) 𝜙𝑥𝑦 [𝜅] and 𝜙𝑦𝑥 [𝜅] between the in- and output signal of an LTI system
𝑦[𝑘] = ℋ{𝑥[𝑘]} are derived. As for the ACF it is assumed that the input signal originates from a weakly stationary
real-valued random process and that the LTI system’s impulse response is real-valued, i.e. ℎ[𝑘] ∈ R.
Introducing the convolution into the definition of the CCF and rearranging the terms yields
The CCF 𝜙𝑥𝑦 [𝜅] between in- and output is given as the time-reversed impulse response of the system convolved
with the ACF of the input signal.
The same calculus applied to the CCF between out- and input results in
Hence, the CCF 𝜙𝑦𝑥 [𝜅] between out- and input is given as the impulse response of the system convolved with the
ACF of the input signal.
The CCFs of an LTI system play an important role in the measurement of the impulse response ℎ[𝑘] of an unknown
system. This is illustrated in the following.
Let’s assume that the input signal 𝑥[𝑘] of the unknown LTI system is white noise. The ACF of the input signal is
then given as 𝜙𝑥𝑥 [𝜅] = 𝑁0 · 𝛿[𝜅]. According to the relation derived above, the CCF between out- and input for
this special choice of input signal gets
Hence, the impulse response is derived by computing the CCF between out- and input, for white noise as input
signal 𝑥[𝑘].
3.6.1 Example
The application of the CCF for system identification is illustrated in the following
In [2]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
def plot_correlation_function(cf):
cf = cf[N-K-1:N+K-1]
kappa = np.arange(-len(cf)//2,len(cf)//2)
plt.stem(kappa, cf)
plt.xlabel(r'$\kappa$')
plt.axis([-K, K, -0.2, 1.1*max(cf)])
plt.figure()
plot_correlation_function(acfy)
plt.title('Estimated ACF of output signal')
plt.ylabel(r'$\hat{\varphi}_{yy}[\kappa]$')
plt.figure()
plt.hold(True)
plot_correlation_function(ccfyx)
plt.plot(np.arange(len(h)), h, 'g-')
plt.title('Estimated CCF and true impulse response')
plt.ylabel(r'$\hat{\varphi}_{yx}[\kappa]$, $h[k]$');
Exercise
• Why is the estimated CCF 𝜙ˆ𝑦𝑥 [𝑘] not exactly equal to the impulse response ℎ[𝑘] of the system?
• What changes if you change the number of samples N of the input signal?
The propagation of sound from one position (e.g. transmitter) to another (e.g. receiver) conforms reasonable well
to the properties of a linear time-invariant (LTI) system. Consequently, the impulse response ℎ[𝑘] characterizes the
propagation of sound between theses two positions. Impulse responses have various applications in acoustics. For
instance as head-related impulse responses (HRIRs) or room impulse responses (RIRs) for the characterization of
room acoustics.
The following example demonstrates how an acoustic impulse response can be measured with correlation-based
system identification techniques using the soundcard of a computer. The module sounddevice provides access to
the soundcard via PortAudio.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
import sounddevice as sd
We generate white noise with an uniform distribution between ±0.5 as the excitation signal 𝑥[𝑘]
In [2]: fs = 44100 # sampling rate
T = 5 # length of the measurement signal in sec
Tr = 2 # length of the expected system response in sec
The measurement signal 𝑥[𝑘] has to be played through the output of the soundcard and the response 𝑦[𝑘] has to
be captured synchronously by the input of the soundcard. Since the length of the played/captured signal has to be
equal, the measurement signal 𝑥[𝑘] is zero-padded so that the captured signal 𝑦[𝑘] includes the system response.
Be sure not to overdrive the speaker and the microphone by keeping the input level well below 0 dB.
In [3]: x = np.concatenate((x, np.zeros(Tr*fs)))
y = sd.playrec(x, fs, channels=1)
sd.wait()
y = np.squeeze(y)
The acoustic impulse response is estimated by cross-correlation 𝜙𝑦𝑥 [𝜅] of the output with the input signal. Since
1
the cross-correlation function (CCF) for finite-length signals is given as 𝜙𝑦𝑥 [𝜅] = 𝐾 ·𝑦[𝜅]*𝑥[−𝜅], the computation
of the CCF can be speeded up with the fast convolution method.
In [4]: h = 1/len(y) * sig.fftconvolve(y, x[::-1], mode='full')
h = h[fs*(T+Tr):fs*(T+2*Tr)]
For a weakly stationary real-valued random process 𝑥[𝑘], the power spectral density (PSD) Φ𝑥𝑥 (e j Ω ) is given as
the discrete-time Fourier transformation (DTFT) of the auto-correlation function (ACF) 𝜙𝑥𝑥 [𝜅]
∞
∑︁
Φ𝑥𝑥 (e j Ω ) = 𝜙𝑥𝑥 [𝜅] e −j Ω 𝜅
𝜅=−∞
Under the assumption of a real-valued LTI system with impulse response ℎ[𝑘] ∈ R, the power spectral density
(PSD) Φ𝑦𝑦 (e j Ω ) of the output signal of an LTI system 𝑦[𝑘] = ℋ{𝑥[𝑘]} is derived by the DTFT of the ACF of the
output signal 𝜙𝑦𝑦 [𝜅]
∞
∑︁
Φ𝑦𝑦 (e j Ω ) = ℎ[𝜅] * ℎ[−𝜅] *𝜙𝑥𝑥 [𝜅] e −j Ω 𝜅 (3.8)
𝜅=−∞
⏟ ⏞
𝜙ℎℎ [𝜅]
= 𝐻(e jΩ
) · 𝐻(e −j Ω ) · Φ𝑥𝑥 (e j Ω ) = |𝐻(e j Ω )|2 · Φ𝑥𝑥 (e j Ω ) (3.9)
The PSD of the output signal Φ𝑦𝑦 (e j Ω ) of an LTI system is given by the PSD of the input signal Φ𝑥𝑥 (e j Ω )
multiplied with the squared magnitude |𝐻(e j Ω )|2 of the transfer function of the system.
The cross-power spectral densities Φ𝑦𝑥 (e j Ω ) and Φ𝑥𝑦 (e j Ω ) between the in- and output of an LTI system are given
by the DTFT of the cross-correlation functions (CCF) 𝜙𝑦𝑥 [𝜅] and 𝜙𝑥𝑦 [𝜅]. Hence,
∞
∑︁
Φ𝑦𝑥 (e jΩ
)= ℎ[𝜅] * 𝜙𝑥𝑥 [𝜅] e −j Ω 𝜅 = Φ𝑥𝑥 (e j Ω ) · 𝐻(e j Ω )
𝜅=−∞
and
∞
∑︁
Φ𝑥𝑦 (e j Ω ) = ℎ[−𝜅] * 𝜙𝑥𝑥 [𝜅] e −j Ω 𝜅 = Φ𝑥𝑥 (e j Ω ) · 𝐻(e −j Ω )
𝜅=−∞
Using the result above for the cross-power spectral density Φ𝑦𝑥 (e j Ω ) between out- and input, and the relation of
the CCF of finite-length signals to the convolution one yields
Φ𝑦𝑥 (e j Ω ) 1
𝐾 𝑌 (e
jΩ
) · 𝑋(e −j Ω ) 𝑌 (e j Ω )
𝐻(e j Ω ) = = 1 =
Φ𝑥𝑥 (e j Ω ) 𝐾 𝑋(e
j Ω ) · 𝑋(e −j Ω ) 𝑋(e j Ω )
holding for Φ𝑥𝑥 (e j Ω ) ̸= 0 and 𝑋(e j Ω ) ̸= 0. Hence, the transfer function 𝐻(e j Ω ) of an unknown system can be
derived by dividing the spectrum of the output signal 𝑌 (e j Ω ) through the spectrum of the input signal 𝑋(e j Ω ).
This is equal to the definition of the transfer function. However, care has to be taken that the spectrum of the input
signal does not contain zeros.
Above relation can be realized by the discrete Fourier transformation (DFT) by taking into account that a multi-
plication of two spectra 𝑋[𝜇] · 𝑌 [𝜇] results in the cyclic/periodic convolution 𝑥[𝑘] ~ 𝑦[𝑘]. Since we aim at a linear
convolution, zero-padding of the in- and output signal has to be applied.
3.10.1 Example
We consider the estimation of the impulse response ℎ[𝑘] = ℱ*−1 {𝐻(e j Ω )} of an unknown system using the
spectral division method. Normal distributed white noise with variance 𝜎𝑛2 = 1 is used as input signal 𝑥[𝑘]. In
order to show the effect of sensor noise, normally distributed white noise 𝑛[𝑘] with the variance 𝜎𝑛2 = 0.01 is
added to the output signal 𝑦[𝑘] = 𝑥[𝑘] * ℎ[𝑘] + 𝑛[𝑘].
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
Exercise
• Change the length N of the input signal. What happens?
• Change the variance 𝜎𝑛2 of the additive noise. What happens?
The Wiener filter, named after *Nobert Wiener*, aims at estimating an unknown random signal by filtering a
noisy observation of the signal. It has a wide variety of applications in noise reduction, system identification,
deconvolution and signal detection. For instance, the Wiener filter can be used to denoise audio signals, like
speech, or to remove noise from a picture.
The following signal model is underlying the following derivation of the Wiener filter
The random signal 𝑠[𝑘] is subject to distortion by the linear time-invariant (LTI) system 𝐺(e j Ω ) and additive
noise 𝑛[𝑘], resulting in the observed signal 𝑥[𝑘] = 𝑠[𝑘] * 𝑔[𝑘] + 𝑛[𝑘]. The additive noise 𝑛[𝑘] is assumed to be
uncorrelated from 𝑠[𝑘]. It is furthermore assumed that all random signals are weakly stationary. This distortion
model holds for many practical problems, like e.g. the measurement of a physical quantity by a sensor.
The goal of the Wiener filter is to design the LTI system 𝐻(e j Ω ) such that the output signal 𝑦[𝑘] matches 𝑠[𝑘] as
best as possible. In order to quantify this, the error signal
which is also known as *mean squared error* (MSE). We aim at the minimization of the MSE between the original
signal 𝑠[𝑘] and its estimate 𝑦[𝑘].
At first, the Wiener filter shall only have access to the observed signal 𝑥[𝑘] and some statistical measures. It is
assumed that the cross-power spectral density Φ𝑥𝑠 (e j Ω ) between the observed signal 𝑥[𝑘] and the original signal
𝑠[𝑘], and the power spectral density (PSD) of the observed signal Φ𝑥𝑥 (e j Ω ) are known. This knowledge can either
be gained by estimating both from measurements taken at an actual system or by using suitable statistical models.
The optimal filter is found by minimizing the MSE 𝐸{|𝑒[𝑘]|2 } with respect to the transfer function 𝐻(e j Ω ). The
solution of this optimization problem goes beyond the scope of this notebook and can be found in the literature,
e.g. [Girod et. al]. The transfer function of the Wiener filter is given as
Φ𝑠𝑥 (e j Ω ) Φ𝑥𝑠 (e −j Ω )
𝐻(e j Ω ) = j Ω
=
Φ𝑥𝑥 (e ) Φ𝑥𝑥 (e j Ω )
No knowledge on the actual distortion process is required. Only the PSDs Φ𝑠𝑥 (e j Ω ) and Φ𝑥𝑥 (e j Ω ) have to be
known in order to estimate 𝑠[𝑘] from 𝑥[𝑘] in the minimum MSE sense. Care has to be taken that the filter 𝐻(e j Ω )
is causal and stable in practical applications.
Example
The following example considers the estimation of the original signal from a distorted observation. It is assumed
that the original signal is 𝑠[𝑘] = sin[Ω0 𝑘] which is distorted by an LTI system and additive normally distributed
zero-mean white noise with 𝜎𝑛2 = 0.1. The PSDs Φ𝑠𝑥 (e j Ω ) and Φ𝑥𝑥 (e j Ω ) are estimated from 𝑠[𝑘] and 𝑥[𝑘] using
the Welch technique. The Wiener filter is applied to the observation 𝑥[𝑘] in order to compute the estimate 𝑦[𝑘] of
𝑠[𝑘].
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.plot(Om, 20*np.log10(np.abs(.5*Pxx)), label=r'$| \Phi_{xx}(e^{j \Omega}) |$
plt.plot(Om, 20*np.log10(np.abs(.5*Psx)), label=r'$| \Phi_{sx}(e^{j \Omega}) |$
plt.title('(Cross) PSDs')
plt.xlabel(r'$\Omega$')
plt.legend()
plt.axis([0, np.pi, -60, 40])
plt.grid()
# plot signals
idx = np.arange(500, 600)
plt.figure(figsize=(10, 4))
plt.plot(idx, x[idx], label=r'observed signal $x[k]$')
plt.plot(idx, s[idx], label=r'original signal $s[k]$')
plt.plot(idx, y[idx], label=r'estimated signal $y[k]$')
plt.title('Signals')
plt.xlabel(r'$k$')
plt.axis([idx[0], idx[-1], -1.5, 1.5])
plt.legend()
plt.grid()
Exercise
• Take a look at the PSDs and the resulting transfer function of the Wiener filter. How does the Wiener filter
remove the noise from the observed signal?
• Change the frequency Om0 of the original signal 𝑠[𝑘] and the noise power N0 of the additive noise. What
changes?
As discussed above, the general formulation of the Wiener filter is based on the knowledge of the PSDs Φ𝑠𝑥 (e j Ω )
and Φ𝑥𝑥 (e j Ω ) characterizing the distortion process and the observed signal respectively. These PSDs can be
derived from the PSDs of the original signal Φ𝑠𝑠 (e j Ω ) and the noise Φ𝑛𝑛 (e j Ω ), and the transfer function 𝐺(e j Ω )
of the distorting system.
Under the assumption that 𝑛[𝑘] is uncorrelated from 𝑠[𝑘] the PSD Φ𝑠𝑥 (e j Ω ) can be derived as
Φ𝑠𝑥 (e j Ω ) = Φ𝑠𝑠 (e j Ω ) · 𝐺(e −j Ω )
and
Φ𝑥𝑥 (e j Ω ) = Φ𝑠𝑠 (e j Ω ) · |𝐺(e j Ω )|2 + Φ𝑛𝑛 (e j Ω )
Introducing these results into the general formulation of the Wiener filter yields
Φ𝑠𝑠 (e j Ω ) · 𝐺(e −j Ω )
𝐻(e j Ω ) =
Φ𝑠𝑠 (e j Ω ) · |𝐺(e j Ω )|2 + Φ𝑛𝑛 (e j Ω )
This specialization is also known as *Wiener deconvolution filter*. The filter can be derived from the PSDs of
the original signal and the noise, and the transfer function of the distorting system. This form is especially useful
when the PSDs can be modeled by analytic probabilty density functions (PSDs). For instance, the additive noise
can be modeled by white noise Φ𝑛𝑛 (𝑒𝑗Ω ) = 𝑁0 in many cases.
3.11.4 Interpretation
The result above can be rewritten by introducing the frequency dependent signal-to-noise ratio SNR(e j Ω ) =
Φ𝑠𝑠 (e j Ω )
Φ𝑛𝑛 (e j Ω ) between the orignal signal and the noise as
(︃ )︃
jΩ 1 |𝐺(e j Ω )|2
𝐻(e ) = · 1
𝐺(e j Ω ) |𝐺(e j Ω )|2 + SNR(e j Ω)
This form of the Wiener devonvolution filter can be discussed for two special cases:
1. If there is no additive noise Φ𝑛𝑛 (e j Ω ) = 0, the bracketed expression is equal to 1. Hence, the Wiener filter
is simply given as the inverse system to the distorting system
1
𝐻(e j Ω ) =
𝐺(e j Ω )
2. If the distorting system is just a pass through 𝐺(e j Ω ) = 1, the Wiener filter is given as
SNR(e j Ω ) Φ𝑠𝑠 (e j Ω )
𝐻(e j Ω ) = j Ω
=
SNR(e ) + 1 Φ𝑠𝑠 (e Ω ) + Φ𝑛𝑛 (e j Ω )
j
Hence for a high SNR(e j Ω ), i.e. Φ𝑠𝑠 (e j Ω ) ≫ Φ𝑛𝑛 (e j Ω ) at a given frequency Ω the transfer function
approaches 1; and for a small SNR low values.
Example
The preceding example of the general Wiener filter will now be reevaluated with the Wiener deconvolution filter.
In [2]: N = 8129 # number of samples
M = 256 # length of Wiener filter
Om0 = 0.1*np.pi # frequency of original signal
N0 = .1 # PSD of additive white noise
plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.plot(Om, 20*np.log10(np.abs(.5*Pss)), label=r'$| \Phi_{ss}(e^{j \Omega}) |$
plt.plot(Om, 20*np.log10(np.abs(.5*Pnn)), label=r'$| \Phi_{nn}(e^{j \Omega}) |$
plt.title('PSDs')
plt.xlabel(r'$\Omega$')
plt.legend()
plt.axis([0, np.pi, -60, 40])
plt.grid()
plt.grid()
plt.tight_layout()
# plot signals
idx = np.arange(500, 600)
plt.figure(figsize=(10, 4))
plt.plot(idx, x[idx], label=r'observed signal $x[k]$')
plt.plot(idx, s[idx], label=r'original signal $s[k]$')
plt.plot(idx, y[idx], label=r'estimated signal $y[k]$')
plt.title('Signals')
plt.xlabel(r'$k$')
plt.axis([idx[0], idx[-1], -1.5, 1.5])
plt.legend()
plt.grid()
Exercise
• What is different compared to the general Wiener filter? Why?
4.1 Introduction
In the preceding sections various statistical measures have been introduced to characterize random processes and
signals. For instance the probability density function (PDF) 𝑝𝑥 (𝜃), the mean value 𝜇𝑥 , the auto-correlation func-
tion (ACF) 𝜙𝑥𝑥 [𝜅] and its Fourier transformation, the power spectral density (PSD) Φ𝑥𝑥 (e j Ω ). For many random
processes whose internal structure is known these measures can be given in closed-form. However, for practical
random signals, measures of interest have to be estimated from a limited number of samples. These estimated
quantities can e.g. be used to fit a parametric model of the random process or as parameters in algorithms.
The estimation of the spectral properties of a random signal is of special interest for spectral analysis. The discrete
Fourier transform (DFT) of a random signal is also random. It is not very well suited to get insights into the
spectral structure of a random signal. We therefore aim at estimating the PSD Φ̂𝑥𝑥 (e j Ω ) of a weakly stationary
and ergodic process from a limited number of samples. This is known as *spectral (density) estimation*. Many
techniques have been developed for this purpose. They can be classified into
1. non-parametric and
2. parametric
techniques. Non-parametric techniques estimate the PSD of the random signal without assuming any particular
structure for the generating random process. In contrary, parametric techniques assume that the generating random
process can be modeled by few parameters. Their aim is to estimate these parameters in order to characterize the
random signal.
4.1.2 Evaluation
The estimate Φ̂𝑥𝑥 (e j Ω ) can be regarded as a random signal itself. The performance of an estimator is therefore
evaluated in a statistical sense. For the PSD, the following metrics are of interest
Bias
quantifies the difference between the estimated Φ̂𝑥𝑥 (e j Ω ) and the true Φ𝑥𝑥 (e j Ω ). An estimator is biased if 𝑏Φ^ 𝑥𝑥 ̸=
0 and bias-free if 𝑏Φ^ 𝑥𝑥 = 0.
89
Digital Signal Processing, Release 0.0
Variance
quantifies its quadratic deviation from its mean value 𝐸{Φ̂𝑥𝑥 (e j Ω )}.
Consistency
A consistent estimator is an estimator for which the following conditions hold for a large number 𝑁 of samples:
1. the estimator is unbiased
lim 𝑏 ^ =0
𝑁 →∞ Φ𝑥𝑥
Example
The following example computes and plots the magnitude spectra |𝑋𝑛 [𝜇]| of an ensemble of random signals 𝑥𝑛 [𝑘].
In the plot, each color denotes one sample function.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
# DFT of signal
X = np.fft.rfft(x, axis=1)
Om = np.linspace(0, np.pi, X.shape[1])
Exercise
• What can you conclude on the spectral properties of the random process?
• Increase the number N of samples. What changes? What does not change with respect to the evaluation
criteria introduced above?
• Is the DFT a consistent estimator for the spectral properties of a random process?
The periodogram is an estimator for the power spectral density (PSD) Φ𝑥𝑥 (e j Ω ) of a random signal 𝑥[𝑘]. We
assume a weakly ergodic real-valued random process in the following.
4.2.1 Definition
The PSD is given as the discrete time Fourier transformation (DTFT) of the auto-correlation function (ACF)
Φ𝑥𝑥 (e j Ω ) = ℱ* {𝜙𝑥𝑥 [𝜅]}
Hence, the PSD can be computed from an estimate of the ACF. Let’s assume that we want to estimate the PSD
from 𝑁 samples of the random signal 𝑥[𝑘] by way of the ACF. The truncated signal is given as
𝑥𝑁 [𝑘] = 𝑥[𝑘] · rect𝑁 [𝑘]
The ACF is estimated by using its definition in a straightforward manner. For a random signal 𝑥𝑁 [𝑘] of finite
length, the estimated ACF 𝜙ˆ𝑥𝑥 [𝜅] can be expressed in terms of a convolution
1
𝜙ˆ𝑥𝑥 [𝜅] = · 𝑥𝑁 [𝑘] * 𝑥𝑁 [−𝑘]
𝑁
Applying the DTFT to both sides and rearranging the terms yields
1 1
Φ̂𝑥𝑥 (e j Ω ) = 𝑋𝑁 (e j Ω ) 𝑋𝑁 (e −j Ω ) = |𝑋𝑁 (e j Ω )|2
𝑁 𝑁
where the latter equality has been derived by applying the symmetry relations of the DTFT. This estimate of the
PSD is known as the periodogram. It can be computed directly from the DTFT
𝑁
∑︁−1
𝑋𝑁 (e j Ω ) = 𝑥𝑁 [𝑘] e −j Ω 𝑘
𝑘=0
4.2.2 Example
The following example estimates the PSD of a random process which draws samples from normal distributed
white noise with zero-mean and unit variance. The true PSD is given as Φ𝑥𝑥 (e j Ω ) = 1. In order to compute the
periodogram by the discrete Fourier transformation (DFT), the signal 𝑥[𝑘] has to be zero-padded to ensure that
above convolution is not circular.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# plot results
plt.figure(figsize=(10,4))
plt.stem(Om, Sxx, 'b', label=r'$|\hat{\Phi}_{xx}(e^{j \Omega})|$')
plt.plot(Om, np.ones_like(Sxx), 'r', label=r'$\Phi_{xx}(e^{j \Omega})$')
plt.title('Estimated and true PSD')
plt.xlabel(r'$\Omega$')
plt.axis([0, np.pi, 0, 5])
plt.legend()
Exercise
• What do you have to change to evaluate experimentally if the periodogram is a consistent estimator?
• Based on the results, is the periodogram a consistent estimator?
4.2.3 Evaluation
From above numerical example it should have become clear that the periodogram is no consistent estimator for
the PSD Φ𝑥𝑥 (e j Ω ). It can be shown that the estimator is asymptotically bias free when 𝑁 → ∞, hence
This is due to the leakage effect which limits the spectral resolution for signals of finite length.
The variance of the estimator does not converge towards zero
2
lim 𝜎Φ
^ ̸= 0
𝑁 →∞ 𝑥𝑥
2𝜋
This is due to the fact that with increasing 𝑁 also the number of independent frequencies Ω = 𝑁 𝜇 for 𝜇 =
0, 1, . . . , 𝑁 − 1 increases.
The periodogram is the basis for a variety of advanced estimation techniques for the PSD. These techniques rely
on averaging or smoothing of (overlapping) periodograms.
In the previous section it has been shown that the periodogram, as a non-parametric estimator of the power spec-
tral density (PSD) Φ𝑥𝑥 (e j Ω ) of a random signal 𝑥[𝑘], is not consistent. This is due to the fact that its variance
does not converge towards zero even when the length of the random signal is increased towards infinity. In or-
der to overcome this problem, the [Bartlett method](https://en.wikipedia.org/wiki/Bartlett’s_method) and [Welch
method](https://en.wikipedia.org/wiki/Welch’s_method)
1. split the random signal into segments,
2. estimate the PSD for each segment, and
3. average over these local estimates.
The averaging reduces the variance of the estimated PSD. While Barlett’s method uses non-overlapping segments,
Welch’s is a generalization using windowed overlapping segments. As before we assume a weakly ergodic real-
valued random process for the discussion of Welch’s method.
4.3.1 Derivation
Let’s assume that we split the random signal 𝑥[𝑘] into 𝐿 overlapping segments 𝑥𝑙 [𝑘] of length 𝑁 with 0 ≤ 𝑙 ≤
𝐿 − 1, starting at multiples of the stepsize 𝑀 ∈ 1, 2, . . . , 𝑁 . These segments are then windowed by the window
𝑤[𝑘] of length 𝑁 , resulting in a windowed 𝑙-th segment. The discrete time Fourier transformation (DTFT) 𝑋𝑙 (e j Ω )
of the windowed 𝑙-th segment is thus given as
𝑁
∑︁−1
𝑋𝑙 (e j Ω ) = 𝑥[𝑘 + 𝑙 · 𝑀 ] 𝑤[𝑘] e −j Ω 𝑘
𝑘=0
−1
𝑁∑︀
1
where the window 𝑤[𝑘] defined within 0 ≤ 𝑘 ≤ 𝑁 − 1 should be normalized as 𝑁 |𝑤[𝑘]|2 = 1. The stepsize
𝑘=0
𝑀 determines the overlap between the segments. In general 𝑁 − 𝑀 number of samples overlap between adjacent
segments, for 𝑀 = 𝑁 no overlap occurs. The overlap is sometimes given as ratio 𝑁 −𝑀
𝑁 · 100%.
Introducing 𝑋𝑙 (e j Ω ) into the definition of the periodogram yields the periodogram of the 𝑙-th segment
1
Φ̂𝑥𝑥,𝑙 (e j Ω ) = |𝑋𝑙 (e j Ω )|2
𝑁
The estimated PSD is then given by averaging over the segment’s periodograms Φ̂𝑥𝑥,𝑙 (e j Ω )
𝐿−1
1 ∑︁
Φ̂𝑥𝑥 (e j Ω ) = Φ̂𝑥𝑥,𝑙 (e j Ω )
𝐿
𝑙=0
Note, that the total number 𝐿 of segments has to be chosen such that the last required sample (𝐿 − 1) · 𝑀 + 𝑁 − 1
does not exceed the total length of the random signal. Otherwise the last segment 𝑥𝐿−1 [𝑘] may also be zeropadded
towards length 𝑁 .
The Bartlett method uses a rectangular window and non-overlapping segments. The Welch method uses overlap-
ping segments and a window that must be chosen according to the intended spectral analysis task.
4.3.2 Example
The following example is equivalent to the periodogram example. We aim at estimating the PSD of a random
process which draws samples from normal distributed white noise with zero-mean and unit variance. The true
PSD is given as Φ𝑥𝑥 (e j Ω ) = 1.
In [21]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
# plot results
plt.figure(figsize=(10,4))
plt.stem(Om, Pxx, 'b', label=r'$|\hat{\Phi}_{xx}(e^{j \Omega})|$')
plt.plot(Om, np.ones_like(Pxx), 'r', label=r'$\Phi_{xx}(e^{j \Omega})$')
plt.title('Estimated and true PSD')
plt.xlabel(r'$\Omega$')
plt.axis([0, np.pi, 0, 2])
plt.legend()
Exercise
• Compare the results to the periodogram example. Is the variance of the estimator lower?
• Change the number of segments L and check if the variance reduces further
• Change the segment length N and stepsize M. What changes?
4.3.3 Evaluation
It is shown in [Stoica et al.] that Welch’s method is asymptotically unbiased. Under the assumption of a weakly
stationary random process, the periodograms Φ̂𝑥𝑥,𝑙 (𝑒𝑗Ω ) of the segments can be assumed to be approximately
uncorrelated. Hence, averaging over these reduces the variance of the estimator. It can be shown formally that in
the limiting case of an infinitely number of segments (infintely long signal) the variance tends to zero. As a result
Welch’s method is an asymptotically consistent estimator of the PSD.
For a finite segment length 𝑁 , the properties of the estimated PSD Φ̂𝑥𝑥 (𝑒𝑗Ω ) depend on the length 𝑁 of the
segments and the window function 𝑤[𝑘] due to the leakage effect.
4.4.1 Motivation
Non-parametric methods for the estimation of the power spectral density (PSD), like the periodogram or Welch’s
method, don’t rely on a-priori information about the process generating the random signal. Often some a-priori
information is available that can be used to formulate a parametric model of the random process. The goal is then
to estimate these parameters in order to characterize the random signal. Such techniques are known as ‘parametric
methods <https://en.wikipedia.org/wiki/Spectral_density_estimation#Parametric_estimation>‘__ or model-based
methods. The incorporation of a-priori knowledge can improve the estimation of the PSD significantly, as long as
the underlying model is a valid description of the random process. The parametric model of the random process
can also be used to generate random signals with a desired PSD.
For the remainder we assume weakly stationary real-valued random processes. For many applications the process
can be modeled by a linear-time invariant (LTI) system
where 𝑛[𝑘] is white noise and 𝐻(e j Ω ) denotes the transfer function of the system. In general, the random signal
𝑥[𝑘] will be correlated as a result of the processing of the uncorrelated input signal 𝑛[𝑘] by the system 𝐻(e j Ω ).
Due to the white noise assumption Φ𝑛𝑛 (e j Ω ) = 𝑁0 , the PSD of the random process is given as
Parametric methods model the system 𝐻(e j Ω ) by a limited number of parameters. These parameters are then
ˆ j Ω ) of the transfer function. This estimate is then used to calculate
estimated from 𝑥[𝑘], providing an estimate 𝐻(e
jΩ
the desired estimate Φ̂𝑥𝑥 (e ) of the PSD.
Autoregressive model
The autoregressive (AR) model assumes a recursive system with a direct path. Its output relation is given as
𝑁
∑︁
𝑥[𝑘] = 𝑎𝑛 · 𝑥[𝑘 − 𝑛] + 𝑛[𝑘]
𝑛=1
where 𝑎𝑛 denote the coefficients of the recursive path and 𝑁 the order of the model. Its system function 𝐻(𝑧) is
derived by 𝑧-transformation of the output relation
1
𝐻(𝑧) = ∑︀𝑁
1− 𝑛=1 𝑎𝑛 𝑧 −𝑛
The moving average (MA) model assumes a non-recursive system. The output relation is given as
𝑀
∑︁−1
𝑥[𝑘] = 𝑏𝑚 · 𝑛[𝑘 − 𝑚] = ℎ[𝑘] * 𝑛[𝑘]
𝑚=0
with the impulse response of the system ℎ[𝑘] = [𝑏0 , 𝑏1 , . . . , 𝑏𝑀 −1 ]. The MA model is a finite impulse response
(FIR) model of the random process. Its system function is given as
𝑀
∑︁−1
𝐻(𝑧) = 𝒵{ℎ[𝑘]} = 𝑏𝑚 𝑧 −𝑚
𝑚=0
The autoregressive moving average (ARMA) model is a combination of the AR and MA model. It constitutes a
general linear process model. Its output relation is given as
𝑁
∑︁ 𝑀
∑︁−1
𝑥[𝑘] = 𝑎𝑛 · 𝑥[𝑘 − 𝑛] + 𝑏𝑚 · 𝑛[𝑘 − 𝑚]
𝑛=1 𝑚=0
The models above describe the synthesis of the samples 𝑥[𝑘] from the white noise 𝑛[𝑘]. For spectral estimation
only the random signal 𝑥[𝑘] is known and we are aiming at estimating the parameters of the model. This can be
achieved by determining an analyzing system 𝐺(e j Ω ) such to decorrelate the signal 𝑥[𝑘]
where 𝑒[𝑘] should be white noise. Due to its desired operation, the filter 𝐺(e j Ω ) is also denoted as whitening filter.
The optimal filter 𝐺(e j Ω ) is given by the inverse system 𝐻(e1j Ω ) . However, 𝐻(e j Ω ) is in general not known. But
this nevertheless implies that our linear process model of 𝐻(e j Ω ) also applies to 𝐺(e j Ω ). Various techniques
have been developed to estimate the parameters of the filter 𝐺(e j Ω ) such that 𝑒[𝑘] becomes decorrelated. For
instance, by expressing the auto-correlation function (ACF) 𝜙𝑥𝑥 [𝜅] in terms of the model parameters and solving
with respect to these. The underlying set of equations are known as Yule-Walker equations.
ˆ j Ω ) of the analysis
Once the model parameters have been estimated, these can be used to calculate an estimate 𝐺(e
system. The desired estimate of the PSD is then given as
Φ𝑒𝑒 (e j Ω )
Φ̂𝑥𝑥 (e j Ω ) =
ˆ j Ω )|2
|𝐺(e
4.4.4 Example
In the follwing example 𝑛[𝑘] is drawn from normal distributed white noise with 𝑁0 = 1. The Yule-Walker
equations are used to estimate the parameters of an AR model of 𝐻(e j Ω ). The implementation provided
by statsmodels.api.regression.yule_walker returns the estimated AR coefficients of the system
𝐻(e j Ω ). These parameters are then used to numerically evaluate the estimated transfer function, resulting in
ˆ j Ω )|2 .
Φ̂𝑥𝑥 (e j Ω ) = 1 · |𝐻(e
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import statsmodels.api as sm
import scipy.signal as sig
# plot PSDs
plt.figure(figsize=(10,5))
plt.plot(Om, np.abs(H)**2, label=r'$\Phi_{xx}(e^{j\Omega})$')
plt.plot(Om2*2*np.pi, .5*np.abs(Pxx), 'k-', alpha=.5 , label=r'$\hat{\Phi}_{xx}(
plt.plot(Om, np.abs(He)**2, label=r'$\hat{\Phi}_{xx}(e^{j\Omega})$ (parametric)'
plt.xlabel(r'$\Omega$')
plt.axis([0, np.pi, 0, 25])
plt.legend();
Exercise
• Change the order N of the AR model used for estimation by the Yule-Walker equations. What happens if
the order is smaller or higher than the order of the true system? Why?
• Change the number of samples K. Is the estimator consistent?
Quantization
5.1 Introduction
Digital signal processors and general purpose processors can only perform arithmetic operations within a limited
number range. So far we considered discrete signals with continuous amplitude values. These cannot be handled
by processors in a straightforward manner. Quantization is the process of mapping a continuous amplitude to a
countable set of amplitude values. This refers also to the requantization of a signal from a large set of countable
amplitude values to a smaller set. Scalar quantization is an instantaneous and memoryless operation. It can be
applied to the continuous amplitude signal, also referred to as analog signal or to the (time-)discrete signal. The
quantized discrete signal is termed as digital signal. The connections between the different domains is illustrated
in the following for time dependent signals.
99
Digital Signal Processing, Release 0.0
In order to quantify the effects of quantizing a continuous amplitude signal, a model of the quantization process
is formulated. We restrict our considerations to a discrete real-valued signal 𝑥[𝑘]. In order to map the continuous
amplitude to a quantized representation the following model is used
𝑥𝑄 [𝑘] = 𝑔( ⌊ 𝑓 (𝑥[𝑘]) ⌋ )
where 𝑔(·) and 𝑓 (·) denote real-valued mapping functions, and ⌊·⌋ a rounding operation. The quantization process
can be split into two stages
1. Forward quantization The mapping 𝑓 (𝑥[𝑘]) maps the signal 𝑥[𝑘] such that it is suitable for the rounding
operation. This may be a scaling of the signal or a non-linear mapping. The result of the rounding operation
is an integer number ⌊ 𝑓 (𝑥[𝑘]) ⌋ ∈ Z, which is termed as quantization index.
2. Inverse quantization The mapping 𝑔(·), maps the quantization index to the quantized value 𝑥𝑄 [𝑘] such
that it is an approximation of 𝑥[𝑘]. This may be a scaling or a non-linear operation.
The quantization error/quantization noise 𝑒[𝑘] is defined as
Rearranging yields that the quantization process can be modeled by adding the quantization noise to the discrete
signal
Example
In order to illustrate the introduced model, the quantization of one period of a sine signal is considered
𝑥[𝑘] = sin[Ω0 𝑘]
1
using 𝑓 (𝑥[𝑘]) = 3 · 𝑥[𝑘] and 𝑔(𝑖) = 3 · 𝑖. The rounding is realized by the nearest integer function. The quantized
signal is then given as
1
𝑥𝑄 [𝑘] = · ⌊ 3 · sin[Ω0 𝑘] ⌋
3
For ease of illustration the signals are not shown by stem plots.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
# generate signal
x = np.sin(2*np.pi/N * np.arange(N))
# quantize signal
xi = np.round(3 * x)
xQ = 1/3 * xi
e = xQ - x
ax2.set_ylim([-3.6, 3.6])
ax2.set_ylabel('quantization index')
ax2.grid()
Exercise
• Investigate the quantization noise 𝑒[𝑘]. Is its amplitude bounded?
• If you would represent the quantization index (shown on the right side) by a binary number, how much bits
would you need?
• Try out other rounding operations like np.floor() and np.ceil() instead of np.round(). What
changes?
5.1.2 Properties
Without knowledge of the quantization error 𝑒[𝑘], the signal 𝑥[𝑘] cannot be reconstructed exactly knowing only
its quantization index or quantized representation 𝑥𝑄 [𝑘]. The quantization error 𝑒[𝑘] itself depends on the signal
𝑥[𝑘]. Therefore, quantization is in general an irreversible process. The mapping from 𝑥[𝑘] to 𝑥𝑄 [𝑘] is furthermore
non-linear, since the superposition principle does not hold in general. Summarizing, quantization is an inherently
irreversible and non-linear process.
5.1.3 Applications
The characteristics of a quantizer depend on the mapping functions 𝑓 (·), 𝑔(·) and the rounding operation ⌊·⌋
introduced in the previous section. A linear quantizer bases on linear mapping functions 𝑓 (·) and 𝑔(·). A uniform
quantizer splits the mapped input signal into quantization steps of equal size. Quantizers can be described by their
nonlinear in-/output characteristic 𝑥𝑄 [𝑘] = 𝒬{𝑥[𝑘]}, where 𝒬{·} denotes the quantization process. For linear
uniform quantization it is common to differentiate between two characteristic curves, the so called mid-tread and
mid-rise.
where 𝑄 denotes the quantization step size and ⌊·⌋ the floor function which maps a real number to the largest
integer not greater than its argument. Without restricting 𝑥[𝑘] in amplitude, the resulting quantization indexes
are countable infinite. For a finite number of quantization indexes, the input signal has to be restricted to a
minimal/maximal amplitude 𝑥min < 𝑥[𝑘] < 𝑥max before quantization. The resulting quantization characteristic of
a linear uniform mid-tread quantizer is shown in the following
𝑄
The term mid-tread is due to the fact that small values |𝑥[𝑘]| < 2 are mapped to zero.
Example
The quantization of one period of a sine signal 𝑥[𝑘] = 𝐴 · sin[Ω0 𝑘] by a mid-tread quantizer is simulated. 𝐴
denotes the amplitude of the signal, 𝑥min = −1 and 𝑥max = 1 are the smallest and largest output values of the
quantizer, respectively.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
return xQ
# generate signal
x = A * np.sin(2*np.pi/N * np.arange(N))
# quantize signal
xQ = uniform_midtread_quantizer(x, Q)
# plot signals
plot_signals(x, xQ)
Exercise
• Change the quantization stepsize Q and the amplitude A of the signal. Which effect does this have on the
quantization error?
where ⌊·⌋ denotes the floor function. The quantization characteristic of a linear uniform mid-rise quantizer is
illustrated in the following
The term mid-rise copes for the fact that 𝑥[𝑘] = 0 is not mapped to zero. Small positive/negative values around
zero are rather mapped to ± 𝑄
2.
Example
The example from above is now evaluated for the mid-rise characteristic
In [2]: A = 1.2 # amplitude of signal
Q = 1/10 # quantization stepsize
N = 2000 # number of samples
return xQ
# generate signal
x = A * np.sin(2*np.pi/N * np.arange(N))
# quantize signal
xQ = uniform_midrise_quantizer(x, Q)
# plot signals
plot_signals(x, xQ)
Exercise
• What are the differences between the mid-tread and the mid-rise characteristic curves for the given example?
The quantization results in two different types of distortions, as illustrated in the preceding notebook. Overload
distortions are a consequence of exceeding the maximum amplitude of the quantizer. Granular distortions are
a consequence of the quantization process when no clipping occurs. Various measures are used to quantify the
distortions of a given quantizer. We limit ourselves to the Signal-to-Noise ratio.
A quantizer can be evaluated by its signal-to-noise ratio (SNR), which is defined as the power of the unquantized
signal 𝑥[𝑘] divided by the power of the quantization error 𝑒[𝑘]. Under the assumption that both signals are drawn
from a zero-mean weakly stationary process, the average SNR is given as
(︂ 2 )︂
𝜎𝑥
𝑆𝑁 𝑅 = 10 · log10 in dB
𝜎𝑒2
where 𝜎𝑥2 and 𝜎𝑒2 denote the variances of the signals 𝑥[𝑘] and 𝑒[𝑘]. The statistical properties of the signal 𝑥[𝑘] and
the quantization error 𝑒[𝑘] are required in order to evaluate the SNR of a quantizer.
The statistical properties of the quantization error 𝑒[𝑘] have been derived for instance in [Zölzer]. We only summa-
rize the results here. We focus on the non-clipping case first, hence on granular distortions. Here the quantization
error is in general bounded |𝑒[𝑘]| < 𝑄 2.
Under the assumption that the average magnitude of the input signal is much larger that the quantization step size
𝑄, the quantization error 𝑒[𝑘] can be approximated by the following statistical model and assumptions
1. The quantization error 𝑒[𝑘] is not correlated with the input signal 𝑥[𝑘]
Φ𝑒𝑒 (e j Ω ) = 𝜎𝑒2
3. The probability density function (PDF) of the quantization error is given by the zero-mean uniform distri-
bution
(︂ )︂
1 𝜃
𝑝𝑒 (𝜃) = · rect
𝑄 𝑄
The variance of the quantization error is then calculated from its PDF as
𝑄2
𝜎𝑒2 =
12
Let’s assume that the quantization index is represented as binary or fixed-point number with 𝑤-bits. The common
notation for the mid-tread quantizer is that 𝑥min can be represented exactly. The quantization step is then given as
𝑥max |𝑥min |
𝑄= = 𝑤−1
2𝑤−1 − 1 2
Using this, the variance of the quantization error can be related to the word length 𝑤
𝑥2max
𝜎𝑒2 =
3 · 22𝑤
From this result it can be concluded that the power of the quantization error decays by 6 dB per additional bit.
This holds only for the assumptions stated above.
In order to calculate the average SNR of a linear uniform quantizer, a model for the input signal 𝑥[𝑘] is required.
Let’s assume that the signal is modeled by a zero-mean uniform distribution
(︂ )︂
1 𝜃
𝑝𝑥 (𝜃) = rect
2𝑥max 2𝑥max
Hence, all amplitudes between −𝑥max and 𝑥max occur with the same probability. The variance of the signal is then
calculated to
4𝑥2max
𝜎𝑥2 =
12
Introducing 𝜎𝑥2 and 𝜎𝑒2 into the definition of the SNR yields
This is often referred to as the 6 dB/bit rule of thumb for quantization. Note that in the derivation above it has
been assumed that the signal 𝑥[𝑘] uses the full amplitude range of the quantizer. If this is not the case, the SNR
will be lower since 𝜎𝑥2 is lower.
Example
In this example the linear uniform quantization of a random signal drawn from a uniform distribution is evaluated.
The amplitude range of the quantizer is 𝑥min = −1 and 𝑥max = 1 − 𝑄.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
return xQ
plt.subplot(121)
plt.bar(bins[:-1]/Q, pe*Q, width = 2/len(pe))
plt.title('Estimated histogram of quantization error')
plt.xlabel(r'$\theta / Q$')
plt.ylabel(r'$\hat{p}_x(\theta) / Q$')
plt.axis([-1, 1, 0, 1.2])
plt.subplot(122)
plt.plot(nf*2*np.pi, Pee*6/Q**2)
plt.title('Estimated PSD of quantization error')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$\hat{\Phi}_{ee}(e^{j \Omega}) / \sigma_e^2$')
plt.axis([0, np.pi, 0, 2]);
plt.figure(figsize=(10,6))
ccf = ccf[N-K-1:N+K-1]
kappa = np.arange(-len(ccf)//2,len(ccf)//2)
plt.stem(kappa, ccf)
plt.title('Cross-correlation function between input signal and error')
plt.xlabel(r'$\kappa$')
plt.ylabel(r'$\varphi_{xe}[\kappa]$')
# quantization step
Q = 1/(2**(w-1))
# compute input signal
Exercise
• Change the number of bits w and check if the derived SNR holds
• Change the amplitude A of the input signal. What happens if you make the amplitude very small? Why?
For a harmonic input signal 𝑥[𝑘] = 𝑥max ·cos[Ω0 𝑘] the variance 𝜎𝑥2 is given by its squared root mean square (RMS)
value
𝑥2max
𝜎𝑥2 =
2
Introducing this into the definition of the SNR together with the variance 𝜎𝑒2 of the quantization error yields
(︂ )︂
2𝑤 3
𝑆𝑁 𝑅 = 10 · log10 2 · ≈ 6.02 𝑤 + 1.76 in dB
2
The gain of 1.76 dB with respect to the case of a uniformly distributed input signal is due to the fact that the
amplitude distribution of a harmonic signal is not uniform
1
𝑝𝑥 (𝜃) = √︁
𝜃 2
𝜋 1 − ( 𝑥max )
for |𝜃| < 𝑥max . High amplitudes are more likely to occur. The relative power of the quantization error is lower for
higher amplitudes which results in an increase of the average SNR.
So far, we did not consider clipping of the input signal 𝑥[𝑘], e.g. by ensuring that its minimum/maximum values
do not exceed the limits of the quantizer. However, this cannot always be ensured for practical signals. Moreover,
many practical signals cannot be modeled as a uniform distribution. For instance a normally distributed random
signal exceeds a given maximum value with non-zero probability. Hence, clipping will occur for such an input
signal. Clipping results in overload distortions whose amplitude can be much higher that 𝑄 2 . For the overall
average SNR both granular and overload distortions have to be included.
For a normally distributed signal with a given probability that clipping occurs Pr{|𝑥[𝑘]| > 𝑥max } = 10−5 the
SNR can be calculated to [Zölzer]
𝑆𝑁 𝑅 ≈ 6.02 𝑤 − 8.5 in dB
The reduction of the SNR by 8.5 dB results from the fact that small signal values are more likely to occur for a
normally distributed signal. The relative quantization error for small signals is higher, which results in a lower
average SNR. Overload distortions due to clipping result in a further reduction of the average SNR.
The Laplace distribution is a commonly applied model for speech and music signals. As for the normal distribu-
tion, clipping will occur with a non-zero probability. For a Laplace distributed signal with a given probability that
clipping occurs Pr{|𝑥[𝑘]| > 𝑥max } = 10−4 the SNR can be calculated to [Vary et al.]
𝑆𝑁 𝑅 ≈ 6.02 𝑤 − 9 in dB
Even though the probability of clipping is higher as for the normally distributed signal above, the SNR is in the
same range. The reason for this is, that the Laplace distribution features low signal values with a higher and large
values with a lower probability in comparison to the normal distribution.
Example
The following example evaluates the SNR of a linear uniform quantizer with 𝑤 = 8 for a Laplace distributed
signal 𝑥[𝑘]. The SNR is computed for various probabilities that clipping occurs.
In [2]: w = 8 # wordlength of the quantized signal
Pc = np.logspace(-20, np.log10(.5), num=500) # probabilities for clipping
N = int(1e6) # number of samples
def compute_SNR(Pc):
# compute input signal
sigma_x = - np.sqrt(2) / np.log(Pc)
x = np.random.laplace(size=N, scale=sigma_x/np.sqrt(2) )
# quantize signal
xQ = uniform_midtread_quantizer(x, Q)
e = xQ - x
# compute SNR
SNR = 10*np.log10((np.var(x)/np.var(e)))
return SNR
# quantization step
Q = 1/(2**(w-1))
# compute SNR for given probabilities
SNR = [compute_SNR(P) for P in Pc]
# plot results
plt.figure(figsize=(8,4))
plt.semilogx(Pc, SNR)
plt.xlabel('Probability for clipping')
plt.ylabel('SNR in dB')
plt.grid()
Exercise
• Can you explain the specific shape of the curve? What effect dominates for clipping probabilities be-
low/obove the maximum?
The following example illustrates the requantization of a speech signal. The signal was originally recorded with a
wordlength of 𝑤 = 16 bits. It is requantized with a uniform mid-tread quantizer to various wordlengths. The SNR
is computed and a portion of the (quantized) signal is plotted. It is further possible to listen to the requantized
signal and the quantization error.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import soundfile as sf
return xQ
In [2]: xQ = uniform_midtread_quantizer(x, 8)
e = evaluate_requantization(x, xQ)
sf.write('speech_8bit.wav', xQ, fs)
sf.write('speech_8bit_error.wav', e, fs)
SNR: 34.021487 dB
Requantized Signal Your browser does not support the audio element. speech_8bit.wav
Quantization Error Your browser does not support the audio element. speech_8bit_error.wav
In [3]: xQ = uniform_midtread_quantizer(x, 6)
e = evaluate_requantization(x, xQ)
sf.write('speech_6bit.wav', xQ, fs)
sf.write('speech_6bit_error.wav', e, fs)
SNR: 22.889593 dB
Requantized Signal Your browser does not support the audio element. speech_6bit.wav
Quantization Error Your browser does not support the audio element. speech_6bit_error.wav
In [4]: xQ = uniform_midtread_quantizer(x, 4)
e = evaluate_requantization(x, xQ)
sf.write('speech_4bit.wav', xQ, fs)
sf.write('speech_4bit_error.wav', e, fs)
SNR: 11.713678 dB
Requantized Signal Your browser does not support the audio element. speech_4bit.wav
Quantization Error Your browser does not support the audio element. speech_4bit_error.wav
In [5]: xQ = uniform_midtread_quantizer(x, 2)
e = evaluate_requantization(x, xQ)
sf.write('speech_2bit.wav', xQ, fs)
sf.write('speech_2bit_error.wav', e, fs)
SNR: 2.428364 dB
Requantized Signal Your browser does not support the audio element. speech_2bit.wav
Quantization Error Your browser does not support the audio element. speech_2bit_error.wav
The quantized signal as the output of a quantizer can be expressed with the quantization error 𝑒[𝑘] as
According to the introduced model, the quantization noise can be modeled as uniformly distributed white noise.
Hence, the noise is distributed over the entire frequency range. The basic concept of noise shaping is a feedback
of the quantization error to the input of the quantizer. This way the spectral characteristics of the quantization
noise can be changed, i.e. spectrally shaped. Introducing a generic filter ℎ[𝑘] into the feedback loop yields the
following structure
The quantized signal can be deduced from the block diagram above as
where the additive noise model from above has been introduced and it has been assumed that the impulse response
ℎ[𝑘] is normalized such that the magnitude of 𝑒[𝑘]*ℎ[𝑘] is below the quantization step 𝑄. The overall quantization
error is then
The power spectral density (PSD) of the quantization with noise shaping is calculated to
⃒2
Φ𝑒 𝑒 (e j Ω ) = Φ𝑒𝑒 (e j Ω ) · ⃒1 − 𝐻(e j Ω )⃒
⃒
𝐻 𝐻
Hence the PSD Φ𝑒𝑒 (e j Ω ) of the quantizer without noise shaping is weighted by |1 − 𝐻(e j Ω )|2 . Noise shaping
allows a spectral modification of the quantization error. The desired shaping depends on the application scenario.
For some applications, high-frequency noise is less disturbing as low-frequency noise.
5.5.1 Example
If the feedback of the error signal is delayed by one sample we get with ℎ[𝑘] = 𝛿[𝑘 − 1]
⃒2
Φ𝑒𝐻 𝑒𝐻 (e j Ω ) = Φ𝑒𝑒 (e j Ω ) · ⃒1 − e −j Ω ⃒
⃒
For linear uniform quantization Φ𝑒𝑒 (e j Ω ) = 𝜎𝑒2 is constant. Hence, the spectral shaping constitutes a high-pass
characteristic of first order. The following simulation evaluates a noise shaping quantizer of first order.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
return xQ[1:]
# quantization step
Q = 1/(2**(w-1))
# compute input signal
x = np.random.uniform(size=N, low=-A, high=(A-Q))
# quantize signal
xQ = uniform_midtread_quantizer_w_ns(x, Q)
e = xQ - x[1:]
# estimate PSD of error signal
nf, Pee = sig.welch(e, nperseg=64)
# estimate SNR
SNR = 10*np.log10((np.var(x)/np.var(e)))
print('SNR = %f in dB' %SNR)
plt.figure(figsize=(10,5))
Om = nf*2*np.pi
plt.plot(Om, Pee*6/Q**2, label='simulated')
plt.plot(Om, np.abs(1 - np.exp(-1j*Om))**2, label='theory')
plt.plot(Om, np.ones(Om.shape), label='w/o noise shaping')
plt.title('Estimated PSD of quantization error')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$\hat{\Phi}_{e_H e_H}(e^{j \Omega}) / \sigma_e^2$')
plt.axis([0, np.pi, 0, 4.5]);
plt.legend(loc='upper left')
plt.grid()
SNR = 45.128560 in dB
Exercise
• The overall average SNR is lower than for the quantizer without noise shaping. Why?
5.6 Oversampling
Oversampling is a technique which is applied in analog-to-digital converters to lower the average power of the
quantization error. It requires a joint consideration of sampling and quantization.
Let’s consider the ideal sampling of a signal followed by its quantization, as given by the following block diagram
Ideal sampling is modeled by multiplying the continuous signal 𝑥(𝑡) with a series of equidistant Dirac functions,
resulting in the discrete signal 𝑥[𝑘] = 𝑥(𝑘𝑇 ) where 𝑇 denotes the sampling interval. The discrete signal 𝑥[𝑘] is
then quantized. The output of the ideal analog-to-digital converter is the quantized discrete signal 𝑥Q [𝑘].
Sampling of the continuous signal 𝑥(𝑡) leads to repetitions of the spectrum 𝑋(𝑗𝜔) = ℱ{𝑥(𝑡)} at multiples of
𝜔S = 2𝜋
𝑇 . We limit ourselves to a continuous real-valued 𝑥(𝑡) ∈ R, band-limted signal |𝑋(𝑗𝜔)| = 0 for |𝜔| > 𝜔C
where 𝜔C denotes its cut-off frequency. The spectral repetitions due to sampling do not overlap if the sampling
theorem 𝜔S ≥ 2 · 𝜔C is fulfilled. In the case of Nyquist (critical) sampling, the sampling frequency is chosen as
𝜔S = 2 · 𝜔C .
5.6.3 Oversampling
The basic idea of oversampling is to sample the input signal at frequencies which are significantly higher than
the Nyquist criterion dictates. After quantization, the signal is low-pass filtered by a discrete filter 𝐻LP (e j Ω ) and
resampled back to the Nyquist rate. In order to avoid aliasing due to the resampling this filter has to be chosen as
an ideal low-pass
(︂ )︂
jΩ Ω
𝐻LP (e ) = rect
2 ΩC
where ΩC = 𝜔C · 𝑇 . For an oversampling of factor 𝐿 ∈ Z we have 𝜔S = 𝐿 · 2𝜔C . For this case, the resampling
can be realized by keeping only every 𝐿-th sample which is known as decimation. The following block diagram
illustrates the building blocks of oversampled digital-to-analog conversion, ↓ 𝐿 denotes decimation by a factor of
𝐿
In order the conclude on the benefits of oversampling we have to derive the average power of the overall quan-
tization error. According to our model of the quantization error, the quantization error 𝑒[𝑘] can be modeled as
uniformly distributed white noise. Its power spectral density (PSD) is given as
𝑄2
Φ𝑒𝑒 (e j Ω ) =
12
where 𝑄 denotes the quantization step. Before the discrete low-pass filter 𝐻LP (e j Ω ), the power of the quantization
error is uniformly distributed over the entire frequency range −𝜋 < Ω ≤ 𝜋. However, after the ideal low-pass
𝜋 𝜋
filter the frequency range is limited to − 𝐿 <Ω≤ 𝐿 . The average power of the quantization error is then given as
𝜋
∫︁𝐿
2 1 1 𝑄2
𝜎𝑒,LP = Φ𝑒𝑒 (e j Ω ) dΩ = ·
2𝜋 𝐿 12
𝜋
−𝐿
The average power 𝜎𝑥2 of the sampled signal 𝑥[𝑘] is not affected, since the cutoff frequency of the low-pass filter
has been chosen as the upper frequency limit 𝜔C of the input signal 𝑥(𝑡).
In order to calculate the SNR of the oversampled analog-to-digital converter we assume that the input signal is
drawn from a uniformly distributed zero-mean random process with |𝑥[𝑘]| < 𝑥max . With the results from our
2
discussion of linear uniform quantization and 𝜎𝑒,LP from above we get
where 𝑤 denotes the number of bits used for a binary representation of the quantization index. Hence, oversam-
pling by a factor of 𝐿 brings a plus of 10 · log10 (𝐿) dB in terms of SNR. For instance, an oversampling by a factor
of 𝐿 = 4 results in a SNR which is approximately 6 dB higher. For equal SNR the quantization step 𝑄 can be
chosen larger. In terms the of wordlength of a quantizer this accounts to a reduction by one bit. Consequently,
there is a trade-off between accuracy of the quantizer and its sampling frequency.
5.6.4 Example
The following numerical simulation illustrates the benefit in terms of SNR for an oversampled linear uniform
quantizer with 𝑤 = 16 for the quantization of the harmonic signal 𝑥[𝑘] = cos[Ω0 𝑘].
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
return xQ
def SNR_oversampled_ADC(L):
x = (1-Q)*np.cos(Om0*np.arange(N))
xu = (1-Q)*np.cos(Om0*np.arange(N*L)/L)
# quantize signal
xQu = uniform_midtread_quantizer(xu, Q)
# low-pass filtering and decimation
xQ = sig.resample(xQu, N)
# estimate SNR
e = xQ - x
return 10*np.log10((np.var(x)/np.var(e)))
# plot result
plt.figure(figsize=(10, 4))
plt.semilogx(L, SNR, label='with oversampling')
plt.plot(L, (6.02*w+1.76)*np.ones(L.shape), label='without oversampling' )
plt.xlabel(r'oversampling factor $L$')
plt.ylabel(r'SNR in dB')
plt.legend(loc='upper left')
plt.grid()
Exercise
• What SNR can be achieved for an oversampling factor of 𝐿 = 16?
• By how many bits could the word length 𝑤 be reduced in order to gain the same SNR as without oversam-
pling?
Besides an increased SNR, oversampling has also another benefit. In order to ensure that the input signal 𝑥(𝑡) is
band-limited before sampling, a low-pass filter 𝐻LP (j 𝜔) is applied in typical analog-to-digital converters. This is
illustrated in the following
(︁ )︁
The filter 𝐻LP (j 𝜔) is also known as anti-aliasing filter. The ideal low-pass filter is given as 𝐻LP (j 𝜔) = rect 𝜔𝜔S .
The ideal 𝐻LP (j 𝜔) can only be approximated in the analog domain. Since the sampling rate is higher than the
Nyquist rate, there is no need for a steep slope of the filter in order to avoid aliasing. However, the pass-band of
the filter within |𝜔| < |𝜔C | has to be flat.
Before decimation, the discrete filter 𝐻LP (e j Ω ) has to remove the spectral contributions that may lead to aliasing.
However, a discrete filter 𝐻LP (e j Ω ) with steep slope can be realized much easier than in the analog domain.
Speech signals have a non-uniform amplitude distribution which is often modeled by the Laplace distribution.
Linear uniform quantization is not optimal for speech signals, since small signal amplitudes are more likely than
higher ones. This motivates a non-linear quantization scheme, where the signal is companded before linear quan-
tization and expanded afterwards.
The following example illustrates the A-law companding used in European telephone networks. The signal was
originally recorded with a wordlength of 𝑤 = 16 bits using linear uniform quantization. First the A-law compan-
sion is applied, then quantization by a linear uniform quantizer with a wordlength of 𝑤 = 8 bits. For a sampling
rate of 𝑓𝑠 = 8 kHz this results in a bit-rate of 64 kbits/s used in the backbone of many telephone networks.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import soundfile as sf
def A_law_compander(x):
A = 87.6
y = np.zeros_like(x)
idx = np.where(np.abs(x) < 1/A)
y[idx] = A*np.abs(x[idx]) / (1 + np.log(A))
idx = np.where(np.abs(x) >= 1/A)
y[idx] = (1 + np.log(A*np.abs(x[idx]))) / (1 + np.log(A))
return np.sign(x)*y
def A_law_expander(y):
A = 87.6
x = np.zeros_like(y)
idx = np.where(np.abs(y) < 1/(1+np.log(A)))
x[idx] = np.abs(y[idx])*(1+np.log(A)) / A
idx = np.where(np.abs(y) >= 1/(1+np.log(A)))
x[idx] = np.exp(np.abs(y[idx])*(1+np.log(A))-1)/A
return np.sign(y)*x
return xQ
Lets first take a look at the non-linear characteristic of the A-law requantizer. The left plot shows the characteristic
of the A-law companding and linear-quantization. The right plot shows the overall characteristic for companding,
linear quantization and expansion.
In [2]: x = np.linspace(-1, 1, 2**16)
y = A_law_compander(x)
yQ4 = uniform_midtread_quantizer(y, 4)
yQ8 = uniform_midtread_quantizer(y, 8)
xQ4 = A_law_expander(yQ4)
xQ8 = A_law_expander(yQ8)
plt.figure(figsize=(10, 4))
plt.subplot(121)
plt.plot(x, yQ4, label=r'$w=4$ bit')
plt.plot(x, yQ8, label=r'$w=8$ bit')
plt.title('Compansion and linear quantization')
plt.xlabel(r'$x$')
plt.ylabel(r'$x_Q$')
plt.legend(loc=2)
plt.axis([-1.1, 1.1, -1.1, 1.1])
plt.grid()
plt.subplot(122)
plt.plot(x, xQ4, label=r'$w=4$ bit')
plt.plot(x, xQ8, label=r'$w=8$ bit')
plt.title('Overall')
plt.xlabel(r'$x$')
plt.ylabel(r'$x_Q$')
plt.legend(loc=2)
plt.axis([-1.1, 1.1, -1.1, 1.1])
plt.grid()
Now the signal-to-noise ratio (SNR) is computed for a Laplace distributed signal for various probabilities
Pr{|𝑥[𝑘]| > 𝑥max } that clipping occurs. The results show that the non-linear quantization scheme provides a
constant SNR over a wide range of signal amplitudes. The SNR is additional higher as for linear quantization of
a Laplace distributed signal.
In [3]: w = 8 # wordlength of the quantized signal
Pc = np.logspace(-20, np.log10(.5), num=500) # probabilities for clipping
N = int(1e6) # number of samples
def compute_SNR(Pc):
# compute input signal
sigma_x = - np.sqrt(2) / np.log(Pc)
x = np.random.laplace(size=N, scale=sigma_x/np.sqrt(2) )
# quantize signal
y = A_law_compander(x)
yQ = uniform_midtread_quantizer(y, 8)
xQ = A_law_expander(yQ)
e = xQ - x
# compute SNR
SNR = 10*np.log10((np.var(x)/np.var(e)))
return SNR
# quantization step
Q = 1/(2**(w-1))
# compute SNR for given probabilities
SNR = [compute_SNR(P) for P in Pc]
# plot results
plt.figure(figsize=(8,4))
plt.semilogx(Pc, SNR)
plt.xlabel('Probability for clipping')
plt.ylabel('SNR in dB')
plt.grid()
Finally we requantize a speech sample with a linear and the A-law quantization scheme. Listen to the samples!
In [5]: # load speech sample
x, fs = sf.read('../data/speech_8k.wav')
x = x/np.max(np.abs(x))
# linear quantization
xQ = uniform_midtread_quantizer(x, 8)
e = evaluate_requantization(x, xQ)
sf.write('speech_8k_8bit.wav', xQ, fs)
sf.write('speech_8k_8bit_error.wav', e, fs)
# A-law quantization
y = A_law_compander(x)
yQ = uniform_midtread_quantizer(y, 8)
xQ = A_law_expander(yQ)
e = evaluate_requantization(x, xQ)
sf.write('speech_Alaw_8k_8bit.wav', xQ, fs)
sf.write('speech_Alaw_8k_8bit_error.wav', e, fs)
SNR: 35.749340 dB
SNR: 38.177564 dB
Original Signal Your browser does not support the audio element. ../data/speech_8k.wav
Linear Requantization to :math:‘w=8‘ bit
Signal Your browser does not support the audio element. speech_8k_8bit.wav
Error Your browser does not support the audio element. speech_8k_8bit_error.wav
A-law Requantization to :math:‘w=8‘ bit
Signal Your browser does not support the audio element. speech_Alaw_8k_8bit.wav
Error Your browser does not support the audio element. speech_Alaw_8k_8bit_error.wav
6.1 Introduction
Computing the output 𝑦[𝑘] = ℋ{𝑥[𝑘]} of a linear time-invariant (LTI) system is of central importance in digital
signal processing. This is often referred to as *filtering* of the input signal 𝑥[𝑘]. The methods for this purpose are
typically classified into
• non-recursive and
• recursive
techniques. This section focuses on the realization of non-recursive filters.
The output signal 𝑦[𝑘] is given by (linear) convolution of the input signal 𝑥[𝑘] with the impulse response ℎ[𝑘]
∞
∑︁
𝑦[𝑘] = 𝑥[𝑘] * ℎ[𝑘] = 𝑥[𝜅] ℎ[𝑘 − 𝜅]
𝜅=−∞
Two aspects of this representation become evident when inspecting above equation:
1. The output signal 𝑦[𝑘] is a linear combination of the input signal 𝑥[𝑘]. There is no feedback of the output
signal of past time-instants. Therefore, such filters are termed as non-recursive filters.
2. In order to compute the output signal at one particular time-instant 𝑘, the input signal needs to be known for
all past and future time-instants.
The second aspect prohibits a practical realization. In order to be able to realize a non-recursive filter by convolu-
tion, the output at time-instant 𝑘 should only depend on the input signal 𝑥[𝑘] up to time-index 𝑘
𝑘
∑︁
𝑦[𝑘] = 𝑥[𝜅] ℎ[𝑘 − 𝜅]
𝜅=−∞
This is the case when the impulse response is causal, hence when ℎ[𝑘] = 0 for 𝑘 < 0. However, this still requires
knowledge of the input signal for all past time-instants. If we further assume that the input signal is causal,
123
Digital Signal Processing, Release 0.0
Many practical systems have an impulse response of finite length 𝑁 or can be approximated by an impulse re-
sponse of finite length
{︃
ℎ[𝑘] for 0 ≤ 𝑘 < 𝑁
ℎ𝑁 [𝑘] =
0 otherwise
Such an impulse response is denoted as *finite impulse response* (FIR). Introducing ℎ𝑁 [𝑘] into above sum and
rearranging terms yields
𝑘
∑︁ 𝑁
∑︁−1
𝑦[𝑘] = 𝑥[𝜅] ℎ𝑁 [𝑘 − 𝜅] = ℎ𝑁 [𝜅] 𝑥[𝑘 − 𝜅]
𝜅=0 𝜅=0
Hence for a causal input signal 𝑥[𝑘] and a FIR the output of the system can be computed by a finite number of
operations.
The evaluation of the convolution for a FIR of length 𝑁 requires 𝑁 multiplications and 𝑁 − 1 additions per time
index 𝑘. For the real-time convolution of an audio signal with a sampling rate of 𝑓S = 48 kHz with a FIR of length
𝑁 = 48000 we have to compute around 2 × 2.3 · 109 numerical operations per second. This is a considerable
numerical complexity, especially on embedded or mobile platforms. Therefore, various techniques have been
developed to lower the computational complexity.
The straightforward convolution of two finite-length signals 𝑥[𝑘] and ℎ[𝑘] is a numerical complex task. This has
led to the development of various techniques with considerably lower complexity. The basic concept of the fast
convolution is to exploit the correspondence between the convolution and the scalar multiplication in the frequency
domain.
The convolution of a causal signal 𝑥𝐿 [𝑘] of length 𝐿 with a causal impulse response ℎ𝑁 [𝑘] of length 𝑁 is given as
𝐿−1
∑︁ 𝑁
∑︁−1
𝑦[𝑘] = 𝑥𝐿 [𝑘] * ℎ𝑁 [𝑘] = 𝑥𝐿 [𝜅] ℎ𝑁 [𝑘 − 𝜅] = ℎ𝑁 [𝜅] 𝑥𝐿 [𝑘 − 𝜅]
𝜅=0 𝜅=0
The resulting signal 𝑦[𝑘] is of finite length 𝑀 = 𝑁 + 𝐿 − 1. The computation of 𝑦[𝑘] for 𝑘 = 0, 1, . . . , 𝑀 − 1
requires 𝑀 · 𝑁 multiplications and 𝑀 · (𝑁 − 1) additions. The computational complexity of the convolution is
consequently in the order of 𝒪(𝑀 · 𝑁 ). Discrete-time Fourier transformation (DTFT) of above relation yields
𝑌 (𝑒𝑗Ω ) = 𝑋𝐿 (𝑒𝑗Ω ) · 𝐻𝑁 (𝑒𝑗Ω )
Discarding the effort of transformation, the computationally complex convolution is replaced by a scalar multipli-
cation with respect to the frequency Ω. However, Ω is a continuous frequency variable which limits the numerical
evaluation of this scalar multiplication. In practice, the DTFT is replaced by the discrete Fourier transformation
(DFT). Two aspects have to be considered before a straightforward application of the DFT
1. The DFTs 𝑋𝐿 [𝜇] and 𝐻𝑁 [𝜇] are of length 𝐿 and 𝑁 respectively and cannot be multiplied straightforward
2. For 𝑁 = 𝐿, the multiplication of the two spectra 𝑋𝐿 [𝜇] and 𝐻𝐿 [𝜇] would result in the periodic/circular
convolution 𝑥𝐿 [𝑘]~ℎ𝐿 [𝑘] due to the periodicity of the DFT. Since we aim at realizing the linear convolution
𝑥𝐿 [𝑘] * ℎ𝑁 [𝑘] with the DFT, special care has to be taken to avoid cyclic effects.
The periodic convolution of the two signals 𝑥𝐿 [𝑘] and ℎ𝑁 [𝑘] is defined as
𝑀
∑︁−1
𝑥𝐿 [𝑘] ~ ℎ𝑁 [𝑘] = ˜𝑀 [𝑘 − 𝜅] ℎ𝑁 [𝜅]
𝑥
𝜅=0
where without loss of generality it is assumed that 𝐿 ≥ 𝑁 and 𝑀 ≥ 𝑁 . The periodic continuation 𝑥
˜𝑀 [𝑘] of 𝑥[𝑘]
with period 𝑀 is given as
∞
∑︁
𝑥
˜𝑀 [𝑘] = 𝑥𝐿 [𝑚 · 𝑀 + 𝑘]
𝑚=−∞
for 𝑘 = 0, 1, . . . , 𝑀 − 1 with 𝑀 = 𝑁 + 𝐿 − 1.
Example
The following example computes the linear, periodic and linear by periodic convolution of two signals 𝑥[𝑘] =
rect𝑁 [𝑘] and ℎ[𝑘].
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
return np.asarray(y)
# generate signals
x = np.ones(L)
h = sig.triang(N)
# linear convolution
y1 = np.convolve(x, h, 'full')
# periodic convolution
y2 = cconv(x, h, M)
# linear convolution via periodic convolution
xp = np.append(x, np.zeros(N-1))
hp = np.append(h, np.zeros(L-1))
y3 = cconv(xp, hp, L+N-1)
# plot results
def plot_signal(x):
plt.figure(figsize = (10, 3))
plt.stem(x)
plt.xlabel(r'$k$')
plt.ylabel(r'$y[k]$')
plt.axis([0, N+L, 0, 1.1*x.max()])
plot_signal(x)
plt.title('Signal $x[k]$')
plot_signal(y1)
plt.title('Linear convolution')
plot_signal(y2)
plt.title('Periodic convolution')
plot_signal(y3)
plt.title('Linear convolution by periodic convolution');
Exercise
• Change the lengths L, N and M within the constraints given above and check how the results for the different
convolutions change
Using the above derived equality of the linear and periodic convolution one can express the linear convolution
𝑦[𝑘] = 𝑥𝐿 [𝑘] * ℎ𝑁 [𝑘] by the DFT as
This operation requires three DFTs of length 𝑀 and 𝑀 complex multiplications. On first sight this does not seem
to be an improvement, since one DFT/IDFT requires 𝑀 2 complex multiplications and 𝑀 · (𝑀 − 1) complex
additions. The overall numerical complexity is hence in the order of 𝒪(𝑀 2 ). The DFT can be realized efficiently
by the fast Fourier transformation (FFT), which lowers the computational complexity to 𝒪(𝑀 log2 𝑀 ). The
resulting algorithm is known as fast convolution due to its computational efficiency.
The fast convolution algorithm is composed of the following steps
1. Zero-padding of the two input signals 𝑥𝐿 [𝑘] and ℎ𝑁 [𝑘] to at least a total length of 𝑀 ≥ 𝑁 + 𝐿 − 1
2. Computation of the DFTs 𝑋[𝜇] and 𝐻[𝜇] using a FFT of length 𝑀
3. Multiplication of the spectra 𝑌 [𝜇] = 𝑋[𝜇] · 𝐻[𝜇]
4. Inverse DFT of 𝑌 [𝜇] using an inverse FFT of length 𝑀
The overall complexity depends on the particular implementation of the FFT. Many FFTs are most efficient for
lengths which are a power of two. It therefore can make sense, in terms of computational complexity, to choose
𝑀 as a power of two instead of the shortest possible length 𝑁 + 𝐿 − 1. For real valued signals 𝑥[𝑘] ∈ R and
ℎ[𝑘] ∈ R the computational complexity can be reduced significantly by using a real valued FFT.
Example
The implementation of the fast convolution algorithm is straightforward. Most implementations of the FFT include
the zero-padding to a given length 𝑀 , e.g in numpy by numpy.fft.fft(x, M). In the following example an
implementation of the fast convolution in Python is shown. The output of the fast convolution is compared to
a straightforward implementation by means of the absolute difference |𝑒[𝑘]|. The observed differences are due to
numerical effects in the convolution and the FFT. The differences can be neglected in most applications.
In [2]: L = 16 # length of signal x[k]
N = 16 # length of signal h[k]
M = N+L-1
# generate signals
x = np.ones(L)
h = sig.triang(N)
# linear convolution
y1 = np.convolve(x, h, 'full')
# fast convolution
y2 = np.fft.ifft(np.fft.fft(x, M)*np.fft.fft(h, M))
plt.figure(figsize=(10, 3))
plt.stem(np.abs(y1-y2))
plt.xlabel(r'k')
plt.ylabel(r'|e[k]|');
Numerical Complexity
It was already argued that the numerical complexity of the fast convolution is considerably lower due to the usage
of the FFT. The gain with respect to the convolution is evaluated in the following. In order to measure the execution
times for both algorithms the timeit module is used. The algorithms are evaluated for the convolution of two
signals 𝑥𝐿 [𝑘] and ℎ𝑁 [𝑘] of length 𝐿 = 𝑁 = 2𝑛 for 𝑛 = 0, 1, . . . , 16.
In [3]: import timeit
gain = np.zeros(len(n))
for N in n:
length = 2**N
# setup environment for timeit
tsetup = 'import numpy as np; from numpy.fft import rfft, irfft; \
x=np.random.randn(%d); h=np.random.randn(%d)' % (length, length)
# direct convolution
tc = timeit.timeit('np.convolve(x, x, "full")', setup=tsetup, number=reps)
# fast convolution
tf = timeit.timeit('irfft(rfft(x, %d) * rfft(h, %d))' % (2*length, 2*length)
# speedup by using the fast convolution
gain[N] = tc/tf
Exercise
• When is the fast convolution more efficient/faster than a direct convolution?
• Why is it slower below a given signal length?
• Is the trend of the gain as expected by the numerical complexity of the FFT?
In many applications one of the signals of a convolution is much longer than the other. For instance when filtering
a speech signal 𝑥𝐿 [𝑘] of length 𝐿 with a room impulse response ℎ𝑁 [𝑘] of length 𝑁 ≪ 𝐿. In such cases the fast
convolution, as introduced before, does not bring a benefit since both signals have to be zero-padded to a total
length of at least 𝑁 + 𝐿 − 1. Applying the fast convolution may then even be impossible in terms of memory
requirements or overall delay. The filtering of a signal which is captured in real-time is also not possible by the
fast convolution.
In order to overcome these limitations, various techniques have been developed that perform the filtering on limited
portions of the signals. These portions are known as partitions, segments or blocks. The respective algorithms
are termed as segmented or block-based algorithms. The following section introduces two techniques for the
segmented convolution of signals. The basic concept of these is to divide the convolution 𝑦[𝑘] = 𝑥𝐿 [𝑘] * ℎ𝑁 [𝑘]
into multiple convolutions operating on (overlapping) segments of the signal 𝑥𝐿 [𝑘].
The overlap-add algorithm is based on splitting the signal 𝑥𝐿 [𝑘] into non-overlapping segments 𝑥𝑝 [𝑘] of length 𝑃
𝐿/𝑃 −1
∑︁
𝑥𝐿 [𝑘] = 𝑥𝑝 [𝑘 − 𝑝 · 𝑃 ]
𝑝=0
Note that 𝑥𝐿 [𝑘] might have to be zero-padded so that its total length is a multiple of the segment length 𝑃 .
Introducing the segmentation of 𝑥𝐿 [𝑘] into the convolution yields
where 𝑦𝑝 [𝑘] = 𝑥𝑝 [𝑘] * ℎ𝑁 [𝑘]. This result states that the convolution of 𝑥𝐿 [𝑘] * ℎ𝑁 [𝑘] can be split into a series of
convolutions 𝑦𝑝 [𝑘] operating on the samples of one segment only. The length of 𝑦𝑝 [𝑘] is 𝑁 + 𝑃 − 1. The result of
the overall convolution is given by summing up the results from the segments shifted by multiples of the segment
length 𝑃 . This can be interpreted as an overlapped superposition of the results from the segments, as illustrated in
the following diagram
The overall procedure is denoted by the name overlap-add technique. The convolutions 𝑦𝑝 [𝑘] = 𝑥𝑝 [𝑘] * ℎ𝑁 [𝑘]
can be realized efficiently by the fast convolution using zero-padding and fast Fourier transformations (FFTs) of
length 𝑀 ≥ 𝑃 + 𝑁 − 1.
A drawback of the overlap-add technique is that the next input segment is required to compute the result for the
actual segment of the output. For real-time applications this introduces an algorithmic delay of one segment.
Example
The following example illustrates the overlap-add algorithm by showing the (convolved) segments and the overall
result.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
# overlap-add convolution
xp = np.zeros((L//P, P))
yp = np.zeros((L//P, N+P-1))
y = np.zeros(L+P-1)
for n in range(L//P):
xp[n, :] = x[n*P:(n+1)*P]
yp[n, :] = np.convolve(xp[n,:], h, mode='full')
y[n*P:(n+1)*P+N-1] += yp[n, :]
y = y[0:N+L]
# plot signals
plt.figure(figsize = (10,2))
plt.subplot(121)
plt.stem(x)
for n in np.arange(L//P)[::2]:
plt.axvspan(n*P, (n+1)*P-1, facecolor='g', alpha=0.5)
plt.title(r'Signal $x[k]$ and segments')
plt.xlabel(r'$k$')
plt.ylabel(r'$x[k]$')
plt.axis([0, L, 0, 1])
plt.subplot(122)
plt.stem(h)
plt.title(r'Impulse response $h[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$h[k]$')
plt.axis([0, L, 0, 1])
for p in np.arange(L//P):
plt.figure(figsize = (10,2))
plt.figure(figsize = (10,2))
plt.stem(y)
plt.title(r'Result $y[k] = x[k] * h[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$y[k]$')
plt.axis([0, L+P, 0, 4]);
Exercises
• Change the length N of the impulse response and the length P of the segments. What changes?
• What influence have these two lengths on the numerical complexity of the overlap-add algorithm?
The overlap-save algorithm, also known as overlap-discard algorithm, follows a different strategy as the overlap-
add technique introduced above. It is based on an overlapping segmentation of the input 𝑥𝐿 [𝑘] and application of
the periodic convolution for the individual segments.
Lets take a closer look at the result of the periodic convolution 𝑥𝑝 [𝑘] ~ ℎ𝑁 [𝑘], where 𝑥𝑝 [𝑘] denotes a segment
of length 𝑃 of the input signal and ℎ𝑁 [𝑘] the impulse response of length 𝑁 . The result of a linear convolution
𝑥𝑝 [𝑘] * ℎ𝑁 [𝑘] would be of length 𝑃 + 𝑁 − 1. The result of the periodic convolution of period 𝑃 for 𝑃 > 𝑁 would
suffer from a circular shift (time aliasing) and superposition of the last 𝑁 − 1 samples to the beginning. Hence,
the first 𝑁 − 1 samples are not equal to the result of the linear convolution. However, the remaining 𝑃 − 𝑁 + 1
do so.
This motivates to split the input signal 𝑥𝐿 [𝑘] into overlapping segments of length 𝑃 where the 𝑝-th segment
overlaps its preceding (𝑝 − 1)-th segment by 𝑁 − 1 samples
{︃
𝑥𝐿 [𝑘 + 𝑝 · (𝑃 − 𝑁 + 1) − (𝑁 − 1)] for 𝑘 = 0, 1, . . . , 𝑃 − 1
𝑥𝑝 [𝑘] =
0 otherwise
The part of the circular convolution 𝑥𝑝 [𝑘] ~ ℎ𝑁 [𝑘] of one segment 𝑥𝑝 [𝑘] with the impulse response ℎ𝑁 [𝑘] that is
equal to the linear convolution of both is given as
{︃
𝑥𝑝 [𝑘] ~ ℎ𝑁 [𝑘] for 𝑘 = 𝑁 − 1, 𝑁, . . . , 𝑃 − 1
𝑦𝑝 [𝑘] =
0 otherwise
Example
The following example illustrates the overlap-save algorithm by showing the results of the periodic convolutions
of the segments. The discarded parts are indicated by the red background.
# overlap-save convolution
nseg = (L+N-1)//(P-N+1) + 1
x = np.concatenate((np.zeros(N-1), x, np.zeros(P)))
xp = np.zeros((nseg, P))
yp = np.zeros((nseg, P))
y = np.zeros(nseg*(P-N+1))
for p in range(nseg):
xp[p, :] = x[p*(P-N+1):p*(P-N+1)+P]
yp[p, :] = np.fft.irfft(np.fft.rfft(xp[p, :]) * np.fft.rfft(h, P))
y[p*(P-N+1):p*(P-N+1)+P-N+1] = yp[p, N-1:]
y = y[0:N+L]
plt.figure(figsize = (10,2))
plt.subplot(121)
plt.stem(x[N-1:])
plt.title(r'Signal $x[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$x[k]$')
plt.axis([0, L, 0, 1])
plt.subplot(122)
plt.stem(h)
plt.title(r'Impulse response $h[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$h[k]$')
plt.axis([0, L, 0, 1])
for p in np.arange(nseg):
plt.figure(figsize = (10,2))
plt.stem(yp[p, :])
plt.axvspan(0, N-1+.5, facecolor='r', alpha=0.5)
plt.title(r'Result of periodic convolution of $x_%d[k]$ and $h_N[k]$' %(p))
plt.xlabel(r'$k$')
plt.axis([0, L+P, 0, 4])
plt.figure(figsize = (10,2))
plt.stem(y)
plt.title(r'Result $y[k] = x[k] * h[k]$')
plt.xlabel(r'$k$')
plt.ylabel(r'$y[k]$')
plt.axis([0, L+P, 0, 4]);
Exercise
• Change the length N of the impulse response and the length P of the segments. What changes?
• How many samples of the output signal 𝑦[𝑘] are computed per segment for a particular choice of these two
values?
• What would be a good choice for the segment length P with respect to the length N of the impulse response?
• For both the overlap-add and overlap-save algorithm the length 𝑃 of the segments influences the lengths of
the convolutions, FFTs and the number of output samples per segment. The segment length is often chosen
as
Numbers and numerical operations are represented with a finite numerical resolution in digital processors. The
same holds for the amplitude values of signals and the algorithmic operations applied to them. Hence, the in-
tended characteristics of an digital filter may deviate in practice due to the finite numerical resolution. The
double-precision floating point representation used in numerical environments like MATLAB or Python/numpy is
assumed to be quasi-continuous. This representation serves therefore as reference for the evaluation of quantiza-
tion effects.
This section investigates the consequences of quantization in non-recursive filters. We first take a look on the quan-
tization of the filter coefficients, followed by the effects caused by a finite numerical resolution on the operations.
The realization of non-recursive filters is subject to both effects.
The output signal 𝑦[𝑘] of a non-recursive filter with a finite impulse response (FIR) ℎ[𝑘] of length 𝑁 is given as
𝑁
∑︁−1
𝑦[𝑘] = ℎ[𝑘] * 𝑥[𝑘] = ℎ[𝜅] 𝑥[𝑘 − 𝜅]
𝜅=0
where 𝑥[𝑘] denotes the input signal. The quantized impulse response ℎ𝑄 [𝑘] (quantized filter coefficients) is yielded
by quantizing the impulse response ℎ[𝑘]
where 𝑒[𝑘] = ℎ𝑄 [𝑘] − ℎ[𝑘] denotes the quantization error. Introducing ℎ𝑄 [𝑘] into above equation and rearranging
results in
𝑁
∑︁−1 𝑁
∑︁−1
𝑦𝑄 [𝑘] = ℎ[𝑘] 𝑥[𝑘 − 𝜅] + 𝑒[𝑘] 𝑥[𝑘 − 𝜅]
𝜅=0 𝜅=0
The input signal 𝑥[𝑘] is filtered by the quantization noise 𝑒[𝑘] and superimposed to the desired output of the filter.
The overall transfer function 𝐻𝑄 (𝑒𝑗Ω ) of the filter with quantized filter coefficients is given as
𝑁
∑︁−1 𝑁
∑︁−1
𝐻𝑄 (𝑒𝑗Ω ) = ℎ[𝑘] 𝑒−𝑗Ω𝑘 + 𝑒[𝑘] 𝑒−𝑗Ω𝑘 = 𝐻(𝑒𝑗Ω ) + 𝐸(𝑒𝑗Ω )
𝑘=0 𝑘=0
Hence, the quantization of filter coefficients results in a linear distortion of the desired frequency response. To
some extend this distortion can be incorporated into the design of the filter. However, the magnitude of the
quantization error |𝐸(𝑒𝑗Ω )| cannot get arbitrarily small for a finite quantization step 𝑄. This limits the achievable
attenuation of a digital filter with quantized coefficients. It is therefore important to normalize the filter coefficients
ℎ[𝑘] before quantization in order to keep the relative power of the quantization noise small.
Example
𝜋
The coefficients of a digital lowpass filter with a cutoff frequency of Ω0 = 2 are quantized in the following
example.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
return xQ
# design lowpass
h = A * sig.firwin(N, .5)
# quantize coefficients
hQ = uniform_midtread_quantizer(h, Q)
plt.figure(figsize=(10, 4))
plt.plot(Om, 20*np.log10(np.abs(H)), label=r'$| H(e^{j \Omega}) |$ in dB (Design
plt.plot(Om, 20*np.log10(np.abs(HQ)), label=r'$| H_Q(e^{j \Omega}) |$ in dB (Qua
plt.title('Magnitude of the transfer function w and w/o quantization of coeffici
plt.xlabel(r'$\Omega$')
plt.axis([0, np.pi, -130, 10])
plt.legend(loc=3)
plt.grid()
Exercise
• Change the wordlength w of the quantized filter coefficients. How does the magnitude response |𝐻𝑄 (𝑒𝑗Ω )|
of the quantized filter change?
• Change the attenuation A of the filter coefficients. What changes?
• Why does the magnitude response of the quantized filter |𝐻𝑄 (𝑒𝑗Ω )| deviate more from the magnitude re-
sponse of the designed filter |𝐻(𝑒𝑗Ω )| in the frequency ranges with high attenuation?
Besides the quantization of filter coefficients ℎ[𝑘], also the quantization of the signals, state variables and op-
erations has to be considered in a practical implementation of filters. The computation of the output sig-
nal 𝑦[𝑘] = ℎ[𝑘] * 𝑥[𝑘] of a non-recursive filter by the convolution involves multiplications and additions.
In digital signal processors numbers are often represented in fixed-point arithmetic using [two’s comple-
ment](https://en.wikipedia.org/wiki/Two’s_complement). When multiplying two numbers with a wordlength of
𝑤-bits in this representation the result would require 2𝑤-bits. Hence the result has to be requantized to 𝑤-bits. The
rounding operation in the quantizer is often realized as truncation of the 𝑤 least significant bits. The resulting quan-
tization error is known as round-off error. The addition of two numbers may fall outside the maximum/minimum
values of the representation and may suffer from clipping. Similar considerations hold also for other number
representations, like e.g. floating point.
As for the quantization noise, a statistical model for the round-off error in multipliers is used to quantify the
average impact of round-off noise in a non-recursive filter.
As outlined above, multipliers require a requantization of the result in order to keep the wordlength constant. The
multiplication of a quantized signal 𝑥𝑄 [𝑘] with a quantized factor 𝑎𝑄 can be written as
1. The round-off error 𝑒[𝑘] is not correlated with the input signal 𝑥𝑄 [𝑘]
2. The round-off error is white
Φ𝑒𝑒 (e j Ω ) = 𝜎𝑒2
3. The probability density function (PDF) of the round-off error is given by the zero-mean uniform distribution
(︂ )︂
1 𝜃
𝑝𝑒 (𝜃) = · rect
𝑄 𝑄
The variance (power) of the round-off error is derived from its PDF as
𝑄2
𝜎𝑒2 =
12
Round-off noise
Using above model of a multiplier and discarding clipping, a straightforward realization of the convolution with
quantized signals would be to requantize after every multiplication
𝑁
∑︁−1 𝑁
∑︁−1
𝑦𝑄 [𝑘] = 𝒬{ℎ𝑄 [𝜅] 𝑥𝑄 [𝑘 − 𝜅]} = ℎ𝑄 [𝜅] 𝑥𝑄 [𝑘 − 𝜅] + 𝑒[𝜅]
𝜅=0 𝜅=0
The round-off errors for each multiplication are uncorrelated to each other. The overall power of the round-off
error is then given as
𝑄2
𝜎𝑒2 = 𝑁 ·
12
Many digital signal processors allow to perform the multiplications and additions in an internal register with
double wordlength. In this case only the result has to be requantized
{︃𝑁 −1 }︃
∑︁
𝑦𝑄 [𝑘] = 𝒬 ℎ𝑄 [𝜅] 𝑥𝑄 [𝑘 − 𝜅]
𝜅=0
Example
The following example simulates the round-off noise of a non-recursive filter when requantization is performed
after each multiplication. Clipping is not considered. The input signal 𝑥[𝑘] is drawn from a uniform distribution
with 𝑎 = −1 and 𝑏 = 1 − 𝑄. Both the input signal and filter coefficients are quantized. The output signal 𝑦[𝑘]
without requantization of the multiplications is computed, as well as the output signal 𝑦𝑄 [𝑘] with requantization.
The statistical properties of the round-off noise 𝑒[𝑘] = 𝑦𝑄 [𝑘] − 𝑦[𝑘] are evaluated.
return xQ
plt.subplot(121)
plt.bar(bins[:-1]/Q, pe*Q, width = 20/len(pe))
plt.title('Estimated PDF of the round-off noise')
plt.xlabel(r'$\theta / Q$')
plt.ylabel(r'$\hat{p}_x(\theta) / Q$')
#plt.axis([-1, 1, 0, 1.2])
plt.subplot(122)
plt.plot(nf*2*np.pi, Pee*6/Q**2/N)
plt.title('Estimated PSD of the round-off noise')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$\hat{\Phi}_{ee}(e^{j \Omega}) / \sigma_e^2$')
plt.axis([0, np.pi, 0, 2])
plt.grid();
Power of overall round-off noise is -86.141757 dB
Exercise
• Change the wordlength w and check if the 𝜎𝑒2 derived by numerical simulation is equal to its theoretic value
derived above?
• Can you explain the shape of the estimated PDF for the round-off noise?
7.1 Introduction
Computing the output 𝑦[𝑘] = ℋ{𝑥[𝑘]} of a linear time-invariant (LTI) system is of central importance in digital
signal processing. This is often referred to as *filtering* of the input signal 𝑥[𝑘]. We already have discussed the
realization of non-recursive filters. This section focuses on the realization of recursive filters.
Linear difference equations with constant coefficients represent linear-time invariant (LTI) systems
𝑁
∑︁ 𝑀
∑︁
𝑎𝑛 𝑦[𝑘 − 𝑛] = 𝑏𝑚 𝑥[𝑘 − 𝑚]
𝑛=0 𝑚=0
where 𝑦[𝑘] = ℋ{𝑥[𝑘]} denotes the response of the system to the input signal 𝑥[𝑘], 𝑁 the order, 𝑎𝑛 and 𝑏𝑚 constant
coefficients, respectively. Above equation can be rearranged with respect to the output signal 𝑦[𝑘] by extracting
the first element (𝑛 = 0) of the left hand sum
(︃ 𝑀 𝑁
)︃
1 ∑︁ ∑︁
𝑦[𝑘] = 𝑏𝑚 𝑥[𝑘 − 𝑚] − 𝑎𝑛 𝑦[𝑘 − 𝑛]
𝑎0 𝑚=0 𝑛=1
It is evident that the output signal 𝑦[𝑘] at time instant 𝑘 is given as a linear combination of past output samples
𝑦[𝑘 − 𝑛] superimposed by a linear combination of the actual 𝑥[𝑘] and past 𝑥[𝑘 − 𝑚] input samples. Hence, the
actual output 𝑦[𝑘] is composed from the two contributions
1. a non-recursive part, and
2. a recursive part where a linear combination of past output samples is fed back.
The impulse response of the system is given as the response of the system to a Dirac impulse at the input ℎ[𝑘] =
ℋ{𝛿[𝑘]}. Using above result and the properties of the discrete Dirac impulse we get
(︃ 𝑁
)︃
1 ∑︁
ℎ[𝑘] = 𝑏𝑘 − 𝑎𝑛 ℎ[𝑘 − 𝑛]
𝑎0 𝑛=1
Due to the feedback, the impulse response will in general be of infinite length. The impulse response is termed as
infinite impulse response (IIR) and the system as recursive system/filter.
Applying a 𝑧-transform to the left and right hand side of the difference equation and rearranging terms yields the
transfer function 𝐻(𝑧) of the system
∑︀𝑀
𝑌 (𝑧) 𝑏𝑚 𝑧 −𝑚
𝐻(𝑧) = = ∑︀𝑚=0𝑁
𝑋(𝑧) 𝑛=0 𝑎𝑛 𝑧
−𝑛
145
Digital Signal Processing, Release 0.0
The transfer function is given as a rational function in 𝑧. The polynominals of the numerator and denominator can
expressed alternatively by their roots as
∏︀𝑃 𝑚𝜇
𝑏𝑀 𝜇=1 (𝑧 − 𝑧0𝜇 )
𝐻(𝑧) = · ∏︀𝑄
𝑎𝑁 𝜈=1 (𝑧 − 𝑧∞𝜈 )
𝑛𝜈
where 𝑧0𝜇 and 𝑧∞𝜈 denote the 𝜇-th zero and 𝜈-th pole of degree 𝑚𝜇 and 𝑛𝜈 of 𝐻(𝑧), respectively. The total
number of zeros and poles is denoted by 𝑃 and 𝑄. Due to the symmetries of the 𝑧-transform, the transfer function
of a real-valued system ℎ[𝑘] ∈ R exhibits complex conjugate symmetry
𝐻(𝑧) = 𝐻 * (𝑧 * )
Poles and zeros are either real valued or conjugate complex pairs for real-valued systems (𝑏𝑚 ∈ R, 𝑎𝑛 ∈ R). For
the poles of a causal and stable system 𝐻(𝑧) the following condition has to hold
Hence all poles have to be located inside the unit circle |𝑧| = 1. Amongst others, this implies that 𝑀 ≤ 𝑁 .
7.1.3 Example
The following example shows the pole/zero diagram, the magnitude and phase response, and impulse response of
a recursive filter with so called Butterworth lowpass characteristic.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.markers import MarkerStyle
from matplotlib.patches import Circle
import scipy.signal as sig
fig = plt.figure(figsize=(5,5))
ax = fig.gca()
plt.hold(True)
plt.hold(False)
# plot pole/zero-diagram
zplane(np.roots(b), np.roots(a))
# plot magnitude response
plt.figure(figsize=(10, 3))
plt.plot(Om, 20 * np.log10(abs(H)))
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$|H(e^{j \Omega})|$ in dB')
plt.grid()
plt.title('Magnitude response')
# plot phase response
plt.figure(figsize=(10, 3))
plt.plot(Om, np.unwrap(np.angle(H)))
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$\varphi (\Omega)$ in rad')
plt.grid()
plt.title('Phase')
# plot impulse response
plt.figure(figsize=(10, 3))
plt.stem(20*np.log10(np.abs(np.squeeze(h))))
plt.xlabel(r'$k$')
plt.ylabel(r'$|h[k]|$ in dB')
plt.grid()
plt.title('Impulse response');
Exercise
• Does the system have an IIR?
• What happens if you increase the order N of the filter?
The output signal 𝑦[𝑘] = ℋ{𝑥[𝑘]} of a recursive linear-time invariant (LTI) system is given by
(︃ 𝑀 𝑁
)︃
1 ∑︁ ∑︁
𝑦[𝑘] = 𝑏𝑚 𝑥[𝑘 − 𝑚] − 𝑎𝑛 𝑦[𝑘 − 𝑛]
𝑎0 𝑚=0 𝑛=1
where 𝑎𝑛 and 𝑏𝑚 denote constant coefficients and 𝑁 the order. Note that systems with 𝑀 > 𝑁 are in general
not stable. The computational realization of above equation requires additions, multiplications, the actual and past
samples of the input signal 𝑥[𝑘], and the past samples of the output signal 𝑦[𝑘]. Technically this can be realized by
• adders
• multipliers, and
• unit delays or storage elements.
These can be arranged in different topologies. A certain class of structures, which is introduced in the following,
is known as direct form structures. Other known forms are for instance cascaded sections, parallel sections, lattice
structures and state-space structures.
For the following it is assumed that 𝑎0 = 1. This can be achieved for instance by dividing the remaining coeffi-
cients by 𝑎0 .
It is now evident that we can realize the recursive filter by a superposition of a non-recursive and a recursive part.
With the elements given above, this results in the following block-diagram
This representation is not canonical since 𝑁 + 𝑀 unit delays are required to realize a system of order 𝑁 . A
benefit of the direct form I is that there is essentially only one summation point which has to be taken care of when
considering quantized variables and overflow. The output signal 𝑦[𝑘] for the direct form I is computed by realizing
above equation.
The block diagram of the direct form I can be interpreted as the cascade of two systems. Denoting the signal in
between both as 𝑤[𝑘] and discarding initial values we get
𝑀
∑︁
𝑤[𝑘] = 𝑏𝑚 𝑥[𝑘 − 𝑚] = ℎ1 [𝑘] * 𝑥[𝑘] (7.1)
𝑚=0
𝑁
∑︁
𝑦[𝑘] = 𝑤[𝑘] + −𝑎𝑛 𝑤[𝑘 − 𝑛] = ℎ2 [𝑘] * 𝑤[𝑘] = ℎ2 [𝑘] * ℎ1 [𝑘] * 𝑥[𝑘] (7.2)
𝑛=1
where ℎ1 [𝑘] = [𝑏0 , 𝑏1 , . . . , 𝑏𝑀 ] denotes the impulse response of the non-recursive part and ℎ2 [𝑘] =
[1, −𝑎1 , . . . , −𝑎𝑁 ] for the recursive part. From the last equality of the second equation and the commutativity
of the convolution it becomes clear that the order of the cascade can be exchanged.
The direct form II is yielded by exchanging the two systems in above block diagram and noticing that there are
two parallel columns of delays which can be combined, since they are redundant. For 𝑁 = 𝑀 it is given as
Other cases with 𝑁 ̸= 𝑀 can be considered for by setting coefficients to zero. This form is a canonical structure
since it only requires 𝑁 unit delays for a recursive filter of order 𝑁 . The output signal 𝑦[𝑘] for the direct form II
The block diagrams above can be interpreted as linear signal flow graphs. The theory of these graphs provides
useful transformations into different forms which preserve the overall transfer function. Of special interest is the
transposition or reversal of a graph which can be achieved by
• exchanging in- and output,
• exchanging signal split and summation points, and
• reversing the directions of the signal flows.
Applying this procedure to the direct form II shown above for 𝑁 = 𝑀 yields the transposed direct form II
Using the signal before the 𝑛-th delay unit as internal state 𝑤𝑛 [𝑘] we can reformulate this into a set of difference
equations for computation of the output signal
{︃
𝑤𝑛+1 [𝑘 − 1] − 𝑎𝑛 𝑦[𝑘] + 𝑏𝑛 𝑥[𝑘] for 𝑛 = 0, 1, . . . , 𝑁 − 1
𝑤𝑛 [𝑘] = (7.5)
−𝑎𝑁 𝑦[𝑘] + 𝑏𝑁 𝑥[𝑘] for 𝑛 = 𝑁
𝑦[𝑘] = 𝑤1 [𝑘 − 1] + 𝑏0 𝑥[𝑘] (7.6)
7.2.4 Example
The following example illustrates the computation of the impulse response ℎ[𝑘] of a 2nd-order recursive system
using the transposed direct form II as realized by scipy.signal.lfilter.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
p = 0.90*np.exp(-1j*np.pi/4)
a = np.poly([p, np.conj(p)]) # denominator coefficients
b = [1, 0, 0] # numerator coefficients
N = 40 # number of samples
The realization of recursive filters with a high order may be subject to numerical issues. For instance, when the
coefficients span a wide amplitude range, their quantization may require a small quantization step or may impose
a large relative error for small coefficients. The basic concept of cascaded structures is to decompose a high order
filter into a cascade of lower order filters, typically first and second order recursive filters.
The rational transfer function 𝐻(𝑧) of a linear time-invariant (LTI) recursive system can be expressed by its zeros
and poles as
∏︀𝑃 𝑚𝜇
𝑏𝑀 𝜇=1 (𝑧 − 𝑧0𝜇 )
𝐻(𝑧) = · ∏︀𝑄
𝑎𝑁 𝜈=1 (𝑧 − 𝑧∞𝜈 )
𝑛𝜈
where 𝑧0𝜇 and 𝑧∞𝜈 denote the 𝜇-th zero and 𝜈-th pole of degree 𝑚𝜇 and 𝑛𝜈 of 𝐻(𝑧), respectively. The total
number of zeros and poles is denoted by 𝑃 and 𝑄.
The poles and zeros of a real-valued filter ℎ[𝑘] ∈ R are either single real valued or conjugate complex pairs. This
motivates to split the transfer function into
• first order filters constructed from a single pole and zero
• second order filters constructed from a pair of conjugated complex poles and zeros
Decomposing the transfer function into these two types by grouping the poles and zeros into single poles/zeros
and conjugate complex pairs of poles/zeros results in
𝑆1 𝑆2 *
∏︁ (𝑧 − 𝑧0𝜂 ) ∏︁ (𝑧 − 𝑧0𝜂 )(𝑧 − 𝑧0𝜂 )
𝐻(𝑧) = 𝐾 · · *
𝜂=1
(𝑧 − 𝑧∞𝜂 ) 𝜂=1 (𝑧 − 𝑧∞𝜂 )(𝑧 − 𝑧∞𝜂 )
where 𝐾 denotes a constant and 𝑆1 + 2𝑆2 = 𝑁 with 𝑁 denoting the order of the system. The cascade of
two systems results in a multiplication of their transfer functions. Above decomposition represents a cascade of
first- and second-order recursive systems. The former can be treated as a special case of second-order recursive
systems. The decomposition is therefore known as decomposition into second-order sections (SOSs) or biquad
filters. Using a cascade of SOSs the transfer function of the recursive system can be rewritten as
𝑆
∏︁ 𝑏0,𝜇 + 𝑏1,𝜇 𝑧 −1 + 𝑏2,𝜇 𝑧 −2
𝐻(𝑧) =
𝜇=1
1 + 𝑎1,𝜇 𝑧 −1 + 𝑎2,𝜇 𝑧 −2
where 𝑆 = ⌈ 𝑁2 ⌉ denotes the total number of SOSs. These results state that any real valued system of order 𝑁 > 2
can be decomposed into SOSs. This has a number of benefits
• quantization effects can be reduced by sensible grouping of poles/zeros, e.g. such that the spanned amplitude
range of the filter coefficients is limited
• A SOS may be extended by a gain factor to further reduce quantization effects by normalization of the
coefficients
• efficient and numerically stable SOSs serve as generic building blocks for higher-order recursive filters
7.3.2 Example
The following example illustrates the decomposition of a higher-order recursive Butterworth lowpass filter into a
cascade of second-order sections.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.markers import MarkerStyle
from matplotlib.patches import Circle
import scipy.signal as sig
ax = plt.gca()
plt.hold(True)
plt.hold(False)
# design filter
b, a = sig.butter(N, 0.2)
# decomposition into SOS
sos = sig.tf2sos(b, a, pairing='nearest')
for n in range(sos.shape[0]):
if not n%3:
plt.figure(figsize=(10, 3.5))
plt.subplot(131+n%3)
zplane(np.roots(sos[n, 0:3]), np.roots(sos[n, 3:6]))
plt.title('Section %d'%n)
plt.tight_layout()
plt.grid();
Coefficients of the recursive part
['1.00', '-5.39', '13.38', '-19.96', '19.62', '-13.14', '5.97', '-1.78', '0.31', '-0.02'
Exercise
• What amplitude range is spanned by the filter coefficients?
• What amplitude range is spanned by the SOS coefficients?
• Change the pole/zero grouping strategy from pairing=’nearest’ to pairing=’keep_odd’. What
changes?
• Increase the order N of the filter. What changes?
The finite numerical resolution of digital number representations has impact on the properties of filters, as already
discussed for non-recursive filters. The quantization of coefficients, state variables, algebraic operations and
signals plays an important role in the design of recursive filters. Compared to non-recursive filters, the impact of
quantization is often more prominent due to the feedback. Severe degradations from the desired characteristics
and instability are potential consequences of a finite word length in practical implementations.
A recursive filter of order 𝑁 ≥ 2 can be decomposed into second-order sections (SOS). Due to the grouping
of poles/zeros to filter coefficients with a limited amplitude range, a realization by cascaded SOS is favorable in
practice. We therefore limit our investigation of quantization effects to SOS. The transfer function of a SOS is
given as
𝑏0 + 𝑏1 𝑧 −1 + 𝑏2 𝑧 −2
𝐻(𝑧) =
1 + 𝑎1 𝑧 −1 + 𝑎2 𝑧 −2
This can be split into a non-recursive part and a recursive part. The quantization effects of non-recursive filters
have already been discussed. We therefore focus here on the recursive part given by the transfer function
1
𝐻(𝑧) =
1 + 𝑎1 𝑧 −1 + 𝑎2 𝑧 −2
This section investigates the consequences of quantization in recursive filters. As for non-recursive filters, we first
take a look at the quantization of filter coefficients. The structure used for the realization of the filter has impact on
the quantization effects. We begin with the direct form followed by the coupled form, as example for an alternative
structure.
Above transfer function of the recursive part of a SOS can be rewritten in terms of its complex conjugate poles
*
𝑧∞ and 𝑧∞ as
1 𝑧 −2
𝐻(𝑧) = =
*
(𝑧 − 𝑧∞ )(𝑧 − 𝑧∞ ) 1 −2𝑟 cos(𝜙) 𝑧 −1 + ⏟𝑟⏞2 𝑧 −2
⏟ ⏞
𝑎1 𝑎2
where 𝑟 = |𝑧∞ | and 𝜙 = arg{𝑧∞ } denote the absolute value and phase of the pole 𝑧∞ , respectively. Let’s assume
a linear uniform quantization of the coefficients 𝑎1 and 𝑎2 with quantization step 𝑄. Discarding clipping, the
following relations for the locations of the poles can be found
√︀
𝑟𝑛 = 𝑛 · 𝑄 (7.7)
(︃√︂ )︃
𝑚2 𝑄
𝜙𝑛𝑚 = arccos (7.8)
4𝑛
for 𝑛 ∈ N0 and 𝑚 ∈ Z. Quantization of the filter coefficients 𝑎1 and 𝑎2 into a finite number of amplitude values
leads to a finite number of pole locations. In the 𝑧-plane the possible pole locations are given by the intersections
of
√
• circles whose radii 𝑟𝑛 are given by 𝑟𝑛 = 𝑛 · 𝑄 with
• equidistant vertical lines which intersect the horizontal axis at 21 𝑚 · 𝑄.
The finite number of pole locations may lead to deviations from a desired filter characteristic since a desired pole
location is moved to the next possible pole location. The filter may even get unstable, when poles are moved
outside the unit circle. For illustration, the resulting pole locations for a SOS realized in direct form are computed
and plotted.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.patches import Circle
import scipy.signal as sig
import itertools
def compute_pole_locations(Q):
a1 = np.arange(-2, 2+Q, Q)
a2 = np.arange(0, 1+Q, Q)
return p
Exercise
• What consequences has the distribution of pole locations on the desired characteristics of a filter for e.g.
low/high frequencies?
Besides the quantization step 𝑄, the pole distribution depends also on the topology of the filter. In order to gain a
different distribution of pole locations after quantization, one has to derive structures where the coefficients of the
multipliers are given by other values than the direct form coefficients 𝑎1 and 𝑎2 .
One of these alternative structures is the coupled form (also known as Gold & Rader structure)
where ℜ{𝑧∞ } = 𝑟 · cos 𝜙 and ℑ{𝑧∞ } = 𝑟 · sin 𝜙 denote the real- and imaginary part of the complex pole 𝑧∞ ,
respectively. Analysis of the structure reveals its difference equation as
𝑤[𝑘] = 𝑥[𝑘] + ℜ{𝑧∞ } 𝑤[𝑘 − 1] − ℑ{𝑧∞ } 𝑦[𝑘 − 1] (7.9)
𝑦[𝑘] = ℑ{𝑧∞ } 𝑤[𝑘 − 1] + ℜ{𝑧∞ } 𝑦[𝑘 − 1] (7.10)
Note that the numerator of the transfer function differs from the recursive only SOS given above. However, this
can be considered in the design of the transfer function of a general SOS.
The real- and imaginary part of the pole 𝑧∞ occur directly as coefficients for the multipliers in the coupled form.
Quantization of these coefficients results therefore in a Cartesian grid of possible pole locations in the 𝑧-plane.
This is illustrated in the following.
In [2]: def compute_pole_locations(w):
Q = 1/(2**(w-1)) # quantization stepsize
a1 = np.arange(-1, 1+Q, Q)
a2 = np.arange(-1, 1+Q, Q)
return p
def plot_pole_locations(p):
ax = plt.gca()
Excercise
• What is the benefit of this representation in comparison to the direct from discussed in the previous section?
7.4.3 Example
The following example illustrates the effects of coefficient quantization for a recursive Butterworth filter realized
in cascaded SOSs in transposed direct form II.
In [3]: w = 12 # wordlength of filter coefficients
N = 5 # order of filter
return xQ
Exercise
• Decrease the word length w of the filter. What happens? At what word length does the filter become
unstable?
• Increase the order N of the filter for a fixed word length w. What happens?
As for non-recursive filters, the practical realization of recursive filters may suffer from the quantization of vari-
ables and algebraic operations. The effects of coefficient quantization were already discussed. This section takes
a look at the quantization of variables. We limit the investigations to the recursive part of a second-order section
(SOS), since any recursive filter of order 𝑁 ≥ 2 can be decomposed into SOSs.
The computation of the output signal 𝑦[𝑘] = ℋ{𝑥[𝑘]} by a difference equation involves a number of multipli-
cations and additions. As discussed already for non-recursive filters, multiplying two numbers in a binary repre-
sentation (e.g. [two’s complement](https://en.wikipedia.org/wiki/Two’s_complement) or floating point) requires
requantization of the result to keep the word length constant. The addition of two numbers may fall outside the
maximum/minimum values of the representation and may suffer from clipping.
The resulting round-off and clipping errors depend on the number and sequence of algebraic operations. These
depend on the structure used for implementation of the SOSs. For ease of illustration we limit our discussion to
the direct form I and II. Similar insights can be achieved in a similar manner for other structures.
Round-off errors are a consequence of reducing the word length after a multiplication. In order to investigate the
influence of these errors on a recursive filter, the statistical model for round-off errors in multipliers as introduced
for non-recursive filters is used. We furthermore neglect clipping.
The difference equation for the recursive part of a SOS realized in direct form I or II is given as
where 𝑎0 = 1, 𝑎1 and 𝑎2 denote the coefficients of the recursive part. Introducing the requantization after the
multipliers into the difference equation yields the output signal 𝑦𝑄 [𝑘]
where 𝒬{·} denotes the requantizer. Requantization is a non-linear process which results in a requantization error.
If the value to be requantized is much larger that the quantization step 𝑄, the average statistical properties of this
error can be modeled as additive uncorrelated white noise. Introducing the error into above difference equation
gives
where the two white noise sources 𝑒1 [𝑘] and 𝑒2 [𝑘] are assumed to be uncorrelated to each other. This difference
equation can be split into a set of two difference equations
The first difference equation computes the desired output signal 𝑦[𝑘] as a result of the input signal 𝑥[𝑘]. The
second one the additive error 𝑒[𝑘] due to requantization as a result of the requantization error −(𝑒1 [𝑘] + 𝑒2 [𝑘])
injected into the recursive filter. The power spectral density (PSD) Φ𝑒𝑒 (e j Ω ) of the error 𝑒[𝑘] is then given as
𝑄2
According to the model for the requantization errors, their PSDs are given as Φ𝑒1 𝑒1 (e j Ω ) = Φ𝑒2 𝑒2 (e j Ω ) = 12 .
Introducing this together with the transfer function of the SOS yields
⃒2
⃒ 𝑄2
⃒
jΩ
⃒ 1 ⃒ ·
Φ𝑒𝑒 (e )=⃒
⃒
1 + 𝑎1 e −j Ω + 𝑎2 e −j 2 Ω ⃒ 6
Example
The following example evaluates the error 𝑒[𝑘] = 𝑦𝑄 [𝑘] − 𝑦[𝑘] for a SOS which only consists of a recursive part.
The desired system response 𝑦[𝑘] is computed numerically by floating point operations with double precision,
𝑦𝑄 [𝑘] is computed by applying a uniform midtread quantizer after the multiplications. The system is excited by
(︁ 2 )︁
𝜎
uniformly distributed white noise. Besides the PSD Φ𝑒𝑒 (e j Ω ), the signal-to-noise ratio (SNR) 10 · log10 𝜎𝑦2 in
𝑒
dB of the filter is evaluated.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
def uniform_midtread_quantizer(x):
return xQ
def no_quantizer(x):
return x
return y[0:-2]
# cofficients of SOS
p = 0.90*np.array([np.exp(1j*np.pi/3), np.exp(-1j*np.pi/3)])
a = np.poly(p)
# quantization step
Q = 1/(2**(w-1))
# plot results
plt.figure(figsize=(10,4))
plt.plot(Om, Pxx/Q**2 * 12, 'b', label=r'$|\hat{\Phi}_{ee}(e^{j \Omega})|$')
plt.plot(w, np.abs(H)**2 * 2, 'g', label=r'$|H(e^{j \Omega})|^2$')
plt.title('Estimated PSD and transfer function of requantization noise')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$Q^2/12$')
plt.axis([0, np.pi, 0, 100])
plt.legend()
plt.grid();
SNR due to requantization: 44.778398 dB
Besides the requantization noise, recursive filters may be subject to periodic oscillations present at the output.
These undesired oscillations are termed limit cycles. Small limit cycles emerge from the additive round-off noise
due to requantization after a multiplication. The feedback in a recursive filter leads to a feedback of the requan-
tization noise. This may lead to a periodic output signal with an amplitude range of some quantization steps 𝑄,
even after the input signal is zero. The presence, amplitude and frequency of small limit cycles depends on the
location of poles and the structure of the filter. A detailed treatment of this phenomenon is beyond the scope of
this notebook and can be found in the literature.
Example
The following example illustrates small limit cycles for the system investigated in the previous example. The input
signal is uniformly distributed white noise till time-index 𝑘 = 256 and zero for the remainder.
In [2]: # compute input signal
x = np.random.uniform(low=-1, high=1, size=256)
x = np.concatenate((x, np.zeros(1024)))
# compute output signal
yQ = sos_df1(x, a, requantize=uniform_midtread_quantizer)
# plot results
plt.figure(figsize=(10, 3))
plt.plot(20*np.log10(np.abs(yQ)))
plt.title('Level of output signal')
plt.xlabel(r'$k$')
plt.ylabel(r'$|y_Q[k]|$ in dB')
plt.grid()
plt.figure(figsize=(10, 3))
k = np.arange(1000, 1050)
plt.stem(k, yQ[k]/Q)
plt.title('Output signal for zero input')
plt.xlabel(r'$k$')
plt.ylabel(r'$y_Q[k] / Q$ ')
plt.axis([k[0], k[-1], -3, 3])
plt.grid();
/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/
Exercise
• Estimate the period of the small limit cycles. How is it related to the poles of the system?
• What amplitude range is spanned?
Large limit cycles are periodic oscillations of a recursive filter due to overflows in the multiplications/additions.
As for small limit cycles, large limit cycles may be present even after the input signal is zero. Their level is
typically in the range of the minimum/maximum value of the requantizer. Large limit cycles should therefore be
avoided in a practical implementation. The presence of large limit cycles depends on the scaling of input signal
and coefficients, as well as the strategy used to cope for clipping. Amongst others, they can be avoided by proper
scaling of the coefficients to prevent overflow. Again, a detailed treatment of this phenomenon is beyond the scope
of this notebook and can be found in the literature.
Example
The following example illustrates large limit cycles for the system investigated in the first example. In order to
trigger large limit cycles, the coefficients of the filter have been doubled. The input signal is uniformly distributed
white noise till time-index 𝑘 = 256 and zero for the remainder.
In [3]: def uniform_midtread_quantizer(x, xmin=1):
# limiter
x = np.copy(x)
if x <= -xmin:
x = -1
if x > xmin - Q:
x = 1 - Q
# linear uniform quantization
xQ = Q * np.floor(x/Q + 1/2)
return xQ
# plot results
plt.figure(figsize=(10, 3))
plt.plot(20*np.log10(np.abs(yQ)))
plt.title('Level of output signal')
plt.xlabel(r'$k$')
plt.ylabel(r'$|y_Q[k]|$ in dB')
plt.grid()
plt.figure(figsize=(10, 3))
k = np.arange(1000, 1050)
plt.stem(k, yQ[k])
plt.title('Output signal for zero input')
plt.xlabel(r'$k$')
plt.ylabel(r'$y_Q[k]$ ')
#plt.axis([k[0], k[-1], -1.1, 1.1])
plt.grid();
Exercise
• Determine the period of the large limit cycles. How is it related to the poles of the system?
The design of non-recursive filters with a finite impulse response (FIR) is a frequent task in practical applications.
The designed filter should approximate a desired frequency response as close as possible. First, the design of
causal filters is considered. For many applications the resulting filter should have a linear phase characteristic
since this results in a constant (and thus frequency independent) group delay. We therefore specialize the design
to causal linear-phase filters.
Let’s assume that the desired frequency characteristics of the filter are given by the frequency response (i.e. the
DTFT spectrum) 𝐻d (e j Ω ). The corresponding impulse response is computed by its inverse discrete-time Fourier
transform (IDTFT)
∫︁𝜋
1
ℎd [𝑘] = 𝐻d (e j Ω ) e j Ω 𝑘 dΩ (8.1)
2𝜋
−𝜋
In the general case, ℎd [𝑘] will not be a causal FIR. The Paley-Wiener theorem states, that a causal system 𝐻d (e j Ω )
can only have zeros at single frequencies. This is not the case for idealized filters, like e.g. the ideal low-pass
filter. The basic idea of the window method is to truncate the impulse response ℎd [𝑘] in order to derive a causal
FIR filter. This can be achieved by applying a window 𝑤[𝑘] of finite length 𝑁 to ℎd [𝑘]
where ℎ[𝑘] denotes the impulse response of the designed filter. Its frequency response 𝐻(e j Ω ) is given by the
multiplication<->convolution theorem of the discrete-time Fourier transform (DTFT)
1
𝐻(e j Ω ) = 𝐻d (e j Ω ) ~ 𝑊 (e j Ω ) (8.3)
2𝜋
where 𝑊 (e j Ω ) denotes the DTFT of the window function 𝑤[𝑘]. The frequency response 𝐻(e j Ω ) of the filter is
given as the periodic convolution of the desired frequency response 𝐻d (e j Ω ) and the frequency response of the
window function 𝑊 (e j Ω ). The frequency response 𝐻(e j Ω ) is equal to the desired frequency response 𝐻d (e j Ω )
only if 𝑊 (e j Ω ) = 2𝜋 · 𝛿(Ω). This would require that 𝑤[𝑘] = 1 for 𝑘 = −∞, . . . , ∞. Hence for a window 𝑤[𝑘]
of finite length deviations from the desired frequency response are to be expected.
In order to investigate the effect of truncation on the frequency response 𝐻(e j Ω ), a particular window is con-
sidered. A straightforward choice is the rectangular window 𝑤[𝑘] = rect𝑁 [𝑘] of length 𝑁 . Its DTFT is given
as
𝑁 −1 sin( 𝑁2Ω )
𝑊 (e j Ω ) = e−j Ω 2 · (8.4)
sin( Ω2 )
The frequency-domain properties of the rectangular window have already been discussed for the leakage effect.
The rectangular window features a narrow main lobe at the cost of relative high sidelobe level. The main lobe
169
Digital Signal Processing, Release 0.0
gets narrower with increasing length 𝑁 . The convolution of the desired frequency response with the frequency
response of the window function effectively results in smoothing and ringing. While the main lobe will smooth
discontinuities of the desired transfer function, the sidelobes result in undesirable ringing effects. The latter can be
alleviated by using other window functions. Note that typical window functions decay towards their ends and are
symmetric with respect to their center. This may cause problems for desired impulse responses with large values
towards their ends.
The design of an ideal low-pass filter using the window technique is illustrated in the following example. For
|Ω| < 𝜋 the transfer function of the ideal low-pass is given as
{︃
1 for |Ω| ≤ Ωc
𝐻d (e j Ω ) = (8.5)
0 otherwise
where Ωc denotes the corner frequency of the low-pass. An inverse DTFT of the desired transfer function yields
Ωc
ℎd [𝑘] = · si[Ωc 𝑘] (8.6)
𝜋
The impulse response ℎd [𝑘] is not causal nor FIR. In order to derive a causal and FIR approximation a rectangular
window 𝑤[𝑘] of length 𝑁 is applied
ℎ[𝑘] = ℎd [𝑘] · rect𝑁 [𝑘] (8.7)
The resulting magnitude and phase response is computed numerically in the following.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
N = 32 # length of filter
Omc = np.pi/2
# frequency response
Om, H = sig.freqz(h)
Exercises
• Does the resulting filter have the desired phase?
• Increase the length N of the filter. What changes?
Above results show that an ideal-low pass cannot be realized very well with the window method. The reason is
that an ideal-low pass has zero-phase, as most of the idealized filters.
Lets assume a general zero-phase filter with transfer function 𝐻𝑑 (e j Ω ) = 𝐴(e j Ω ) with amplitude 𝐴(e j Ω ) ∈ R.
Its impulse response ℎ𝑑 [𝑘] = ℱ*−1 {𝐻𝑑 (𝑒𝑗Ω } is conjugate complex symmetric
due to the symmetry relations of the DTFT. Hence, a transfer function with zero-phase cannot be realized by a
causal non-recursive filter. This observation motivates to replace the zero-phase by a linear-phase in such situa-
tions. This is illustrated in the following.
The design of non-recursive FIR filters with a linear phase is often desired due to their constant group delay. Let’s
assume a system with generalized linear phase. For |Ω| < 𝜋 its transfer function is given as
where 𝐴(e j Ω ) ∈ R denotes the amplitude of the filter, 𝛼 its linear phase and 𝛽 a constant phase offset. Such a
system can be decomposed into two cascaded systems: a zero-phase system with transfer function 𝐴(e j Ω ) and an
all-pass with phase 𝜙(Ω) = −𝛼Ω + 𝛽. The linear phase term −𝛼Ω results in the desired constant group delay
𝑡𝑔 (Ω) = 𝛼.
The impulse response ℎ[𝑘] of a linear-phase system shows a specific symmetry which can be deduced from the
symmetry relations of the DTFT for odd/even symmetry of 𝐻d (e j Ω ) as
for 𝑘 = 0, 1, . . . , 𝑁 − 1 where 𝑁 ∈ N denotes the length of the (finite) impulse response. The transfer function
of a linear phase filter is given by its DTFT
𝑁
∑︁−1
𝐻d (e j Ω ) = ℎ[𝑘] e −j Ω 𝑘 (8.11)
𝑘=0
Introducing the symmetry relations of the impulse response ℎ[𝑘] into the DTFT and comparing the result with
above definition of a generalized linear phase system reveals four different types of linear-phase systems. These
can be discriminated with respect to their phase and magnitude characteristics
Type Length Impulse Constant Group Constant Transfer Function 𝐴(e j Ω )
𝑁 Response Delay 𝛼 in Samples Phase 𝛽
ℎ[𝑘]
𝑁 −1
1 odd ℎ[𝑘] = 𝛼= 2 ∈N 𝛽 = {0, 𝜋} 𝐴(e j Ω ) = 𝐴(e− j Ω ), all filter
ℎ[𝑁 − 1 − 𝑘] characteristics
𝑁 −1
2 even ℎ[𝑘] = 𝛼= 2 ∈
/N 𝛽 = {0, 𝜋} 𝐴(e j Ω ) = 𝐴(e− j Ω ), 𝐴(e j 𝜋 ) = 0,
ℎ[𝑁 − 1 − 𝑘] only lowpass or bandpass
𝑁 −1
3 odd ℎ[𝑘] = 𝛼= 2 ∈N 𝛽= 𝐴(e j Ω ) = −𝐴(e− j Ω ),
−ℎ[𝑁 − 1 − 𝑘] { 𝜋2 , 3𝜋
2 } 𝐴(e j 0 ) = 𝐴(e j 𝜋 ) = 0, only
bandpass
𝑁 −1
4 even ℎ[𝑘] = 𝛼= 2 ∈
/N 𝛽= 𝐴(e j Ω ) = −𝐴(e− j Ω ),
−ℎ[𝑁 − 1 − 𝑘] { 𝜋2 , 3𝜋
2 } 𝐴(e j 0 ) = 0, only highpass or
bandpass
These relations have to be considered in the design of a causal linear phase filter. Depending on the desired
frequency characteristics 𝐴(e j Ω ) the suitable type is chosen. The odd/even length 𝑁 of the filter and the phase
(or group delay) is chosen accordingly for the design of the filter. It is also abovious that a filter with zero-phase
𝛼 = 0, e.g. an ideal low-pass, would result in 𝑁 = 1.
We aim at the design of a causal linear-phase low-pass using the window technique. According to the previous
example, the desired frequency response has an even symmetry 𝐴(e j Ω ) = 𝐴(e −j Ω ) with 𝐴(ej 0 ) = 1. This could
be realized by a filter of type 1 or 2. We choose type 1, since the resulting filter then exhibits an integer group
delay of 𝑡𝑔 (Ω) = 𝑁 2−1 samples. Consequently the length of the filter 𝑁 has to be odd.
The impulse response ℎd [𝑘] is given by the inverse DTFT of 𝐻d (e j Ω ) as
[︂ (︂ )︂]︂
Ωc 𝑁 −1
ℎd [𝑘] = · si Ωc 𝑘 − (8.12)
𝜋 2
The impulse response fulfills the desired symmetry for 𝑘 = 0, 1, . . . , 𝑁 − 1. A causal and FIR approximation is
obtained by applying a window function of length 𝑁 to the impulse response ℎd [𝑘]
Note that the window function 𝑤[𝑘] also has to fulfill the desired symmetries.
As already outlined, the chosen window determines the properties of the transfer function 𝐻(e j Ω ). The spectral
properties of commonly applied windows have been discussed previously. The width of the main lobe will gen-
erally influence the smoothing of the desired transfer function 𝐻d (e j Ω ), while the sidelobes influence the typical
ringing artifacts. This is illustrated in the following.
In [2]: N = 33 # length of filter
Omc = np.pi/2
# frequency responses
Om, H1 = sig.freqz(h1)
Om, H2 = sig.freqz(h2)
plt.title('Phase')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$\varphi (\Omega)$ in rad')
plt.legend(loc=3)
plt.grid()
Exercises
• Does the impulse response fulfill the required symmetries for a type 1 filter?
• Can you explain the differences between the magnitude responses |𝐻(e j Ω )| for the different window func-
tions?
• What happens if you increase the length N of the filter?
For some applications, the desired frequency response is not given at all frequencies but rather at a number of
discrete frequencies. For this case, the frequency sampling method provides a solution for the design of non-
recursive filters.
Let’s assume that the desired transfer function 𝐻d (e j Ω ) is specified at a set of 𝑁 equally spaced frequencies
Ω𝜇 = 2𝜋𝑁 𝜇
2𝜋
𝐻d [𝜇] = 𝐻d (e j 𝑁 𝜇 ) (8.14)
for 𝜇 = 0, 1, . . . , 𝑁 − 1. The coefficients of a non-recursive filter with finite impulse response (FIR) can then be
computed by inverse discrete Fourier transformation (DFT) of 𝐻d [𝜇]
𝑁 −1
1 ∑︁ 2𝜋
ℎ[𝑘] = DFT−1
𝑁 {𝐻d [𝜇]} = 𝐻d [𝜇] e j 𝑁 𝜇 𝑘 (8.15)
𝑁 𝜇=0
for 𝑘 = 0, 1, . . . , 𝑁 − 1.
In order to investigate the properties of the designed filter, its transfer function 𝐻(e j Ω ) is computed. It is given
by discrete-time Fourier transformation (DTFT) of its impulse response ℎ[𝑘]
𝑁 −1 𝑁 −1 𝑁 −1
∑︁ ∑︁ 1 ∑︁ −j 𝑘 (Ω− 2𝜋 𝜇)
𝐻(e j Ω ) = ℎ[𝑘] e −j Ω 𝑘 = 𝐻d [𝜇] · e 𝑁 (8.16)
𝜇=0
𝑁
𝑘=0 𝑘=0
When comparing this result with the interpolation of a DFT, it can be concluded that 𝐻(e j Ω ) is yielded by
interpolation of the desired transfer function 𝐻d [𝜇]
𝑁 −1 (Ω− 2𝜋 𝜇)(𝑁 −1)
∑︁ 𝑁 2𝜋
𝐻(e j Ω ) = 𝐻d [𝜇] · e− j 2 · psinc𝑁 (Ω − 𝜇) (8.17)
𝜇=0
𝑁
where psinc𝑁 (·) denotes the 𝑁 -th order periodic sinc function.
Both the transfer function of the filter 𝐻(e j Ω ) and the desired transfer function 𝐻d [𝜇] are equal at the specified
frequencies Ω𝜇 = 2𝜋 𝑁 𝜇. Values in between adjacent Ω𝜇 are interpolated by the periodic sinc function. This is
illustrated in the following.
The design of an ideal low-pass filter using the frequency sampling method is considered. For |Ω| < 𝜋 the transfer
function of the ideal low-pass is given as
{︃
jΩ 1 for |Ω| ≤ Ωc
𝐻d (e ) = (8.18)
0 otherwise
where Ωc denotes its corner frequency. The desired transfer function 𝐻d [𝜇] for the frequency sampling method
is derived by sampling 𝐻d (e j Ω ). Note that for sampling on the unit circle with 0 ≤ Ω < 2𝜋, the periodicity
𝐻d (e j Ω ) = 𝐻d (e j(Ω+𝑛2𝜋) ) for 𝑛 ∈ Z has to be considered.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
8.2. Design of Non-Recursive Filters using the Frequency Sampling Method 175
Digital Signal Processing, Release 0.0
N = 32 # length of filter
Omc = np.pi/3 # corner frequency of low-pass
Exercises
• What phase behavior does the designed filter have?
• Increase the length N of the filter. Does the attenuation in the stop-band improve?
The reason for the poor perfomance of the designed filter is the zero-phase of the desired transfer function which
cannot be realized by a causal non-recursive system. This was already discussed for the window method. In
comparison to the window method, the freqeuncy sampling method suffers from addtional time-domain aliasing
due to the periodicy of the DFT. Again a linear-phase design is better in such situations.
The design of non-recursive FIR filters with a linear phase is often desired due to their constant group delay. As
for the window method, the design of a digital filter with a generalized linear phase is considered in the following.
For |Ω| < 𝜋 its transfer function is given as
where 𝐴(e j Ω ) ∈ R denotes the amplitude of the filter, −𝛼 Ω its linear phase and 𝛽 a constant phase offset. The
impulse response ℎ[𝑘] of a linear-phase filter shows specific symmetries which have already been discussed for
the design of linear-phase filters using the window method. For the resulting four types of linear-phase FIR filters,
the properties of 𝐴(e j Ω ) and the values of 𝛼 and 𝛽 have to be chosen accordingly for the formulation of 𝐻d (e j Ω )
and 𝐻𝑑 [𝜇], respectively. This is illustrated in the following for the design of a low-pass filter.
We aim at the approximation of an ideal low-pass as a linear-phase non-recursive FIR filter. For the sake of
comparison with a similar example for the window method, we choose a type 1 filter with odd filter length 𝑁 ,
8.2. Design of Non-Recursive Filters using the Frequency Sampling Method 177
Digital Signal Processing, Release 0.0
𝑁 −1
𝛼= 2 and 𝛽 = 0. The desired frequency response 𝐻d [𝜇] is given by sampling
{︃
jΩ −j 𝑁 2−1 Ω 1 for |Ω| ≤ Ωc
𝐻d (e ) = e · (8.20)
0 otherwise
which is defined for |Ω| < 𝜋. Note that for sampling on the unit circle with 0 ≤ Ω < 2𝜋, the periodicity
𝐻d (e j Ω ) = 𝐻d (e j(Ω+𝑛2𝜋) ) for 𝑛 ∈ Z has to be considered.
In [2]: N = 33 # length of filter
Omc = np.pi/3 # corner frequency of low-pass
Exercises
• Does the designed filter have the desired linear phase?
• Increase the length N of the filter. What is different to the previous example?
• How could the method be modified to change the properties of the frequency response?
For a comparison of the frequency sampling to the window method it is assumed that the desired frequency
response 𝐻d (e j Ω ) is given. For both methods, the coefficients ℎ[𝑘] of an FIR approximation are computed as
follows
8.2. Design of Non-Recursive Filters using the Frequency Sampling Method 179
Digital Signal Processing, Release 0.0
1. Window Method
For finite lengths 𝑁 , the difference between both methods is related to the periodicity of the DFT. For a desired
frequency response 𝐻d (e j Ω ) which does not result in a FIR ℎd [𝑘] of length 𝑁 , the inverse DFT in the frequency
sampling method will suffer from time-domain aliasing. In the general case, filter coefficients computed by the
window and frequency sampling method will hence differ.
However, for a rectangular window 𝑤[𝑘] and 𝑁 → ∞ both methods will become equivalent. This reasoning
motivates an oversampled frequency sampling method, where 𝐻d (e j Ω ) is sampled at 𝑀 ≫ 𝑁 points in order to
derive an approximation of ℎd [𝑘] which is then windowed to the target length 𝑁 . The method is beneficial in cases
where a closed-form inverse DTFT of 𝐻d (e j Ω ), as required for the window method, cannot be found.
We consider the design of a linear-phase approximation of an ideal low-pass filter using the oversampled frequency
sampling method. For the sake of comparison, the parameters have been chosen in accordance to a similar example
using the window method. Using 𝐻d (e j Ω ) from the previous example in this section, the filter is computed by
1. (Over)-Sampling the desired response at 𝑀 frequencies
2𝜋
𝐻d [𝜇] = 𝐻d (e j 𝑀 𝜇 ) (8.25)
plt.xlabel(r'$k$')
plt.ylabel(r'$h[k]$')
# plot frequency response
plt.figure(figsize = (10,3))
plt.plot(Om, 20 * np.log10(abs(H)), label='rectangular window')
plt.plot([0, Omc, Omc], [0, 0, -100], 'r--')
plt.title('Magnitude response of designed filter')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$|H(e^{j \Omega})|$ in dB')
plt.axis([0, np.pi, -100, 3])
plt.legend()
plt.grid()
# plot phase
plt.figure(figsize = (10,3))
plt.plot(Om, np.unwrap(np.angle(H)))
plt.title('Phase of designed filter')
plt.xlabel(r'$\Omega$')
plt.ylabel(r'$\varphi(\Omega)$')
plt.grid()
8.2. Design of Non-Recursive Filters using the Frequency Sampling Method 181
Digital Signal Processing, Release 0.0
Exercises
• Compare the designed filter and its properties to the same design using the window method
• Change the number of samples M used for sampling the desired response. What changes if you in-
crease/decrease M?
Various techniques have been developed to derive digital realizations of analog systems. For instance the impulse
invariance method, the matched Z-transform and the bilinear transform. The following section introduces the
bilinear transform and its application to the digital realization of analog systems and recursive filter design.
The bilinear transform is used to map the transfer function 𝐻(𝑠) = ℒ{ℎ(𝑡)} of a continous system to the transfer
function 𝐻(𝑧) = 𝒵{ℎ[𝑘]} of a discrete system approximating the continuous system. The transform is designed
such that if 𝐻(𝑠) is a rational function in 𝑠 then 𝐻(𝑧) is a rational function in 𝑧 −1 . The coefficients of the powers
of 𝑧 −1 are then the coefficients of the digital system.
Assuming ideal sampling with sampling interval 𝑇 , the frequency variable 𝑠 of the Laplace transformation is
linked to the frequency variable 𝑧 of the 𝑧-transform by
𝑧 = e 𝑠𝑇 (8.28)
For sampled signals the resulting mapping from the 𝑠-plane into the 𝑧-plane is shown in the following illustration:
The shading indicates how the different areas are mapped. The imaginary axis 𝑠 = j 𝜔 is mapped onto the unit
circle 𝑧 = e j Ω , representing the frequency response of the continuous and discrete system. The left half-plane of
the 𝑠-plane is mapped into the unit circle of the 𝑧-plane.
For the desired mapping of 𝐻(𝑠) to 𝐻(𝑧) we need the inverse of above equation. It is given as
1
𝑠= · ln(𝑧) (8.29)
𝑇
However, when introduced into a rational transfer function 𝐻(𝑠) this non-linear mapping would not result in the
desired rational transfer function 𝐻(𝑧). In order to achieve the desired mapping, ln(𝑧) is expanded into the power
series
(𝑧 − 1)3 (𝑧 − 1)5
(︂ )︂
𝑧−1
ln(𝑧) = 2 + + + . . . (8.30)
𝑧 + 1 3(𝑧 + 1)3 5(𝑧 + 1)5
Using only the linear term as approximation of ln(𝑧) yields the bilinear transform
2 𝑧−1
𝑠= · (8.31)
𝑇 𝑧+1
and its inverse
2 + 𝑠𝑇
𝑧= (8.32)
2 − 𝑠𝑇
It worthwhile noting that this mapping rule is a special case of a conformal map.
⃒
Let’s consider the mapping of the frequency response 𝐻(j 𝜔) = 𝐻(𝑠)⃒𝑠=j 𝜔 of a continuous system to the fre-
⃒
quency response 𝐻𝑑 (e j 𝜔𝑇 ) = 𝐻𝑑 (𝑧)⃒𝑧=e j Ω of a discrete system. Introducing the bilinear transform into the
continuous system to yield its discrete counterpart results in
(︂ )︂ ⃒
jΩ 2 𝑧 − 1 ⃒⃒ (︁ 2 Ω )︁
𝐻𝑑 (e ) = 𝐻 · =𝐻 j · tan( ) (8.33)
𝑇 𝑧+1 ⃒ ⏟𝑇 ⏞ 2
⃒
𝑧=ejΩ
𝜔
The imaginary axis 𝑠 = j 𝜔 of the 𝑠-plane is mapped onto the unit circle e j Ω of the 𝑧-plane. Note, that for sampled
signals the mapping between the continuous frequency axis 𝜔 and the frequency axis Ω of the discrete system is
Ω = 𝜔𝑇 . However, for the bilinear transform the mapping is non-linear
2 Ω
𝜔= · tan( ) (8.34)
𝑇 2
𝜔𝑇
Ω = 2 arctan( ) (8.35)
2
In the following, this is illustrated for 𝑇 = 1
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
plt.figure(figsize=(10,4))
plt.plot(om, Om, label=r'$2 \cdot \arctan(\frac{\omega T}{2})$')
plt.plot(om, om, 'k--', label=r'$\omega T$')
plt.xlabel(r'$\omega$')
plt.ylabel(r'$\Omega$')
plt.axis([-10, 10, -np.pi, np.pi])
plt.legend(loc=2)
plt.grid()
It is evident that the frequency axis deviates from the linear mapping Ω = 𝜔𝑇 , especially for high frequencies.
The frequency response of the digital filter 𝐻𝑑 (e j Ω ) therefore deviates from the desired continuous frequency
response 𝐻(j 𝜔). This is due to the first-order approximation of the mapping from the 𝑠-plane to the 𝑧-plane. The
effect is known as frequency warping. It can be considered explicitly in the filter design stage, as shown in the
examples.
Besides this drawback, the bilinear transform has a number of benefits:
• stability and minimum-phase of the continuous filter is preserved. This is due to mapping of the left half-
space of the 𝑠-plane into the unit-circle of the 𝑧-plane.
• the order of the continuous filter is preserved. This is due to the linear mapping rule.
• no aliasing distortion as observed for the impulse invariance method
The application of the bilinear transform to the design of digital filters is discussed in the following.
We aim at designing a digital filter 𝐻𝑑 (𝑧) that approximates a given continuous prototype 𝐻(𝑠) using the bilinear
transform. For instance, the transfer function 𝐻(𝑠) may result from the analysis of an analog circuit or filter design
technique. The transfer function 𝐻𝑑 (𝑧) of the digital filter is then given by
⃒
𝐻𝑑 (𝑧) = 𝐻(𝑠)⃒ 2 𝑧−1 (8.36)
⃒
𝑠= 𝑇 · 𝑧+1
The coefficients of the digital filter are derived by representing the numerator and denominator of 𝐻𝑑 (𝑧) as poly-
nomials with respect to 𝑧 −1 . For instance, for a continuous system of second order (second order section)
𝛽0 + 𝛽1 𝑠 + 𝛽2 𝑠2
𝐻(𝑠) = (8.37)
𝛼0 + 𝛼1 𝑠 + 𝛼2 𝑠2
the bilinear transform results in
(𝛽2 𝐾 2 − 𝛽1 𝐾 + 𝛽0 ) 𝑧 −2 + (2𝛽0 − 2𝛽2 𝐾 2 ) 𝑧 −1 + (𝛽2 𝐾 2 + 𝛽1 𝐾 + 𝛽0 )
𝐻𝑑 (𝑧) = (8.38)
(𝛼2 𝐾 2 − 𝛼1 𝐾 + 𝛼0 ) 𝑧 −2 + (2𝛼0 − 2𝛼2 𝐾 2 ) 𝑧 −1 + (𝛼2 𝐾 2 + 𝛼1 𝐾 + 𝛼0 )
2
where 𝐾 = 𝑇 .
As outlined in the previous section, the frequency response of the digital filter 𝐻𝑑 (e j Ω ) will differ for high
frequencies from the desired analog frequency response 𝐻(j 𝜔). For the design of a digital filter from an analog
prototype, this can be coped for by replacing corner frequencies with
(︂ )︂
2 𝜔c 𝑇
𝜔cw = · tan (8.39)
𝑇 2
where 𝜔cw denotes a warped corner frequency 𝜔c . This techniques is known as pre-warping.
8.3.3 Examples
The following two examples illustrate the digital realization of an analog system and the design of a recursive
filter, respectively.
where 𝑥(𝑡) denotes the input signal and 𝑦(𝑡) the output signal. Analysis of the circuit reveals its transfer function
as
𝑌 (𝑠) 1
𝐻(𝑠) = = 2
(8.40)
𝑋(𝑠) 𝐿𝐶 𝑠 + 𝑅𝐶 𝑠 + 1
Introducing this into the bilinear transform of a second order section (SOS) given in the previous section yields
𝑇 2 𝑧 −2 + 2𝑇 2 𝑧 −1 + 𝑇 2
𝐻𝑑 (𝑧) = (8.41)
(4𝐿𝐶 − 2𝑇 𝑅𝐶 + 𝑇 2 ) 𝑧 −2 + (−8𝐿𝐶 + 2𝑇 2 ) 𝑧 −1 + (4𝐿𝐶 − 2𝑇 𝑅𝐶 + 𝑇 2 )
In the following, the frequency response of the analog filter and its digital realization is compared numerically.
In [2]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import scipy.signal as sig
B = [0, 0, 1]
A = [L*C, R*C, 1]
# plot results
f = Om*fs/(2*np.pi)
plt.figure(figsize = (10, 4))
plt.semilogx(f, 20*np.log10(np.abs(H)), label=r'$|H(j \omega)|$ of analog filter
plt.semilogx(f, 20*np.log10(np.abs(Hd)), label=r'$|H_d(e^{j \Omega})|$ of digita
plt.xlabel(r'$f$ in Hz')
plt.ylabel(r'dB')
plt.axis([100, fs/2, -70, 3])
plt.legend(loc=1)
plt.grid()
Exercise
• Increase the corner frequency fc of the analog filter. What effect does this have on the deviations between
the analog filter and its digital representation?
The design of analog filters is a topic with a long lasting history. Consequently, many analog filter designs are
known. For instance
• Butterworth filters
• Chebyshev_filters
• Cauer filters
• Bessel filters
The properties of these designs are well documented and therefore digital realizations are of in-
terest. These can be achieved by the bilinear transform. In Python, the ‘scipy.signal
<http://docs.scipy.org/doc/scipy/reference/signal.html>‘__ package provides implementations of various analog
filter design techniques, as well as an implementation of the bilinear transform.
The design of a Butterworth bandpass using pre-warping is illustrated in the following.
In [3]: omc = 2*np.pi*np.array([5000, 6000]) # corner frequencies of bandpass
N = 2 # order of filter
# plot results
f = Om*fs/(2*np.pi)
plt.figure(figsize = (12, 8))
plt.semilogx(f, 20*np.log10(np.abs(H)), label=r'$|H(j \omega)|$ of analog protot
plt.semilogx(f, 20*np.log10(np.abs(Hd)), label=r'$|H_d(e^{j \Omega})|$ of digita
plt.semilogx(f, 20*np.log10(np.abs(Hdp)), label=r'$|H_d(e^{j \Omega})|$ of digit
plt.xlabel(r'$f$ in Hz')
plt.ylabel(r'dB')
plt.axis([100, fs/2, -70, 3])
plt.legend(loc=2)
plt.grid()
/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/
/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/
Exercise
• What is improved by pre-warping?
• Change the corner frequencies omc of the analog prototype and examine the deviations from the analog
In the following example, the characteristics and computational complexity of a non-recursive and a recursive
filter are compared for a particular design. Quantization is not considered. In order to design the filters we need to
specify the requirements. This is typically done by a tolerance scheme. The scheme states the desired frequency
response and allowed deviations. This is explained at an example.
We aim at the design of a low-pass filter with
1. unit amplitude with an allowable symmetric deviation of 𝛿p for |Ω| < Ωp
2. an attenuation of 𝑎s for |Ω| > Ωs
where the indices p and s denote the pass- and stop-band, respectively. The region between the pass-band Ωp and
the stop-band Ωs is known as transition-band. The phase of the filter is not specified.
𝜋 𝜋
The resulting tolerance scheme is illustrated for the design parameters Ωp = 3, Ωs = 3 + 0.05, 𝛿p = 1.5 dB and
𝑎s = −60 dB.
In [1]: %matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import scipy.signal as sig
p = [[0, -d_p], [Omp, -d_p], [Omp, -300], [np.pi, -300], [np.pi, a_s], [Oms,
polygon = mpatches.Polygon(p, closed=True, facecolor='r', alpha=0.3)
plt.gca().add_patch(polygon)
Exercise
• What corner frequencies 𝑓p and 𝑓s result for a sampling frequency of 𝑓s = 48 kHz?
The comparison of non-recursive and recursive filters depends heavily on the chosen filter design algorithm. For
the design of the non-recursive filter a technique is used which bases on numerical optimization of the filter coeffi-
cients with respect to the desired response. The Remez algorithm, as implemented in scipy.signal.remez,
is used for this purpose. The parameters for the algorithm are the corner frequencies of the pass- and stop-band,
as well as the desired attenuation in the stop-band. For the recursive filter, a Chebyshev type II design is used.
Here the parameters are the corner frequency and attenuation of the stop-band. The order of both filters has been
chosen manually to fit the given tolerance scheme.
In [2]: N = 152 # length of non-recursive filter
M = 13 # order of recursive filter
Exercises
• How do both designs differ in terms of their magnitude and phase responses?
• Calculate the number of multiplications and additions required to realize the non-recursive filter
• Calculate the number of multiplications and additions required to realize the recursive filter in transposed
direct form II
• Decrease the corner frequencies and adapt the order of the filters to match the tolerance scheme
In order to evaluate the computational complexity of both filters, the execution time is measured when filtering a
signal 𝑥[𝑘] of length 𝐿 = 105 samples. The non-recursive filter is realized by direct convolution, the recursive
filter in transposed direct form II using the respective Python functions.
In [3]: import timeit
Exercises
• Do the execution times correspond with the number of algorithmic operations calculated in the previous
exercise?
• Estimate the computational load for the filtering of a signal with a sampling rate of 48 kHz
• How could the execution time of the non-recursive filter be decreased?
• Finally, would you prefer the non-recursive or the recursive design for a practical implementation? Consider
the numerical complexity, as well as numerical aspects in your decision.
Getting Started
This will open a new view in your web browser with a list of notebooks. Click on index.ipynb (or any of the
other available notebooks). Alternatively, you can also download individual notebook files (with the extension
.ipynb) and open them in Jupyter. Note that some notebooks make use of additional files (audio files etc.)
which you’ll then also have to download manually.
193
Digital Signal Processing, Release 0.0
Literature
195
Digital Signal Processing, Release 0.0
Contributors
197