Lecture Notes For Math 485 PDF

LECTURE NOTES FOR MATH 485
Fritz Keinert
Carver 464
294-5223
keinert@iastate.edu
Contents
Chapter 0. Introduction to these Notes 0-1

0.1. About These Notes 0-1
0.2. Mathematical Background 0-1
0.3. Computer Background 0-1
0.4. Suggested Reading 0-1
0.5. Bibliography 0-1
Chapter 1. Introduction and Overview 1-1
1.1. Transforms 1-1
1.2. The Wavelet Transform 1-3
Chapter 2. Continuous Time-Frequency Representations 2-1
2.1. The Windowed Fourier Transform 2-1
2.2. The Continuous Wavelet Transform 2-3
2.3. The Uncertainty Principle 2-4
Chapter 3. Discrete Time-Frequency Representations 3-1
3.1. The Windowed Fourier Transform 3-1
3.2. The Continuous Wavelet Transform 3-3
Chapter 4. Multi-Resolution Approximations 4-1
4.1. The spaces Vj and the scaling function φ 4-1
4.2. The spaces Wj and the mother wavelet ψ 4-4
4.3. Constructing Multi-Resolution Approximations 4-5
Chapter 5. Algorithms 5-1
5.1. Inner Products of φ, ψ 5-1
5.2. The Algorithm 5-2
5.3. Matrix Viewpoint 5-3
5.4. Finite-Dimensional Setting 5-5
0-3
0-4 Math 485 Lecture Notes September 15, 1993
Chapter 6. Construction of Wavelets 6-1
6.1. Approach 1 6-2
6.2. Approach 2 6-3
6.3. Approach 3 6-3
Chapter 7. Designing Wavelets with Nice Properties 7-1
7.1. Basic Properties 7-1
7.1.1. In terms of {hk } 7-1
7.1.2. In terms of H(ξ) 7-1
7.1.3. In terms of h(z) 7-2
7.2. Choice of {gk } 7-3
7.2.1. In terms of H(ξ) 7-3
7.2.2. In terms of h(z) 7-4
7.2.3. In terms of {hk } 7-5
7.3. Other Desirable Properties 7-5
7.3.1. Compact Support 7-5
7.4. Symmetry 7-5
7.5. Vanishing Moments 7-6
7.5.1. Smoothness 7-8
Chapter 8. Daubechies Wavelets 8-1
Chapter 9. Other Topics 9-1
9.1. Signal Compression 9-1
9.2. Wavelet Packets 9-1
9.3. Biorthogonal Wavelets 9-3
9.4. Multi-Wavelets 9-4
9.5. Higher-dimensional Wavelets 9-4
Appendix A. Computer Information A-1
A.1. Finding a Machine A-1
A.2. Getting An Account A-1
A.3. Getting Help A-1
A.4. Finding Files For This Course A-2
A.5. Viewing Or Printing Files A-2
September 15, 1993 Fritz Keinert 0-5
A.6. Electronic Communication A-3
A.7. Matlab Documentation A-3
A.8. Producing Pictures from Matlab A-3
A.8.1. One-Dimensional Graphs A-3
A.8.2. Two-Dimensional Images A-3
A.9. Electronic Wavelet Resources A-4
Appendix B. Mathematical Background B-1
B.1. Vector Spaces B-1
B.2. Frames and Biorthogonal and Orthonormal Bases B-4
B.2.1. Orthonormal Bases B-4
B.2.2. Biorthogonal Bases B-5
B.2.3. Frames B-6
B.3. Fourier Series and Fourier Transform B-7
B.3.1. The Continuous Fourier Transform B-7
B.3.2. The Periodic Fourier Transform (or Fourier Series) B-8
B.3.3. The Discrete Fourier Transform B-8
B.3.4. The Fourier Transform of some basic functions B-9
B.3.5. Relationship between the Continuous and Periodic Fourier Transform B-10
B.3.6. Relationship between the Periodic and Discrete Fourier Transform B-12
B.4. Bibliography B-14
Appendix C. Wavelet Coefficients C-1
CHAPTER 0
Introduction to these Notes
0.1. About These Notes

Wavelets are a fairly recent development in mathematics. Until very recently, no books at all existed on
this subject, except for some conference proceedings and an out-of-date book by Meyer. The situation is
rapidly improving (see literature list), but as far as I know, there is still nothing that could be used as a text
book at the level of this course.
Therefore, I decided to put my lecture notes in writing. I will let you know when a section of the notes is
finished, so you can view it on the screen or print it out. This will probably not be until after I have finished
teaching this section, so you should take notes in class as well.
I will suggest further reading for each section, and put the required materials in the reserve section of the
library. You should look at that material also. Some of that material is a little tough, so don’t despair if
you can’t get through, and maybe skip sections that don’t look like anything we did in class, but do make
an effort to look at it. You may find other titles of interest in the bibliography section.
If you find any errors, omissions or unclear passages, please let me know, either in person or via e-mail to
keinert@iastate.edu.
0.2. Mathematical Background

Mathematical ideas we will use include: vector spaces and other ideas from linear algebra, expansion in
series of orthogonal functions, Fourier series and Fourier transforms, basic ideas of function spaces like L2 .
Ideally, you should have seen this material before, but you can pick up the necessary information in this
course. I will review all the needed material briefly (see appendix B).
0.3. Computer Background

I am planning to assign numerical experiments. See appendix A for more information.
0.4. Suggested Reading

There are 3 books on reserve in the library, along with copies of some important journal articles. These
are listed in the bibliography below. For each chapter of these notes, you should check these references for
sections relevant to the same material. In my experience, it helps to see the same material presented in
several ways. My hope is that these notes will make it easier to follow the books and papers, and the books
and papers will fill in the gaps in these notes. I will list the most relevant outside reading at the end of each
chapter.
0.5. Bibliography
[Chu92] Charles K. Chui, An introduction to wavelets, Wavelet Analysis and Its Applications, vol. 1, Aca-
demic Press, Boston, 1992.
[Dau92] I. Daubechies, Ten lectures on wavelets, CBMS-NSF Regional Conference Series in Applied Math-
ematics, vol. 61, SIAM, Philadelphia, 1992.
[(ed92] Charles K. Chui (ed.), Wavelets: A tutorial in theory and applications, Wavelet Analysis and Its
Applications, vol. 2, Academic Press, Boston, 1992.
[HBB92] F. Hlawatasch and G.F. Boudreaux-Bartels, Linear and quadratic time-frequency signal represen-
tations, IEEE Signal Proc. Mag. (1992), 21–67.
[JS] Björn Jawerth and Wim Sweldens, An overview of wavelet based multiresolution analyses, available
by anonymous ftp from maxwell.math.scarolina.edu as /pub/wavelet/papers/overview.ps.
[RV91] O. Rioul and M. Vetterli, Wavelets and signal processing, IEEE Signal Proc. Mag. 8 (1991), no. 4,
14–38.
0-1
CHAPTER 1
Introduction and Overview
In this section, I will try to address the following questions:

• What are the ideas behind transforms in general?
• What is the idea behind the wavelet transform?
1.1. Transforms
A transform is a mapping which takes a function (or a sequence of numbers) and maps it into another
function (or sequence of numbers). Reasons for taking a transform include
• The values of the transform may give us some information on the original function, such as smooth-
ness, rate of decay, periodicity, etc.
• The transform of an equation may be easier to solve than the original equation.
• The transform of a vector may require less storage.
• We may want to apply some operation (such as smoothing) to the original function, and find that
it is easier to do on the transform side.
Most useful transforms are linear, one-to-one, and invertible. This means
• linear: We can pull out constants and transform the terms of a sum separately:
T (αf + βg) = α(T f) + β(T g).
Here we assume that f, g are functions (or sequences), α, β are numbers, and T f is the transform
of f.
• one-to-one: If f, g are different functions (sequences), then their transforms T f, T g are different.
• invertible: There is an inverse transform T −1 which recovers f from T f.
I will use the word continuous transform to denote one that maps functions to functions. The word
“continuous” usually means something else in mathematics, but that should not cause any confusion in
this context. A discrete transform maps (finite or infinite) sequences to sequences. There are also some
semidiscrete transforms that relate functions with sequences.
Most continuous linear transforms have the form
Z
(T f)(ξ) = k(ξ, x)f(x) dx,
where the function k is called the kernel. If the kernel depends only on (ξ − x), not on ξ and x separately,
this is called a convolution (or filter, for the engineers):
Z
(T f)(ξ) = k(ξ − x)f(x) dx.
For discrete transforms, the counterparts are

X
(T f)i = kij fj
j
in general, or
X
(T f)i = ki−j fj
j
in the case of a discrete convolution.

Example: The Fourier Transform (continuous) is defined by
Z ∞
1
fˆ(ξ) = √ e−ixξ f(x) dx.
2π −∞
1-1
The Laplace Transform (continuous) is defined by
Z ∞
Lf(s) = e−st f(t) dt.
0
The Discrete Fourier Transform (discrete) of a vector of length N is defined by

X
N−1
e−i N jk fj ,
2π
fˆk = k = 0, . . . , N − 1.
j=0
All these transforms are linear, one-to-one, and have an inverse. None of them is a convolution.
One way to interpret the values of a transform is to consider them as inner products of the original
function with various “test functions”.
Recall that if f, g are finite-dimensional real vectors, their inner product satisfies
hf, gi = kfk2 · kgk2 · cos θ,
where θ is the angle between f and g. (The inner product is defined in appendix B). Thus, hf, gi is large if f,
g are almost parallel, and small if f, g are close to perpendicular. If g has length 1, hf, gi can be interpreted
as the size of the component of f in direction g.
Now consider a discrete transform X
(T f)i = kij fj
j
For fixed i, ki = {kij }j is a vector, and

(T f)i = hf, k̄i i,
so the ith component of the transformed vector T f is the component of f in direction k̄i. Taking the
transform of f is equivalent to “testing” f with many different vectors, to see how much f “looks like” the
test vectors.
The same idea, with minor modifications, applies to continuous transforms as well.
Example: The Discrete Fourier Transform. The test vectors are complex, but their real and
imaginary parts are discretized versions of sin and cos functions of various frequencies:
Thus, a coefficient in the DFT of f roughly corresponds to the frequency component of f at the corre-
sponding frequency.
1.2. The Wavelet Transform
This section illustrates one of the reasons for introducing the Continuous Wavelet Transform.
Suppose you want to analyze a time-varying signal. The Fourier Transform tells you what frequencies are
present in the signal.
Example: The data represents the number of sunspots recorded in the years 1700 through 1987 (taken
from [KMN89]). The Fourier Transform is complex. The plot shows the real and imaginary parts of the
FT, and the power spectrum (absolute value of FT). Choosing a frequency axis from 0 to 1 means that a
frequency ξ corresponds to a period of 1/ξ years. (The values in the second half of the FT represent negative
frequencies; for real data, this is just a complex conjugate mirror image of the values in the first half).
It is well known that sunspots have a cycle of about 11.2 years, which shows up as a distinct spike in the
FT (best seen in the power spectrum) around 1/11.2.
Sunspot Data Real Part

200 14000
180 12000
160 10000
140 8000
120
6000
100
4000
80
60 2000
40 0
20 -2000
0 -4000
1700 1750 1800 1850 1900 1950 2000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Power Spectrum Imaginary Part

14000 2500
2000
12000
1500
10000 1000
8000 500
0
6000 -500
4000 -1000
-1500
2000
-2000
0 -2500
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
The Fourier Transform works well for periodic signals. However, it does not yield any information on time
localization of frequencies. Signals whose frequency content varies with time cannot be analyzed properly
this way.
Example: Consider the Fourier Transforms of two signals, both composed of two pure sin functions of
different frequencies. On the left, each frequency is present by itself for half of the interval. On the right,
the two frequencies are both present the entire time. The Fourier transforms show similar peaks.
different frequencies at different times different frequencies at the same time
power spectrum power spectrum
This problem is of course related to the fact that all the “test functions” employed in the Fourier Transform
cover the entire interval. The Wavelet Transform, in contrast, employs localized test functions.
Taking a Wavelet Transform is equivalent to “testing” the original function with compressed and shifted
versions of a single wave form, the mother wavelet. Some sample wavelets are shown in the picture below.
Notice how the more compressed wavelets have smaller shifts, so it takes more of them to cover a given
interval.
Daubechies wavelet with 2 vanishing moments compressed by 2 compressed by 4
1 2 3 4 5 etc.
shifted compressed by 2, shifted compressed by 4, shifted
etc.
shifted again compressed by 2, shifted again compressed by 4, shifted again
etc.
etc. etc. etc.
Example:
Haar wavelet
As an example, consider the wavelet transform of the two

signals in the last example. I used the so-called Haar
wavelets for this, whose mother wavelet looks like this:
different frequencies at different times different frequencies at the same time
Wavelet transform Wavelet transform
1 2 1 2
A more detailed explanation of what the wavelet transform means has to wait until later, but compare
regions 1 and 2 in the two transform pictures; they correspond to the main frequencies present. In the
Wavelet Transform, you can tell whether something is happening in the left or right half only, or in the
whole region.

Chapter 1 in Daubechies [Dau92]. Chapter 1 in Chui [Chu92]. The survey article by Jawerth and
Sweldens [JS].
1.4. Bibliography
[Chu92] Charles K. Chui, An introduction to wavelets, Wavelet Analysis and Its Applications, vol. 1,
Academic Press, Boston, 1992.
[KMN89] David Kahaner, Cleve Moler, and Stephen Nash, Numerical methods and software, Prentice Hall,
1989.
CHAPTER 2
Continuous Time-Frequency Representations
Assume that f(t) is a complex-valued function on R which represents some signal (think of t as time).
The Fourier Transform (FT)
Z ∞
1
fˆ(ξ) = √ f(t)e−itξ dt
2π −∞
is used to decompose f into its frequency components. The inversion formula
Z ∞
1
f(t) = √ fˆ(ξ)eitξ dξ
2π −∞
can be interpreted as writing f as a superposition of time-harmonic waves eitξ of frequency ξ. If fˆ is large near
some frequency, then f has a large component that is periodic with that frequency. The Fourier Transform
is discussed in more detail in appendix B. You should read the material there first.
This approach works well for analyzing signals that are produced by some periodic process (like sunspot
counts, pulsar brightness, weather patterns, etc.) However, in other applications, like speech analysis, we
would like to localize the frequency components in time as well, and the FT is not really suitable for that.
In this chapter, we will consider two methods that attempt to provide information on both time and
frequency: the Windowed Fourier Transform (WFT), also called Short Time Fourier Transform (STFT),
and the Continuous Wavelet Transform (CWT). Both of them map a function of one variable (time) into a
function of two variables (time and frequency). A large value of the transform near time t, frequency ξ is
interpreted as: the signal f contains a large component with frequency ξ near time t.
There is a lot more theory to both of these transforms than we will cover in this chapter, but our main
interest lies elsewhere.
2.1. The Windowed Fourier Transform
Fix a function w ∈ L2 , the window function. w should be localized in time near t = 0, with a spread σ
(we will define “spread” more precisely later).
Typical choices are
w = χ[−1,1]
√
σ = 1/ 3
(recall that χS is the characteristic func-
tion of the set S, which has value 1 on the sigma
set, 0 otherwise).
2-1
w = (1 + cos(2t))/2 for t ∈ [−π/2, π/2]

√
σ = 1/ 3
sigma
1
w = √ e−t /2 (Gabor window)
2
2π
σ=1
sigma
The WFT with window w of f is defined as

Z ∞
1
Ψw f(a, b) = √ f(t)e−ibt w(t − a) dt.
2π −∞
Thus, a is the time parameter, b is the frequency.

If f ∈ L2 , then Ψw f is pointwise defined, bounded, continuous, and in L2 (R × R). A counterpart to the
Parseval-Plancherel theorem for the FT holds:
hΨw f, Ψw gi = kwk22 hf, gi.
The WFT can be inverted by
Z ∞ Z ∞
1 1
f(t) = √ Ψw f(a, b)eibt w(t − a) da db
kwk22 2π −∞ −∞
(where the integrals are interpreted in a suitable sense, as with the FT).
Ψw f(a, b) can be interpreted as an inner product of f with the test function e−ibt w(t − a).
A typical test function looks like this

(shifted and modulated window):
Note how the time resolution σ (related to the window width) is constant, independent of the frequency.
2.2. The Continuous Wavelet Transform
Fix a function w ∈ L2 , the mother wavelet. Again, w should be localized near t = 0 to some resolution σ.
As will become apparent later, w should also have integral 0, and some “natural frequency” (i.e. ŵ should
be localized near some non-zero frequency ξ).
Example:
The Daubechies wavelet with 5 vanishing

moments
The CWT of f with mother wavelet w is defined as

Z ∞
t−b
Φw f(a, b) = f(t)|a|−1/2 w( ) dt
−∞ a
In this case, b represents time, and a is a multiple of 1/frequency. (This is different from the meaning these
letters had for the WFT; I am just trying to follow the notation in [Chu92]).
If f ∈ L2 , Φw f is pointwise defined for a 6= 0, bounded and continuous.
Further properties require the condition
Z ∞ 1/2
|ŵ(ξ)|2
cw = 2π dξ < ∞.
−∞ |ξ|
R
If ŵ is continuous, this implies that ŵ(0) = 0, which means w(t) dt = 0.
Assuming that cw < ∞, the counterpart to the Parseval-Plancherel theorem from FT is
hΦw f, Φw gi = c2w hf, gi,
and the inversion formula is
Z ∞ Z ∞
1 t−b 1
f(t) = Φw f(a, b)|a|−1/2w( ) da db.
c2w −∞ −∞ a |a|2
Φw f(a, b) can be interpreted as an inner product of f with |a|−1/2w((t − b)/a).
A typical test function looks like this

(shifted and compressed mother wavelet):
Note that this window is localized in time with a spread of aσ, proportional to the wavelength and
inversely proportional to frequency: higher frequencies are resolved better in time.
2.3. The Uncertainty Principle

Assume w ∈ L2 .
The mean µw of w is defined as
Z ∞
1
µw = t|w(t)|2 dt.
kwk22 −∞
The uncertainty σw of w is defined as

Z ∞ 1/2
1
σw = (t − µw )2 |w(t)|2 dt .
kwk2 −∞
Remark: For any given w, µw and σw may or may not exist. We are only interested in w for which µw and σw are defined,
as well as µŵ and σŵ (the mean and uncertainty of the FT ŵ).
Remark: If w ∈ L2 , then w2 /kwk22 is a probability distribution, i.e. a non-negative function with integral 1. µw and σw are
simply the mean and standard deviation of this distribution, in the statistical sense.
µw measures where w is localized in time, and σw measures how spread out (or uncertain) this time
measurement is. µŵ localizes the frequency, and σŵ measures the uncertainty in frequency.
The Uncertainty Principle states that for any w
1
σw · σ̂w ≥ .
2
If a function is localized in time, it must be spread out in the frequency domain, and vice versa. The optimal
value of 1/2 is achieved if and only if w is a Gaussian distribution.
To visualize this, consider a hypothetical function F (t, ξ) over the time-frequency plane. F (t, ξ) represents
the component of f with frequency ξ at time t.
The uncertainty principle says that it makes no sense to try to assign pointwise values to F (t, ξ). All we
can do is to assign meaning to averages of F over rectangles of area at least 2. (The signal is localized in
time to [µ − σ, µ + σ], and likewise for frequency, so the rectangle has area (2σw ) × (2σŵ ).)
possible time-frequency windows
xi
good time resolution

bad frequency resolution
good frequency resolution

bad time resolution
t
The uncertainty principle applies to both WFT and CFT, but in different ways. Let’s skip the subscript
from now on, and use the notation µ = µw , σ = σw , µ̂ = µŵ , σ̂ = σŵ .
If w has mean µ, uncertainty σ, then hf, wi contains information on f in [µ−σ, µ+σ]. Since hf, wi = hfˆ, ŵi,
it also contains information of fˆ in [µ̂ − σ̂, µ̂ + σ̂]. Thus, hf, wi represents the frequencies between µ̂ − σ̂ and
µ̂ + σ̂ that are present between times µ − σ and µ + σ.
For WFT, the test function is the shifted and modulated window
eibt w(t − a) = Eb Ta w,
which is localized near µ + a, with uncertainty σ. Its Fourier Transform is Tb E−aŵ, which is localized near
µ̂ + b, with uncertainty σ̂. Thus,
1 1
Ψw f(a, b) = √ hf, Eb Ta wi = √ hfˆ, Tb E−a ŵi,
2π 2π
contains information on f in [µ + a − σ, µ + a + σ] and information on fˆ in [µ̂ + b − σ̂, µ̂ + b + σ̂].

For a WFT with fixed window w, the time resolution is fixed at σ, the frequency resolution is fixed at σ̂.
We can shift the window around in both time and frequency, but the uncertainty box always has the same
shape. The shape can only be changed by changing the window w.
All times and frequencies are resolved equally well (or equally bad). A discrete WFT (with equally spaced
time and frequency samples) gives a uniform tiling:
For CWT, the test function is
t−b
|a|−1/2w( ) = Tb Da w
a
with Fourier Transform E−b D1/a ŵ. Thus,
Φw f(a, b) = hf, Tb Da wi = hf,

ˆ E−b D1/a ŵi,
contains information on f in [aµ + b − aσ, aµ + b + aσ], and information on fˆ in [µ̂/a − σ̂/a, µ̂/a − σ̂/a].
The uncertainty boxes have different shapes in different parts of the time-frequency plane.
1/a high frequencies

good time resolution
bad frequency resolution
low frequencies
good frequency resolution
bad time resolution
The discrete wavelet transform results in a tiling

In many applications (such as speech analysis), high frequencies are present very briefly at the onset of a
sound, while lower frequencies are present later for longer periods.
To resolve all these frequencies well, the WFT has to be applied several times, with windows of varying
widths. The CWT resolves all frequencies simultaneously, localized in time to a level proportional to their
wavelength. That is what the wavelet people always claim, anyway.
Another way engineers like to look at it is this:
The window function in the WFT is a low-pass filter. The test functions are shifted and modulated
versions of w, which act like windows of constant width in both time and frequency.
WFT window function w FT of w
modulated and shifted window EbTaw FT of modulated and shifted window
The mother wavelet in the CWT is a bandpass filter. The test functions are shifted and dilated versions
of w, which change width in both time and frequency, but in opposite directions.
CWT mother wavelet w FT of mother wavelet
dilated and shifted mother wavelet FT of dilated and shifted mother wavelet

Chapters 1 and 2 in the Daubechies book [Dau92]. Chapter 3 in Chui [Chu92]. Portions of the articles
by Farge [Far92], Hlawatsch and Boudreaux-Bartels [HBB92], Rioul and Vetterli [RV91].
2.5. Bibliography
[Far92] Marie Farge, Wavelet transforms and their application to turbulence, Annual Review of Fluid
Mechanics 24 (1992), 395–457.
14–38.
CHAPTER 3
Discrete Time-Frequency Representations
Both the WFT and the CWT contain a lot of redundancy, since they map a function of one variable (time)
to a function of two variables (time and frequency). A natural question is whether certain subsets of the
transform values already suffice to characterize f, or to reconstruct it. We are especially interested in subsets
of discrete points, for numerical reasons.
Suppose we have a doubly indexed set of points (ajk , bjk ) in the time-frequency plane, and we know
Ψw f(ajk , bjk ) or Φw f(ajk , bjk ). The questions are
• Do these values uniquely determine f?

• Can we recover f from these discrete values?
We observe that the data we have is of the form
hf, ψjk i
for some set of functions ψjk (explicit formulas are given in the sections on WFT and CWT below). We can
answer both of the above questions with “yes”, and give the explicit reconstruction formula
X
f(t) = hf, ψjk iψ̃jk (t)
jk
provided that the {ψjk } form a frame, and the {ψ̃jk } form the dual frame. (See appendix B for details on
frames).
The question thus becomes: Given a particular (window function or mother wavelet) w(t), and a particular
choice of (ajk , bjk ), do the {ψjk } form a frame? Do they maybe form a biorthogonal or even orthonormal
basis?
We will address these questions in more detail below.
3.1. The Windowed Fourier Transform
For the WFT, we have
Z ∞
1
Ψw f(a, b) = √ f(t)e−ibt w(t − a) dt.
2π −∞
For a fixed window w, Ψw f(a, b) contains time-frequency information on f in a rectangle of fixed size around
(a, b). Thus, the most natural choice for (ajk , bjk ) is equally spaced points: pick a sampling stepsize a0 in
time, a sampling stepsize b0 in frequency, and choose (ajk , bjk ) = (ja0 , kb0 ).
3-1
Time-Frequency Sampling for WFT
b
o o o o o o
o o o o o o
b0
o o o o o o
a0 a
o o o o o o
The values of Ψw at these points are

Z ∞
1
Ψw f(ja0 , kb0 ) = √ f(t)e−ikb0 t w(t − ja0 ) dt = hf, ψjk i,
2π −∞
where
1
ψjk (t) = √ eikb0 t w(t − ja0 ).
2π
The questions are now: Under what conditions on w, a0 , b0 do the {ψjk } form a frame? Under what
conditions do they form a biorthogonal basis, or even an orthonormal basis? A partial answer is given below.
For full details, look at the Daubechies book [Dau92].
First, it can be shown that the {ψjk } can only possibly form a frame, for any w, if a0 b0 ≤ 2π. If a0 b0 > 2π,
the sampling is just too sparse. The {ψjk } can only possibly form an orthonormal basis if a0 b0 = 2π. These
are necessary conditions: if they are violated, it certainly will not work, but it may not work even if they
are satisfied.
The good news is that under fairly mild conditions on the window function w we get a frame for small
enough a0 , b0 . A simplified version of a more general sufficient condition given in Daubechies [Dau92] is
this: Assume the window w satisfies
|w(t)| ≤ C(1 + |t|)−1−
for some constant C, > 0, and that w is bounded away from 0 on some interval of length a0 . Then one
can calculate a constant B0 so that {ψjk } form a frame (with frame bounds that can also be calculated)
whenever b0 ≤ B0 .
In words: assume the window function goes to zero fairly rapidly for large |t| (which is what you want,
anyway), and the time sampling step a0 is chosen small enough so that the windows overlap. Then for a
small enough frequency sampling step, the {ψjk } form a frame.
Orthonormal bases also exist. For example, choose w = χ[0,L] (the characteristic function of the interval
[0, L]). The choice a0 = L, b0 = 2π/L leads to an orthonormal basis
( 2π
√1 ei L kt t ∈ [jL, (j + 1)L],
ψjk (t) = 2π
0 otherwise.
What you are really doing, of course, is to cut a function f up into sections of length L, and then write each
piece as a Fourier Series.
The bad news is that no “good” orthonormal bases exist. This is a theorem of Balian-Low: If {ψjk }
constitute an orthonormal basis, then either σ or σ̂ do not exist.
This means that the window is either no good for localizing in time, or no good for localizing in frequency.
You can get good localization in both time and frequency, but at the cost of using a frame (which contains
redundant basis functions).
3.2. The Continuous Wavelet Transform

Before we imitate the developments above, we simplify the problem a little, by demanding
Z 0 Z ∞
|ŵ(ξ)|2 |ŵ(ξ)|2 c2
2π dξ = 2π dξ = w
−∞ |ξ| 0 |ξ| 2
instead of
Z ∞
|ŵ(ξ)|2
2π dξ = c2w .
−∞ |ξ|
This is not as restrictive as it looks: any real w, for example, satisfies it. (Recall that the FT of a real
function satisfies ŵ(−ξ) = ŵ(ξ), so |ŵ(−ξ)| = |ŵ(ξ)|).
Under the new condition, it suffices to consider positive a only for reconstruction:
Z ∞Z ∞
2 t−b 1
f(t) = 2 Φw f(a, b)|a|−1/2w( ) da db.
cw −∞ 0 a |a|2
The discussion on uncertainty in chapter 2 suggests the time-frequency sampling
a = aj0 , b = kaj0 b0 , a0 > 1, b0 > 0
for this case. The resulting lattice of points in the time-frequency plane looks like this
Time-Frequency Sampling for CWT

1/a
o o o o o o o o o
o o 1/a0 o o o o
o 1 o o o
o a0 o o
b0/a0
b0 b0 a0 b
The corresponding basis functions are
−j/2 t − kaj0 b0
ψjk (t) = a0 w(a−j t − kb0 ) = a−j/2 w( ).
aj0
A necessary condition for an orthonormal basis is
Z 0 Z ∞
|ŵ(ξ)|2 |ŵ(ξ)|2 b0 ln a0
2π dξ = 2π dξ = .
−∞ |ξ| 0 |ξ| 2π
Again, the exact conditions under which the {ψjk } constitute a frame are very technical (see Daubechies [Dau92]).
A simplified sufficient condition is this: Assume that the mother wavelet w satisfies
|ŵ(ξ)| ≤ C|ξ|α(1 + |ξ|)−γ
for some α > 0, γ > 1 + α, and that a0 is chosen so that |ŵ| is bounded away from zero on some interval
[ξ0 , a0 ξ0 ]. The one can calculate a B0 > 0 such that the {ψjk } form a frame for any choice of b0 ≤ B0 , with
fram bounds that can also be calculated.
In this case, we have conditions on the FT of w. Again, we have a minimal condition on the decay at 0
and at infinity, and the a0 must be chosen small enough so that the FT of the mother wavelet at different
scales overlap.
For the rest of this course (and these notes), we will only conside the case a0 = 2, b0 = 1. It turns out
that even in this special case, many different types of orthonormal and biorthogonal bases exist.

Chapters 3 and 4 in the Daubechies book [Dau92]. Chapter 3 in Chui [Chu92]. Portions of the articles
by Farge [Far92], Hlawatsch and Boudreaux-Bartels [HBB92], Rioul and Vetterli [RV91].
3.4. Bibliography
Mechanics 24 (1992), 395–457.
14–38.
CHAPTER 4
Multi-Resolution Approximations
Let us now specialize the results of chapter 3 to the Discrete Wavelet Transform alone, with the special
choice a0 = 2, b0 = 1. In the notation of chapter 3, this means
wjk (t) = 2−j/2 w(2−j t − k).
We will approach the same construction as before from a different angle, and change the notation to the
standard wavelet notation: The mother wavelet is now called ψ instead of w, and the dilated and shifted
versions of it are called
ψjk (t) = 2k/2 ψ(2k t − j).
Also, another function φ, called the scaling function, will come up.
4.1. The spaces Vj and the scaling function φ
Definition: A Multi-Resolution Approximation (MRA) of L2 is a sequence {Vj }, j ∈ Z, of closed

subspaces of L2 such that
• Vj ⊂ Vj+1 ∀j ∈ Z
S
• Tj Vj is dense in L
2
• j Vj = {0}
• f(t) ∈ Vj ↔ f(2t) ∈ Vj+1 ∀j ∈ Z
• f(t) ∈ Vj ↔ f(t − 2−j k) ∈ Vj ∀k ∈ Z, ∀j ∈ Z
• There is a function φ(t) ∈ L2 , called the scaling function, such that
φ0j (t) = φ(t − j), j∈Z
form an orthonormal basis of V0 .
The definition implies that V1 consists exactly of all the functions in V0 compressed by a factor of 2, V2
consists of the functions in V0 compressed by a factor of 22 = 4, V−1 consists of the functions in V0 dilated
by a factor of 2, and so on. Once we know the scaling function φ, everything is determined.
Example: Let V0 consist of the L2 functions which are piecewise constant on intervals of the form
[k, k + 1], k ∈ Z. V1 then consists of L2 functions piecewise constant on intervals of the form [k/2, (k + 1)/2],
V2 consists of L2 functions piecewise constant on intervals of the form [k/4, (k + 1)/4], V−1 consists of L2
functions piecewise constant on intervals of the form [2k, 2(k + 1)], and so on.
V-1 V0
4-1
V1 V2
The scaling function is easy to guess:

φ(t) = χ[0,1] .
It is orthogonal to its translates since
there is never any overlap. 1 2
Example: Let V0 consist of the L2 functions which are continuous and piecewise linear on intervals of
the form [k, k + 1], k ∈ Z. The basic idea is the same:
V-1 V0
V1 V2
It is tempting to use the hat function as

the scaling function. This generates a ba-
sis, but not an orthonormal one: the func-
tion overlaps its translates, and the inte-
gral is not zero.
-1 1 2
We will see below how to find φ.
Theorem 4.1. The integer translates of φ(t) ∈ L2 are orthonormal if and only if
X 1
|φ̂(ξ + 2πk)|2 =
2π
k
P
Proof: Let Φ(ξ) = k |φ̂(ξ + 2πk)|2 . Φ(ξ) is a 2π-periodic function. We need to show
1
Φ(ξ) = ⇔ hφ(t − j), φ(t)i = δj0 .
2π
We calculate
Z ∞
−ijξ
hφ(t − j), φ(t)i = hφ̂(ξ)e , φ̂(ξ)i = |φ̂(ξ)|2 e−ijξ dξ
−∞
XZ 2π
= |φ̂(ξ + 2πk)|2 e−ij(ξ+2πk) dξ
k 0
Z ( )
2π X
= |φ̂(ξ + 2πk)| 2
e−ijξ dξ
0 k
Z 2π
= Φ(ξ)e−ijξ dξ
0
= jth Fourier coefficient of Φ(ξ).
The function with Fourier coefficients δ0j is 1/2π.
Assume now that g(t) ∈ L2 is a function whose integer translates span V0 and form an exact frame. That
is, if f ∈ V0 , then we can write it uniquely as
X
f(t) = αj g(t − j),
j
and
X X
A |αj |2 ≤ kfk2 ≤ B |αj |2 ,
j j
where 0 < A ≤ B are the frame bounds. As in the previous theorem, one can show that this implies
A X B
≤ |ĝ(ξ + 2πj)|2 ≤ .
2π j
2π
If we define
1 ĝ(ξ)
φ̂(ξ) = √ qP ,
j |ĝ(ξ + 2πj)|
2π 2
then
X 1
|φ̂(ξ + 2πj)|2 = ,
j
2π
so integer translates of φ are orthonormal. One can show that they are a basis of V0 . This technique allows
us to find φ for a given V0 .
Example: Take V0 to be the space of L2 functions which are continuous and piecewise linear on intervals
of the form [k, k + 1], k ∈ Z. An easy choice for g is the hat function (see previous example). We calculate
1 sin2 (ξ/2)
ĝ(ξ) =
2π (ξ/2)2
X 1
|ĝ(ξ + 2πj)|2 = √ (2 + cos(ξ))
j
3 2π
√ sin2 (ξ/2)
φ̂(ξ) = 3(2π)1/4 √ .
(ξ/2)2 2 + cos ξ
This φ is one of the Battle-Lemarié

wavelets [Bat87], [Lem88]. It extends all
the way to infinity, but decays exponen-
tially fast.
4.2. The spaces Wj and the mother wavelet ψ

There is a second sequence of subspaces of L2 that turns out to be very useful. We define Wj as the
orthogonal complement of Vj in Vj+1 , so that
Wj ⊕ Vj = Vj+1 .
(The ⊕ stands for “orthogonal sum”. This means that the vectors in Wj plus the vectors in Vj generate all
of Vj+1 , and Vj and Wj are orthogonal).
We can visualize this as follows:
V PPP
PPP
j+1
Vj Wj
, @
, @
Vj−1 Wj−1
BB
Vj−2 Wj−2
From the definition, Wj is orthogonal to Vj (written Wj ⊥ Vj ) , and therefore to all Vk and Wk with
k < j, since they are subspaces of Vj . This implies that all Wj are mutually orthogonal.
One can show that the sequence {Wj } has very similar properties to a MRA. About the only difference is
that any two Wj and Wk (j 6= k) are orthogonal, whereas for any two Vj , Vk one is contained in the other.
In detail
• LWj ⊥ Wk for j 6= k
• T j Wj is dense in L2
• j Wj = {0}
• f(t) ∈ Wj ⇔ f(2t) ∈ Wj+1
• f(t) ∈ Wj ⇔ f(t − 2−j k) ∈ Wj ∀k ∈ Z
• There exists a function ψ(t) ∈ L2 , called the mother wavelet, such that
ψj0 (t) = ψ(t − j), j ∈ Z,
form an orthonormal basis of W0 .
So, altogether there are really only two functions involved in the MRA approach: the scaling function φ
and the mother wavelet ψ. We define the scaled and shifted versions by
φkj (t) = 2k/2 φ(2k t − j)
ψjk (t) = 2k/2 ψ(2k t − j)
Then we know that
• For fixed k, the {φkj }∞
j=−∞ form an orthonormal basis of Vk
k ∞
• For fixed k, the {ψj }j=−∞ form an orthonormal basis of Wk
• The {ψjk }∞
k,j=−∞ form an orthonormal basis of L
2
Thus, every function f ∈ L2 can be written as

X
f(t) = hf, ψjk iψjk (t).
j,k
4.3. Constructing Multi-Resolution Approximations

How do we find scaling functions and wavelets? There are two basic approaches:
Approach 1:
• Pick a suitable space V0 (like piecewise linear functions)
• Find a basis function which generates an exact frame
• Orthonormalize this basis to get φ (from which you can calculate ψ).
This approach was taken independently by Battle [Bat87] and Lemairé [Lem88]. It is also mentioned in
Chui’s book [Chu92].
Approach 2:
• Pick a suitable φ which is orthonormal to its integer translates
• Let V0 be the vector space spanned by the translates of φ, whatever it happens to look like; the
other Vj are then the dilated versions of V0 .
• The only thing left to verify is that this produces all of L2 , not just a subspace of it.
This is the approach taken by Daubechies [Dau92], and it is also the approach we will take.
It turns out that everything you need to know about φ, ψ is based on two sets of coefficients {hk }, {gk }.
These coefficients need to satisfy certain conditions to produce suitable φ, ψ. The best way to derive these
conditions is via Fourier transforms, but we will postpone that to a later chapter. Here, I will derive some
necessary properties of {hk }, {gk } directly.
The basic observation is that since V0 , W0 are subspaces of V1 , their basis functions φ and ψ must have
an expansion in the basis of V1 . Thus, there have to be so-called recursion coefficients {hk } and {gk } in `2
so that
X √ X
φ(t) = hk φ1k (t) = 2 hk φ(2t − k)
k k
X √ X
ψ(t) = gk φ1k (t) = 2 gk φ(2t − k)
k k
What are the requirements on {hk }?

R
• Assume that φ(t) dt 6= 0 (this is necessary, as we will see later). Then
Z √ X Z
φ(t) dt = 2 hk φ(2t − k) dt
k
Z Z
√ X 1 X
= 2 hk φ(2t) dt = √ hk φ(t) dt,
k
2 k
so we must have
X √
hk = 2.
k
• By assumption, the φ0j are orthonormal. Thus
Z
δ0j = hφ(t), φ(t − j)i = φ(t)φ(t − j) dt
Z (√ X )(
√ X
)
= 2 hk φ(2t − k) 2 h̄` φ(2t − 2j − `) dt
k `
X
=2 hk h̄` hφ(2t − k), φ(2t − 2j − `)i
k,`
X
= hk h̄k−2j ,
k
since (
0 6 2j + `
if k =
hφ(2t − k), φ(2t − 2j − `)i =
1/2 if k = 2j + `
To display the result once more, for easier reference:
X
hk h̄k−2j = δ0j .
k
• It can be proved from the two boxed identities that
X X 1
h2k = h2k+1 = √ .
k k
2
This will follow much more easily from the Fourier transform approach later, so I won’t bother to
prove it here.
The conditions on {gk } are similar and will also follow much more easily later, so I will just summarize
everything here: In order to generate a MRA, the recursion coeffients must satisfy
P P √ P √
(1) h = k h2k+1 = 1/ 2,√which of course implies k hk = 2.
Pk 2k P P
(2) Pk g2k = − k g2k+1 = (1/ 2) · α, where |α| = 1. This implies k gk = 0.
(3) Pk hk h̄k−2j = δ0j for all j (orthonormality of translates of φ).
(4) Pk gk ḡk−2j = δ0j for all j (orthonormality of translates of ψ).
(5) k hk ḡk−2j = 0 for all j (orthonormality of translates of φ, ψ).
If the {hk } satisfy conditions (1) and (3), then the choice
gk = (−1)k h1−k
will satisfy the rest. (Exercise: prove that!).

Remark: The conditions listed here are necessary but not sufficient. They prove that we get a ladder of spaces with the
S T
right nesting properties, but we need additional conditions to insure that Vk = L2 , Vk = {0}.
Example: Let V0 be the space of L2 functions piecewise constant on intervals [k, k + 1]. We know
1 1
φ(t) = χ[0,1] (t) = χ[0,1/2](t) + χ[1/2,1](t) = φ(2t) + φ(2t − 1) = √ φ10 (t) + √ φ11 (t),
2 2
so
1
h0 = h1 = √ .
2
(Exercise: Check that these satisfy the properties listed above).
Thus, we should use

1 1
g0 = √ , g1 = − √ .
2 2
Haar wavelet
This means
ψ(t) = χ[0,1/2] − χ[1/2,1]
Chapter 5 in Daubechies [Dau92]. Chapter 5 in Chui [Chu92]. Papers by Battle [Bat87], Lemarié [Lem88],
Jawerth and Sweldens [JS], Mallat [Mal89], Strang [Str89].
4.5. Bibliography
[Bat87] Guy Battle, A block spin construction of ondelettes. Part I: Lemarié functions, Comm. Math. Phys.
110 (1987), 601–615.
[Lem88] P. G. Lemarié, Une nouvelle base d’ondelettes de l2 (Rn ), J. Math. Pures Appl. 67 (1988), 227–236.
[Mal89] S. Mallat, A theory for multiresolution signal decomposition: The wavelet representation, IEEE
Trans. Pattern Analysis and Machine Intelligence 11 (1989), no. 7, 674–693.
[Str89] Gilbert Strang, Wavelets and dilation equations: A brief introduction, SIAM Rev. 31 (1989), no. 4,
614–627.
CHAPTER 5
Algorithms
We know from the previous chapter that any function f ∈ L2 can be written as
X
f(t) = hf, ψjk iψjk (t),
k,j
where ψ is the mother wavelet. We can also write out partial decompositions. Pick any n, and any m < n.
Then
Vn = Vn−1 ⊕ Wn−1
= Vn−2 ⊕ Wn−2 ⊕ Wn−1
···
= Vm ⊕ Wm ⊕ Wm+1 ⊕ · · · ⊕ Wn−1 .
For the same decomposition, written out in terms of vectors, take any f ∈ Vn . We can write
X
f(t) = hf, φnj iφnj (t)
j
X X
= hf, φn−1
j iφn−1
j (t) + hf, ψjn−1 iψjn−1 (t)
j j
···
X X X
= hf, φm
j iφj (t) +
m
hf, ψjm iψjm (t) + · · · + hf, ψjn−1 iψjn−1 (t)
j j j
Let us introduce the abbreviations
skj = hf, φkj i,

dkj = hf, ψjk i.
The original representation of f ∈ Vn is then

X
(5.1) f(t) = snj φnj (t).
j
The wavelet decomposition of f from level n down to level m is
X X
n−1 X
(5.2) f(t) = sm m
j φj (t) + dkj ψjk (t).
j k=m j
In principle, the coefficients skj , dkj can all be calculated directly as inner products of f and ψjk , φkj , but that
would be very time consuming. The wavelet decomposition algorithm goes from representation (5.1) directly
to (5.2), the wavelet reconstruction algorithm goes the opposite way.
5.1. Inner Products of φ, ψ

At the heart of both decomposition and reconstruction algorithms is the calculation of various inner
products of φkj , ψjk . Calculating them all is too time-consuming. It suffices to do them for φ, ψ at the same
level, and in adjacent levels.
5-1
In detail: we need to calculate the following inner products (results are given if already known):
hφnj , φnki = δjk

hψjn , ψkn i = δjk
hφnj , ψkn i = 0
hφnj , φn−1
k i
hψjn , ψkn−1 i = 0
hφnj , ψkn−1 i
First, we need to generalize the recursion formula

√ X X
φ00 (t) = φ(t) = 2 hi φ(2t − i) = hi φ1i (t).
i i
We calculate
√ X
φn−1
k (t) = 2(n−1)/2φ(2n−1 t − k) = 2(n−1)/2 2 hiφ(2 · [2n−1t − k] − i)
i
X X
=2 n/2
hiφ(2 t − 2k − i) =
n
hi φn2k+i(t)
i i
and likewise for ψ. In summary

X
φn−1
k (t) = hiφn2k+i(t)
i
X
ψkn−1 (t) = giφn2k+i(t)
i
The missing inner products in the list above are therefore

X X
hφnj , φn−1
k i = hφnj , hiφn2k+ii = h̄ihφnj , φn2k+ii = h̄j−2k ,
i i
and likewise
hφnj , ψkn−1 i = ḡj−2k .
5.2. The Algorithm

Now we are ready. For the decomposition, we use
X
sn−1
k = hf, φn−1
k i=h snj φnj , φn−1
k i
j
X X
= snj hφnj , φn−1
k i = h̄j−2k snj ,
j j
and likewise X
dn−1
k = ḡj−2k snj .
j
For the reconstruction

X X
snj = hf, φnj i = sn−1
k hφn−1
k , φnj i + dn−1
k hψkn−1 , φnj i
k k
X
= hj−2k sn−1
k + gj−2k dn−1
k .
k
Let us display the results once more, for easy reference:

Decomposition:
X
sn−1
k = h̄j−2k snj
j
X
dn−1
k = ḡj−2k snj
j
Reconstruction: X
snj = hj−2k sn−1
k + gj−2k dn−1
k
k
This form is easy to manipulate for theoretical calculations. For practical programming, it is better to
sum over the indices of h and g, since that is typically a much shorter sum:
Decomposition:
X
sn−1
k = h̄j snj+2k
j
X
dn−1
k = ḡj snj+2k
j
Reconstruction:
Xn o
sn2j = h2k sn−1 n−1
j−k + g2k dj−k
k
Xn o
sn2j+1 = h2k+1 sn−1 n−1
j−k + g2k+1 dj−k
k
A complete decomposition or reconstruction in done in stages:
{snj } ↔ {sn−1
k , dn−1
k } ↔ {sn−2
` , dn−2
` , dn−1
k } ↔ ...
5.3. Matrix Viewpoint

Any linear operator between finite-dimensional vector spaces can be written as a matrix. In our case, the
spaces are infinite-dimensional, but the basis is countable, so we can still do this with infinite matrices.
Vn
P Q
n-1 n-1
The space Vn is split into two orthogonal
subspaces Vn−1 and Wn−1 :
Vn-1 W n-1
There are four linear maps associated with this splitting:
Pn−1 = orthogonal projection from Vn onto Vn−1

Qn−1 = orthogonal projection from Vn onto Wn−1
∗
Pn−1 = identity map which embeds Vn−1 into Vn
∗
Qn−1 = identity map which embeds Wn−1 into Vn
These operators satisfy the relations
∗
Pn−1 Pn−1 =I (identity on Vn−1 )
Qn−1 Q∗n−1 =I (identity on Wn−1 )
(5.3)
Pn−1Q∗n−1 ∗
= Qn−1 Pn−1 =0
∗
Pn−1 Pn−1 + Q∗n−1 Qn−1 =I (identity on Vn )
Conversely, assume that Vn−1 , Wn−1 are any two subspaces of Vn , Pn−1 , Qn−1 are any two maps from Vn into
∗
these subspaces, and Pn−1 , Q∗n−1 the adjoint operators (= complex conjugate transpose). If relations (5.3)
are satisfied, then Vn−1 , Wn−1 are orthogonal, Vn = Vn−1 ⊕ Wn−1 , and Pn−1 , Qn−1 are the orthogonal
projections onto Vn−1 , Wn−1 .
The fourth relation in (5.3) is the decomposition/reconstruction property:
sn−1 = Pn−1 sn
dn−1 = Qn−1 sn
∗
sn = Pn−1 sn−1 + Q∗n−1 dn−1
What do the matrices look like? In general, if P is any matrix,
X
y = P x ⇔ yk = Pkj xj .
j
In our case, X
sn−1
k = h̄j−2k snj ,
j
so the kj entry of Pn−1 is h̄j−2k . Each row of the matrix contains the h̄j in sequence, but shifted two places
to the right compared to the row above it.
 
(col. (-1) (col. 0) (col. 1)
 .. .. .. 
 . . . 
 
· · · h̄1 h̄2 h̄3 · · · (row (-1))
Pn−1 =   
· · · h̄−1 h̄0 h̄1 ··· (row 0) 

· · · h̄−3 h̄−2 h̄−1 ··· (row 1) 
 
.. .. ..
. . .
∗
Pn−1 is the complex conjugate transpose of Pn−1 (this follows from linear algebra theory, but you can also
find it directly from the formulas in the last section):
 
.. .. ..
 . . . 
· · · h1 h−1 h−3 · · ·
 
∗
Pn−1 =· · · h2 h0 h−2 · · ·

· · · h3 h1 h−1 · · ·
 
.. .. ..
. . .
Qn−1 , Q∗n−1 look the same as Pn−1 , Pn−1
∗
, but using g instead of h.
We can ask what properties of g, h are necessary to make relations (5.3) work. It turns out that we need
X
hk h̄k−2m = δ0m ,
k
X
gk ḡk−2m = δ0m ,
k
X
hk ḡk−2m = 0,
k
exactly the same conditions we had before.

5.4. Finite-Dimensional Setting
So far, everything took place in an infinite-dimensional space. To do any actual calculations, we need to
restrict ourselves to a finite-dimensional setting. In principle, everything works the same as before, but there
are a few pitfalls. The following detailed examples should make things clear.
Example: Let us use the Haar wavelets, defined by
p p
{h0 , h1 } = { 1/2, 1/2}
p p
{g0 , g1 } = { 1/2, − 1/2}
on the original vector of length 8
s3 = {s30 , . . . , s37 } = {1, 2, 3, 4, 3, −1, 0, 1}.
We calculate
p p p
s20 = h̄0 s30 + h̄1 s31 = 1/2 · 1 + 1/2 · 2 = 3 1/2
p p p
s21 = h̄0 s32 + h̄1 s33 = 1/2 · 3 + 1/2 · 4 = 7 1/2
...
p p p
d20 = ḡ0 s30 + ḡ1 s31 = 1/2 · 1 − 1/2 · 2 = − 1/2
p p p
d21 = ḡ0 s32 + ḡ1 s33 = 1/2 · 3 − 1/2 · 4 = − 1/2
...
p p p p
s10 = h̄0 s20 + h̄1 s21 = 1/2 · (3 1/2) + 1/2 · (7 1/2) = 5
p p p p
s11 = h̄0 s22 + h̄1 s23 = 1/2 · (2 1/2) + 1/2 · ( 1/2) = 3/2
...
For the reconstruction,

p p p p
s10 = h0 s00 + g0 d00 = 1/2 · (13/2 1/2) + 1/2 · (7/2 1/2) = 5
p p p p
s11 = h1 s00 + g1 d00 = 1/2 · (13/2 1/2) − 1/2 · (7/2 1/2) = 3/2
p p p
s20 = h0 s10 + g0 d10 = 1/2 · 5 + 1/2 · (−2) = 3 1/2
p p p
s21 = h1 s10 + g1 d10 = 1/2 · 5 − 1/2 · (−2) = 7 1/2
p p p
s22 = h0 s11 + g0 d11 = 1/2 · (3/2) + 1/2 · (1/2) = 2 1/2
p p p
s22 = h1 s11 + g1 d11 = 1/2 · (3/2) − 1/2 · (1/2) = 1/2
p p p p
s30 = h0 s20 + g0 d20 = 1/2 · (3 1/2) + 1/2 · (− 1/2) = 1
p p p p
s31 = h1 s20 + g1 d20 = 1/2 · (3 1/2) − 1/2 · (− 1/2) = 2
...
At each stage, the new coefficients being calculated can all be stored in place of the old ones:
s30 s20 s10 s00

s31 s21 s11 d00
s32 s22 d10 d10
s33 s23 d11 d11
⇔ ⇔ ⇔
s34 d20 d20 d20
s35 d21 d21 d21
s36 d22 d22 d22
s37 d23 d23 d23
p p
1 3p1/2 5 (13/2) ·p 1/2
2 7p1/2 3/2 (7/2) · 1/2
3 2p1/2 −2 −2
4 1 p1/2 1/2
p 1/2
⇔ ⇔ ⇔ p
3 −p1/2 −p1/2 −p1/2
−1 −p 1/2 −p 1/2 −p 1/2
0 4 p1/2 4 p1/2 4 p1/2
1 − 1/2
− 1/2 − 1/2
Example: Take the same setup as in the previous example, but use an initial vector of length 6
s3 = {1, 2, 3, 4, 3, −1}
We get
s10 s00
s30 s20
s11 d00
s31 s21
d10 d10
s32 s22
⇔ ⇔ d11 ⇔ d11
s33 d20
d20 d20
s34 d21
d21 d21
s35 d22
d22 d22
p
p 5 6 · p1/2
1 3p1/2 1 4 · 1/2
2 7p1/2 −2 −2
3
⇔
2 p1/2
⇔ p1 ⇔ p1
4 −p1/2 −p1/2 −p1/2
3 −p 1/2 −p 1/2 −p 1/2
−1 4 1/2 4 p1/2 4 p1/2
− 1/2 − 1/2
Notice how it is not possible this time to store the decomposed vector in the same place as the original. That
happens whenever a vector of odd length gets decomposed. Thus, if the original vector length N contains a
factor of 2`, we can do ` stages of decomposition without problems. If we want to do more, the decomposed
vectors get longer.
Changing the vector length in the middle of decomposition is a programming headache. Fortunately,
there is an easy way out: just pad the original vector with zeros to a length that is a power of 2 (or at least
has a factor that is a large power of 2, so you can do several stages).
Notice, by the way, that if you reconstruct the data in this example, you get a vector of length 8 with two
zeros at the end (unless you know the original vector is supposed to be shorter, and take that into account).
Example: Let us now use a longer wavelet, like the Daubechies wavelet with two vanishing moments.
The coefficients are
√
1+ 3
h0 = −g3 = √ ≈ 0.482962913144534
4 2
√
3+ 3
h1 = g2 = √ ≈ 0.836516303737808
4 2
√
3− 3
h2 = −g1 = √ ≈ 0.2241438680420134
4 2
√
1− 3
h3 = g0 = √ ≈ −0.1294095225512603
4 2
Take again
s3 = {s0 , . . . , s7 } = {1, 2, 3, 4, 3, −1, 0, 1}.
If we apply the formulas in a straightforward manner, by treating s3j as 0 unless j is between 0 and 7, we
again pick up extra coefficients during the decomposition. This happens for any wavelet with more than two
coefficients.
The usual way out is to treat all vectors as periodic. It turns out that this works. Thus, we use s38 = s30 ,
s39 = s31 , etc. If necessary, this also applies in the negative direction: s3−1 = s37 , and so on. The period adapts
to the vector length: s2 is periodic with length 4, s1 is periodic with length 2, and so on.
Let me write out the complete formulas for the skj in this case:
s20 = h̄0 s30 + h̄1 s31 + h̄2 s32 + h̄3 s33

s21 = h̄0 s32 + h̄1 s33 + h̄2 s34 + h̄3 s35
s22 = h̄0 s34 + h̄1 s35 + h̄2 s36 + h̄3 s37
s23 = h̄0 s36 + h̄1 s37 + h̄2 s30 + h̄3 s31
s10 = h̄0 s20 + h̄1 s21 + h̄2 s22 + h̄2 s23
s11 = h̄0 s22 + h̄1 s23 + h̄2 s20 + h̄2 s21
s00 = h̄0 s10 + h̄1 s11 + h̄2 s10 + h̄3 s11
The same applies for the reconstruction:
s10 = h0 s00 + h2 s00 + g0 d00 + g2 d00

s11 = h1 s00 + h3 s00 + g1 d00 + g3 d00
s20 = h0 s10 + h2 s11 + g0 d10 + g2 d11
s21 = h1 s10 + h3 s11 + g1 d10 + g3 d11
s22 = h0 s11 + h2 s10 + g0 d11 + g2 d10
s23 = h1 s11 + h3 s10 + g1 d11 + g3 d10
s30 = h0 s20 + h2 s21 + g0 d20 + g2 d21
s31 = h1 s20 + h3 s21 + g1 d20 + g3 d21
s32 = h0 s21 + h2 s22 + g0 d21 + g2 d22
s33 = h1 s21 + h3 s22 + g1 d21 + g3 d22
s34 = h0 s22 + h2 s23 + g0 d22 + g2 d23
s35 = h1 s22 + h3 s23 + g1 d22 + g3 d23
s36 = h0 s23 + h2 s20 + g0 d23 + g2 d20
s37 = h1 s23 + h3 s20 + g1 d23 + g3 d20
Here are the actual numbers (to 6 decimals):
1 2.31079 5.80232 4.59619
2 5.5968 0.697677 3.60953
3 0.482963 −1.53678 −1.53678
4 0.801841 −1.01226 −1.01226
⇔ ⇔ ⇔
3 0 0 0
−1 1.70771 1.70771 1.70771
0 −0.647048 −0.647048 −0.647048
1 −0.353553 −0.353553 −0.353553
Another possibility is to use end-point correction formulas. That means that near the ends we use different
coefficients for decomposition and reconstruction. This has certain advantages but is harder to program.
See [CDV] for more details.

Section 5.6 in Daubechies [Dau92]. The survey article by Jawerth and Weldens [JS]. An article by
Cody [Cod92] in Dr. Dobb’s Journal of Computer Calisthenics (or whatever the full title is). This article has
a computer program with it, but that does not seem very useful for our purposes (for example, it is limited
to wavelets with no more than 6 coefficients).
5.6. Bibliography
[CDV] A. Cohen, I. Daubechies, and P. Vial, Wavelets and fast wavelet transform on the interval, preprint.
[Cod92] Mac A. Cody, The fast wavelet transform: Beyond Fourier transforms, Dr. Dobb’s J. (1992), 16–28,
100–101.
CHAPTER 6
Construction of Wavelets
So far, we have discussed the ideas behind the introduction of wavelets, and the mechanics of wavelet-based
decomposition and reconstruction algorithms. The next big question is: how do we find wavelets?
In this section, we will address the question of finding φ, ψ once we have the recursion coefficients {hk },
{gk }. After that, we will worry about how to find these coefficients. As a preparation for both of these tasks,
we introduce some auxiliary functions.
Assume {hk }, {gk } are given and satisfy
X
|hk | < ∞,
k
X
|gk | < ∞,
k
(6.1) X
|khk | < ∞,
k
X
|kgk | < ∞.
k
We define
1 X
H(ξ) = √ hk e−ikξ ,
2 k
1 X
G(ξ) = √ gk e−ikξ .
2 k
Obviously, H and G are 2π-periodic functions, given in the form of Fourier Series. If {hk }, {gk } have finite
length, functions of this type are called trigonometric polynomials.
The conditions in (6.1) insure that the series converge everywhere to differentiable functions.
We also define
1 X
h(z) = √ hk z k ,
2 k
1 X
g(z) = √ gk z k .
2 k
These functions are polynomials or power series (with negative exponents allowed).
We see from the definition that
H(ξ) = h(e−iξ ),
G(ξ) = g(e−iξ ).
What is the point of these functions? The coefficients {hk }, {gk }, the functions H, G and the functions
h, g all contain exactly the same information, just in different form. All of them have their uses, and we
will switch back and forth between them, using whichever of them is easiest to use at the moment. This is
similar to switching back and forth between the time and frequency domains, using the Fourier transform.
There are three approaches I am aware of that construct φ from the {hk }. All of them are useful, so we
will do them all. Once you have φ, it is of course easy to find ψ from the recursion relation.
6-1
6.1. Approach 1
We take the Fourier Transform of the recursion relation
√ X
φ(t) = 2 hk φ(2t − k)
k
and find

√ X 1 −ikξ/2
φ̂(ξ) = 2 hk e φ̂(ξ/2)
2
k
( )
1 X
= √ hk e−ikξ/2 φ̂(ξ/2) = H(ξ/2)φ̂(ξ/2)
2 k
This relationship is very important, so I’ll highlight it:
ξ ξ
φ̂(ξ) = H( )φ̂( )
2 2
Repeating this process, we find formally
 
ξ ξ ξ Y∞
ξ 
φ̂(ξ) = H( )H( )φ̂( ) = · · · = H( j ) φ̂(0).
2 4 4  2 
j=1
Conditions (6.1) insure that the infinite product converges. Indeed, since H is differentiable at 0, and
H(0) = 1, we have
|H(ξ)| ≤ 1 + c · |ξ| ≤ ec|ξ|
for ξ near 0. For any ξ, choose J large enough so that the previous estimate holds for ξ/2n , n ≥ J. Then

Y P∞ −j
∞ ξ
H( j ) ≤ e
c|ξ| 2
= ec|ξ|/2
J−1
< ∞.

j=J
j=J 2
The total product is then the product of the first J terms, times something finite.
In general, it is too hard to actually evaluate the infinite product, but we can make some useful observa-
tions.R √
If φ(t) dt = 2π φ̂(0) = 0, then φ = 0, which is not very interesting. Thus, the scaling function must
have a nonzero integral. Usually, we require
Z
1
φ(t) dt = 1 ⇔ φ̂(0) = √ .
2
It can be shown that this implies
Z
|φ(t)|2 dt = 1,
which is exactly the orthonormalization condition we want.

To summarize, and give the corresponding formula for ψ:
1 Y ξ
H
φ̂(ξ) = √ ( ),
2π j=1 2j
∞
ξ ξ 1 ξ Y ξ
ψ̂(ξ) = G( )φ̂( ) = √ G( ) H( j ).
2 2 2π 2 j=2 2
6.2. Approach 2
R
Pick any φ(0) ∈ L1 ∪ L2 with φ(0)(t) dt = 1, and define recursively
√ X
φ(n+1)(t) = 2 hk φ(n) (2t − k).
k
On the Fourier Transform side, we see

ξ ξ
φ̂(1)(ξ) = H( )φ̂(0)( )
2 2
ξ (1) ξ ξ ξ ξ
φ̂ (ξ) = H( )φ̂ ( ) = H( )H( )φ̂(0) ( )
(2)
2 2 2 4 4
...
 
Yn
ξ  ξ
φ̂(n)(ξ) = H( j ) φ̂(0)( n ),
 2  2
j=1
so
∞
Y ξ
φ̂(n)(ξ) → φ̂(0) H( ) = φ̂(ξ).
j=1
2j
(0)
Thus, we could start with any reasonable φ (like χ[0,1] ) and keep substituting it into the recursion formula.
6.3. Approach 3
This approach works only for finite sequences {hk }, but is numerically the most useful.
Theorem 6.2. If {hk } = {hL , hL+1 , . . . , hM }, then φ is supported in the interval [L, M ].
Proof: Choose any φ(0) with support in [L − `, M + m], and use the iteration from approach 2. Since
φ(0)(2t − k) has support in [(L − ` + k)/2, (M + m + k)/2],
√ X
φ(1)(t) = 2 hk φ(0) (2t − k)
k
has support in
M
[
L−`+k M +m+k L−`+L M +m+M ` m
, ⊂ , = [L − , M + ].
2 2 2 2 2 2
k=L
Likewise, φ(2) has support in [L − (`/4), M + (m/4)], and so on. In the limit, we find that φ has support in
[L, M ].
Approach 3 is best illustrated by an example, before describing it in general.
Example: Take the Daubechies wavelet with two vanishing moments, with coefficients
√
1+ 3
h0 = √
4 2
√
3+ 3
h1 = √
4 2
√
3− 3
h2 = √
4 2
√
1− 3
h3 = √
4 2
By the previous theorem, φ has support on the interval [0, 3].
Write out the recursion formulas for the values of φ at the integers. The only interesting integers are 0,
1, 2, 3, of course, since φ is zero at all the others.
√
φ(0) = 2[h0 φ(0) + h1 φ(−1) + h2 φ(−2) + h3 φ(−3)]
√
= 2h0 φ(0)
√
φ(1) = 2[h0 φ(2) + h1 φ(1) + h2 φ(0) + h3 φ(−1)]
√
= 2[h0 φ(2) + h1 φ(1) + h2 φ(0)]
√
φ(2) = 2[h0 φ(4) + h1 φ(3) + h2 φ(2) + h3 φ(1)]
√
= 2[h1 φ(3) + h2 φ(2) + h3 φ(1)]
√
φ(3) = 2[h0 φ(6) + h1 φ(5) + h2 φ(4) + h3 φ(3)]
√
= 2h3 φ(3)
The first equation shows that φ(0) = 0, and the last equation shows φ(3) = 0. What is left is
√
φ(1) = 2[h0 φ(2) + h1 φ(1)]
√
φ(2) = 2[h2 φ(2) + h3 φ(1)]
which is an eigenvalue problem:

√ h h0 φ(1) φ(1)
2 1 =
h3 h2 φ(2) φ(2)
If we normalize the solution so that φ(1) + φ(2) = 1, we get
√
1+ 3
φ(1) =
2√
1− 3
φ(2) = .
2
From this we can calculate the values at the half-integers:
√
√ 2+ 3
φ(1/2) = 2[h0 φ(1) + h1 φ(0) + h2 φ(−1) + h3 φ(−2)] =
√ 4
φ(3/2) = 2[h0 φ(3) + h1 φ(2) + h2 φ(1) + h3 φ(0)] = 0
√
√ 2− 3
φ(5/2) = 2[h0 φ(5) + h1 φ(4) + h2 φ(3) + h3 φ(2)] = ,
4
then the values at the quarter-integers, and so on.
In general, the calculation goes exactly the same way: if we have {hL , . . . , hM }, then φ is supported on
the interval [L, M ], and we need to find the values at the integers L, . . . ,M .
The first and last equations are
√
φ(L) = 2hL φ(L),
√
φ(M ) = 2hM φ(M ).
√ √
Unless hL = 1/ 2 or hM = 1/ 2, φ is zero at the endpoints. We make a vector out of the remaining
unknown values  
φ(L + 1)
φ= ... 
φ(M − 1)
and get an eigenvalue problem
Lφ = φ,
where  
hL+1 hL
hL+3 hL+2 hL+1 hL 
√ 



L = 2 .. 
 . 
 hM hM −1 hM −2 hM −3 
hM hM −1
We know from approach 1 that a solution must exist, but we can also see this directly: A theorem from
linear algebra states that a matrix A and its transpose AT have the same eigenvalues. It follows from
X X √
h2k = h2k+1 = 1/ 2
k k
T T
that (1, 1, 1, . . . , 1) is an eigenvector of L with eigenvalue 1 (check that!), so L must also have an eigenvector
with eigenvalue 1.

The best place to look is Strang’s article [Str89]. He covers pretty much the same material that I covered
here, but it often helps to see it phrased in a couple of different ways. The third approach is done in section
5.2 in Chui’s book [Chu92]. Other than that, I could not find this material in the Chui or Daubechies books.
It is probably in there somewhere, but not easy to find.
6.5. Bibliography
[Str89] Gilbert Strang, Wavelets and dilation equations: A brief introduction, SIAM Rev. 31 (1989), no. 4,
614–627.
CHAPTER 7
Designing Wavelets with Nice Properties
In chapter 4 we determined some basic properties the recursion coefficients {hk } must have to make the
decomposition/reconstruction work. We will repeat those conditions here, and also rephrase them in terms
of H(ξ), and h(z). Along the way, we will answer the question of how to find the {gk } from {hk }. One
possibility was given in chapter 4, but now we can give the complete answer.
Then we will discuss what other desirable properties we may want our wavelets to have, and how they are
expressed in terms of {hk }, H(ξ) and h(z). Together, this will allow us to write down the set of equations
we need to solve to find wavelets with particular properties. In the next chapter, we will actually solve these
equations for one important special case.
7.1. Basic Properties

7.1.1. In terms of {hk }. Let’s review what we did in chapter 4. We determined that the following
conditions are necessary for {hk }, {gk } to produce a scaling function and wavelet:
X √
(a) hk = 2
k
X
(b) hk h̄k+2j = δ0j for all j ∈ Z
k
X X √
(c) h2k = h2k+1 = 1/ 2
k k
X
(d) gk = 0
k
X
(e) gk ḡk+2j = δ0j for all j ∈ Z
k
X
(f) hk ḡk+2j = 0 for all j ∈ Z
k
The only really essential ones are (a) and (b). They imply (c) (we will prove that shortly), and the rest are
automatically satisfied for a suitable choice of {gk } (which we will also spell out shortly).
Condition (a) must be satisfied to make the recursion relation work. Condition (b) says that the integer
translates of φ should be orthonormal.
7.1.2. In terms of H(ξ). Recall that
1 X
H(ξ) = √ hk e−ikξ
2 k
1 X
G(ξ) = √ gk e−ikξ
2 k
In terms of H, G, the same conditions as above read
(a) H(0) = 1
(b) |H(ξ)| + |H(ξ + π)|2 = 1
2
(c) H(π) = 0
(d) G(0) = 0
(e) |G(ξ)|2 + |G(ξ + π)|2 = 1
(f) H(ξ)G(ξ) + H(ξ + π)G(ξ + π) = 0
7-1
Proof: For (a), (c), (d), this is easy to see by writing the series out. Let us prove (b), the rest are proved
in basically the same way. We calculate
1 X
H(ξ) = √ hk e−ikξ
2 k
1 X
H(ξ) = √ h̄` ei`ξ
2 `
1 X 1 X
H(ξ + π) = √ hk e−ik(ξ+π) = √ (−1)k hk e−ikξ
2 k 2 k
1 X
H(ξ + π) = √ (−1)` h̄` ei`ξ
2 `
1X
|H(ξ)|2 = H(ξ)H(ξ) = hk h̄` e−i(k−`)ξ
2
k,`
1X
|H(ξ + π)|2 = H(ξ + π)H(ξ + π) = (−1)k−l hk h̄` e−i(k−`)ξ
2
k,`
1 X
|H(ξ)|2 + |H(ξ + π)|2 = 1 + (−1)k−l hk h̄`e−i(k−`)ξ
2
k,`
( )
X X X
−i(k−l)ξ
= hk h̄` e = hk h̄k−2j e−2ijξ .
k,` j k
k−` even
The last expression is the Fourier series for the 2π-periodic function |H(ξ)|2 + |H(ξ + π)|2 . The function 1
has Fourier coefficients δ0j , which proves the equivalence.
Note that (c) follows from (a) and (b), as stated before.
7.1.3. In terms of h(z). Instead of rederiving everything from scratch, we just translate it from H(ξ).
H(ξ) h(z)
e−iξ z
ξ=0 z=1
ξ=π z = −1
eiξ 1/z
e−i(ξ+π) −z
ei(ξ+π)
P −1/z P
H(ξ) = √12 k hk e−ikξ h(z) = √12 k hk z k
P P
H(ξ) = √12 k h̄k eikξ h̄(z) = √12 k h̄k z −k
P P
H(ξ + π) = √12 k (−1)k hk e−ikξ h(−z) = √12 k (−1)k hk z k
P P
H(ξ + π) = √12 k (−1)k h̄k eikξ h̄(−z) = √12 k (−1)k h̄k z −k
The function h̄ is in general not the complex conjugate of h. It is defined by the sum above, or equivalently
by
h̄(z) = h(1/z̄).
Note, however, that many authors use only z with |z| = 1, in which case h̄ is the complex conjugate of h.
The conditions in terms of h(z), g(z) are now
(a) h(1) = 1
(b) h(z)h̄(z) + h(−z)h̄(−z) = 1
(c) h(−1) = 0
(d) g(1) = 0
(e) g(z)ḡ(z) + g(−z)ḡ(−z) = 1
(f) h(z)ḡ(z) + h(−z)ḡ(−z) = 0
7.2. Choice of {gk }
We can now answer the following question: Assuming that the {hk } satisfy (a), (b) (and therefore auto-
matically (c)), what choices of {gk } will make the remaining conditions valid?
7.2.1. In terms of H(ξ). Define the matrix

H(ξ) H(ξ + π)
M (ξ) =
G(ξ) G(ξ + π)
and its complex conjugate transpose

∗ H(ξ) G(ξ)
M (ξ) = .
H(ξ + π) G(ξ + π)
Conditions (b), (e), (f) are equivalent to

∗ 1 0
M (ξ) · M (ξ) =
0 1
so M ∗ is the inverse matrix of M . (A matrix which satisfies M ∗ = M −1 is called unitary). One consequence
of this is
1 0
M ∗(ξ) · M (ξ) =
0 1
which leads to a few more relations, for example
|H(ξ)|2 + |G(ξ)|2 = 1.
A more important consequence is that we can now get direct relations between H and G, using the inversion
formula for 2 × 2 matrices
−1
a b 1 d −b
=
c d ad − bc −c a
Define the determinant of M (ξ) as
∆(ξ) = H(ξ)G(ξ + π) − G(ξ)H(ξ + π),
then
−1 1 G(ξ + π) −H(ξ + π) H(ξ) G(ξ)
M (ξ) = = = M ∗ (ξ)
∆(ξ) −G(ξ) H(ξ) H(ξ + π) G(ξ + π)
Compare the lower left corner:
G(ξ) = −∆(ξ)H(ξ + π)
Compare the top left corner:
G(ξ + π) = ∆(ξ)H(ξ) ⇒ G(ξ) = ∆(ξ + π)H(ξ + π)
Therefore, we must have
∆(ξ + π) = −∆(ξ).
Also, the determinant of any unitary matrix must have absolute value 1. One can check directly that these
conditions on ∆(ξ) are also sufficient, so
If H(ξ) has been found, then all suitable G(ξ) are of the form
G(ξ) = −∆(ξ)H(ξ + π),
where ∆(ξ) is any 2π-periodic function with
∆(ξ + π) = −∆(ξ),
|∆(ξ)| = 1 for all ξ.
Conversely, every such ∆(ξ) will produce a suitable G(ξ).
As a special case, if {hk }, {gk } are finite sequences, ∆(ξ) must be of the form
∆(ξ) = α · e−i(2j+1)ξ
for some j ∈ Z, |α| = 1. (This will be easier to see in the next section, when we rephrase things in terms of
h(z), g(z)). This leads to
gk = α · (−1)k h̄2j+1−k , j ∈ Z, |α| = 1.
Thus, once the {hk } are given, there is essentially only one choice of finite {gk } (which is {gk } = {hk }
backwards, with alternating sign), up to multiplication by a complex number of absolute value one, and a
shift.
If {hk }, {gk } are real, then α = ±1, which leads to
gk = ±(−1)k h̄2j−1−k.
The choice ∆(ξ) = e−iξ leads to gk = (−1)k h̄1−k , which is what some authors prefer. If {hk } = {h0 , h1 , . . . , hL,
the more usual choice is ∆(ξ) = e−iLξ , which leads to gk = (−1)k h̄L−k , since then {gk } = {g0 , . . . , gL }. It
follows from property (b) that L must be odd (exercise!).
7.2.2. In terms of h(z). We copy the approach from H(ξ), and define

h(z) h(−z) −1 h̄(z) ḡ(z)
M (z) = , M (z) = ,
g(z) g(−z) h̄(−z) ḡ(−z)
The determinant of M (z) is

∆(z) = h(z)g(−z) − h(−z)g(z)
From comparing the inversion formula for M (z) with M −1 (z), we find as before
g(z) = −∆(z)h̄(−z)
∆(−z) = −∆(z)
¯
∆(z)∆(z) =1
In summary
If h(z) has been found, then all suitable g(z) are of the form
g(z) = −∆(z)h̄(−z)
where ∆(z) is any function with
∆(−z) = −∆(z),
¯
∆(z)∆(z) =1
Conversely, every such ∆(z) will produce a suitable g(z).
As a special case, if {hk }, {gk } are finite sequences, then h(z), g(z), ∆(z) are polynomials (with positive
or negative powers of z). ∆(z) is an odd polynomial and is never zero, except possibly for z = 0 (because
M −1 (z) is defined everywhere except possibly at z = 0), so ∆(z) must be of the form
∆(z) = α · z 2j+1
¯
for some j ∈ Z, |α| = 1 (since ∆(z)∆(z) = |α|2 = 1).
If {hk }, {gk } are real, then α = ±1, which again leads to
gk = ±(−1)k h̄2j+1−k .
The choice ∆(z) = z leads to gk = (−1)k h̄1−k , the choice ∆(z) = z L leads to gk = (−1)k h̄L−k .
7.2.3. In terms of {hk }. The easiest way to derive conditions in terms of {hk } is to just translate the
conditions from H(ξ) or h(z). This gives
If {hk } have been found, then all suitable {gk } are of the form
X
gk = − αj+k (−1)j h̄j
j
where {αj } is any sequence with

α2j = 0 for all j
X
αk αk+2j = δ0j
k
Conversely, every such {αk } will produce a suitable {gk }.

These conditions are not all that useful, but given here for completeness.
If {hk }, {gk } are finite sequences, the only choice is |α2j+1| = 1 for some j, with all other αk = 0, which
again leads to
gk = α(−1)k h̄2j+1 , |α| = 1.
The standard choices are α1 = 1, that is gk = (−1)k h̄1−k , or αL = 1, that is gk = (−1)k h̄L−k .
7.3. Other Desirable Properties

We may be interested in wavelets with other “nice” properties, for specific application. The following
subsections address four such properties and how they are expressed in terms of {hk }, H(ξ), h(z).
7.3.1. Compact Support. We have already seen that compact support of φ, ψ is equivalent to having
finite sequences {hk }, {gk }, which is equivalent to H(ξ) being a trigonometric polynomial, or h(z) being a
polynomial.
The main reason for wanting a support as small as possible is numerical efficiency. The amount of
computation necessary is directly proportional to the number of wavelet recursion coefficients.
7.4. Symmetry
In signal processing applications, wavelets are considered as special cases of filters. Symmetric or anti-
symmetric wavelets have a desirable property property called linear phase response.
φ is symmetric about a point a if
φ(a + t) = φ(a − t).
φ is anti-symmetric about a if
φ(a + t) = −φ(a − t).
For discrete filters, a must be an integer if we want linear phase response. Symmetry or anti-symmetry is
expressed in terms of {hk } by
ha+k = ±ha−k .
This is not hard to prove, but I won’t write it out here. Equivalently,
H(ξ) = e−iaξ P (cos ξ)
or
h(z) = z a [P (z) + P (1/z)]
where in each case, P is a polynomial or power series (with non-negative powers only).
The bad news is that there are no symmetric or antisymmetric wavelets with compact support, except
the Haar wavelet. There do exist wavelets that are almost symmetric, and symmetric biorthogonal wavelets.
7.5. Vanishing Moments
The mth moment of ψ is Z
tm ψ(t) dt.
It is often desirable to use wavelets with a certain number M of vanishing moments, especially for the
compression of smooth signals. The reason is the following:
Z Z
−n/2
dk = hf, ψk i = f(t)2 ψ(2 t − k) dt = 2
n n n/2 n
f(2n (s + k))ψ(s) ds
with the substitution s = 2n t − k. Assume now that f is sufficiently smooth, and expand it in a Taylor
series:
1 1 −n M (M ) −n
f(2−n k + 2−n s) = f(2−n k) + (2−n s)f 0 (2−n k) + (2−n s)2 f 00 (2−n k) + . . . (2 s) f (2 k) + . . .
2! M!
When we substitute this into the integral, the first M terms vanish, so
Z
1 −n(M +(1/2)) (M ) −n
hf, ψk i =
n
2 f (2 k) sM φ(s) ds + · · · = O(2−n(M +(1/2)))
M!
if f (M ) is bounded. If f is smooth and ψ has several vanishing moments, we expect that dnk = hf, ψkn i will
be quite small for moderately large n. We can neglect these small coefficients, and obtain a much smaller
number of wavelet coefficients that can be reconstructed into a function very close to the original f. Thus,
wavelets with many vanishing moments are useful for data compression.
In some cases, it is also desirable to demand vanishing moments for φ (except the first, which must be 1):
Z
φ(t) dt = 1
Z
tm φ(t) dt = 0, m = 1, 2, . . . , M − 1.
The same argument as above leads to

snk = hf, φnk i = f(2−n k) + O(2−n(M +(1/2))).
This is useful for faster evaluation of the discrete wavelet transform: with a very small error, the snk are just
values of f, so we can skip the full computation of the snk .
How are vanishing moments expressed in terms of H(ξ)? Let us assume that φ̂, ψ̂, H(ξ), G(ξ) are
sufficiently often differentiable. (For φ, ψ with compact support, this is automatic).
First, we observe that Z √
tm ψ(t) dt = 2π[tm ψ]ˆ(0),
and
[tm ψ]ˆ(ξ) = im ψ̂(m) (ξ),
so if ψ has M vanishing moments, then
ψ̂(0) = ψ̂0 (0) = · · · = ψ̂(M −1) (0) = 0.
Next, we differentiate the relation
ψ̂(ξ) = G(ξ/2)φ̂(ξ/2)
repeatedly:
1h 0 i
ψ̂0 (ξ) = G (ξ/2)φ̂(ξ/2) + G(ξ/2)φ̂0 (ξ/2)
2
00 1 h 00 i
ψ̂ (ξ) = G (ξ/2)φ̂(ξ/2) + 2G0 (ξ/2)φ̂0 (ξ/2) + G(ξ/2)φ̂00 (ξ/2)
4
000 1 h 000 i
ψ̂ (ξ) = G (ξ/2)φ̂(ξ/2) + 3G00 (ξ/2)φ̂0 (ξ/2) + 3G0 (ξ/2)φ̂00 (ξ/2) + G(ξ/2)φ̂000 (ξ/2)
8
...
Now, evaluate for ξ = 0:
ψ̂(0) = G(0)φ̂(0)
1h 0 i
ψ̂0 (0) = G (0)φ̂(0) + G(0)φ̂0 (0)
2
00 1 h 00 i
ψ̂ (0) = G (0)φ̂(0) + 2G0 (0)φ̂0 (0) + G(0)φ̂00 (0)
4
000 1 h 000 i
ψ̂ (0) = G (0)φ̂(0) + 3G00 (0)φ̂0 (0) + 3G0 (0)φ̂00 (0) + G(0)φ̂000 (0)
8
...
Keeping in mind that φ̂(0) 6= 0, we deduce in turn that G and its first M − 1 derivatives must vanish at the
origin. We want a condition on H, so we use the basic relationship G(ξ) = −∆(ξ)H(ξ + π) from before to
deduce that M vanishing moments are equivalent to
H(π) = H 0 (π) = . . . H (M −1)(π) = 0.
The theory of complex functions states that this is equivalent to

M
1 + e−iξ
H(ξ) = L(ξ),
2
where L is again a trigonometric polynomial.
Vanishing moments for φ are likewise expressed as
H(0) = 1, H 0 (0) = · · · = H (M −1) (0) = 0,
or
M
1 − e−iξ
H(ξ) = 1 + K(ξ).
2
We can now translate this easily into conditions on h(z) (exercise): M vanishing moments for ψ mean
h(−1) = h0 (−1) = . . . h(M −1)(−1) = 0
or
M
1+z
h(z) = p(z),
2
where p is a polynomial or power series. M vanishing moments for φ mean
h(1) = 1, h0 (1) = · · · = h(M −1) (1) = 0,
or
M
1−z
h(z) = 1 + q(z).
2
Just as easily (another easy exercise), we obtain that M vanishing moments for ψ mean
X
(−1)k k m hk = 0 m = 0, 1, . . . , M − 1
k
and M vanishing moments for φ mean

X √
hk = 2
k
X
k m hk = 0, m = 1, 2, . . ., M − 1.
k
7.5.1. Smoothness. For certain applications (like solving differential equations), we need wavelets with
a number of derivatives. Also, numerical evidence suggests that a minimum amount of smoothness is
necessary for good numerical performance, even though nobody has actually proved that, as far as I know.
The smoothness of ψ is always the same as that of φ, by the recursion relations.
First, we need to define what we mean by smoothness.
Definition: A function f is Hölder continuous of order α, 0 < α < 1, if
|f(x) − f(y)| ≤ C · |x − y|α
for some constant C. All such f are uniformly continuous.
Example: f(t) = |t|α is Hölder continuous of order α.
Definition: C n is the set of n times continuously differentiable functions (n = 0, 1, 2, . . .). In particular,

C 0 is the set of continuous functions.
For r ≥ 0, we split it as r = n + α, where n is a non-negative integer, 0 < α < 1, and define C r as the set
of functions in C n whose nth derivative is Hölder continuous of order α.
This is a nested set of spaces. If r < s, then C s ⊂ C r . We use r as a measure of smoothness. The larger
r is, the smoother the function becomes.
Theorem 7.1. If Z
ˆ
|f(ξ)|(1 + |ξ|)r dξ < ∞, r≥0
then f ∈ C r .
The proof is not hard, but lengthy, so we’ll skip it.
Corollary 7.2. If
ˆ
|f(ξ)| ≤ C · (1 + |ξ|)−1−r−
for some > 0, the f ∈ C r .
In other words: smoothness of f is directly related to how fast fˆ goes to 0 at infinity. This should come
as no surprise. For example, if f ∈ L1 , f 0 ∈ L1 , then (f 0 )ˆ(ξ) = iξ fˆ(ξ) goes to zero at infinity, so fˆ decays
at least as fast as 1/|ξ|. If f has n derivatives in L1 , then fˆ decays at least as fast as |ξ|−n , by the same
argument. The corollary just expresses that in more detail.
Recall now the formula
∞
1 Y ξ
φ̂(ξ) = √ H( j ).
2π j=1 2
If ψ has M vanishing moments, then
M
1 + e−iξ
H(ξ) = L(ξ).
2
One can calculate explicitly
Y∞
1 + e−iξ/2 1 − e−iξ
j
sin(ξ/2)
= = e−iξ/2
j=1
2 iξ ξ/2
and
1 − e−iξ M
= |sinc(ξ/2)|M
iξ
which decays like |ξ|−M . Since
∞ ∞
1 Y ξ 1 Y ξ
|φ̂(ξ)| = √ H( j = √ |sinc(ξ/2)|M |L( j )|,
2π j=1 2 2π j=1
2
the sinc term will provide fast decay, as long as the infinite product does not grow too fast.
There is a detailed investigation of this in Daubechies [Dau92]. I will just quote the simplest of the
theorems one can prove.
Theorem 7.3. If
sup |L(ξ)| < 2M −α−1,
ξ
then φ ∈ C α .
Example: For the Haar wavelets,

1 1 −iξ
H(ξ) = + e ,
2 2
so M = 1, L(ξ) = 1. The theorem says that φ is in C α for any α < 0, which tells us nothing about continuity.
Of course that is as it should be, since φ is not continuous in this case.
For the Daubechies wavelets with 2 vanishing moments,
2 " √ √ #
1 + e−iξ 1 + 3 1 − 3 −iξ
H(ξ) = + e .
2 2 2
√
The maximum value is |L(π)| = 3, and we calculate
ln 3
α <1− ≈ 0.207159.
2 ln 2
With better methods, one can show that the true Hölder coefficient is about 0.550. This φ is continuous,
but not differentiable

The properties listed in this section and relevant discussions can be found in basically all introductory
papers and books on wavelets. Usually, though, they are spread throughout the text, and they are usually
not given in all three notations. The book by Daubechies [Dau92] is the standard reference.
7.7. Bibliography
CHAPTER 8
Daubechies Wavelets
As a one of the most important examples in practice, we will derive the class of Daubechies wavelets in this
section, using the equations from the last chapter.
All wavelets coefficient sequences of finite length must have even length (this follows from property (b)).
For length 2, the equations read
p
h0 = 1/2
p
h1 = 1/2
h20 + h21 = 1
which leads to the Haar wavelets as the only possibility.
For length 4, the equations are
p
h0 + h2 = 1/2
p
h1 + h3 = 1/2
h20 + h21 + h22 + h23 = 1
h0 h2 + h1 h3 = 0
p p
These equations have an infinite number of solutions: pick any h0 ∈ [−1/2 + 1/8, 1/2 + 1/8], then
p q p
h1 = 1/8 ± 1/8 − h20 + 1/2h0
p
h2 = 1/2 − h0
p
h3 = 1/2 − h1 .
For higher lengths, this approach becomes unwieldy.
Keeping in mind that vanishing moments are useful in many applications, Ingrid Daubechies set out to
find the shortest real wavelet sequences with M vanishing moments, for M = 1, 2, . . . .
Here is an outline of the steps she went through. We will fill in the details below:
Step 1. Fix M , the number of vanishing moments. We write down the equations we need to satisfy,
using the H(ξ) approach:
M
1 + e−iξ
H(ξ) = L(ξ)
2
H(0) = 1
|H(ξ)|2 + |H(ξ + π)|2 = 1
Step 2. Convert the second and third condition to a condition on L:
M M
ξ ξ
cos2 |L(ξ)|2 + sin2 |L(ξ + π)|2 = 1.
2 2
Step 3. Show that
ξ
|L(ξ)|2 = P (sin2 ),
2
2 ξ
|L(ξ + π)| = P (cos )
2
2
for some polynomial P .
Step 4. Convert the equation for L into an equation for P
(1 − y)M P (y) + yM P (1 − y) = 1
8-1
Step 5. Find the polynomial P of lowest possible degree that solves this equation, and convert it back
into |L(ξ)|2 .
Step 6. Factor |L(ξ)|2 to find L(ξ), and put everything back together.
Now let us look at the details:
Step 1. These equations are taken straight from the last chapter.
Step 2. We calculate
M M
1 + e−iξ 1 + eiξ
|H(ξ)|2 = H(ξ) · H(ξ) = L(ξ) · L(ξ)
2 2
M M
1 + e−iξ 1 + eiξ 2 ξ
= · |L(ξ)| = cos
2
|L(ξ)|2 .
2 2 2
Likewise,
M M
ξ+π 2 ξ
|H(ξ + π)| = cos
2 2
|L(ξ + π)| = sin
2
|L(ξ + π)|2 .
2 2
Putting these two together, we obtain the equation listed in step 2. Note that since cos 0 = 1, sin 0 = 0,
we get automatically that |L(0)| = 1. By introducing a scale factor of absolute value 1, if necessary, we can
assume that L(0) = 1, which takes care of the second condition in step 1.
Step 3. Here is the point where we need the fact that the coefficients are real. Since
X
L(ξ) = lk e−ikξ
k
X
L(ξ) = lj eijξ
j
we get X
|L(ξ)|2 = L(ξ)L(ξ) = lk lj e−i(k−j)ξ .
k,j
We take out the terms with j = k, and group together the remaining kj and jk terms:
X X h i
|L(ξ)|2 = lk2 + lk lj ei(j−k)ξ + e−i(j−k)ξ
k j>k
X X X
= lk2 +2 lk lj cos(j − k)ξ = αk cos kξ.
k j>k k
Using basic trig identities

cos 2ξ = 2 cos2 ξ − 1
cos 3ξ = 4 cos3 ξ − 3 cos ξ
...
we convert this sum to powers of cos ξ:
X X
|L(ξ)|2 = αk cos kξ = βk cosk ξ.
k k
then we use the substitution cos ξ = 1 − 2 sin (ξ/2) to get

2
X k
2 ξ ξ
|L(ξ)| =
2
γk sin = P (sin2 ).
2 2
k
The rest is easy:

ξ+π ξ
|L(ξ + π)|2 = P (sin2 ) = P (cos2 ).
2 2
Step 4. We make the substitution
ξ
y = sin2
2
and get immediately the desired equation in step 4.
Step 5. Using a theorem due to Bezout, it can be shown that the equation for P has a unique shortest
solution of degree M − 1. The solution itself can be found by a trick: Write the equation in the form
P (y) = (1 − y)−M − yM (1 − y)−M P (1 − y).

This must be valid for all y. For |y| < 1, we can expand (1 − y)−M into a power series. Since P is of degree
M − 1, and the second term on the right-hand side starts with a power of yM , the shortest P (y) must be
equal to the first M terms in the power series for (1 − y)−M , or
X
M −1
M +k−1 k
P (y) = y .
k
k=0
One thing I did not mention before is that we also must have P (y) ≥ 0 for y ∈ [0, 1], since P is the square
of L, and y = sin2 2ξ takes on values in [0, 1]. By a lucky coincidence, this is satisfied here for all M .
Maybe this is a good point to talk about the number of coefficients in these trig polynomials and regular
polynomials. Suppose L(ξ) has degree K. Then |L(ξ)|2 , written as a sum of exponentials, has terms going
from −K to K. We collapse the exponentials into cosines by grouping positive and negative powers together,
then we are back to summing from 0 to K. Nothing changes during the next few steps, so P also has degree
K, which we now know to be M − 1. This translates into an H of degree (2M − 1), or 2M coefficients {hk }.
The Daubechies wavelet with M vanishing moments has 2M coefficients.
Step 5. Now we have the P , and we have to find our way back to L. This process is called spectral
factorization. Daubechies used the following method, based on a theorem of Riesz.
Do all the substitutions backwards until you have |L(ξ)|2 in terms of complex exponentials again. To
make life easier, let z = e−iξ , so |L(ξ)|2 turns into a z −M +1 times a polynomial in z of degree 2M − 2. Find
all roots of this polynomial. These roots turn out to come in groups of four, consisting of zj , 1/zj , zj and
1/zj . If zj is real or on the unit circle, there are only two roots in a group instead of 4.
We select one pair zj , 1/zj from each group of 4, or one root from each pair. If we multiply the terms
(z − zk ) for all j together, we get the original polynomial (except for a constant). If we multiply the terms
(z − zj ) only for the selected roots, we get the square root we want, except for a constant (this is the content
of the Riesz theorem). We can then replace z by e−iξ again, and we are done.
This whole process is quite a mathematical tour de force. Let me illustrate it by deriving the Daubechies
wavelet with 1 and 2 vanishing moments. We can start with step 5, the other steps were only necessary for
the proof. These two cases can be done by hand. For more vanishing moments, these equations must be
solved numerically.
Daubechies Wavelets with 1 Vanishing Moment. This is pretty trivial: we find
P1 (y) = 1,
which translates into |L(ξ)|2 = 1, and obviously L(ξ) = 1. This leads to
1 + e−iξ
H(ξ) =
2
which are the Haar wavelets.
Daubechies Wavelets with 2 Vanishing Moments. Here
P2 (y) = 1 + 2y.
Recall that y = sin2 (ξ/2), plug that in and convert everything back to complex exponentials:
2
e−iξ eiξ z 1
|L(ξ)|2 = − +2− = z −1 − + 2z − .
2 2 2 2
√
The√corresponding polynomial is −z 2 /2 + 2z − 1/2, which has roots 2 ± 3. We choose one of them, like
2 + 3. The Riesz theorem says that L(ξ) should be
√ √
L(ξ) = const · (z − (2 + 3)) = const · (e−iξ − (2 + 3)).
The
√ constant can be found by multiplying it back together, or from the Riesz theorem. It turns out to be
( 3 − 1)/2.
So,
2 √
3 − 1 −iξ √ X 3
1 + e−iξ
H(ξ) = · · e − (2 + 3) = hk e−ikξ ,
2 2
k=0
where
√
1+ 3
h0 = √
4 2
√
3+ 3
h1 = √
4 2
√
3− 3
h2 = √
4 2
√
1− 3
h3 = √
4 2
The h coefficients for more Daubechies wavelets can be found in Daubechies’ book [Dau92]. They are
reproduced in appendix C.

The Daubechies wavelets were first derived in [Dau88]. This is the most readable 90+ page paper I have
ever encountered. In more compact form, the same derivation can be found in chapter 6 of [Dau92].
The derivation of wavelets with other nice properties can be found in [Dau93]. Among other things, this
paper contains a derivation of Coiflets, which are the shortest wavelets with M vanishing moments for both
φ and ψ.
8.2. Bibliography
[Dau88] Ingrid Daubechies, Orthonormal bases of compactly supported wavelets, Comm. Pure Appl. Math.
41 (1988), 909–996.
[Dau93] Ingrid Daubechies, Orthonormal bases of compactly supported wavelets II: Variations on a theme,
SIAM J. Math. Anal. 24 (1993), no. 2, 499–519.
CHAPTER 9
Other Topics
Without going into too much detail, we mention a few other topics and applications relating to wavelets.
This area keeps expanding, and for further information you are referred to the growing literature.
9.1. Signal Compression

In earlier sections, we talked quite a bit about applications of wavelets in signal processing. In fact, that
was one of the main motivators for wavelets. The emphasis there was on signal analysis, time frequency
resolution, and such. Now we want to consider a different application.
Recall the discussion of vanishing moments in chapter 7. We found that when wavelets with several
vanishing moments are used on a smooth signal, then most of the dkj coefficients in the decomposed signal
are very small. They can be set to 0 with very little error, resulting in data compression. Data compression
is one of the hottest topics in wavelet theory because of the commercial applications.
For high-quality speech or music, the average listener cannot detect any difference with compression
factors up to about 20. With compression factors around 100, the distortion is very noticeable, but the
original sound is still recognizable. Achievable compression is lower for low-quality sound, like telephone
conversations. Similar numbers are achieved in image compression.
In recent contests, wavelet-based signal compression could not beat compression schemes based on FFT
or related schemes, but that is likely to change soon. FFT has had a head start of several decades.
9.2. Wavelet Packets

In the standard wavelet decomposition, at the first level we decompose the original signal sn into sn−1
and dn−1 . At the next level, we decompose sn−1 into sn−2 and dn−2 , and so on. The dj are not processed
any farther. The corresponding decomposition tree looks about like figure 9.1. (The notation V j is more
useful than the previous notation Vj in this section).
Figure 9.1. Wavelet tree for n = 3
V
PPPP 3
)
PPq
V2 W2
, @@
, R
V1 W1
BBN
V0 W0
The idea behind wavelet packets is very simple: what if we decompose the dj also? This results in a
decomposition tree as in figure 9.2.
From a practical point of view, the implementation is very easy. At each level we decompose all partial
vectors instead of just the first one. It is customary to retain all the intermediate results instead of overwriting
them with the decomposed vector, for reasons that will become clear shortly. The resulting wave packet
decomposition of a vector of length 8 is shown in the figure 9.3.
It is easy to verify that a complete decomposition requires O(N log N ) operations for a vector of length
N , and N log N storage if we want to keep all levels.
From a mathematical point of view, we are subdividing the W j spaces into further subspaces. To describe
this process in more detail, we have to switch notation.
9-1
Figure 9.2. Wave packet tree for n = 3
V PPPP
3
)
PPq
0
V02 V12
, @ , @
,
@R , @R
V01 V11 V21 V31
BBN BBN BBN BBN

V00 V10 V20 V30 V40 V50 V60 V70
Figure 9.3. Wave packet decomposition of a vector for n = 3

x30 x31 x32 x33 x34 x35 x36 x37
2 2 2 2
x0 x1 x2 x3 x24 x25 x26 x27
1 1 1 1 1 1
x0 x1 x2 x3 x4 x5 x16 x17
x0 x1 x2 x3 x4 x5 x36 x37
3 3 3 3 3 3
The old space V j is now called V0j , the old W j becomes V1j . The spaces V2j , V3j etc are new. Each Vkj is
j−1 j−1
subdivided into V2k and V2k+1 .
The basis functions of Vk are shifted versions of some function φjk . The superscript indicates the level (φjk
j
is φ0k , compressed by a factor of 2j ), the subscript denotes the type of function (not a shift as it did before).
The new φj0 is the old φj , the new φj1 is the old ψ1j , the other ones are new functions that are generated by
recursion. Specifically, the old recursion relations
1 X
φj−1 (x) = √ hk φj (2x − k)
2 k
1 X
ψj−1 (x) = √ gk φj (2x − k)
2 k
are replaced by the single formula
1 X l j
2m+l = √
φj−1 hk φm (2x − k), l = 0, 1.
2 k
It should come as no surprise that the {h0k } are the old {hk }, the {h1k } are the old {gk }. The first new
functions showing up in the example are
1 X
φ12 (x) = √ hk ψ1 (x),
2 k
1 X
φ13 (x) = √ gk ψ1 (x),
2 k
where I have used the old notation on the right-hand side.
For the Haar wavelets, the resulting basis functions are called Walsh functions.
In general, these new basis functions don’t have any nice properties like time-frequency localization. The
wave packet decomposition would probably not be very useful in signal analysis. The main application lies
in signal compression.
As mentioned above, we usually save all the levels during decomposition. Observe that the information
contained in each subvector is equivalent to the information contained in the levels below it. We can select
any subset of decomposed vectors that contains all the original information. For example, we could select
the vectors corresponding to V01 , V20 , V30 , V12 in figure 9.2. The total length of any such subset is equal to
the length of the original vector.
The idea is to select the subset that gives the best compression, Since the usual wavelet decomposition is
one special case, we can do at least as well as that, maybe better.
The implementation works like this. First we do a complete wave packet decomposition. For a vector of
length N = 2n , we get n = log N levels. This takes O(N log N ) operations, and N log N storage locations.
Then, we search for the optimal decomposition from the ground up. For each subvector at the lowest
level, we set negligible elements to zero, and count how many are left. At the next lowest level, we do the
same, and compare the number of elements left in each subvector with the combined number of elements in
the two subvectors below it. Depending on which is smaller, we keep either the subvector at level 1, or the
two subvectors at level 0. Continue up the scale, for a total of O(N log N ) operations. When we are done,
we can throw away everything except the N optimal coefficients.
This process may or may not be useful in practice. It takes O(N log N ) operations instead of O(N ), and
we have to store additional information about the levels we decided to keep, which partially destroys the
savings from better compression.
Many variations on this scheme are possible. There is no law that says we have to use the same wavelet
coefficients at every level. We could vary those as well, trying to adapt them to specific properties of the
signal. There is room for many more theses and patents.
9.3. Biorthogonal Wavelets

So far, we have only considered orthogonal wavelets, where the translates of φ and ψ are orthogonal to
each other. In biorthogonal wavelets, we use two sets of scaling functions and wavelets, denoted by φ, ψ and
φ̃, ψ̃, and all orthogonality relations are between one function from each set. In more detail, we replace the
orthogonality relations
hφnj , φnk i = δjk

hψjn , ψkm i = δjk δnm
hφnj , ψkm i = 0
by
hφnj , φ̃nk i = δjk

hψjn , ψ̃km i = δjk δnm
hφnj , ψ̃km i = hφ̃nj , ψkm i = 0.
This corresponds to having two multiresolution approximation chains of spaces Vj , Wj and Ṽj , W̃j , with
Vj ⊥ W̃j
Wj ⊥ W̃k for j 6= k.
In practice, the only difference is that you use one set of coefficients {hk }, {gk } for decomposition, another
set {h̃k }, {g̃k } for reconstruction. The roles of the two sets of coefficients can be reversed.
The advantage is that biorthogonal wavelets are much easier to find, since you have twice as many
coefficients that need to satisfy the same number of constraints. Biorthogonal wavelet coefficients tend to
be much nicer numbers, often integers divided by some power of 2. Biorthogonal wavelets can be symmetric
or antisymmetric, and they can be shorter for the same number of vanishing moments.
The disadvantage is that you need two sets of coefficients, and that you need to start worrying about
numerical stability. Some biorthogonal wavelets lead to catastrophic growth of round-off error.
An example of biorthogonal wavelets is given by
j = −1 0 1 2
hj 3 2 -1
gj 1 -2 1
h̃j 1 2 1
g̃j 1 2 -3
√
(all coefficients need to be multiplied by 2/4).
9.4. Multi-Wavelets
Instead of having a scaling function and wavelet, you could have a scaling function and several wavelets,
all orthogonal to their translates and each other. You could extend that to the biorthogonal setting as well.
(1) (2)
Mathematically, this corresponds to decomposing each Vj space into several subspaces Vj−1 , Wj−1 , Wj−1 ,
etc.
In one extreme case (see Alpert [Alp92]), there are so many wavelets that you don’t have to use translates
any more.
9.5. Higher-dimensional Wavelets

The standard way to apply wavelets in higher dimensions is to do a one-dimensional decomposition for each
variable. In two dimensions, that means you get basis functions φ(x)φ(y), φ(x)ψ(y), ψ(x)φ(y), ψ(x)ψ(y).
This is what mathematicians call a tensor product.
There are also truly higher-dimensional wavelets, given by recursion relations that look the same as before:
X
φ(x) = 2−n/2 hk φ(2x − k)
k
but mean something different: each k is now a multi-index (k1 , k2, . . . , kn ), and 2x − k is (2x1 − k1 , 2x2 −
k2 , . . . ). Even more generally, the multi-indices k can vary on a non-rectangular regular grid.
The big problem here is that this gets very, very messy. Even the support of the scaling function is often
a fractal. In practice, people are sticking to tensor products so far.

The subjects of signal compression using wavelets and wavelet packets are both discussed in [CMQW91].
Recent papers on the subject are more likely to show up in the engineering literature.
An introduction to biorthogonal wavelets is [CDF92].
Multiple wavelets and higher-dimensional wavelets are mentioned in chapter 10 of Daubechies [Dau92],
or in some articles in [(ed92].
9.7. Bibliography
[Alp92] Bradley Alpert, Wavelets and other bases for fast numerical linear algebra, Wavelets: A Tutorial
in Theory and Applications (San Diego) (C. K. Chui, ed.), Wavelet Analysis and its Applications,
vol. 2, Academic Press, San Diego, 1992, pp. 181–216.
[CDF92] A. Cohen, Ingrid Daubechies, and J.-C. Feauveau, Biorthogonal bases of compactly supported
wavelets, Comm. Pure Appl. Math. 45 (1992), 485–560.
[CMQW91] Ronald R. Coifman, Yves Meyer, Steven Quake, and Mladen Victor Wickerhauser, Signal pro-
cessing and compression with wave packets, Proceedings of the Marseilles Wavelet Meeting 1989,
1991.
[Dau92] I. Daubechies, Ten lectures on wavelets, CBMS-NSF Regional Conference Series in Applied
Mathematics, vol. 61, SIAM, Philadelphia, 1992.
[(ed92] Charles K. Chui (ed.), Wavelets: A tutorial in theory and applications, Wavelet Analysis and
Its Applications, vol. 2, Academic Press, Boston, 1992.
APPENDIX A
Computer Information
We will be doing some computer experiments with wavelets as part of the homework. I will provide a number
of tools and data sets for this on Project Vincent (PV) machines. If you have access to matlab on another
system, you can probably copy all the subroutines and data files over and run your stuff there. However, I
will not give computer help on systems other than PV.
This section contains some hints on how to get started (if you are not familiar with PV), and where to
find the things you need for this course.
A.1. Finding a Machine

Public PV workstations are in Durham 139, in Durham 248 (when they are not used for a class), and
other places. There are four in the Math Department, in rooms Carver 451 through 457. You are allowed
to use the DEC 2100s, DEC 3100s and PCs there, but not the DEC 5000s. These rooms used to be open
Monday through Friday, 8am to 5 pm, until somebody walked off with one of our Macintosh computers in
broad daylight, so now the rooms are locked. You will have to find somebody to let you in, or log in remotely.
You can dial into PV remotely, either from another machine on campus or through a phone line. Un-
fortunately, most of the public PV stations are restricted to the user sitting in front of it. The only public
machines I know that are officially open for remote login are
vincent1.iastate.edu
vincent2.iastate.edu
pv3437.vincent.iastate.edu
pv343d.vincent.iastate.edu
Vincent1 and vincent2 are DEC 5000s in the basement of Durham Center, and are usually pretty crowded;
the other four are the public machines in the Math Department. There is also a machine class1 that you
may have access to. It is supposed to be restricted to people taking certain CS courses, but I don’t know if
that is being enforced.
Your department may have other machines you can use.
A.2. Getting An Account

You can sign up for a PV account by going to any unoccupied PV station, clicking on the Register box
and answering the questions. After registering, it will take several hours until you can actually log in. There
are fliers outside Durham 195 with more details about the registration procedure.
The Computation Center (CC) offers introductory courses to PV for free. You even get a free User’s Guide
with that. I recommend this highly for new users. You should sign up soon, the classes at the beginning of
a term fill up fast.
A.3. Getting Help

Manuals and other help for PV machines are available in the following places:
• On-line manual pages. The help command is called man (short for “manual”). You can do man man
for starters. If you are sitting in front of a PV machine, you can use the window version xman instead
of man. Documentation for course-related programs is in section m of the manual (if there is a man
page for them).
• CC consultants are on duty Monday through Friday 8am to 5pm and Sunday through Thursday
6pm to 9pm. Go to Durham 138 or call them at 294-1314.
• Manuals are available for consultation in Durham 138 (the help room) or Durham 195 (where you
get VAX and Wylbur accounts). Manuals are for sale in Durham 195.
A-1
A-2 Math 485 Lecture Notes September 15, 1993
• There is an on-line consulting facility olc, where you can ask questions. Type olc and take it from
there. You will get a personal answer within a few hours.
Before you ask a question, type answers from inside olc and look at the answers to frequently asked
questions. Maybe your question is answered there.
• You can post your question to newsgroup isu.cc.vincent. In my experience, this is often more
helpful than olc.
• There is a local newsgroup isu.math.485 for this class.
• Ask a friend.
• Send e-mail to keinert@iastate.edu.
If your problem has to do with computer use in general, try the first seven options first, before you ask
me. I will be glad to answer questions about your assignments or about the programs used in this class, but
I am here to teach mathematics, not computer use.
A.4. Finding Files For This Course

All programs and documentation needed for this course are somewhere in the lockers mathclasses and
matlab. Before you can access them, you have to type
% add mathclasses
% add matlab
You should put these two lines in your .startup.tty file, so they get executed automatically every time
you log in.
You should also put the line
setenv MATLABPATH /home/mathclasses/matlab/WaveletToolbox
in your .environment or .startup.tty file. This tells matlab where to find the wavelet files.
The mathclasses locker is organized as follows (all pathnames start with /home/mathclasses):
• Executable programs are in bin/dec.
• Documentation is either in man/manm (accessible through man or xman) in section m, or in doc. (Note:
xman will not find these man pages unless it is started from a window in which the add mathclasses
has been executed).
• Source code is in src.
• matlab routines are in matlab.
• Handouts, homework assignments, these notes and similar things are somewhere in keinert or one
of its subdirectories. There may also be more documentation, executable programs or source code
here. See below for an explanation on how to view or print them. You should inspect everything
in keinert and in the keinert/485 subdirectory. Most of the other subdirectories probably don’t
relate to this course.
A.5. Viewing Or Printing Files

The documentation will be in one of the following formats:
• Plain text files (ascii files). These files can be viewed on the terminal with the more or cat command,
and printed.
• TEX source and auxiliary files. These are files ending in .tex, .ltx, .alx, .aux, .log, .bib, .bbl,
.blg, .toc. You can ignore those.
• .dvi files. These files are meant to be viewed on the screen, using the xdvi program. You have to
type add tex first. Do not attempt to print these files unless you are looking for a source
of scratch paper. If I forget to put out a PostScript version, run them through the dvips program
before printing.
• PostScript files, ending in .ps. These files can be viewed on the screen, using the gs program (do
add ghost first), or printed. If the same file exists in a .dvi version, I recommend viewing that
instead. The quality is better, and you can go back to a previous page. However, .dvi files do not
show included graphics, if there are any.
If you are not sure what the format of a file is, use the unix file command.
Pictures will be in one of the following formats:
September 15, 1993 Fritz Keinert A-3
• .met files. This is matlab graphics output. Convert it to PostScript with the gpp -dps command.
• .pgm files. These are Portable Grayscale Maps. You can view them using xv, and save them in
PostScript form from inside xv.
• .ps files. You can view them with gs, and print them.
A.6. Electronic Communication

I encourage you to use electronic communication for questions and discussion about this course. The main
tools are electronic mail and newsgroups.
If you have a question about the material or an assignment, which may be of interest to other people in
the class as well, post it to isu.math.485. This way, I only have to answer the question once, and maybe
somebody else can answer it first. You will be responsible for knowing whatever is posted in this newsgroup.
If you have a personal question (like “when will you be in” or “how am I doing so far”), send a zephyrgram
or electronic mail to me. If I receive a question that seems to be of general interest, I may post your e-mail
and my answer to isu.math.485.
There is a list of all class members that I have access to. If it seems useful, I may make it available as a
general mailing list, so you can send e-mail to everybody in the class.
A.7. Matlab Documentation

If you are not familiar with matlab, you should print out file /home/mathclasses/doc/matlab/primer.ps.
Then, start up matlab and run through its built-in demo.
A matlab User’s Guide can be consulted in Durham 195.
A.8. Producing Pictures from Matlab

A.8.1. One-Dimensional Graphs. One-dimensional graphs are very easy to produce. Use the matlab
plotting commands to produce a graph on your screen. When you have it in the form you like, type
> meta foo
this will produce a file foo.met in your directory. You can save several pictures to a single file, or to different
files. After you are done with matlab, type
vincent% gpp -dps foo
This will produce a PostScript file foo.ps from foo.met. The PostScript file can be printed.
A.8.2. Two-Dimensional Images. I have added some subroutines to load and store Portable Grayscale
Maps (.pgm files) directly from matlab. These are simply two-dimensional arrays of integers between 0 and
255 whose values can be interpreted as gray scales (0 = black, 255 = white).
To load file foo.pgm into matlab, type
> A = loadpgm(’foo’);
Don’t forget the semicolon at the end, or you will find your terminal flooded with numbers. Now A is a
two-dimensional array with numbers between 0 and 255 that you can work with.
To save an array B as a .pgm file, first adjust the scaling. One way to do that is to plot a few rows or
columns as one-dimensional graphs, to see how the values are distributed. Pick out the range where you
think the action is, and scale that to [0, 255]. Values larger (smaller) than 255 (0) will be set to 255 (0)
during the writing, so you don’t need to worry about that.
To actually save it, type one of the following:
> savepgm(B,’bar’);
> savepgm(B);
The first form will save the picture in bar.pgm, the second form will save it in B.pgm.
You can view such a file with the xv program (see manual page). When the picture is displayed, click on
save and select PostScript. This will save it in PostScript form, which can then be printed.
A-4 Math 485 Lecture Notes September 15, 1993
A.9. Electronic Wavelet Resources
There is a vast amount of free information and software available on the Internet. Here is a list of what I
found:
• Björn Jawerth at the University of South Carolina edits the Wavelet Digest, a collection of wavelet-
related announcements that is distributed by e-mail periodically (about every two months). To re-
ceive future issues, send e-mail with subject subscribe to wavelet@math.scarolina.edu. Any text in
the message itself is ignored. Back issues of the digest can be found on maxwell.math.scaroline.edu
in directory pub/wavelet/archive.
• Bjorn Jawerth also maintains the wavelet section of the gopher at the University of South Car-
olina. Do add public first, then gopher bigcheese.math.scarolina.edu, and take it from there.
You will find back general information, bibliography lists, pointers to wavelet software, and some
preprints.
• You can ask archie about wavelet-related stuff. Archie maintains a big database of public domain
software all over the world. File /home/mathclasses/keinert/485/archie.search contains the
results of my recent search.
APPENDIX B
Mathematical Background
This section contains some mathematical background material that we need.

Customarily, background material is put in the first chapter(s) of a book. The idea is that the reader
should acquire the background first (or verify that he/she already has it) before going on to the material
itself. I have often taught courses in this order, too. This makes for a nicely organized book or course, but
there is a long “dry stretch” right at the very beginning.
This time, I would like to get going on the material right away, and only introduce the background material
at the point where we really need it. It would still be nice to have all the non-wavelet background material
in one place, for easy reference, so I decided to put it in an appendix.
Some things will be stated with more mathematical rigor that we actually need for this course. If you
don’t understand some terms, ask me about it. Chances are the exact definition will not be that important
for this course.
Some theorems are labeled as facts. This means I consider them very well-known, so I won’t bother
proving them, and you should be able to find the proof in any standard textbook on the subject.
This appendix will probably grow later in the term, as I discover things I should have put here in the first
place.
B.1. Vector Spaces

Further reading on this section can be found in any standard book on linear algebra.
To avoid having to keep track of what type of numbers we are using at the moment, I will do everything
with complex numbers. Think of real numbers as a special case. Complex numbers are also called scalars in
this setting.
If z = x + iy is a complex number, its complex conjugate is z̄ = x − iy.
Z is the set of integers, R is the set of real numbers, and C the set of complex numbers.
Definition: A (complex) vector space V is a set of elements v, w, . . . called vectors, with two arithmetic
operations
• Addition:
v+w ∈ V if v, w ∈ V,
• Scalar Multiplication:
αv ∈ V if α ∈ C, v ∈ V,
with properties
• Commutativity:
v+w=w+v
• Associativity:
(v + w) + z = v + (w + z)
(αβ)v = α(βv)
• Distributivity:
α(v + w) = αv + αw
(α + β)v = αv + βv
• Zero Vector: There exists a vector 0 ∈ V such that
v+0=0+v =v for all v ∈ V
• Inverse Vector: For all v ∈ V , there exists a vector (−v) ∈ V such that
v + (−v) = (−v) + v = 0
B-1
B-2 Math 485 Lecture Notes September 15, 1993
Standard Examples:
 
v1
 .. 
(1) V = C , v =  .  or {v1 , . . . , vn } with componentwise addition and scalar multiplication.
n
vn
(2) V = the space of infinite sequences of complex numbers, v = {vj }∞
j=−∞, again with componentwise
addition and scalar multiplication.
(3) V = the space of complex-valued functions on [a, b], with pointwise addition and scalar multiplica-
tion:
(v + w)(x) = v(x) + w(x)

(αv)(x) = αv(x)
Definition: A linear combination of vectors v1 , . . . , vn is

X
α1 v1 + · · · + αn vn = αj vj
j
Definition: A collection {v1 , . . . , vn } of vectors is linearly dependent if one of them can be written as a
linear combination of the others: X
vk = αj vj for some k.
j6=k
Otherwise, they are linearly independent.
Definition: A basis of V is a collection of linearly independent vectors such that any v ∈ V can be
written as a linear combination of basis vectors.
Fact: Every basis contains the same number of vectors; this number is called the dimension of V .
Definition: A norm on a vector space V is a mapping which assigns to each v ∈ V a norm (or length)
kvk with the properties
(1) kvk is real, kvk ≥ 0
(2) kvk = 0 ⇐⇒ v = 0
(3) kαvk = |α|kvk, α ∈ C
(4) kv + wk ≤ kvk + kwk (the triangle inequality)
Standard Examples:
(1) (The `p norms) Assume that v = {vj } (a finite vector or infinite sequence). For 1 ≤ p < ∞, we
define the p-norm by
 1/p
X 
kvkp = |vj |p
 
j
The only special cases we need are

X
kvk1 = |vj |
j
sX
kvk2 = |vj |2
j
kvk∞ = sup |vj |

j
September 15, 1993 Fritz Keinert B-3
The infinity norm kvk∞ is not covered by the initial definition. It is defined as the limit of the
p-norms as p goes to ∞, which turns out to be the formula given.
For finite vectors, all these norms are well-defined, but for infinite sequences they may not exist.
Thus, we define
`p = the set of all infinite sequences {vj } for which kvkp < ∞.
For each 1 ≤ p ≤ ∞, this is a normed vector space, and a subspace of the space of all infinite
sequences.
The sequences in `1 are called summable, the sequences in `2 are square summable, and the sequences
in `∞ are bounded.
(2) (The Lp norms) Assume that v is a function on some interval [a, b] (which could be finite or
infinite). For 1 ≤ p < ∞, we define the p-norm by
(Z )1/p
b
kvkp = |v(x)|p dx
a
The only special cases we need are

Z b
kvk1 = |v(x)| dx
sa
Z b
kvk2 = |v(x)|2 dx
a
kvk∞ = ess sup |v(x)|
Again, the infinity norm is the limit of the p norms. Ess sup stands for essential supremum, which
means the supremum except for sets of measure zero. If you don’t know what that means, don’t
worry about it.
Again, for any given function, a p norm may or may not exist. We introduce the spaces
Lp [a, b] = the set of all functions v on [a, b] for which kvkp < ∞.
Again, for 1 ≤ p ≤ ∞, these are vector spaces, and subspaces of the set of all functions on [a, b]. If
the interval is not given, it is R:
L2 = L2 (R).
The functions in L1 are integrable, the functions in L2 are square integrable, and the functions in L∞ are
(essentially) bounded.
Definition: An inner product on a vector space is a mapping which assigns to each pair v, w ∈ V a
complex number hv, wi such that
hαv + βw, zi = αhv, zi + βhw, zi
hv, αw + βzi = ᾱhv, wi + β̄hv, zi
hv, wi = hw, vi
hv, vi ≥ 0 for all v
hv, vi = 0 ⇐⇒ v = 0
Standard Examples:
(1) Cn and `2 have the inner product X
hv, wi = vj w̄j
j
(2) L2 [a, b] has the inner product

Z b
hv, wi = v(x)w(x) dx
a
Fact: Every inner product defines a norm by
p
kvk = hv, vi
The 2-norms in `2 and L2 come from the standard inner products defined above.
B.2. Frames and Biorthogonal and Orthonormal Bases

The original definition of a frame comes from Duffin and Schaeffer [DS52]. Further reading on frames in the
context of wavelet theory can be found in in section 3.5 of Chui [Chu92] and the articles by Benedetto [Ben92]
and Heil [Hei90].
B.2.1. Orthonormal Bases. Two vectors v, w are orthonormal if they are orthogonal, that is,
hv, wi = 0,
and they both have length 1:
kvk2 = kwk2 = 1.
Likewise, a collection {v1 , . . . , vn } of vectors is orthonormal if they are pairwise orthogonal, and all have
length 1:
hvi , vj i = δij ,
(
1 if i = j
δij =
0 if i 6= j
(δij is called the Kronecker delta). Among all possible bases of a vector space V , orthonormal bases are
especially popular because of their many nice properties and their numerical stability.
Assume that V is a vector space with an inner product, and with a basis {vj }. Any basis has the property
that any f ∈ V can be written in a unique way as a linear combination of basis vectors
X
f= αj vj .
j
However, the αj may be hard to calculate, and may not be bounded. For an orthonormal basis, both of
these problems disappear:
αj = hf, vj i
and X
|αj |2 ≤ kfk22 (Bessel’s Inequality)
j
Remark: Because of convergence problems, it is in general required than any vector can be written as a finite linear
combination of basis vectors. For orthonormal bases, Bessel’s inequality provides convergence, so infinite linear combinations
are also allowed.
Thus, in an orthonormal basis, any f ∈ V can be written as

X
f= hf, vj ivj .
j
More generally, if {v1 , . . . , vn } is a subset of {vj }, then

X
n
Pf = hf, vj ivj
j=1
is the orthogonal projection of f onto the subspace spanned by {v1 , . . . , vn }.

Example: Let V be the space of real polynomials on [−1, 1]. The standard basis is {1, x, x2, x3 , . . . },
which is not orthonormal:
Z 1 (
2
j + k even
hx , x i =
j k j k
x x dx = j+k+1
−1 0 j + k odd
An orthonormal basis is given by the Legendre polynomials.
Example: Take V = CN , with basis {v0 , . . . , vN−1 }, where the kth component of vj is
1 2π
(vj )k = √ ei N jk .
N
One can check that this basis is orthonormal.
The expansion of a vector f in this basis is the Discrete Fourier Transform:
X
N−1
1 X
N−1
1
fj e−i N jk = √ f¯k
2π
hf, vk i = fj vj = √
j=0
N j=0 N
X
N−1
1 X ¯
N−1
f= hf, vk ivk = √ fk vk
k=0
N k=0
1 X ¯ i 2π jk
N−1
fj = fk e N
N
k=0
(Note: f¯k stands for the kth Fourier coefficient of f, not for the complex conjugate of fk . This is the notation
used in the section of Fourier Transforms below).
Example: Likewise, for V = L2 [0, L], the basis

1 i 2π jx
vj (x) = e L , j∈Z
L
is orthonormal, and leads to Fourier Series.
Remark: The Continuous Fourier Transform does not fit into this pattern, since the functions eixξ are not in L2 .
B.2.2. Biorthogonal Bases. Sometimes, orthonormal bases are not available or not convenient to work
with. A generalization is the concept of a biorthogonal basis (actually, a pair of bases).
Here, we have two different bases {vj }, {wj } that are mutually orthogonal. This means that {vj } by itself
is a basis, so is {wj } by itself, and
hvj , wk i = δjk .
To avoid convergence problems, we demand that {vj }, {wj } are Riesz bases. This means that there are
constants 0 < A ≤ B (the Riesz bounds) such that for any sequence of coefficients {αj } ∈ `2
X
Akαk22 ≤ k αj vj k22 ≤ Bkαk22 ,
j
and likewise for {wj }.

In this setting, any f ∈ V can be written
X X
f= hf, vj iwj = hf, wj ivj .
j j
Biorthogonal bases are not as numerically stable as orthonormal ones, but just as easy to work with.

1 1
Example: Take V = R , and choose v1 =
2
, v2 = .
0 1
The condition hvj , wk i = δjk can be read as follows: if V is the matrix with rows vj , then the matrix W
with columns wk must be V −1 . (Incidentally, this proves that the {wj } are uniquely determined from the
{vj }, and that we can find a biorthogonal basis {wj } for any basis {vj }).
Anyway,
1 0 1 0
V = , W = V −1 = ,
1 1 −1 1

1 0
so w1 = , w2 = .
−1 1

1
As an example, take f = . Then
2
f = hf, v1 iw1 + hf, v2 iw2 = 1 · w1 + 3 · w2

f = hf, w1 iv1 + hf, w2 iv2 = (−1) · v1 + 2 · v2
B.2.3. Frames. Sometimes, biorthogonal bases are not available either. We would like to preserve the
property
X X
f= hf, vj iwj = hf, wj ivj
j j
while giving up orthogonality, and even linear independence.

A sequence {vj } is a frame with frame bounds 0 < A ≤ B if for any f ∈ V
X
Akfk22 ≤ |hf, vj i|2 ≤ Bkfk22 .
j
If A = B, the frame is called tight. The frame is exact if it is not a frame any more whenever a single element
is deleted.
The definition implies that a frame spans the whole space. That is, every f ∈ V can be written as a linear
combination of frame vectors. However, the linear combination is not unique any more, so a frame is usually
not a basis. Only exact frames are also bases.
Fact: If {vj } is a frame with frame bounds A, B, then there exists a dual frame {wj } with frame bounds
1/B, 1/A such that
X X
f= hf, vj iwj = hf, wj ivj for all f ∈ V.
j j
Moreover, the coefficients in these expansions are the smallest possible in the sense that if
X X
f= hf, wj ivj = αj vj
j j
are two different expansions of f, then

X X
|hf, wj i|2 ≤ |αj |2 ,
j j
and likewise for the {wj }.

1 1 −1
Example: Take V = R2 , and the frame v1 = , v2 = , v3 = . Frame bounds are 2 and 3
0 1 1
1/3 1/3 −1/3
(prove that!), and the dual frame is w1 = , w2 = , w3 = .
0 1/2 1/2

1
For f = ,
2
f = hf, v1 iw1 + hf, v2 iw2 + hf, v3 iw3 = 1 · w1 + 3 · w2 + 1 · w3

f = hf, w1 iv1 + hf, w2 iv2 + hf, w3 iv3 = 1/3 · v1 + 4/3 · v2 + 2/3 · v3
B.3. Fourier Series and Fourier Transform
Further reading on Fourier Series and Fourier Transforms can be found in many books. A brief review of
relevant Fourier Transform theory is also contained in Chui’s book on wavelets [Chu92].
The Fourier Transform (FT) comes in three flavors:
• The Continuous Fourier Transform (CFT), or simply Fourier Transform (FT), which maps a func-
tion on R into a function on R;
• the Periodic Fourier Transform (PFT) or Fourier Series (FS), which maps a function defined on a
finite interval into an infinite sequence;
• The Discrete Fourier Transform (DFT), which maps a finite sequence into a finite sequence.
We will discuss each one separately, and see how they relate to one another. Usually, the symbol fˆ is
used for all of these transforms, but in the present chapter this would cause confusion. I will use fˆ for the
CFT, f˜ for the PFT, and f¯ for the DFT (since there are no complex conjugates in this section, this should
not cause any confusion).
B.3.1. The Continuous Fourier Transform. Assume that f is a complex-valued function defined on
R. Its Fourier Transform fˆ is defined as
Z ∞
1
fˆ(ξ) = √ e−ixξ f(x) dx.
2π −∞
If f ∈ L1 , the integral makes sense for every value of ξ, and fˆ is a continuous, bounded function which goes
to zero at infinity (this last fact is called the Riemann-Lebesgue Lemma).
The Fourier Transform is also defined for f ∈ L2 . In this case, the integral may not be defined in the
usual sense. One way to define it is
Z R
1
ˆ
f (ξ) = √ lim e−ixξ f(x) dx.
2π R→∞ −R
In this case, fˆ is also in L2 .
If f ∈ L1 or L2 , it can be written in terms of its Fourier Transform as
Z ∞
1
f(ξ) = √ eixξ fˆ(ξ) dξ,
2π −∞
where the integral may need to be interpreted as a limit of finite integrals, as before.
In general, every locally integrable function f has a Fourier Transform, but fˆ may not be a function any
more, rather a generalized function or distribution. We will assume from now on that all functions have
suitable integrability properties so that all Fourier Transforms (and other operations that show up) are well
defined.
Some properties of the Fourier Transform are
• The Fourier Transform preserves L2 -norms and inner products (this is called the Parseval-Plancherel
Theorem). Thus, if f, g ∈ L2 , then
hf, gi = hfˆ, ĝi

ˆ 2 = kfk2
kfk
• The Fourier Transform turns convolutions into products, and vice versa. The convolution of f, g is
defined as Z Z
(f ∗ g)(x) = f(y)g(x − y) dx = f(x − y)g(y) dy.
We find
√
(f ∗ g)ˆ(ξ) = 2π fˆ(ξ)ĝ(ξ)
1
(f · g)ˆ(ξ) = √ (fˆ ∗ ĝ)(ξ)
2π
• The Fourier Transform turns translation into modulation, and vice versa.
The translation of f by a ∈ R is defined as
Ta f(x) = f(x − a).
The modulation of f by a ∈ R is defined as
Ea f(x) = eiax f(x)
We get
(Ta f)ˆ(ξ) = e−iaξ fˆ(ξ) = E−afˆ(ξ)

ˆ − a) = Ta fˆ(ξ)
(Ea f)ˆ(ξ) = f(ξ
• The Fourier Transform turns dilations into inverse dilations. The dilation of f by s ∈ R, s 6= 0, is
given by
Ds f(x) = |s|−1/2f(x/s).
The factor in front is chosen so that kDs fk2 = kfk2 . The Fourier Transform relationship is
(Ds f)ˆ(ξ) = |s|1/2 fˆ(sξ) = D1/s f(ξ).

ˆ
• The Fourier Transform turns differentiation into multiplication by iξ, and vice versa.
(f 0 )ˆ(ξ) = iξ fˆ(ξ)
(xf(x))ˆ(ξ) = ifˆ0 (ξ)
B.3.2. The Periodic Fourier Transform (or Fourier Series). Assume that f is completely described
by its values in a finite interval [0, L]. It does not matter whether we consider f to be zero outside the interval,
or periodic with period L:
f(x + kL) = f(x), k ∈ Z.
The Periodic Fourier Transform f˜ of f is an infinite sequence {f˜k } defined by
Z L
e−i L kxf(x) dx
2π
f˜k =
0
The inverse of the Periodic Fourier Transform is given by

1 X i 2π kx ˜
f(x) = e L fk , x ∈ [0, L]
L
k
This is also known as a Fourier Series.

The Periodic Fourier Transform has many properties similar to those of the Continuous Fourier Transform,
but I won’t list them all separately here.
B.3.3. The Discrete Fourier Transform. Assume that f is a sequence of length N : f = {f0 , f1 , . . . , fN−1 }.
The Discrete Fourier Transform f¯ of f is a sequence of the same length, given by
X
e−i N jk fj .
2π
f¯k =
j
Its inverse is
1 X i 2π jk ¯
fj = e N fk .
N
j
B.3.4. The Fourier Transform of some basic functions.
• The characteristic function χS of a set S is the function which has the value 1 on this set, and 0
otherwise. In particular
(
1 if x ∈ [−1, 1]
χ[−1,1] (x) =
0 otherwise
Its Fourier Transform is the sinc function

r r
2 2 sin ξ
(χ[−1,1] )ˆ(ξ) = sinc ξ =
π π ξ
r
sin x π
ˆ(ξ) = χ[−1,1](ξ)
x 2
• The Gaussian with standard deviation σ is the function

1
e−x /2σ .
2 2
gσ (x) = √
2πσ
Its Fourier Transform is another Gaussian

1
(gσ )ˆ(ξ) = √ e−ξ σ /2 .
2 2
2π
• The δ-function is not really a function, but a generalized function or distribution. It is defined not
in terms of its values at individual points, but only in terms of its integrals:
Z (
b
1 if 0 ∈ (a, b)
δ(x) dx =
a 0 otherwise
If f is a continuous function, δ picks out the value at 0:

Z (
b
f(0) if 0 ∈ (a, b)
f(x)δ(x) dx =
a 0 otherwise
The shifted version δ(x − a) picks out the value at a:

Z
f(x)δ(x − a) dx = f(a)
(if the interval of integration includes a).

δ(x) is usually graphed as an arrow:
Other properties of δ include

1
δ(sx) = δ(x),
|s|
1
δ̂(ξ) = √ 1,
2π
√
1̂(ξ) = 2πδ(ξ),
where 1 is the function which has the value 1 everywhere.

From a mathematical standpoint, δ and other distributions are tricky to handle, and all the usual
differentiation and integration rules need to be justified. However, for the purposes of this course,
X
we can assume this has been done, and treat δ like a regular function.
• The -function (this is the Russian letter sha) is an infinite string of δ-functions, so it is again a
distribution.
X(x) = P δ(x − j)
j
This “function” is useful for working with series of equally spaced points, and for extending functions
in a periodic way. Its main properties are
X (x) = X( xh ) = X δ( xh − j) = X δ( x −h jh ) = h X δ(x − jh)

h
hf, Xi =
j j j
X
f(j)
hf, X i = h
j
X
h f(jh)
j
X̂(ξ) = √12π X( 2πξ ) = √12π X (ξ) 2π
B.3.5. Relationship between the Continuous and Periodic Fourier Transform. Assume that f
is a function which is 0 outside the interval [0, L], and that fp is the periodic extension of the values of f on
[0, L]:
fp (x) = f(y) if x = y + jL for some j ∈ Z, y ∈ [0, L].
Then we can take the Fourier Transform of f or fp , or the Periodic Fourier Transform of fp . These are
related by
√ 2π
(fp )˜k = 2πfˆ(
k)
L
X
L
fˆp (ξ) = fˆ(ξ) · ( ξ)
2π
In ˆ
√ words: the coefficients of the Fourier series are the values of f at equally spaced points (up to a factor of
2π). The Fourier Transform of fp is a sequence of equally spaced δ-functions, each of them multiplied by
the value of fˆ at that point.
CFD
⇔
*o
*o *o
PFT
⇔ *o *o
*o *o
*o *o *o *o
2 pi/L
CFT
⇔
Observe that the inverse Fourier Transform of fˆp turns into the Fourier Series: For x ∈ [0, L],
Z ∞
1
f(x) = fp (x) = √ eixξ fˆp (ξ) dξ
2π −∞
Z ∞
= √
1
2π −∞
X L
eixξ fˆ(ξ) ( ξ) dξ
2π
Z
1 2π X ∞ ixξ ˆ 2π
= √ e f (ξ)δ(ξ − j ) dξ
2π L j −∞ L

1 X i 2π xj √ ˆ 2π 1 X i 2π xj
= e L 2πf ( j) = e L (fp )˜k .
L j L L j
Since the original f can be recovered from either its Fourier Transform fˆ or its Fourier coefficients {f˜k }
(which are the values of fˆ at equally spaced points), we would expect that the Fourier Transform fˆ is
completely determined by its values at these points.
This is expressed by the Shannon Sampling Theorem, which is usually stated the other way around: we
assume it is the Fourier Transform which is limited to a finite interval, and it is the original function which
is determined by its values at equally spaced points. Of course, everything in Fourier theory is symmetric, so
you can interpret this theorem either way. Also, the finite interval is usually taken to be symmetric around
the origin.
The function f is called bandlimited with bandwidth b if fˆ(ξ) = 0 outside the interval [−b, b]. The Shannon
Sampling Theorem states that if f is bandlimited with bandwidth b, then
X π π
f(x) = f(hk)sinc (x − hk) if h ≤
h b
k
Thus, f is uniquely determined by its samples with any spacing below a certain threshold. The threshold
π/b is called the Nyquist frequency.
B.3.6. Relationship between the Periodic and Discrete Fourier Transform. Assume that f is
a function defined on [0, L], and that fs = {f0 , . . . , fN−1 } is a vector of N equally spaced samples of f with
spacing h = L/N .
Then we can take the Periodic Fourier Transform of f, or the Discrete Fourier Transform of fs . How do
the two relate? The first observation is that the discrete Fourier coefficients can be interpreted as numerical
approximations to the corresponding periodic Fourier coefficients, by replacing the integral by a numerical
quadrature rule (the trapezoidal rule). In more detail:
Z L X
f(x)e−i L xk dx ≈ h f(jh)e−i
2π 2π
f˜k = L jkh
0 j
X
−i 2π
N jk
=h f(jh)e = h(fs )k .
j
If the integrand is smooth, and h is fairly small, this should give a good approximation. However, for large
k, the exponential term becomes more and more oscillatory, and the accuracy goes down.
A better characterization is available from the Poisson Summation Formula, which states (for all f, not
just f limited to [0, L])
X 2π 1 X
fˆ(ξ + j) = √ h f(jh)e−ihjξ
j
h 2π j
2π
For ξ = k this gives us
L
X X
f(jh)e−i N jk = f(jh)e−i L hjk
2π 2π
(fs )k =
j j
√
X 2π X ˆ 2π 2π
= f(jh)e−ihjξ = f( j + N j)
j
h j
L L
√ √
2π ˆ 2π 2π X ˆ 2π 2π
= f ( k) + f( k + N j).
h L h L L
j6=0
2π
That is, the value of (fs )k is a multiple of the true value of fˆ at (which is a multiple of f˜k ), PLUS all
L
2π 2π
values of fˆ that are multiples of N away from k. This is called aliasing: the Discrete Fourier Transform
L L
2π
contains only frequencies up to ; higher frequencies are shifted down to this range.
L
CFT
⇔
aliasing
⇔
*o
*o *o
*o
*o
*o
*o *o *o
*o *o
*o
*o *o
DFT *o *o
⇔ *o *o
*o
B.4. Bibliography
[Ben92] John Benedetto, Irregular sampling and frames, Wavelets: A Tutorial in Theory and Applications
(Charles K. Chui, ed.), Wavelet Analysis and Its Applications, vol. 2, Academic Press, 1992, pp. 445–
508.
[DS52] R. J. Duffin and A. C. Schaeffer, A class of nonharmonic Fourier series, Trans. Amer. Math. Soc.
72 (1952), 341–366.
[Hei90] Christopher Heil, Wavelets and frames, Signal Processing, Part I: Signal Processing Theory (New
York), IMA, vol. 22, Springer, New York, 1990, pp. 147–160.
APPENDIX C
Wavelet Coefficients
This appendix lists the coefficients for some of the most commonly used wavelets: The Daubechies wavelets,
and the Coiflets. These tables are taken from Daubechies’ book [Dau92] and included here only for easier
reference.
C-1
C-2 Math 485 Lecture Notes September 15, 1993
Table C.1. Coefficients for Daubechies wavelets
k hk k hk
N=1 0 0.7071067811865476 N =8 0 0.0544158422431072
1 0.7071067811865476 1 0.3128715909143166
N=2 0 0.4829629131445341 2 0.6756307362973195
1 0.8365163037378077 3 0.5853546836542159
2 0.2241438680420134 4 -0.0158291052563823
3 -0.1294095225512603 5 -0.2840155429615824
N=3 0 0.3326705529500825 6 0.0004724845739124
1 0.8068915093110924 7 0.1287474266204893
2 0.4598775021184914 8 -0.0173693010018090
3 -0.1350110200102546 9 -0.0440882539307971
4 -0.0854412738820267 10 0.0139810279174001
5 0.0352262918857095 11 0.0087460940474065
N=4 0 0.2303778133088964 12 -0.0048703529934520
1 0.7148465705529154 13 -0.0003917403733770
2 0.6308807679398587 14 0.0006754494064506
3 -0.0279837694168599 15 -0.0001174767841248
4 -0.1870348117190931 N =9 0 0.0380779473638778
5 0.0308413818355607 1 0.2438346746125858
1 0.0328830116668852 2 0.6048231236900955
1 -0.0105974017850690 3 0.6572880780512736
N=5 0 0.1601023979741929 4 0.1331973858249883
1 0.6038292697971895 5 -0.2932737832791663
2 0.7243085284377726 6 -0.0968407832229492
3 0.1384281459013203 7 0.1485407493381256
4 -0.2422948870663823 8 0.0307256814793385
5 -0.0322448695846381 9 -0.0676328290613279
6 0.0775714938400459 10 0.0002509471148340
7 -0.0062414902127983 11 0.0223616621236798
8 -0.0125807519990820 12 -0.0047232047577518
9 0.0033357252854738 13 -0.0042815036824635
N=6 0 0.1115407433501095 14 0.0018476468830563
1 0.4946238903984533 15 0.0002303857635232
2 0.7511339080210959 16 -0.0002519631889427
3 0.3152503517091982 17 0.0000393473203163
4 -0.2262646939654400 N = 10 0 0.0266700579005473
5 -0.1297668675672625 1 0.1881768000776347
6 0.0975016055873225 2 0.5272011889315757
7 0.0275228655303053 3 0.6884590394534363
8 -0.0315820393174862 4 0.2811723436605715
9 0.0005538422011614 5 -0.2498464243271598
10 0.0047772575109455 6 -0.1959462743772862
11 -0.0010773010853085 7 0.1273693403357541
N=7 0 0.0778520540850037 8 0.0930573646035547
1 0.3965393194818912 9 -0.0713941471663501
2 0.7291320908461957 10 -0.0294575368218399
3 0.4697822874051889 11 0.0332126740593612
4 -0.1439060039285212 12 0.0036065535669870
5 -0.2240361849938412 13 -0.0107331754833007
6 0.0713092192668272 14 0.0013953517470688
7 0.0806126091510774 15 0.0019924052951925
8 -0.0380299369350104 16 -0.0006858566949564
9 -0.0165745416306655 17 -0.0001164668551285
10 0.0125509985560986 18 0.0000935886703202
11 0.0004295779729214 19 -0.0000132642028945
12 -0.0018016407040473
13 0.0003537137999745
September 15, 1993 Fritz Keinert C-3
Table C.2. Coefficients for Coiflets
k hk k hk
N =2 -2 -0.051429728471 N =8 -8 0.000630961046
-1 0.238929728471 -7 -0.001152224852
0 0.602859456942 -6 -0.005194524026
1 0.272140543058 -5 0.011362459244
2 -0.051429728471 -4 0.018867235378
3 -0.011070271529 -3 -0.057464234429
N =4 -4 0.011587596739 -2 -0.039652648517
-3 -0.029320137980 -1 0.293667390895
-2 -0.047639590310 0 0.553126452562
-1 0.273021046535 1 0.307157326198
0 0.574682393857 2 -0.047112738865
1 0.294867193696 3 -0.068038127051
2 -0.054085607092 4 0.027813640153
3 -0.042026480461 5 0.017735837438
4 0.016744410163 6 -0.010756318517
5 0.003967883613 7 -0.004001012886
6 -0.001289203356 8 0.002652665946
7 -0.000509505399 9 0.000895594529
N =6 -6 -0.002682418671 10 -0.000416500571
-5 0.005503126709 11 -0.000183829769
-4 0.016583560479 12 0.000044080453
-3 -0.046507764479 13 0.000022082857
-2 -0.043220763560 14 -0.000002304942
-1 0.286503335274 15 -0.000001262175
0 0.561285256870 N = 10 -10 -0.0001499638
1 0.302983571773 -9 0.0002535612
2 -0.050770140755 -8 0.0015402457
3 -0.058196250762 -7 -0.0029411108
4 0.024434094321 -6 -0.0071637819
5 0.011229240962 -5 0.0165520664
6 -0.006369601011 -4 0.0199178043
7 -0.001820458916 -3 -0.0649972628
8 0.000790205101 -2 -0.0368000736
9 0.000329665174 -1 0.2980923235
10 -0.000050192775 0 0.5475054294
11 -0.000024465734 1 0.3097068490
2 -0.0438660508
3 -0.0746522389
4 0.0291958795
5 0.0231107770
6 -0.0139736879
7 -0.0064800900
8 0.0047830014
9 0.0017206547
10 -0.0011758222
11 -0.0004512270
12 0.0002137298
13 0.0000993776
14 -0.0000292321
15 -0.0000150720
16 0.0000026408
17 0.0000014593
18 -0.0000001184
19 -0.0000000673
[Alp92] Bradley Alpert, Wavelets and other bases for fast numerical linear algebra, Wavelets: A Tutorial
in Theory and Applications (San Diego) (C. K. Chui, ed.), Wavelet Analysis and its Applications,
vol. 2, Academic Press, San Diego, 1992, pp. 181–216.
[Bat87] Guy Battle, A block spin construction of ondelettes. Part I: Lemarié functions, Comm. Math.
Phys. 110 (1987), 601–615.
[BCR91] Gregory Beylkin, Ravi Coifman, and Vladimir Rokhlin, Fast wavelet transforms and numerical
algorithms: I, Comm. Pure Appl. Math. 44 (1991), no. 2, 141–183.
[Ben92] John Benedetto, Irregular sampling and frames, Wavelets: A Tutorial in Theory and Applica-
tions (Charles K. Chui, ed.), Wavelet Analysis and Its Applications, vol. 2, Academic Press,
1992, pp. 445–508.
[CDF92] A. Cohen, Ingrid Daubechies, and J.-C. Feauveau, Biorthogonal bases of compactly supported
wavelets, Comm. Pure Appl. Math. 45 (1992), 485–560.
[CDV] A. Cohen, I. Daubechies, and P. Vial, Wavelets and fast wavelet transform on the interval,
preprint.
[Chu92] Charles K. Chui, An introduction to wavelets, Wavelet Analysis and Its Applications, vol. 1,
Academic Press, Boston, 1992.
[CMQW91] Ronald R. Coifman, Yves Meyer, Steven Quake, and Mladen Victor Wickerhauser, Signal pro-
cessing and compression with wave packets, Proceedings of the Marseilles Wavelet Meeting 1989,
1991.
[Cod92] Mac A. Cody, The fast wavelet transform: Beyond Fourier transforms, Dr. Dobb’s J. (1992),
16–28, 100–101.
[Dau88] Ingrid Daubechies, Orthonormal bases of compactly supported wavelets, Comm. Pure Appl.
Math. 41 (1988), 909–996.
[Dau92] I. Daubechies, Ten lectures on wavelets, CBMS-NSF Regional Conference Series in Applied
Mathematics, vol. 61, SIAM, Philadelphia, 1992.
[Dau93] Ingrid Daubechies, Orthonormal bases of compactly supported wavelets II: Variations on a theme,
SIAM J. Math. Anal. 24 (1993), no. 2, 499–519.
[DS52] R. J. Duffin and A. C. Schaeffer, A class of nonharmonic Fourier series, Trans. Amer. Math.
Soc. 72 (1952), 341–366.
[(ed92] Charles K. Chui (ed.), Wavelets: A tutorial in theory and applications, Wavelet Analysis and
Its Applications, vol. 2, Academic Press, Boston, 1992.
Mechanics 24 (1992), 395–457.
[HBB92] F. Hlawatasch and G.F. Boudreaux-Bartels, Linear and quadratic time-frequency signal repre-
sentations, IEEE Signal Proc. Mag. (1992), 21–67.
[Hei90] Christopher Heil, Wavelets and frames, Signal Processing, Part I: Signal Processing Theory
(New York), IMA, vol. 22, Springer, New York, 1990, pp. 147–160.
[JS] Björn Jawerth and Wim Sweldens, An overview of wavelet based multiresolution analyses, avail-
able by anonymous ftp from maxwell.math.scarolina.edu as /pub/wavelet/papers/overview.ps.
[Kei] F. Keinert, Numerical stability of biorthogonal wavelets, in preparation.
[Lem88] P. G. Lemarié, Une nouvelle base d’ondelettes de l2 (Rn ), J. Math. Pures Appl. 67 (1988),
227–236.
[Mal89a] S. Mallat, A theory for multiresolution signal decomposition: The wavelet representation, IEEE
Trans. Pattern Analysis and Machine Intelligence 11 (1989), no. 7, 674–693.
[Mal89b] Stephane Mallat, Multiresolution approximations and wavelet orthonormal bases of L2 (R),
Trans. Amer. Math. Soc. 315 (1989), no. 1, 69–87.
[Mey86] Yves Meyer, Principe d’incertitude, bases hilbertienne et algèbres d’opérateurs, Seminaire Bour-
baki 662 (1985-86).
[RV91] O. Rioul and M. Vetterli, Wavelets and signal processing, IEEE Signal Proc. Mag. 8 (1991),
no. 4, 14–38.
[Str89] Gilbert Strang, Wavelets and dilation equations: A brief introduction, SIAM Rev. 31 (1989),
no. 4, 614–627.
[Wic90] Mladen Victor Wickerhauser, Nonstandard matrix multiplication, available via anonymous ftp
from ceres.math.yale.edu, 1990.

Lecture Notes For Math 485 PDF

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Lecture Notes For Math 485 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture Notes For Math 485 PDF

Uploaded by

Copyright:

Available Formats

LECTURE NOTES FOR MATH 485

Chapter 0. Introduction to these Notes 0-1

Introduction to these Notes

0.1. About These Notes

0.2. Mathematical Background

0.3. Computer Background

0.4. Suggested Reading

Introduction and Overview

In this section, I will try to address the following questions:

T (αf + βg) = α(T f) + β(T g).

For discrete transforms, the counterparts are

in the case of a discrete convolution.

The Discrete Fourier Transform (discrete) of a vector of length N is defined by

For fixed i, ki = {kij }j is a vector, and

Sunspot Data Real Part

Power Spectrum Imaginary Part

power spectrum power spectrum

shifted compressed by 2, shifted compressed by 4, shifted

etc. etc. etc.

As an example, consider the wavelet transform of the two

different frequencies at different times different frequencies at the same time

Wavelet transform Wavelet transform

1.3. Suggested Reading

Continuous Time-Frequency Representations

is used to decompose f into its frequency components. The inversion formula

2.1. The Windowed Fourier Transform

w = (1 + cos(2t))/2 for t ∈ [−π/2, π/2]

The WFT with window w of f is defined as

Thus, a is the time parameter, b is the frequency.

A typical test function looks like this

The Daubechies wavelet with 5 vanishing

The CWT of f with mother wavelet w is defined as

Φw f(a, b) can be interpreted as an inner product of f with |a|−1/2w((t − b)/a).

A typical test function looks like this

2.3. The Uncertainty Principle

The uncertainty σw of w is defined as

good time resolution

good frequency resolution

contains information on f in [µ + a − σ, µ + a + σ] and information on fˆ in [µ̂ + b − σ̂, µ̂ + b + σ̂].

For CWT, the test function is

with Fourier Transform E−b D1/a ŵ. Thus,

Φw f(a, b) = hf, Tb Da wi = hf,

1/a high frequencies

The discrete wavelet transform results in a tiling

modulated and shifted window EbTaw FT of modulated and shifted window

2.4. Suggested Reading

Discrete Time-Frequency Representations

• Do these values uniquely determine f?

We observe that the data we have is of the form

3.1. The Windowed Fourier Transform

For the WFT, we have

The values of Ψw at these points are

3.2. The Continuous Wavelet Transform

The discussion on uncertainty in chapter 2 suggests the time-frequency sampling

a = aj0 , b = kaj0 b0 , a0 > 1, b0 > 0

Time-Frequency Sampling for CWT

The corresponding basis functions are

3.3. Suggested Reading