0% found this document useful (0 votes)

452 views

Harmonic Analysis and Rational Approximation

Uploaded by

ياسينبوهراوة

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

452 views

Harmonic Analysis and Rational Approximation

Uploaded by

ياسينبوهراوة

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 307

Lecture Notes

in Control and Information Sciences

Editors: M. Thoma M. Morari

327

J.-D. Fournier J. Grimm J. Leblond

J. R. Partington (Eds.)

Harmonic Analysis
and Rational
Approximation
Their R^oles in Signals, Control
and Dynamical Systems
With 47 Figures

Series Advisory Board

F. Allgower P. Fleming P. Kokotovic A.B. Kurzhanski

H. Kwakernaak A. Rantzer J.N. Tsitsiklis

Editors
Dr. J.-D. Fournier
Dr. J. Grimm
Dr. J. Leblond
Departement ARTEMIS
CNRS and Observatoire de la Cote dAzur
BP 4229
06304 Nice Cedex 4
France

Prof. J. R. Partington
University of Leeds
School of Mathematics
LS2 9JT Leeds
United Kingdom

ISSN 0170-8643
ISBN-10 3-540-30922-5 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-30922-2 Springer Berlin Heidelberg New York
Library of Congress Control Number: 2005937084
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microlm or in other ways, and storage in data banks. Duplication
of this publication or parts thereof is permitted only under the provisions of the German Copyright
Law of September 9, 1965, in its current version, and permission for use must always be obtained
from Springer-Verlag. Violations are liable to prosecution under German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
The use of general descriptive names, registered names, trademarks, etc. in this publication does
not imply, even in the absence of a specic statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Typesetting: Data conversion by authors.
Final processing by PTP-Berlin Protago-TEX-Production GmbH, Germany
Cover-Design: design & production GmbH, Heidelberg
Printed on acid-free paper
54/3141/Yu - 5 4 3 2 1 0

In memoriam Macieja Pindora

Preface

This book is an outgrowth of a summer school that took place on the Island of
Porquerolles in September 2003. The goal of the school was mainly to teach
certain pieces of mathematics to practitioners coming from three dierent
communities: signal, control and dynamical systems theory. Our impression
was indeed that, in spite of their great potential applicability, 20th century
developments in approximation theory and Fourier theory, while commonplace
among mathematicians, are unknown or under-appreciated within the abovementioned communities. Specically, we had in mind:
some advances in analytic, meromorphic and rational approximation theory,
as well as their links with identication, robust control and stabilization
of innite-dimensional systems;
the rich correspondences between the complex and real asymptotic behavior of a function and its Fourier transform, as already described, for
instance, in Wieners books.
In this respect, it is noticeable that in the last twenty years, much eort has
been devoted to the research and teaching of recent decomposition tools, like
wavelets or splines, linked to real analysis. From the early stages, we shared
the view that, in contrast, research in certain elds suers from the lack of a
working knowledge of modern Fourier analysis and modern complex analysis.
Finally, we felt the need to introduce at the core of the school a probabilistic counterpart to some of the questions raised above. Although familiar
to specialists of signal and dynamical systems theory, probability is often ignored by members of the control and approximation theory communities. Yet
we hope to convey to the reader the conviction that there is room for fascinating phenomena and useful results to be discovered at the junction of
probability and complex analysis.
This book is not just a proceedings of the summer school, since the contributions made by the speakers have been totally rewritten, anonymously
refereed and edited in order to reect some of the common themes in which
the authors are interested, as well as the diversity of the applications. The

VIII

Preface

contributors were asked to imagine addressing a fellow-scientist with a nonnegligible but modest background in mathematics.
In drawing the boundaries between the chapters of the book, we have also
tried to eliminate redundancy, while allowing for repetition of a theme as seen
from dierent points of view.
We begin in Part I with a general introduction from the late Maciej Pindor. He surveys the conceptual and practical value of complex analyticity,
both in the physical and the conjugate Fourier variables, for physical theories
originally built in the real domain. Obstacles to analytic extension, like polar
singularities known as resonances, a key concept of the school, turn out to
have themselves a physical meaning. It is illustrated here by means of optical
dispersion relations and the scattering of particles.
Part II of this book contains basic material on the complex analysis and
harmonic analysis underlying the further developments presented in the book.
Candelpergher writes on complex analysis, in particular analytic continuation
and the use of Borel summability and Gevrey series. Partington gives an
account of basic harmonic analysis, including Fourier, Laplace and Mellin
transforms, and their links with complex analysis.
Part III contains further basic material, explaining some of the aspects
of approximation theory. Pindor presents the theory of Pade approximation,
including convergence issues. Levin and Sa explain how potential theoretic
tools such as capacity play a role in the study of ecient polynomial and
rational approximation, and analyse some weighted problems. Partington discusses the use of bases of rational functions, including orthogonal polynomials,
Szego bases, and wavelets.
Finally Part IV completes the foundations by a tour in probability theory.
The driving force behind the order emerging from randomness, the central
limit theorem, is explained by Collet, including convergence and fractal issues.
Dujardin gives an account of the properties of random real polynomials, with
particular reference to the distribution of their real and complex roots. Pindor
puts rational approximation into a stochastic context, the basic idea being to
obtain rational interpolants to noisy data.
The major application of the themes of this book lies in signal and control
theory, which is treated in Part V. Deistler gives a thorough treatment of
the spectral theory of stationary processes, leading to an account of ARMA
and state space systems. Cuocos paper treats the power spectral density of
physical systems and its estimation, to be used in the extraction of signals
out of noisy data. Olivi continues some of the ideas of Parts II and III, and,
under the general umbrella of the Laplace transform in control theory, discusses linear time-invariant systems, controllability and rational approximation.
Baratchart uses LaplaceFourier transform techniques in giving an account
of recent work analysing problems originating in the identication of linear
systems subject to perturbations. In a nal return to the perspective of the
Introduction, Parts VI and VII shows the r
ole of the previously-discussed
tools in extremely diverse domains of physics. In Part VI, some mathematical

Preface

aspects of dynamical systems theory are discussed. Biasco and Celletti are
concerned with celestial mechanics and the use of perturbation theory to analyse integrable and nearly-integrable systems. Baladi gives a brief introduction
to resonances in hyperbolic and hamiltonian systems, considered via the spectra of certain transfer operators. Part VII is devoted to a modern approach to
two classical physics problems. Borgnat is concerned with turbulence in uid
ow; he discusses which tools, including the Mellin transform, can be adapted to reveal the various statistical properties of intermittent signals. Finally,
Bondu and Vinet give an account of the high-performance control and noise
analysis required at the gravitational waves VIRGO antenna.
Last but not least, our thanks go to the authors of the 17 contributions
gathered in this book, as well as to all those who have helped us produce it,
with particular mention of the anonymous referees.

Nice (France), Sophia-Antipolis (France), Leeds (U.K.), July 2005.

The editors:

Jean-Daniel Fournier,
Jose Grimm,
Juliette Leblond,
Jonathan R. Partington.

Preface

Maciej PINDOR
Our colleague Dr. Maciej Pindor of Poland, the friend, collaborator and visitor of Jean-Daniel Fournier (JDF), died on Saturday 5th July 2003 at the
Nice Observatory. Apparently, he was on his way to work from the Pavillon Magnetique, where he was staying, to his oce at CION. His death
was attributed to cardiac problems. He was 62 years old. Some colleagues
were present, including the Director of the Observatoire de la C
ote dAzur
(OCA) and JDF, when help arrived.
Maciej Pindor was a senior lecturer at the Institute of Theoretical Physics
at the University of Warsaw. He performed his research work with the same
care that he devoted to his teaching duties. He was a specialist in complex
analysis, applied to some questions of theoretical physics, and, in recent years,
to the processing of data; he produced theoretical and numerical solutions,
which in this regard showed an ingenuity and reliability that is hard to match.
He taught eective computational methods to young physicists. From the
beginning of the thesis that Benedicte Dujardin has been writing under the
direction of JDF, M. Pindor participated in her supervision.
The collaboration of JDF and his colleagues with M. Pindor began in 1996.
Over the years, it was supported by regular or exceptional funding from the
Cassini Laboratory, the Theoretical Physics Institute of Warzaw, the Polish
Academy of Sciences and from OCA (with an associated post in astronomy).
Thus M. Pindor came to Nice several times, and many people knew him. His
genuine modesty made him a very accessible person, and dealings with him
were agreeable and fruitful in all cases.
For the summer school of Porquerolles, he had agreed to give three courses,
on three dierent subjects. In this he was motivated by friendship, scientic
interest, and his acute awareness of the teaching responsibility borne by university sta; since then he had overcome the anxiety that he felt towards the
idea of presenting mathematics in front of professional mathematicians. In
particular, he was due to give the opening course, showing the link between
physics and mathematics, treating the ideas of analyticity and resonance. He
produced his notes for the course in good time, and these are therefore included under his name in this book and listed in the table of contents. At
Porquerolles his courses were given by three dierent people. As co-worker
JDF took the topic rational approximation and noise. We sincerely thank
the two others: G. Turchetti, himself an old friend of M. Pindor, agreed to
expound the r
ole of analytic continuation and Pade approximants in theoretical and mathematical physics; E. B. Sa kindly oered to lecture on the
mathematics behind Pade approximants.
This book is dedicated to the memory of Maciej Pindor.
This obituary and M. Pindors photograph have been included here by
agreement with his widow, Dr. Krystyna Pindor-Rakoczy.

Preface

Memories of the Porquerolles School, a word from the

co-directors
As already mentioned in the Preface, we organized the editing of the present
book as a separate scientic undertaking, distinct from the school itself and
with a wider team including J. Grimm and J.R. Partington. Nevertheless we
feel bound to stress that the book is in part the result of the intellectual
and congenial atmosphere created in Porquerolles in September 2003 by the
speakers and the participants. Such moments are to be cherished, and have
rewarded us for our own preparatory work. This seems a natural place to
thank those of our colleagues who contributed to the running of the school,
either as scientists or assistants, including those whose names do not appear
here. Conversely we thank especially Elena Cuoco, who agreed to write a
chapter for the book, although she had not been able to attend the school for
personal reasons.
List of participants

D. Avanesso (INRIA, Sophia-Antipolis [SA]),

V. Baladi (CNRS, Univ. Jussieu, Paris),
L. Baratchart (INRIA, SA),
L. Biasco (Univ. Rome III, It.),
B. Beckermann (Univ. Lille),
F. Bondu (CNRS, Observatoire de la C
ote dAzur [OCA], Nice),

XII

Preface

P. Borgnat (CNRS, ENS Lyon),

V. Buchin (Russian Academy of Sciences, Moscow, Russia),
B. Candelpergher (Univ. Nice, Sophia-Antipolis [UNSA]),
A. Celletti (Univ. Rome Tor Vergata, It.),
A. Chevreuil (Univ. Marne la Vallee),
C. Cichowlas (ENS Ulm, Paris),
P. Collet (CNRS, Ecole Polytechnique, Palaiseau),
D. Coulot (OCA, Grasse),
F. Deleie (OCA, Grasse),
M. Deistler (Univ. Tech. Vienne, Aut.),
B. Dujardin (OCA, Nice),
Y. Elskens (CNRS, Univ. Provence, Marseille),
J.-D. Fournier (CNRS, OCA, Nice),
V. Fournier (Nice),
Ch. Froeschle (CNRS, OCA, Nice),
C. Froeschle (CNRS, OCA, Nice),
A. Gombani (CNR, LADSEB, Padoue, It.),
J. Grimm (INRIA, SA),
E. Hamann (Univ. Tech. Vienne, Aut.),
J.-M. Innocent (Univ. Provence, Marseille),
J.-P. Kahane (Acad. Sciences Paris et Univ. Orsay),
E. Karatsuba (Russian Academy of Sciences, Moscow, Russia),
J. Leblond (INRIA, SA),
M. Mahjoub (LAMSIN-ENIT, Tunis),
D. Matignon (ENST, Paris),
G. Metris (OCA, Grasse),
N.-E. Najid (Univ. Hassan II, Casablanca, Ma.),
A. Neves (Univ. Paris V),
L. Niederman (Univ. Orsay),
N. Nikolski (Univ. Bordeaux),
A. Noullez (OCA, Nice),
M. Olivi (INRIA, SA),
J.R. Partington (Univ. Leeds, GB),
J.-B. Pomet (INRIA, SA),
E.B. Sa (Univ. Vanderbilt, Nashville, USA),
F. Seyfert (INRIA, SA),
N. Sibony (Univ. Orsay),
M. Smith (Univ. York, GB),
G. Turchetti (Univ. Bologne, It.),
G. Valsecchi (Univ. Rome, It.),
J.-Y. Vinet (OCA, Nice),
P. Vitse (Univ. Laval, Quebec, Ca.).

Preface

XIII

Organization:
J. Gosselin (CNRS, Nice),
F. Limouzis (INRIA, SA),
D. Sergeant (INRIA, SA).
Finally we thank the sponsors of the school: CNRS (Formation Permanente),
INRIA (Formation Permanente), INRIA Sophia-Antipolis, Conseil Regional
PACA, Observatoire de la C
ote dAzur (OCA), Departement Cassini, Minist`ere delegue Recherche et Nouvelles Technologies, VIRGO-EGO. We thank
them all for their support.
Nice (France), Sophia-Antipolis (France), July 2005.
The co-directors:

J.-D. Fournier,
J. Leblond

Contents

Part I Introduction
Analyticity and Physics
Maciej Pindor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2
The optical dispersion relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3
Scattering of particles and complex energy . . . . . . . . . . . . . . . . . . . . . . 7
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Part II Complex Analysis, Fourier Transform
and Asymptotic Behaviors
From Analytic Functions to Divergent Power Series
Bernard Candelpergher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Analyticity and dierentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Analytic continuation and singularities . . . . . . . . . . . . . . . . . . . . . . . . .
3
Continuation of a power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Gevrey series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Borel summability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

15
15
19
22
27
30
37
37

Fourier Transforms and Complex Analysis

Jonathan R. Partington . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Real and complex Fourier analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
DFT, FFT, windows . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
The behaviour of f and f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Wieners theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Laplace and Mellin transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

39
39
45
46
49
51
55

XVI

Contents

Part III Interpolation and Approximation

Pad
e Approximants
Maciej Pindor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
The Pade Table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
3
Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Calculation of Pade approximants . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

59
59
60
63
66
69
69

Potential Theoretic Tools in Polynomial

and Rational Approximation
Eli Levin and Edward B. Sa . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Classical Logarithmic Potential Theory . . . . . . . . . . . . . . . . . . . . . . . . .
2
Polynomial Approximation of Analytic Functions . . . . . . . . . . . . . . . .
3
Approximation with Varying Weights a background . . . . . . . . . . .
4
Logarithmic Potentials with External Fields . . . . . . . . . . . . . . . . . . . .
5
Generalized Weierstrass Approximation Problem . . . . . . . . . . . . . . . .
6
Rational Approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

71
71
78
82
85
88
89
93

Good Bases
Jonathan R. Partington . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2
Orthogonal polynomials and Szeg
o bases . . . . . . . . . . . . . . . . . . . . . . . 96
3
Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Part IV The R
ole of Chance
Some Aspects of the Central Limit Theorem
and Related Topics
Pierre Collet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2
A short elementary probability theory refresher . . . . . . . . . . . . . . . . . 107
3
Another proof of the CLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4
Some extensions and related results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5
Statistical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6
Large deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7
Multifractal measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

Contents

XVII

Distribution of the Roots of Certain Random

Real Polynomials
Benedicte Dujardin . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
2
Real roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
3
Complex roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
4
Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
Rational Approximation and Noise
Maciej Pindor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
2
Rational Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
3
Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
4
Rational Interpolation with Noisy Data . . . . . . . . . . . . . . . . . . . . . . . . 148
5
Froissart Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
6
Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
Part V Signal and Control Theory
Stationary Processes and Linear Systems
Manfred Deistler . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
2
A Short View on the History . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
3
The Spectral Theory of Stationary Processes . . . . . . . . . . . . . . . . . . . . 162
4
The Wold Decomposition and Forecasting . . . . . . . . . . . . . . . . . . . . . . 172
5
Rational Spectra, ARMA and State Space Systems . . . . . . . . . . . . . . 173
6
The Relation to System Identication . . . . . . . . . . . . . . . . . . . . . . . . . . 177
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
Parametric Spectral Estimation and Data Whitening
Elena Cuoco . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181
2
Parametric modeling for Power Spectral Density:
ARMA and AR models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
3
AR and whitening process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
4
AR parameters estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185
5
The whitening lter in the time domain . . . . . . . . . . . . . . . . . . . . . . . . 187
6
An example of whitening . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 189
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

XVIII Contents

The Laplace Transform in Control Theory

Martine Olivi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193
2
Linear time-invariant systems and their transfer functions . . . . . . . . 194
3
Function spaces and stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197
4
Finite order LTI systems and their rational transfer functions . . . . . 200
5
Identication and approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
Identication and Function Theory
Laurent Baratchart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
2
Hardy spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
3
Motivations from System Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
4
Some approximation problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 218
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Part VI Dynamical Systems Theory
Perturbative Series Expansions: Theoretical Aspects
and Numerical Investigations
Luca Biasco and Alessandra Celletti . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 233
2
Hamiltonian formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
3
Integrable and nearlyintegrable systems . . . . . . . . . . . . . . . . . . . . . . . 237
4
Perturbation theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 241
5
A discrete model: the standard map . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
6
Numerical investigation of the breakdown threshold . . . . . . . . . . . . . 253
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 260
Resonances in Hyperbolic and Hamiltonian Systems
Viviane Baladi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
1
Two elementary key examples Basic concepts . . . . . . . . . . . . . . . . . . 263
2
Theorems of Ruelle, Keller, Pollicott, Dolgopyat... . . . . . . . . . . . . . . . 268
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 269
Part VII Modern Experiments in Classical Physics:
Information and Control
Signal Processing Methods Related to Models of Turbulence
Pierre Borgnat . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
1
An overview of the main properties of Turbulence . . . . . . . . . . . . . . . 277
2
Signal Processing Methods for Experiments on Turbulence . . . . . . . . 287
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 298

Contents

XIX

Control of Interferometric Gravitational Wave Detectors

Francois Bondu and Jean-Yves Vinet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
2
Interferometers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
3
Servo systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
4
Conclusion and perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310

List of Contributors

Viviane Baladi
CNRS UMR 7586,
Institut Mathematique de Jussieu,
75251 Paris (France)
baladi@math.jussieu.fr
Laurent Baratchart
Inria, Apics Team
2004, Route des Lucioles
06902 Sophia Antipolis (France)
baratcha@sophia.inria.fr
Luca Biasco
Dipartimento di Matematica,
Universit`a di Roma Tre,
Largo S. L. Murialdo 1,
I-00146 Roma (Italy)
biasco@mat.uniroma3.it
Fran
cois Bondu
Laboratoire Artemis
CNRS UMR 6162
Observatoire de la C
ote dAzur
BP4229 Nice (France)
Francois.Bondu@obs-nice.fr
Pierre Borgnat
Laboratoire de Physique
UMR-CNRS 5672

ENS
Lyon 46 allee dItalie
69364 Lyon Cedex 07 (France)
Pierre.Borgnat@ens-lyon.fr

Bernard Candelpergher
University of Nice-Sophia Antipolis
Parc Valrose
06002 Nice (France)
candel@math.unice.fr
Alessandra Celletti
Dipartimento di Matematica,
Universit`a di Roma Tor Vergata,
Via della Ricerca Scientica 1,
I-00133 Roma (Italy)
celletti@mat.uniroma2.it
Pierre Collet
Centre de Physique Theorique
CNRS UMR 7644
Ecole Polytechnique
F-91128 Palaiseau Cedex (France)
collet@cpht.polytechnique.fr
Elena Cuoco
INFN, Sezione di Firenze,
Via G. Sansone 1,
50019 Sesto Fiorentino (FI),
present address:
EGO, via Amaldi,
Santo Stefano a Macerata,
Cascina (PI) (Italy)
elena.cuoco@ego-gw.it

XXII

List of Contributors

Manfred Deistler
Department of Mathematical
Methods in Economics,
Econometrics and System Theory,
Vienna University of Technology
Argentinierstr. 8,
A-1040 Wien (Austria)
Deistler@tuwien.ac.at
B
en
edicte Dujardin
Departement Artemis,
Observatoire de la C
ote dAzur,
BP 4229, 06304 Nice (France)
dujardin@obs-nice.fr

Jonathan R. Partington
School of Mathematics
University of Leeds
Leeds LS2 9JT (U.K.)
J.R.Partington@leeds.ac.uk
Maciej Pindor
Instytut Fizyki Teoretycznej,
Uniwersytet Warszawski ul.Hoza 69,
00-681 Warszawa (Poland)
deceased

Eli Levin
The Open University of Israel
Department of Mathematics
P.O. Box 808, Raanana (Israel)
elile@openu.ac.il

Edward B. Sa
Center for Constructive Approximation
Department of Mathematics
Vanderbilt University
Nashville, TN 37240 (USA)
esaff@math.vanderbilt.edu

Martine Olivi
Inria, Apics Team
2004, Route des Lucioles
06902 Sophia Antipolis (France)
Martine.Olivi@sophia.inria.fr

Jean-Yves Vinet
ILGA, Departement Fresnel
Observatoire de la C
ote dAzur
BP 4229, 06304 Nice (France)
vinet@obs-nice.fr

Analyticity and Physics

Maciej Pindor
Instytut Fizyki Teoretycznej,
Uniwersytet Warszawski ul.Hoza 69,
00-681 Warszawa, Poland.

1 Introduction
My goal is to present to you some aspects of the role that the mathematical
concept as subtle and abstract as analyticity plays in physics.
In retrospective we could say that also the real number notion is in
fact a very abstract one and its applicability to the description of the world
external to our mind, is sort of a miracle I do not want to dwell here on a
relation between constructs of the mind and the external world this is the
playground for philosophers and I do not wish to compete with them. I mean
here the intuitively manifest dierence between the obvious nature of integer
numbers (and nearly obvious nature of rationals) and abstractness of real
numbers. This abstractness notwithstanding, I do not think that talking in
terms of real numbers when describing the real world needed much more
intellectual eort than applying rational numbers there. This fact is excellently
demonstrated by the fact that in practice we use only rationals: e.g. oating
point numbers in computer calculations practitioners just ignore the subtle
avour of irrationals and treat them as rationals represented in decimal system
by a sucient number of digits.
The situation is completely dierent with complex numbers. Contrary to
many other mathematical notions, they originated entirely within pure mathematics and even for mathematicians they seemed so strange that the word
imaginaire was attributed to them! No real world situation seemed to
demand complex numbers for its mathematical description. However already
Euler (and also dAlembert) observed that they were useful in solving problems in hydrodynamics and cartography [4]. Once domesticated by mathematicians, complex numbers slowly creeped into physical papers, though only
as an auxiliary and convenient tool when dealing with periodic solutions of
some mechanical systems (the spherical pendulum studied by Tissot [7]). Their
particular usefulness was discovered by Riemann for describing some form of
the potential eld [6] and when he studied Maxwell equations [9], but again

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 312, 2006.
Springer-Verlag Berlin Heidelberg 2006

Maciej Pindor

they played here a role of a shorthand notation for a simultaneous description of two dierent, though related, physical quantities. Even the advent of
the quantum mechanics did not change too much although the wave function was essentially complex and its real and imaginary part had no separate
existence, the values of the function had no physical meaning themselves. It
was its modulus that was interpretable and so physicists could think of its
complexness as of some mathematical trick however with some feeling of
uneasiness, this time.
As far as I know, the rst individuals that truly opened the complex plane
for physics were Kramers and Kronig (see [3]). They had the daredevil idea
that extending the domain of a function, having a well dened physical quantity as its argument the frequency in their case to the complex plane, can
lead to conclusions veriable experimentally. They have shown, moreover,
that properties of this function in the complex plane are connected to important physical conditions on another function. Their idea seemed a curiosity
and only 25 years later it was found useful and advantages of considering
energy on the complex plane were discovered. Even then physicists felt still
uneasy with this, and when few years later Tulio Regge proposed extending
the angular momentum to the complex plane his paper was rejected by many
referees [1].
In the following I shall briey review the original idea of Kramers and Kronig (following closely the exposition of [3]), the consequences of the extension
of the energy to the complex plane in the description of particle scattering
and the Regge idea.

2 The optical dispersion relations

Kramers and Kronig considered light in a material medium. The physical
situation there is described by two elds: the electric eld E(x, t) and the
displacement eld D(x, t). Let me clarify that the latter eld comes from a
superposition of the former one and elds produced by atoms and particles of
the medium polarized by the presence of E.
Their monochromatic components of frequency are related through

D(x,
) = ()E(x,
)

(1)

where () is called the dielectric constant and is frequency dependent, because the response of the medium to the presence of E depends on frequency.
These frequency components are just Fourier transforms of the temporal dependence of the elds, e.g.
1
E(x, t) =
2
and vice versa

E(x,
)eit d

Analyticity and Physics

E(x,
) =
2

E(x, )ei t d .

Using now (1) and assuming that the functions considered vanish at innity
in time and frequency fast enough as to make exchange of order of integration
possible, we arrive at
+

D(x, t) = E(x, t) +

G( )E(x, t )d

(2)

where G( ) is the Fourier transform of () 1:

G( ) =

1
2

[() 1]ei d .

(3)

These mathematical manipulations may seem not very inspiring, but if we

look carefully at (2) we can observe that it is somewhat strange it says that
the value of D at the moment t depends on the values of E at all instants of
time we say that the connection between D and E is nonlocal in time. Well,
we can understand that polarizing of atoms and molecules takes some time
and therefore the eect of changing E will be felt by the values of D after
some time, but how can D at time t depend on values of E in later times
what is represented in (2) by the part of the integral from to 0? Every
physicist would say: IT CANNOT DEPEND! It would violate causality.
This means that we must have G( ) 0 for < 0. Consequently, this
means that there are some necessary conditions on the dependence of on .
If we invert the Fourier transform in (3) we get now
() = 1 +

G( )ei d .

(4)

Already at the very birth of the theoretical optics physicists used some simple models, classical ones because quantum mechanics was not yet born,
to describe the interaction between light and matter and these models lead
to expressions for () satisfying our requirement that G( ) 0 for < 0.
However, truly speaking, the phenomenon of polarization of atoms and molecules is a very complicated one and even now it is not easy to describe it in
all its details and it is not obvious how should one guarantee vanishing of the
predicted G( ) for negative arguments.
Kramers and Kronig observed that the most general conditions one should
impose on () to have causality satised, is just that it be of the form (4)
with some real G( ). Again, this form would not be so very interesting if not
their daring concept of considering () as a function of complex . Once
they did this, many interesting conclusions followed. The most fundamental
observation is that if G( ) is nite for all , () is an analytic function of
in the upper half plane.

Maciej Pindor

Although you will soon listen to a lecture on fundamentals of functions of a

complex variable I am afraid I have to state here very briey what analyticity
is and what are its consequences for (). It sounds deceptively simple: f (z)
is analytic at z = z0 if it has a derivative at this point. However the point is
now a point of the plane, therefore the requirement leads to so called CauchyRiemann equations, which are, actually, dierential equations relating the real
and the imaginary parts of f (z) as functions of the real and imaginary parts
of z (in fact these equations were written already by dAlembert and Euler!).
The amazing consequence is that if a function possesses a rst derivative at
some point it possesses all derivatives there and also it has a Taylor expansion
with non-zero radius of convergence at this point! Moreover if f (z) is analytic
inside some domain D and C is a smooth closed curve encircling its interior
counterclockwise (simple closed rectiable positively oriented curve) with
inside the curve, then there holds the Cauchy theorem
f () =

1
2i

f (t)
dt.
t

(5)

We can now take D as the upper half plane, innitesimally above the real
axis and C as on the Figure 1 and write (5) for f () = () 1. With the
condition that () 1 vanishes for large at least like 1/ 2 , which can
be justied by some physical arguments, we can take R and neglect
the integral over CR . With some more maneuvering we arrive at the famous
dispersion relations for the real and the imaginary parts of ().
Re () = 1 +

1
P

1
Im () = P

Im (t)
dt
t

[Re (t) 1]
dt.
t

(6)

The name comes from the fact that the dependence of on leads to the
phenomenon called dispersion the change of the shape of the light wave
penetrating a material medium. The real part of is directly related to this
phenomenon, while the imaginary part is connected with the absorption of
light. Therefore they can be both measured and, not unexpectedly, experimental data conrm the validity of (6).
On the other hand the Titchmarsh theorem [8] says that if a function F (z)
satises relations of the type (6) then its Fourier transform vanishes on the
real negative semiaxis. Thus, not only the physical condition of causality
leads to denite analytical properties of some function implying a relation
between its real and imaginary parts on the real axis that can be conrmed by
physical experiments, but also the experimentally veriable relation between
two physical quantities, when they are considered the real and imaginary parts
of an analytical function on the real axis, implies a property of the Fourier
transform of this function, the one having the meaning of causality!

Analyticity and Physics

-R

C
R Re

Fig. 1. Contour for the integral (5) in the complex plane of

3 Scattering of particles and complex energy

In the middle of fties of the last century the physics of subatomic constituents of matter, called elementary particles, amassed a vast amount of
experimental observations which were impossible to explain on the grounds
of the fundamental theory of the microworld the Quantum Field Theory.
Not that they were in contradiction with the QFT simply the equations of
the QFT could have been solved only in some approximation scheme, called
the perturbation theory, that seemed to fail completely except in the case of
electromagnetism, where it (called Quantum Electrodynamics) worked perfectly.
However the QFT had still another important deciency actually it had
no rm mathematical foundations. In fact it was a cookbook of recipes how to
deal with objects of a very obscure mathematical meaning to extract formulae
containing quantities related to laboratory observations. Therefore, although
the QFT came to existence as the logical extrapolation of ideas of the Quantum Mechanics so fantastically fruitful in explaining the atomic world to
the realm of the relativistic phenomena where mass and energy are one and
the same physical quantity and where physical particles are freely created
and annihilated, it was slowly looked at with a growing suspicion. Its inability

Maciej Pindor

to deal with the growing mass of observational data concerning elementary

particles seemed to seal its fate.
In this desperate situation it was recalled that 25 years earlier Kramers and
Kronig were able to derive their dispersion relations using the apparatus
of the functions of a complex variable with only the fundamental physical
property as causality, as input.
The simplest process studied in elementary particle physics is the elastic
scattering of two spinless particles. The word elastic means that the same
two particles that enter the scattering process, emerge from it and no other
particle is created in the process.
The states of the particles are dened by their four-momenta space-time
vectors with three components being the ordinary momentum, and the fourth
(or rather zeroth, in the notation I shall use) component being the energy
of the particle. The four-momenta of the particles before the scattering are
p1 and p2 and after the scattering they are p3 and p4 . Squares of this fourmomenta are just masses of the particles squared let me remind you that
the space-time has the special metric
p2i = p2i,0 p2i,1 p2i,2 p2i,3 = Ei2 p2i = m2i

i = 1, ..., 4 .

The total four-momentum of the system

P = p1 + p2 = p3 + p4 ;

pi = (Ei , pi )

i = 1, ..., 4

is conserved and so is the total energy. In the special reference system, called
the center of mass system (c.m.s.), the total momentum is zero, and therefore
the c.m.s. energy squared is equal to s = P 2 . Another four-vector important
in the description of the process is the momentum transfer q, together with
its square t
q = p1 p3 = p2 + p4 ;

t = q2 .

In the scattering of two particles of identical masses m the momentum transfer

is simply related to the scattering angle and the energy via
1
t = (1 cos )(s 4m2 )
2
and is negative, while the (relativistic) energy is larger than 4m2 .
The quantity relevant in this context is the scattering amplitude A(s, t).
The squared modulus of the scattering amplitude is, apart of some kinematical factors, the cross-section for the scattering loosely speaking a
probability of the registration of the scattered particle along a direction dened by the given momentum transfer when the scattering takes place at the
given energy.
Using the very general formulae for this scattering amplitude following
from QFT and applying as precise mathematical apparatus as was possible in

Analyticity and Physics

this context at that time, it appeared possible to show again that relativistic
causality (i.e. impossibility of any relation between events separated in such a
way that they could not be connected by signals traveling with a speed inferior
or equal to the velocity of light) implies some special analyticity properties
of the scattering amplitude in the complex plane of energy (see e.g. [2] and
references therein).
In fact, the fascinating connection between the physical requirement
causality and the abstract mathematical property analyticity has been
rigorously (almost) shown only for the forward scattering amplitude, i.e. at
t = 0. These analytical properties allowed then one, using the theorems from
complex variable functions theory, to write the dispersion relations for the
scattering amplitude of the type
A(s, 0) =

4m2

Im A(s , 0)ds
1
+
s s i

Im A(s , 0)ds
.
s s i

(7)

Here i means that the integration runs just above the real axis. This integral representation of A(s, 0) as a function of complex s means that this
function has the very nasty singularities (i.e. the points where it is not analytic) at s = 4m2 and s = 0 (and possibly s = ) called the branchpoints. They
are nasty, because they make the function multivalued if we walk along a
closed curve encircling such a point, then at the point from which we started
we nd a dierent value of the function. I cannot dwell on this horror (or,
to me, the fascinating property of the complex plane) here but can only say
that the multivalued function can be made univalued by removing, from the
complex plane, lines joining the branchpoints such lines are called the cuts.
Looking at (7) you see that A(s, 0) is not dened on (, 0) and (4m2 , )
these are the cuts. On the other hand the function has well dened limits
when s approaches these semiaxes from imaginary directions. The limit from
above for s (4m2 , ) is just the physical scattering amplitude because
these values of s correspond to physical scattering process. On the other hand
the limit from below for s (, 0) corresponds to the scattering amplitude for another process related to the one we consider, through the crossing
symmetry a property of the scattering amplitude suggested by the QFT.
Combining this property with unitarity loosely speaking the requirement
that the probability that anything can happen (in the context of the scattering, of course) is one, leads to conclusions that again could have been veried
experimentally. This was a great triumph, because earlier no quantitative predictions concerning phenomena connected with new types of interactions (new
with respect to electromagnetism) could have been given.
The great success of the simplest dispersion relations prompted many
theoreticians to study the analytical structure of the scattering amplitude
as suggested by the perturbation theory though the later produced divergent expansions. This analytical structure appeared to be very rich with many
branchpoints on the real axis (where the amplitude had a physical meaning)
with locations depending on masses of the scattered particles, and poles at

Maciej Pindor

energies of the bound states (if any) of these particles. Moreover, as mentioned
above, the crossing symmetry implied direct connections between values of
the scattering amplitude on some edges of dierent cuts. Causality implied
that the scattering amplitude is analytic on the whole plane of complex energy
properly cut along the real axis, but it was soon realized that there have to
exist poles on the unphysical sheets one of the fantastic properties of the
analytic functions is that they can undergo the analytic continuation. You will
learn more about it during the lectures to come, but here I shall describe it
as a feature which makes the function dened on its whole domain, once it is
dened on the smallest piece of it. The domain can mean also other copies
(called Riemannian sheets ) of the complex plane if there are branchpoints
reached when one continues function analytically across the cuts. In elementary particle physics, the sheet on which energy has the physical meaning,
is called the physical sheet. The ones reached through analytic continuation
of the amplitude across the cuts, are called the unphysical sheets. I want to
make clear this fundamental fact: the assumption of analyticity of the scattering amplitude as a function of complex energy means that its values on
sections of the real axis, where the values of energy correspond to the physical scattering process, dene the scattering amplitude on all its Riemannian
sheets. In particular for many types of scattering processes the amplitude had
to have poles on the rst unphysical sheet. These poles were the manifestations of resonances experimentally seen enhancements of the cross-section,
related in solvable models of scattering (e.g. nonrelativistic scattering described by the Schr
odinger equation) to short-living quasibound states of the
scattered particles and therefore also in relativistic description attributed to
an existence of short living non-stable particles.
Also using suggestions from the expansions of the scattering amplitude
obtained in the perturbation theory, the so called double dispersion relations
written both in the complex s and t planes were postulated and some
veriable and veried! conclusions followed from them.
Another astonishing concept was put forward by T. Regge [5]. He considered the, so called, partial waves expansion of the nonrelativistic scattering
amplitude A(q 2 , t)
A(q 2 , t) = f (q 2 , cos()) =

(2l + 1)Al (q 2 )Pl (cos())

l=0

where Pl (z) are the Legendre polynomials. Al (q 2 ) are called the partial wave
amplitudes and describe the scattering at the given angular (orbital) momentum. The sum runs over integers only, because in quantum physics the angular
momentum is quantized, i.e. it can take on values only from the discreet
countable set. Regge had, however, an idea to consider the angular momentum
in the complex plane!
He studied the nonrelativistic scattering for a reasonable class of potentials (a superposition of Yukawa potentials) and was able to show that

Analyticity and Physics

Al (q 2 ) is meromorphic in l in the half plane Re l > 1/2 where it vanishes

exponentially as |l| . Using this and writing the above expansion as the
integral
f (q 2 , cos()) =

i
2

dl(2l + 1)A(l, q 2 )

Pl ( cos())
sin(l)

where the contour C encircled the positive semiaxis clockwise (so it was, in
fact, the sum of small circles around all positive integers), he could deform
the contour C by moving its ends at i to 21 i. As the result he got
f (q 2 , cos()) =

i
2

12 +i
21 i

dl(2l + 1)A(l, q 2 )

Pl ( cos())
sin(l)

(2n (q 2 ) + 1)n (q 2 )
Pn (q2 ) ( cos())
sin(n (q 2 ))
n=1

where the sum runs over all poles (called since then the Regge poles) of A(l, q 2 )
in the half plane of the complex l, Re l > 12 .
The most exciting part came from the fact that for q 2 < 0, we call it below
threshold, all these poles lie on the real axis and correspond precisely to bound
states of the potential at energies (q 2 ) at which n (q 2 ) equals to an integer
being the angular momentum of the given bound state! When q 2 grows above
the threshold (becomes positive) n (q 2 ) move to the complex plane and when
at some qr the real part of it crosses an integer, the scattering amplitude has
a form
a
(q 2 qr2 )b + i Im n (qr2 )
characteristic of a resonance. This way bound states and resonances were
grouped into Regge trajectories originating from the same n (q 2 ).
It was then immediately conjectured that the relativistic scattering amplitude shows the same (or analogous) behaviour in the complex angular momentum plane. Though many actual resonances were grouped into Regge trajectories, other conclusions were not veried experimentally, what was attributed
to a hypothetical existence of branchpoints of the scattering amplitude in the
complex angular momentum plane. When such branchpoints were included
the theory lost its beautiful simplicity and its predictive power was considerably limited. Because of that, its attractivity paled and though it is still
considered that actually bound states and resonances form families lying on
Regge trajectories, no more much importance is attributed to this fact.
This amazing fact that elements of the analytical structure of the scattering amplitude, as a function of the complex energy and momentum transfer,
have direct physical meaning, induced some physicist to think that just the
proper analytical properties of the scattering amplitude compatible with the

Maciej Pindor

fundamental physical conditions (like the crossing symmetry or the unitarity) could form the correct set of assumptions to build a complete theory of
the phenomena concerning elementary particles. This point of view fell later
out of fashion in the view of the spectacular success of the developments of
the QFT which take now the shape of the Nonabelian Gauge Field Theory.
Nevertheless the lesson that functions describing the physical observations in
terms of the physically measurable parameters must be studied for complex
values of these parameters because the analytic properties of such functions
have direct relation to true physical phenomena underlying the observations,
is now deeply rooted in the thinking of physicists.

References
1.
2.
3.
4.
5.
6.
7.
8.
9.

G. Bialkowski. Private information.

S. Gasiorowicz. Elementary Particle Physics. John Wiley and Sons, 1966.
J.D. Jackson. Classical Electrodynamics. John Wiley and Son, 1975.
A. Markushevich. Basic notions of mathematical analysis in Euler papers.
Leonard Euler Acad. Nauk SSSR, 1959. (in Russian)
T. Regge. Nuovo Cimento 14, p. 951. 1959.
B. Riemann. Bernhard Riemanns gesammelte mathematische Werke. Dover
Publications, 1953. p. 431.
Tissot. Journal de Liouville 1857; according to P. Appel, Traite de mechanique
rationnelle vol. 1, Paris. 1932.
E.C. Titchmarsh. Introduction to the Theory of Fourier Integrals. Oxford Univ.
Press, 1948.
H. Weber. Die partiellen Diential-Gleichungen der mathematischen Physik
nach Riemanns Vorlesungen. Friedrich Vieweg u. Sohn, 1901.

From Analytic Functions to Divergent Power

Series
Bernard Candelpergher
University of Nice-Sophia Antipolis
Parc Valrose
06002 Nice (France)
candel@math.unice.fr

1 Analyticity and dierentiability

1.1 Dierentiability
The functions occurring commonly in classical analysis, such as xn , ex , Log(x),
sin(x), cos(x), . . . , are not only dened on intervals in R, but they can also
be dened when the variable x (which we shall now denote by z) lies in some
subdomain of C. These domains are the subsets of U of C that we call open
sets, and are characterised by the property
z0 U there exists r > 0 such that D(z0 , r) U
where D(z0 , r) = {z C , |z z0 | < r} is the disc with centre z0 and radius r.
Let U be an open subset of C and let f : U C be a function. We say
that f is dierentiable on U if for z0 U , the expression
f (z) f (z0 )
z z0
tends to a nite limit when z tends to z0 in U . We denote this limit by f (z0 )
or f (z0 ). We say also that f is holomorphic on U (this terminology comes
from the fact that f (z) a + b(z z0 ) for z in a neighbourhood of z0 , and so
f is locally a similarity).
Formally, the denition of dierentiability in C is the same as in R, and
its immediate consequences, such as the dierentiability of a sum, a product
and a composition of functions, will therefore continue to hold. However, the
notion of dierentiability in C is more restrictive than in R since the expression
f (z) f (z0 )
has to tend to the same limit no matter how z tends to z0 in the
z z0
complex plane. In particular if we write z = x + iy, the function f , considered
as a function of two real variables, x and y, will have partial derivatives
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 1537, 2006.
Springer-Verlag Berlin Heidelberg 2006

Bernard Candelpergher

with respect to x and y, satisfying certain equations known as the CauchyRiemann equations.
Indeed, let us consider the functions
: (x, y) Re f (x + iy)
: (x, y) Im f (x + iy).
It is easy to check that the dierentiability of f with respect to z implies
that the functions and are dierentiable with respect to x and y, and
that
f (x + iy) = x (x, y) + ix (x, y)
1
= (y (x, y) + iy (x, y))
i
and hence the partial derivatives satisfy the Cauchy-Riemann equations:
x = y ,
y = x .
The properties of holomorphic functions on an open subset U of C are therefore much more striking than those of functions of a real variable. In particular
a function that is holomorphic on U \ {a} and with a nite limit at a is holomorphic on U (this is the Riemann theorem).
1.2 Integrals
Let f be an holomorphic function on an open subset U of C and a path in
U (so is a piecewise continuously dierentiable function on an interval [a, b]
with values in U ; if (a) = (b), we say that is a closed path). We write

f (z)dz =

b
a

f ((t)) (t)dt.

A natural question is to see how this integral depends on the path , and in
particular, what happens if we deform the path continuously, while remaining in U . It is the concept of homotopy that allows us to make this precise,
saying that two paths 0 and 1 with the same endpoints (or two closed paths), are homotopic in U if there exists a family s of intermediate paths (resp.
of closed paths) between 0 and 1 , having the same endpoints as 0 and 1 ,
which depend continuously on the parameter s [0, 1].
The homotopy theorem
If f is holomorphic in U , and if 1 and 2 are two paths with the same
endpoints, or else two closed paths, which are homotopic in U , then

From Analytic Functions to Divergent Power Series

f (z)dz =

f (z)dz.

Since the integral along a closed path consisting of a single point z0 (i.e.,
the closed path t z0 for all t) is zero, it follows from the homotopy theorem
that if f is holomorphic in U and if we can continuously contract a closed
path down to a point z0 in U while remaining all the time in U , then we
have

f (z)dz = 0.

Connected open sets U (i.e., ones consisting of a single piece) for which
every closed path in U is homotopic in U to a single point in U are called
simply connected.
We deduce from the above that if f is holomorphic on a simply connected
open set U and z0 is a point of U, then for every closed path in U we have

f (z) f (z0 )
dz = 0.
z z0

Since we have
C(z0 ,r)

1
dz = 2i,
z z0

with C(z0 , r)(t) = z0 + r exp(it), t [0, 2], the circle of center 0 and radius r,
then if f is holomorphic on a simply connected open set U and C(z0 , r) U,
we have Cauchys formula
f (z0 ) =

1
2i

C(z0 ,r)

f (z)
dz.
z z0

1.3 Power series expansions

Cauchys formula enables us to show that a function f that is holomorphic
on an open subset U of C is in fact innitely dierentiable, we have
n f (z0 ) =

n!
2i

C(z0 ,r)

f (z)
dz.
(z z0 )n+1

Writing the Cauchy formula at z

f (z) =

1
2i

C(z0 ,r)

and expanding

f (u)
1
du =
(u z0 ) (z z0 )
2i

C(z0 ,r)

f (u)
1
du
u z0 1 (zz0 )
(uz0 )

Bernard Candelpergher

1
1

(zz0 )
(uz0 )

=
n0

(z z0 )n
(u z0 )n

we see that f can be expanded in a Taylor series about every point of U .

Precisely for each z0 U and for all R > 0 such that D(z0 , R) U, we have
+

f (z) =

n f (z0 )
(z z0 )n
n!
n=0

for every z D(z0 , R). We say that f is analytic on U .

We see therefore that if f is holomorphic on an open subset U of C, then
the radius of convergence of the Taylor series of f about z0 is greater than
or equal to every R > 0 for which D(z0 , R) U . In other words, the disc of
convergence of the Taylor series of f about z0 is only controlled by the regions
where f fails to be holomorphic.
1.4 Some properties of analytic functions
The principle of isolated zeroes
This principle may be expressed as the fact that the points where an analytic
function f on U takes the value zero, i.e., the zeroes of f , cannot accumulate
at a point in U (unless f is identically zero). In other words, no compact
subset of U can contain more than nitely many zeroes of f .
Uniqueness of analytic functions
Cauchys formula shows that a function analytic in the neighbourhood of a
disc is fully determined on the interior of the disc if one knows its values on
the circle bounding the disc. We see a further uniqueness property in the fact
that if f is an analytic function on a connected open set U , then the values
of f on a complex line segment [z0 , z1 ] of U , joining two dierent points z0 , z1
of U , determine f uniquely on the whole of U .
To put it another way, if two analytic functions f and g on a connected
open set U are equal on a segment [z0 , z1 ] of U , then they are equal on the
whole of U .
The maximum principle
If f is a non-constant analytic function on U , then the function |f | cannot
have a local maximum in U , in particular if U is bounded, the maximum of
|f | is attained on the boundary of U .

From Analytic Functions to Divergent Power Series

Sequences, series and integrals of analytic functions

If (fn ) is a sequence of analytic functions in U , converging uniformly on every
disc in U , then the limit function f is also analytic and we have f (z) =
limn+ fn (z), for every z U .
Let (fn ) be a series of analytic functions on U , and suppose that n0 fn
converge uniformly on every disc in U . Then f = n0 fn is analytic on U
and we also have f (z) = n0 fn (z), for every z U .
Let z f (t, z) be an analytic function on U depending on a real parameter
t ]a, b[, if there exist a function g such that
b
a

g(t)dt < +

and
|f (t, z)| g(t)
for all z U , then the function
z

b
a

f (t, z)dt

is analytic on U .

2 Analytic continuation and singularities

2.1 The problem of analytic continuation
Let f be an analytic function on an open set U , and let V be an open set
containing U . We seek a function g, analytic on V , such that g = f on U .
We say that such a g is an analytic continuation of f to V .
If V is a connected open set containing U , then the analytic continuation
g of f to V , if it exists, is unique.
On the other hand, the existence of an analytic continuation g of f to V
is not guaranteed.
2.2 Isolated singularities
The obstructions to analytic continuation are the points or sets of points that
we call singularities.
More precisely, if U is a non-empty open set, and a is a point on the
boundary of U , then we say that a is a singularity of f if there is no analytic
continuation of f to U D(a, r) for any disc D(a, r) with r > 0.

Bernard Candelpergher

The most simple singularities are the isolated singularities: a singularity a

of f is an isolated singularity, if f is analytic in a punctured disc D(a, R) \ {a}
for some R > 0, but there is no analytic continuation of f to D(a, r).
We can distinguish two types of isolated singularity, depending on the
behaviour of f (z) as z a. If |f (z)| + as z a we say that a is a pole
of f , otherwise we say that a is an essential singularity of f , this is the case
for example if we take exp(1/z) at 0.
2.3 Laurent expansion
If the point a is a pole of f , then there is a disc D(a, R) with R > 0, such
that
+

f (z) =

cm
c1
+ ... +
cn (z a)n for every z D(a, r) \ {a}.
+
(z a)m
(z a) n=0

This is called the Laurent expansion of f about a, and the singular part
cm
c1
+ ... +
(z a)m
(z a)
is called the principal part of f at a.
If a is an essential singularity of f , then the expansion above becomes
+
n
n= cn (z a) with an innite number of non-zero cn such that n < 0.
2.4 Residue theorem
Let U be an open set, a U and f an analytic function in U \ {a}. The
coecient c1 of the Laurent expansion of f about a is called the residue of
f at a, denoted Res(f, a). This number is all that is needed to calculate the
integral of f around a small closed path winding round a.
More precisely, for every closed path homotopic in U \ {a} to a circle
centred at a we have

f (z)dz = 2i Res(f, a).

We deduce that if U is a simply-connected open set, if a1 , a2 , . . . , an are

points in U and f is an analytic function in U \ {a1 , a2 , . . . , an }, then we have
n

Res(f, ai ),

f (z)dz = 2i
i=1

where is a closed path in U \ {a1 , a2 , . . . , an } such that for every i the curve
is homotopic in U \ {ai } to a circle centre ai .

From Analytic Functions to Divergent Power Series

2.5 The logarithm

There exist examples of singularities that are not isolated but are branch
points; we see an example when we try to dene the function log on C.
We can dene the function log by
log(z) =

z
1

1
du
u

where we integrate along the complex line segment joining 1 to z.

Since the line segment must avoid 0, we see that this function is dened
and analytic on C \ ], 0]; we call it the principal value of the complex
logarithm.
We write arg for the continuous function on C \ ], 0] with values in
], +], such that z = |z|ei arg(z) for each z C \ ], 0], and we call this
function the principal value of the argument.
One can check that
log(z) = ln |z| + i arg(z)

for every z C \ ], 0].

It follows that elog(z) = z and that log has a discontinuity of 2i on the

half-line ], 0[, that is,
lim log(x + i) log(x i) = 2i

for every x ], 0[.

Thus the point 0 is a singularity of log, but not an isolated singularity since
log cannot be continued analytically to a disc centred at 0. The point 0 is a
singularity of log called a branch point.
Let U be a connected open set; then we call any analytic function log on
U satisfying elog(z) = z for all z U a branch of the logarithm in U .
We call a continuous function on a connected open set U a branch of the
argument if for each z U we have z = |z|ei(z) .
Every branch of the logarithm in U can be written
log(z) = ln(|z|) + i(z),
where is a branch of the argument in U . Conversely, each branch of the
argument allows us to dene a branch of the logarithm, by the above formula.
For example we dene a branch of the logarithm on C \ [0, +[ by
Log(z) = ln |z| + i Arg(z)

for every z C \ [0, +[

where Arg is the continuous function on C \ [0, +[ with values in ]0, +2[,
such that z = |z|ei Arg(z) for each z C \ [0, +[.

Bernard Candelpergher

3 Continuation of a power series

Let f (z) = n0 an z n be a power series; this will have a natural domain of
convergence that is a disc D(0, R) in the complex plane, where the radius of
convergence R is given by
R = sup{r 0, there exists C > 0 such that |an |

C
for all n}.
rn

When R = +, we can calculate the value of f (z) at every point z C

as the limit of the partial sums
N

f (z) =

lim

N +

an z n .

If the radius of convergence of n0 an z n is a nite number R > 0 (we

shall look at the case R = 0 later), then the above formula allows us to
calculate f (z) for z in the disc D(0, R), and the function f dened by the
sum of the power series in D(0, R) is analytic in this disc. There will exist at
least one singularity z0 of f on the boundary of the disc (there can be more
than one, indeed even an innite number, the whole circle C(0, R) may consist
of singularities).
We will say that f can be continued analytically along a half-line d starting
at 0 if there exists an open set U containing d and a function g, analytic on
U , such that
g(z) = f (z) for all z U D(0, R).
We shall suppose that f can be continued analytically along all but nitely
many half-lines.
There is then an open set Star(f ), the star domain of holomorphy of f . To
give a formula allowing us to calculate f in this open set, we shall begin by
giving, an expression for f in the interior of the disc of convergence, in terms
of a Laplace integral.
An integral formula
To begin, we improve the convergence of the series
the an by 1/n!; thus we consider the series
B(f )() =
n0

an z n by multiplying

an n
.
n!

Since |an | is bounded by C/rn with 0 < r < R, it is easy to see that this series
has an innite radius of convergence and denes an analytic function B(f ) on
the whole of C.

From Analytic Functions to Divergent Power Series

To recover f from B(f ) we shall use the fact that

+
0

an
(zt)n dt = an z n .
n!

However, for each z in D(0, R) there exists r such that 0 < |z| < r < R, so
that
+
0

et
n0

|an z n | n
t dt
n!

et Cet|z|/r dt < +.

Thus we can write

+
0

et
n0

an z n n
t dt =
n!

+
n0

an z n n
t dt,
n!

giving, for each z in D(0, R), the expression

+
0

an
(zt)n dt =
n!

et
n0

an z n .
n0

Thus in the disc D(0, R) we can write

f (z) =

+
0

et B(f )(zt)dt.

This formula will allow us to continue f analytically beyond D(0, R).

Remark. For z in [0, R[, we can write
1
f (z) =
z

e/z B(f )()d.

If we dene the Laplace transform of a function h by

L(h)(z) =

ez h()d,

we then have, for every z in [0, R[, the expression

f (z) =
Note that the function g : z
domain on which the function

1
1
L(B(f ))( ).
z
z
1
z

e/z B(f )()d is analytic in every

e Re(1/z) |B(f )()|

Bernard Candelpergher

is majorized by an integrable function on ]0, +[, independently of z. Now

we know that |an | is majorized by Const. /(R )n , and so we have
|B(f )()| Ce/(R) for all > 0;
the function g is therefore analytic in the open set {z | Re(1/z) > 1/R}, i.e.,
the disc D(R/2, R/2).
Remark. If the function B(f ) is such that we have a better bound,
|B(f )()| CeB
with B < 1/R, we then obtain an analytic continuation of f in the open set
Re(1/z) > B, i.e., the disc D(1/2B, 1/2B).
Continuation outside the disc
+

We note rst that if the integral 0 et B(f )(zt)dt converges for z = z0 ,

then it converges for all z in the segment [0, z0 ]; indeed it is enough to write,
for z in the segment [0, z0 ],
+
0

et B(f )(zt)dt =

+
0

z0
=( )
z

et B(f )(z0
+
0

z
t)dt
z0

e(z0 /z)u B(f )(z0 u)du,

and since this last integral converges for z0 /z = 1, it does so for z0 /z > 1,
i.e., for in the segment [0, z0 ] and even for those z with Re(z0 /z) > 1, by the
following lemma:
Lemma 1 (Classical lemma). If a is a locally integrable function on [0, +[
+
+
such that 0 et a(t)dt converges, then 0 est a(t)dt converges for every
s such that Re(s) > 1, and the integral denes an analytic function of s in
this half-plane.
Let us consider the function
z

z0
z

+
0

e(z0 /z)u B(f )(z0 u)du;

this function is analytic in the open set consisting of all z such that
Re(

z0
) > 1,
z

that is, the disc D(z0 /2, |z0 |/2).

+
To sum up, if the integral 0 et B(f )(zt)dt converges for z = z0 , then
it converges for all z in the open set D(z0 /2, |z0 |/2), and denes an analytic
function in this open set. This function equals f on the line segment [0, z0 ]

From Analytic Functions to Divergent Power Series

D(0, R); by the uniqueness theorem it therefore equals f on D(z0 /2, |z0 |/2)
D(0, R), and so we obtain an analytic continuation of f .
Consider the open set
E(f ) = {z0 Star(f ), there exists > 0 such that D(

z0 |z0 |
,
+ ) Star(f )};
2 2

we shall show that the function z 0 et B(f )(zt)dt is dened and analytic
in this open set, and therefore provides an analytic continuation of f into E(f ).
This is a consequence of the preceding discussion together with the following
lemma:
+ t
e B(f )(zt)dt
0

Lemma 2. For every z E(f ) the integral

its value is f (z).

converges and

Proof of the Lemma. Take z in E(f ); we deform the contour C(z/2, |z|/2)
to a slightly bigger contour C surrounding 0 such that if C then we have
Re(z/) < 1.
By Cauchys formula we have
f (z) =

1
2i

f ()
d,
z

and now we see that if Re(z/) < 1 then

1
1

+
0

et etz/ dt.

Substituting this into Cauchys formula we have

f ()

C
+
1
et (
=
2i
0
1
2i

f (z) =

et etz/ dt

f () zt/
e
d)dt.

Deforming C into a small circle C(0, r) contained in D(0, R), we have

1
2i

=
n0

f ()

1 z n
( t) d
n!

1
1
(zt)n
n!
2i

C(0,r)

f ()
d
n+1

an
(zt)n
n!

= B(f )(zt).
We deduce that the integral

+ t
e B(f )(zt)dt
0

converges to the value f (z).

Bernard Candelpergher

Continuation in the star domain

To have an analytic continuation of f in the star domain of holomorphy of f ,
we improve the convergence of the series n0 an z n in a more delicate way.
In fact it is enough to multiply the an by a term which behaves like (1/n!)
with 0 < 1, we take the term 1/(1 + n) where
+

(1 + n) =

et tn dt .

So we consider the series

B (f )() =
n0

an
n.
(1 + n)

This series has innite radius of convergence, and denes an analytic function B (f ) on the whole complex plane C.
To recover f from B (f ) we use the fact that
+
0

an z n
tn dt = an z n ,
(1 + n)

and obtain, for every z in the disc D(0, R),

f (z) =

et B (f )(zt )dt.

This formula will allow us to continue f analytically outside D(0, R).

+
We notice as above that if the integral 0 et B (f )(zt )dt converges
for z = z0 , then it converges for all z in the line segment [0, z0 ]; indeed it is
enough to write
+
0

et B (f )(zt )dt =

et B (f )(z0

z0 1/
)
z

+
0

z
t )dt
z0

e(z0 /z)

B (f )(z0 u )du,

and since this last integral converges for (z0 /z)1/ = 1, it does so also for
(z0 /z)1/ > 1 and even for Re(z0 /z)1/ > 1.
The function
z(

z0 1/
)
z

+
0

z0 1/
u
z )

B (f )(z0 u )du

is analytic in the connected open set D (z0 ) containing ]0, z0 ] consisting of

those z such that
z0
Re( )1/ > 1.
z

From Analytic Functions to Divergent Power Series

This is a rather thin convex open set, whose boundary C (z0 ) has the following
equation in polar coordinates:
0
)

< 0 < .
2
2
=

0 (cos

The smaller is, the thinner D (z0 ) is.

+
Summary. If the integral 0 et B (f )(zt )dt converges for z = z0 , then it
converges for all z in the open set D (z0 ), and denes an analytic function in
this open set, which is a continuation of f .
Consider the open set
E (f ) = {z Star(f ) and D (z) C (z0 ) Star(f )}.
One can show that for every z E (f ) the integral
converges.
We see therefore that on E (f ) the function
z

+
0

+ t
e B (f )(zt )dt
0

et B (f )(zt )dt

is an analytic continuation of f.
Since
E (f ),

Star(f ) =
0<1

we therefore have a means of calculating the continuation of f for every z in

Star(f ).

4 Gevrey series
4.1 Denitions
n
If the radius of convergence of
is zero, then the power series
n0 an z
n
F = n0 an z cannot dene an analytic function f by the formula
N

f (z) =

lim

N +

an z n ,

since this limit does not exist for any z = 0.

We shall therefore weaken the concept of convergence, looking for an analytic function f on an open set U , with 0 in U or on the boundary of U , such

Bernard Candelpergher

that the series

sense:

an z n is an asymptotic expansion of f in the following

N 1

|f (z)

an z n | CN |z|N for all z U,

with the above holding for every N 0, and with CN independent of z,

although the CN are allowed to tend to innity.
We cannot hope that such an asymptotic condition could hold on an open
disc U = D(0, R) with R > 0, as that would imply that
an =

n f (0)
,
n!

and since f is supposed to be analytic on U = D(0, R), the series n0 an z n

would converge in D(0, R) and hence would have a non-zero radius of convergence.
We shall therefore require that the above condition holds in a small sector
S based at 0 of angle 1 0 less than 2, i.e.,
S = {z = rei | 0 < r < R, 0 < < 1 }.
In this case, one can show (this is the BorelRitt theorem) that for every
power series n0 an z n there exists an analytic function f on S, such that
the series n0 an z n is the asymptotic expansion of f about 0 in S, but the
function f is not unique (for example, if the sector
is contained in C \ ], 0],
it is possible to add to f the function z e1/ z ).
To obtain uniqueness results, we shall strengthen slightly the asymptotic
condition, as it leaves too much freedom in the terms CN |z|N since CN can
grow arbitrarily as N +.
To make this precise, we introduce the condition of Gevrey asymptoticity
in a small sector S which consists of requiring of CN a growth rate of at most
B N N !, and one then requires that
N 1

|f (z)

an z n | CB N N ! |z|N for all z S,

the above holding for all N 0, with constants C > 0 and B > 0 independent
of z S.
This condition implies that
f (z)

N 1
n0
zN

an z n

aN 0 when z 0 in S,

and so it cannot be satised unless the coecients an also satisfy an inequality

From Analytic Functions to Divergent Power Series

|an | CB n n! .
We say in this case that the series n0 an z n is Gevrey (or Gevrey of order 1).
We shall write this condition of Gevrey asymptoticity in S in the form
an z n in S.

f (z)
n0

4.2 Exponential smallness and uniqueness

In the condition of Gevrey asymptoticity
N 1

|f (z)

an z n | CB N N ! |z|N

the function R : N CB N N ! |z|N is rst decreasing and then increasing,

and so it has a minimum at N0 (B|z|)1 , and takes a minimal value
R(N0 )

A|z|1/2 e

1
B|z|

with A > 0.
We therefore have an exponentially small remainder (when z 0 in S) if
we take the sum as far as N0 (this justies the method of summation up to
the smallest term, or the astronomers method).
Note that this implies that if we have
0z n in S,

f (z)
n0

then the function f is exponentially decreasing in S, i.e.,

|f (z)| CeD/|z| in S.
Conversely, one can show that this inequality implies that f (z)
in S.

0z n

Conclusion. Given a formal series F = n0 an z n that is Gevrey, we do not

have uniqueness of the function f such that
an z n in S :

f (z)
n0

it is enough to add to f an analytic function decreasing exponentially in S.

Bernard Candelpergher

4.3 Gevrey summability

Given a divergent series of Gevrey type n0 an z n a small sector S, does
there exist an unique analytic function f such that
an z n in S ?

f (z)
n0

If one wants to guarantee the uniqueness of f it is enough to require that the

condition of Gevrey asymptoticity holds on a small sector S of angle > .
Indeed, in this case one can show that the only analytic function of exponential
decrease in S is the zero function.
What about the existence of f ? The condition: |an | CB n n! for all n,
guarantees the convergence of the power series
B(F )() =
n0

an n

for all in the disc D(0, 1/B), and denes an analytic function in this disc.
For 0 < < 1/B, we can dene an analytic function
f (z) =

1
z

e(/z) B(F )()d

for z in C \ {0}. We can show that we have

an z n

f (z)
n0

in every small sector S = {z = rei with 2 + < < 2 } of angle

< . The disadvantage of this construction is the arbitrary choice of , since
all we can say is that f f is an analytic function decreasing exponentially
in S. If one wants to guarantee existence and uniqueness of f we will need to
impose stronger hypotheses on the function B(F ).

5 Borel summability
n
be a power series satisfying the Gevrey condition:
Let F =
n0 an z
|an | CB n n! for all n. The function

B(F )() =
n0

an n

is dened and analytic in the disc D(0, 1/B). If we want to avoid the arbitrary
choice of as above, we try to dene the function

From Analytic Functions to Divergent Power Series

1
f (z) =
z

e(/z) B(F )()d.

To guarantee the existence of the integral we shall suppose that the function
B(F ) is continued analytically in a sector S = {z = rei with < < +},
to give a function of at most exponential growth at innity in this sector, i.e.,
|B(F )()| AeB|| .
In this case we say that the series n0 an z n is Borel-summable in the direction = 0. The function f thereby dened is analytic in the domain
{z | Re(1/z) > B}, which is just the disc
D = D(

1 1
1
,
) = {z = rei | r < cos()},
2B 2B
B

(or if B = 0 it is the half-plane Re(z) > 0). Let ], [; then, setting

+ei

1
f (z) =
z

e(/z) B(F )()d,

we obtain a function f dened and analytic in the disc

D = {z = rei | r <

1
cos( )},
B

which is just the disc D(1/2B, 1/2B) rotated by the angle .

For z D D we see, using the analyticity of e(/z) B(f )() and
its decay at innity, that
1
f (z) f (z) =
z
=

+ei
0

1
lim
z R+

e(/z) B(F )()d

+
0

e(/z) B(F )()d

e(/z) B(F )()d = 0

(the path R consisting of the arc Reit , t [0, ]).

Letting vary in ], [, we obtain an analytic continuation of f in an
open set containing a small sector S of angle strictly greater than .
Moreover, one can show that
N 1

|f (z)

an z n | CB N N ! |z|N for all z S.

The function f dened this way in S is then the only analytic function in S
such that
an z n in S.

f (z)
n0

Bernard Candelpergher

We call this the Borel sum of the formal series F = n0 an z n , and we write
it f = s(F ).
In the same way we can dene the notion of Borel-summability in the
direction = 0, and we write
+ei

1
f (z) =
z

e(/z) B(F )()d.

The function f dened this way in ei S is the the only analytic function in
ei S such that
an z n in ei S,

f (z)
n0

and we call it the Borel sum in the direction of the formal series F =
n
n0 an z , we note it s (F ).
n
Remark. If the series F =
has radius of convergence R > 0,
n0 an z
then it is Borel-summable in every direction and the Borel sums s (F ) give
the analytic continuation of the function f : z n0 an z n to an open set
containing D(0, R).
Properties of s
a) s is linear:

s (F + G) = s (F ) + s (G),
s (c F ) = c s (F ) if c C,
since J, L and B are linear.
b) s commutes with dierentiation = d/dz:
s (F ) = s (F ).
c) s is a morphism:
s (F G) = s (F )s (G)
(where the product F G denotes the usual product of formal series).
5.1 Connection with the usual Laplace transform
The integral formula
s (F )(z) =

1
z

+ei
0

e(/z) B(F )()d,

which we use to construct the Borel sum of F =

in terms of an ordinary Laplace integral as

an z n , can be expressed

From Analytic Functions to Divergent Power Series

1
1
L (B(F ))( )
z
z
where
L (g)(z) =

+ei
0

ez g()d.

Let J be the mapping

h J(h),
1 1
J(h)(z) = h( ).
z z
This satises J J = Id and it interchanges behaviour at 0 and behaviour at
, as
1
J(
an n+1 .
an z n ) =
z
n0

We then have
s = J L B.
The behaviour at 0 of s (F ) is then linked to the behaviour at of L (B(F )).
The asymptoticity condition at 0:
an z n in S ,

f (z)
n0

< < + + },
2
2
translates into the asymptoticity condition at :
1
an n+1 in S, ,
L (B(F ))(z)
z
S = {z = rei | r < R and

S, = {z = rei | r > 1/R and

Given a formal series F =
rection , then the function

L (B(F ))(z) =

< < + + }.
2
2

an z n that is Borel-summable in the di+ei

ez B(F )()d

can be continued analytically in the sector

S, = {z = rei | r > 1/R and

< < + + },
2
2

and satises
L (B(F ))(z)

an
n0

1
in S, .
z n+1

Bernard Candelpergher

5.2 Alien derivations

Let F be a formal power series; then some ambiguities in summation can
appear in directions in which the function B(F ) has singularities.
Suppose for example that B(F ) possesses a singularity = rei , r = 0,
and that in a sector S containing the half-line in the direction one has
B(F )() =

1
( ) Log( ) + ( ),
2i

where and are analytic in an open neighbourhood of + S with subexponential growth.

If we take two half-lines in S in the directions < and + > , we have
L (B(F ))(z) L+ (B(F ))(z) =
=e

+ei

ez ( )d

L ()(z).

Suppose that = B() where is a formal power series, we have

L (B(F )) L+ (B(F )) = ez L (B()).
We can write this as
L (B(F )) = L+ (B(F )) + ez L+ (B()),
thus
s (F ) = s+ (F ) + e/z s+ ().
The ambiguity in the summation shows itself in the appearance of the exponential e/z multiplied by the function s+ (). To allow for this we extend
the summation operators s to the formal products e/z () by
s (e/z ()) = e/z s ()).
We can then write
s (F ) = s+ (F + e/z ()),
where the formal series only depends on F and , since the singular part
of B(F ) at is
1
B()( ) Log( ).
2i
We shall write S F = ; then S F describes the singularity of B(F ) at the
point , and we then have

From Analytic Functions to Divergent Power Series

s (F ) = s+ (F + e/z S F ).
This formula can be generalized to other singularities than logarithmic
ones; it is the basis of the denition of alien derivations due to J. Ecalle. Let
us show that S is a derivation, i.e., that it satises
S (F G) = (S F )G + F (S G).
If F and G are two formal series as above, such that we have
s (F ) = s+ (F + e/z S F ),
s (G) = s+ (G + e/z S G).
Using the fact that s and s+ are morphisms, we deduce that
s (F G) = s+ ((F + e/z S F )(G + e/z S G))
= s+ (F G + e/z (S F )G + e/z F (S G) + e2/z (S F )(S G)).
We see that the product of two formal power series F and G such that
B(F ) and B(G) have singularities at , can have one at , but the exponential
e2/z show us that we can also have a singularity at 2.
On the other hand, we have as above
s (F G) = s+ (F G + e/z S (F G) + e2/z S2 (F G))
where S2 (F G) represents the singularity of B(F G) at 2.
Equating the coecients of the exponential, we obtain
S (F G) = (S F )G + F (S G),
or, in other words, the mapping S is a derivation; it is also written .
More generally, in order to take arbitrary products of power series, it is
therefore necessary to allow B(F ) the possibility of singularities at the points
n, n = 1, 2, . . . The ambiguity in summation is then described by all the
Sn , since
s (F ) = s+ (F + e/z S F + e2/z S2 F + . . . ).
The mappings Sn are dened as above, but for n 2 they are not derivations; for example, we have
S2 (F G) = (S2 (F ))G + F (S2 (G)) + S (F )S (G).
We can construct derivations n by suitable combination of the Sk . To
nd this combination, we use the mapping

Bernard Candelpergher

S() : F e/z S F + e2/z S2 F + . . . .

Since
(I + S())(F ) = s1
+ s (F ),
we see that
(I + S())(F G) = (I + S())(F ) (I + S())(G).
We call the mapping I + S() the passage morphism in the direction .
The mapping () given by
() =
n1

(1)n1
(S())n .
n

is a derivation because it satises

I + S() = exp(()),
This is the global alien derivation in the direction .
If we expand (S())n we see that we can write
() = e/z F + e2/z 2 F + . . .
where
= S ,
1
2 = S2 S S ,
2
1
1
3 = S3 (S S2 + S2 S ) + S S S ,
2
3
...
d
By construction, the n are derivations, they are not of the form a(z) dz
,
they are the alien derivations of J. Ecalle.

5.3 Real summation

If F is a series n0 an z n where the an are real, it is natural to calculate the
Borel sum of F in the real direction = 0 in order to obtain a real sum when
z R. If there exist singularities of B(F ) on R+ , then we will have two lateral
sums s0+ = s+ and s = s , and the ambiguity in summation is described
by the global derivation (0) = .
If we take as the sum
s(F ) =

1
(s+ (F ) + s (F )),
2

From Analytic Functions to Divergent Power Series

we do obtain a real sum for real z, although it does not necessarily have the
property
s(F G) = s(F )s(G).
In order to obtain a real sum with this property, we introduce the operator C
dened by
C(F )(z) = F (z).
We have C(F ) = F if F is a series n0 an z n where the an are real; in this
case we wish to determine a sum f of F such that
C(f ) = f.
We may see from the explicit formula for Borel summation that s+ C = Cs .
Since

C 2 = I and s1
+ s = e ,

we deduce that
Cs+ e/2 = s+ e/2 C.
This implies that
Cs+ e/2 (F ) = s+ e/2 (F ).
In other words, the function
s(F ) = s+ e/2 (F )
has the property that s(F )(x) is real if x is real, and
s(F G) = s(F )s(G).

6 Acknowledgments
I am indebted for the quality of the English version of this paper to Jonathan Partington from Leeds University who kindly agreed to translate it. My
warmest thanks to him.

References
1. E. Borel, Lecons sur les series divergentes, Gabay, 1988.
2. B. Candelpergher, Une introduction `
a la resurgence, La Gazette des Mathematiciens, SMF, 42, 1989.
3. J. Ecalle, Les fonctions resurgentes, Publ. Math., Orsay, 1985.
4. B. Malgrange, Sommation des series divergentes. Expo. Math. 13:163-222,
1995.
5. G. Sansone, J. Gerretsen, Lectures on the theory of functions of a complex
variable, Noordho, Groningen, 1960.

Fourier Transforms and Complex Analysis

Jonathan R. Partington
School of Mathematics, University of Leeds,
Leeds LS2 9JT, U.K.
J.R.Partington@leeds.ac.uk

1 Real and complex Fourier analysis

1.1 Fourier series
Let f be a real or complex-valued function dened on the real line R, having
period T > 0, say; by this we mean that f (t + T ) = f (t) for all real t. Then,
assuming that f is suciently well-behaved that the following denitions make
sense (in practice this means that f is locally Lebesgue integrable), we can
form its Fourier series
f (t)

a0
+
2

ak cos
k=1

2kt
2kt
+ bk sin
T
T

where
ak =

2
T

T
0

f (t) cos

2kt
dt
T

and

bk =

2
T

T
0

f (t) sin

2kt
dt,
T

are the real Fourier coecients of f .

For example, consider the sawtooth function, f (t) = t on (, ], extended
with period 2 to R. Then

f (t)

2
(1)n+1 sin nt
n
n=1

(the cosine terms vanish). The Fourier series converges to the function except
at odd multiples of , where it is discontinuous.
We have used the symbol rather than = above, since, even for
continuous functions, the Fourier series need not converge pointwise. However,
if f is C 1 (has a continuous derivative), then in fact there is no problem and
the series converges absolutely. For all continuous functions the partial sums

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 3955, 2006.
Springer-Verlag Berlin Heidelberg 2006

Jonathan R. Partington

a0
+
sn (f )(t) =
2

ak cos
k=1

2kt
2kt
+ bk sin
T
T

converge in an L2 (mean-square) sense, by which we mean that

T
0

|f (t) sn (f )(t)|2 dt 0

n ,

and there are other famous results in the literature, such as Fejers theorem,
which asserts that the Ces`aro averages
m (f ) =

1
(s0 (f ) + . . . + sm (f ))
m+1

converge uniformly to f whenever f is continuous. Thus a continuous periodic

function can always be approximated by trigonometric polynomials (nite
sums of sines and cosines).
It is often more convenient to re-express the Fourier series using the complex exponential function eix = cos x + i sin x, and this produces a somewhat
simpler expression, namely

f (t)

ck e2ikt/T ,

where
ck =

1
T

T
0

f (t)e2ikt/T dt

are the complex Fourier coecients of f , and often written ck = f(k). Indeed,
the real and complex coecients are related by the identities
ak = f(k) + f(k)

and

bk = i(f(k) f(k)).

There is no essential dierence between these two approaches: the partial

sums are now given by
n

sn (f ) =

ck e2ikt/T ,

k=n

as is easily veried.
Underlying all this theory is an inner-product structure, and the basic
orthogonality relation
1
T

T
0

e2ijt/T e2ikt/T dt =

1 if j = k,
0 otherwise,

Fourier Transforms and Complex Analysis

which can be used to deduce Parsevals identity, namely

1
T

T
0

|f (t)|2 dt =

|f(k)|2 .

This expresses the idea that the energy in a signal is the sum of the energies
in each mode.
Fourier series can be used to study the vibrating string (wave equation),
as well as the heat equation, which was Fouriers original motivation. We
illustrate this by an example.
The temperature in a rod of length with ends held at zero temperature
is governed by the heat equation
2y
1 y
,
= 2
2
x
K t
with boundary conditions y(0, t) = y(, t) = 0. Suppose an initial temperature
distribution y(x, 0) = F (x).
We look for solutions y(x, t) = f (x)g(t), so that
f (x)g(t) = f (x)g (t)/K 2 ,
or
f (x)
1 g (t)
=C= 2
.
f (x)
K g(t)
It turns out we should take f (x) = sin nx (times a constant), and C = n2 ,
in which case
g (t) + K 2 n2 g(t) = 0.
Thus one solution is
y(x, t) = f (x)g(t),
with
f (x) = sin nx
and
g(t) = an eK

n2 t

We can now superimpose solutions for dierent n, so we build in the initial

conditions and write

y(x, 0) = F (x) =

an sin nx.
n=1

We then arrive at the formal solution

y(x, t) =
n=1

an sin nx eK

n2 t

Jonathan R. Partington

1.2 Fourier transforms

We now move to what is sometimes regarded as a limiting case of Fourier
series when T tends to innity and innite sums turn into integrals. Here we
work with real or complex functions f dened on R. In fact we assume that
f is in L1 (R), i.e., Lebesgue integrable on the real line; in this case we can
dene its Fourier transform by

f(w) =

f (t)eiwt dt.

This is a function of w, which is sometimes interpreted as denoting frequency, while the variable t denotes time. WARNING: one can nd various
alternative expressions in the literature, for example
1

f (t)eiwt dt

f (t)e2iwt dt.

Each has its advantages and disadvantages, so we have had to make a choice.
On another day we might prefer a dierent one.
Here is an important example. If
f (x) = ex

then
f(w) =

2ew

that is, the Gaussian function is (up to a constant) the same as its Fourier
transform.
In the same way that one can reconstruct a function from its Fourier series,
it is possible to get back from the Fourier transform to the original function.
Accordingly, dene the inverse Fourier transform by
g(t) =

1
2

g(w)eiwt dw =

1
g(t).
2

(1)

We now have Fouriers inversion theorem, which asserts that if f : R C is

continuous and satises
R

|f (t)| dt <

and

|f(w)| dw < ,

then (f) = f ; that is,

f (t) =

1
2

f(w)eiwt dw.

(2)

Fourier Transforms and Complex Analysis

We thus deduce a uniqueness theorem for Fourier transforms, namely, that

two continuous and integrable functions with the same Fourier transform must
be identical.
In the interests of beauty as well as truth, we mention Plancherels theorem,
which is a continuous analogue of Parsevals identity. If (2) holds, and in
addition
R

|f (t)|2 dt <

(more concisely: if f and f lie in L1 (R) and f also lies in L2 (R)), then

|f (t)|2 dt =

1
2

|f(w)|2 dw.

Thus, up to a possible constant, f and f have the same energy.

We shall now say a few words about Fourier transforms in Rn , that is, for
functions f (x) = f (x1 , . . . , xn ). The appropriate denition is
f(w) =

f (x)eiw.x dx,

giving another function dened on Rn . The corresponding inversion theorem

asserts that
f (x) =

1
2

n
Rn

f(w)eiw.x dw,

at least if f is continuous and Rn |f | and Rn |f| are both nite.

One application of the multi-dimensional Fourier transform is in the theory
f
of partial dierential equations. The partial derivative
has transform
xk
iwk f(w), and so the Laplacian
2

f=
k=1

2f
x2k

has transform equal to w 2 f(w); we shall not go into further details here.
1.3 Harmonic and analytic functions
For simplicity, let us consider 2-periodic functions f . These correspond to
functions g dened on the unit circle
T = {z C : |z| = 1}
in the complex plane, by setting g(eit ) = f (t). Conversely, any function g :
T C gives a 2-periodic function f by the same formula.

Jonathan R. Partington

Note that the formula for the Fourier coecients can be written
g(k) =

1
2

2
0

g(eit )eikt dt =

1
2i

g(z)
dz,
z k+1

(3)

where the last integral is a contour integral round the unit circle.
Suppose (for simplicity) that g is continuous. Then it has a harmonic
extension to the unit disc
D = {z C : |z| < 1},
namely

g(rei ) =

g(k)r|k| eik ,

for 0 r < 1 and 0 2.

Write z = x + iy = rei as usual. Then the extension of g is a solution to
the Dirichlet problem, i.e., it satises Laplaces equation
2g
2g
+ 2 = 0,
2
x
y
with the boundary values of g specied on the unit circle.
One important special case arises if g(k) = 0 for all k < 0; then the
harmonic extension is
g(rei ) =

g(k)rk eik ,

k=0

g(z) =

g(k)z k ,

k=0

where again z = rei . This is an analytic function (not just harmonic).

There is a oneone correspondence between power series with squaresummable Taylor coecients (the Hardy class H 2 ), and square-integrable
functions g on the unit circle with g(k) = 0 for all k < 0.
Suppose now that g has an analytic extension to an annulus containing
the unit circle, say, A = {A < |z| < B} with 0 < A < 1 < B. Then the
formula (3) can be replaced by integrals round circles of radius a or b for any
A < a < 1 < b < B, and we obtain useful estimates for the rate of decrease
of the Fourier coecients, namely,
|
g (k)| Mb bk ,
k

|
g (k)| Ma a ,

and

(4)

Fourier Transforms and Complex Analysis

for k 0, where Mr denotes the maximum value of |g| on the circle of radius r.
If now g has an isolated simple pole at a point z0 with A < |z0 | < 1, with
residue c, but is otherwise analytic in the annulus A, then the identity
c
=c
z z0

k=1

z0k1
,
zk

valid on |z| = 1, shows that g(k) is asymptotic to cz0k1 as k . Likewise,

if the location of the pole satises 1 < |z0 | < B instead, then the identity
c
= c
z z0

k=0

zk
z0k+1

shows that g(k) is asymptotic to c/z0k+1 as k . The extension to nitely

many poles, and to poles of multiplicity greater than 1, is similar. Thus the
singularities of g are reected in the behaviour of its Fourier coecients, a
phenomenon that we shall see again in Section 3.

2 DFT, FFT, windows

We consider again the following formula for Fourier coecients:
g(k) =

1
2

2
0

g(eit )eikt dt =

1
2i

g(z)
dz.
z k+1

In order to compute Fourier transforms numerically from data, a natural approximation to the above integral is obtained by discretising. Let us take N
equally-spaced points: to do this set = e2i/N and consider the expression
gN (k) =

1
N

N 1

g( j ) jk .

j=0

This is a discrete Fourier transform of g. Since N = 1, the values of gN repeat

themselves, and we need only work with gN ( N2 ), . . . , gN ( N2 1). It is not
dicult to convince oneself that, if g is continuous, then for each xed k the
number gN (k) should be close to g(k) when N is suciently large (basically,
we have replaced a Riemann integral by a Riemann sum). An approximation
to the Fourier series for g is now given by taking the function
N/21

gN (eit ) =

gN (k)eikt ,
k=N/2

The Fast Fourier Transform (FFT) was introduced by Cooley and Tukey
as a numerical algorithm for computing the discrete Fourier coecients of g

Jonathan R. Partington

for values of N which are powers of 2, say N = 2n . At rst sight it seems

that, starting with N values of g, we require approximately 2N 2 operations
(additions and multiplications) to calculate the N values of gN . In fact, if
we have an even number of points, say 2r, and divide them into two halves
(the even ones and the odd ones), then we can exploit the algebraic relations
existing between gr and g2r . These imply that, if we can nd the coecients
gr in M operations, then we can obtain the coecients g2r in not more than
2M + 8r operations.
The upshot is that, for N = 2n , computers can calculate the coecients
gN in at most n2n+2 = 4N log2 N operations. This is a signicant saving if N
is of the order of several thousand.
In many applications, it is convenient to work with a windowed discrete
Fourier transform of g, which is a function of the form
it

gw (e ) =

gN (k)wk eikt ,

where (wk ) is a sequence of weights, of which usually only nitely many are
non-zero. For example, for 0 m < N we may take the sequence

m + 1 |k|
for|k| m,
wk =
m+1
0
otherwise,
in which case the corresponding functions gw form a sequence of trigonometric
polynomials known as the Jackson polynomials, Jm,N (g). These have many
attractive properties, in particular they converge uniformly to the original
function g as N , for any sequence of m = m(N ) remaining less than N
but also tending to innity. They are also robust, in the sense that small measurement errors or perturbations lead to small errors in the polynomials. For
rather more rapid convergence, one may use the discrete de la Vallee Poussin
polynomials, Vm,N (g), dened for N 3m using the following window:

1
for |k| m,

2m |k|
wk =
for m |k| 2m,

0
otherwise.
These have been used in various interpolation and approximation schemes, for
example in the identication of linear systems from noisy frequency-domain
data.

3 The behaviour of f and f

We return to Fourier transforms for functions dened on L1 (R), and consider
how the properties of f and f are linked. For example, it is easily seen that,

Fourier Transforms and Complex Analysis

if f is a real function, then f(w) = f (w); if f is a real even function, then

f is purely real, and if f is a real odd function, then f is purely imaginary.
The properties of f and f behave well under translations and dilations:
let
(Ta f )(t) = f (t a)

and

(Db f )(t) = f (t/b)

and

(Db f )(w) = bf(bw).

for a R and b > 0. Then

(Ta f )(w) = eiaw f(w)

Similarly, derivatives transform in a simple fashion: if f is an L1 (R) function with a continuous derivative, such that R |f | < , then
(f )(w) = iwf(w).
In particular, there is a constant C > 0 such that |f(w)| C/|w|. This argument can be repeated with higher derivatives, and we obtain the slogan: the
smoother the function, the faster its Fourier transform decays. A similar phenomenon holds for Fourier series of periodic functions: for smooth functions
the Fourier coecients tend rapidly to zero.
By means of the inversion theorem, we can argue in the other direction
too: if f is smooth, then this corresponds to rapid decay of f at .
In many applications, it is convenient to work with smooth functions of
rapid decay. Thus we dene the Schwartz class, S, to be the class of all innitely dierentiable functions f : R C such that every derivative is rapidly
decreasing: thus, for all n, k, there is Cn,k > 0 such that
|f (n) (t)|

Cn,k
(1 + |t|)k

for all t R. A simple example is exp(at2 ) with a > 0, but one can even
nd such functions with compact support (so-called bump functions).
Now if R |tk f (t)| dt < , it follows that f is dierentiable k times, and
(f)(k) (w) =

(it)k f (t)eitw dt.

This can be used to show that the Fourier transform is a linear bijection from
S onto itself.
Suppose now that f is smooth apart from jump discontinuities of the
function and its derivatives at the origin, so that we may dene
k = lim f (k) (t) lim f (k) (t)
t0+

for k = 0, 1, 2, . . . . Then it can be shown that f possesses an asymptotic

expansion of the form

Jonathan R. Partington

f(w)

k=0

k
(iw)k+1

as |w| .
Moreover, since the Fourier transform and inverse Fourier transform are
related by (1), we may similarly conclude that jumps k in f and its derivatives
at the origin are reected in an asymptotic expansion
f (t)

1
2

k=0

k
(it)k+1

as |t| .
Finally, the expansions corresponding to jumps occurring at other points
on the real line may be derived by a straightforward change of variables.
We now consider the case when f has an analytic extension to a horizontal
band B = {A < Im z < B}, where A < 0 and B > 0. Then certain estimates hold for the Fourier transform, which are analogous to those obtained for
Fourier coecients in (4). If we take 0 < b < B and suppose that f is absolutely integrable on the line {Im z = b}, tending to zero uniformly in B as
Re z , then we can move the contour of integration, and obtain the
estimate
|f(w)|

|f (x + ib)| |eiw(x+ib) | dx = O(ebw )

as w . Similarly, analyticity in the lower half-plane leads to estimates of

the form |f(w)| = O(eaw ) as w , provided that we may integrate along
the line {Im z = a} with A < a < 0.
Once more we may see the existence of singularities of f reected in the
asymptotic behaviour of f. The Fourier transform of the function f (t) =
c/(t z0 ), with Re z0 > 0, is easily calculated by contour integration, and is
given by
f(w) =

2iceiwz0
0

if w < 0,
if w > 0.

Similarly, if Re z0 < 0, the Fourier transform is

f(w) =

0
2iceiwz0

if w < 0,
if w > 0.

Thus, if f is suciently regular in B except for an isolated pole with residue

c occurring at p + iq with q > 0, then there is an asymptotic formula valid for
w , namely
f(w) 2iceipw eqw .

Fourier Transforms and Complex Analysis

The extensions to poles in the lower half-plane, to a nite number of poles, and
to poles of multiple order, are very similar and we omit them. As before, we
may exchange the roles of f and f, using the identity (1), so that singularities
in f are reected in the asymptotic behaviour of f .

4 Wieners theorems
Suppose that a 2-periodic function f has an absolutely convergent Fourier
series, that is

f(k)eikt ,

f (t) =
k=

with k= |f(k)| < . So in fact f is necessarily continuous (although this

is not a sucient condition), but need not be dierentiable. These functions
form a linear space, and indeed an algebra, closed under multiplication, since
if

f(k)eikt

f (t) =

and

g(t) =

g(k)eikt ,

then

ck eikt ,

f (t)g(t) =
k=

where

ck =

f(j)
g (k j),

and so

|f(j)| |
g (k j)|

|ck |
k= j=

|f(j)| |
g (l)| < ,

=
j= l=

i.e., f.g has an absolutely convergent Fourier series.

It is a much deeper result, due to Wiener, that, if f never takes the value
0, then 1/f has an absolutely convergent Fourier series. Originally proved by
hard analysis, it can now be deduced more easily using the Gelfand theory
of commutative Banach algebras.

Jonathan R. Partington

There is an analogous result for Taylor series (linked by the change of

variable z = eit ): suppose that

f (z) =

ak z k

k=0

has an absolutely convergent Taylor series, so that

k=0 |ak | < . Such
functions are analytic in the open unit disc D and continuous on the closed
disc D. If f (z) = 0 for z D, then the function 1/f also has an absolutely
convergent Taylor series.
A more general result is the WienerLevy theorem: if G is a function
holomorphic in a neighbourhood of the range of f , then the composite function
Gf also has an absolutely convergent Fourier series. The special case G(x) =
1/xis the classical Wiener theorem, but one can consider other functions such
as f if |f (z) 1| < 1 for all z, and these too have absolutely convergent
Fourier series.
There are analogous results for Fourier transforms. We remark rst that
if f and g lie in L1 (R), then their convolution f g, given by
(f g)(x) =

f (x y)g(y) dy,

also lies in L1 (R). Indeed f g = g f , and we also have

f g

and

(f g)(w) = f(w)
g (w).

The main consequence of the non-vanishing of the Fourier transform of f

is Wieners Tauberian theorem. This may be presented in three forms.
(i) If f L1 (R), then the translates of f , namely f (x) = f (x ) for
R, span a dense subspace of L1 (R) if and only if f is non-zero everywhere.
(ii) If f L1 (R), then the convolutions f g for functions g in L1 (R) form
a dense subspace of L1 (R) if and only if f is non-zero everywhere.
(iii) If f L1 (R) and f is non-zero everywhere, and if in addition the
identity
lim (f K)(x) = A

f (x) dx,

holds for a given function K L (R) and A C, then in fact

lim (g K)(x) = A

g(x) dx.

holds for every g L1 (R).

The rst two forms of the theorem may be seen as results in approximation
theory; the last one, Wieners original version of the theorem, is a Tauberian theorem (a name given to a certain kind of theorem that deduces the
convergence of a series or integral from other hypotheses).

Fourier Transforms and Complex Analysis

There is also an L2 version of (i), which is useful in some applications.

(iv) If f L2 (R), then the translates of f span a dense subspace of L2 (R)
if and only if f is non-zero almost everywhere.

5 Laplace and Mellin transforms

5.1 Laplace
The Laplace transform is an important tool in the theory of dierential equations, and we give its basic properties. Let f be a measurable function dened
on (0, ). Then we dene its Laplace transform F = Lf by
F (s) =

f (t)est dt,

which will, in general be a holomorphic function of a complex variable lying

in some half-plane Re s > a. For example, if f is an exponential function
f (t) = et , then F (s) = 1/(s ), and the integral converges for Re s > Re .
There is an inversion formula available. Namely, if b > a, we have
1
y 2i

f (t) = lim

b+iy
biy

F (s)est ds,

which is an integral along a vertical contour in the complex plane.

Suppose f is dierentiable, and we take the Laplace transform of f . We
may integrate by parts to obtain:
(Lf )(s) =

est f (t) dt

= [est f (t)]
t=0 + s

est f (t) dt

= f (0) + s(Lf )(s).

Thus a dierential equation can be turned into an algebraic equation, using

Laplace transforms.
For example, suppose that we have a linear system
y (t) + ay (t) + by(t) = cu (t) + du(t),
where a, b, c and d are real.
Here u is the input, and y the output. We suppose also (for simplicity)
that u(0) = y(0) = 0. Then, writing U = Lu and Y = Ly, we arrive at
(s2 + as + b)Y (s) = (cs + d)U (s),
and we have an algebraic relation between U and Y .

Jonathan R. Partington

Similarly, we may shift/translate/delay a function f by an amount T > 0,

to get g dened by
g(t) =

f (t T )
0

if t T,
if t < T.

Then
(Lg)(s) =
=
=

est g(t) dt
est f (t T ) dt
es(x+T ) f (x) dx = esT (Lf )(s).

Thus a dierentialdelay equation also looks simpler in the frequency domain.

For example, suppose we have (again with zero initial conditions for simplicity) the equation
y (t) + ay(t 1) = u(t).
Taking Laplace transforms Y = Ly and U = Lu gives
(s + aes )Y (s) = U (s).
Thus, if we know u, we can nd y by taking Laplace transforms and inverse
Laplace transforms.
A key theorem due to Paley and Wiener says that the Laplace transform
provides a linear mapping from the Lebesgue space L2 (0, ) onto the Hardy
class H 2 (C+ ) of the right half-plane C+ . This consists of all analytic functions
F : C+ C such that
F

sup
x>0

|F (x + iy)|2 dy

1/2

< ,

(roughly speaking, functions analytic in the right half-plane, with L2 boundary

values), and moreover the Laplace transform is an isomorphism in the sense
that

F 2 = 2 f 2 .
This is the basis of various approaches to control theory and approximation
theory. One consequence is that if we have an input/output relation
Y (s) = G(s)U (s)
as in our examples, then we can decide whether L2 inputs (nite energy)
guarantee L2 outputs. The answer is that it is necessary and sucient that

Fourier Transforms and Complex Analysis

G(s) be analytic and bounded in C+ (i.e., lie in the Hardy class H (C+ )).
For example, if
y (t) + ay(t 1) = u(t),
then we have this form of stability precisely when
1
H (C+ ),
s + aes

i.e., for 0 < a < /2.

5.2 Mellin
Note that the Laplace transform is closely related to the Fourier transform
(put s = iw), if we consider only functions which are 0 on the negative real
axis. A still closer analogue is the bilateral Laplace transform, where we dene
G(s) =

f (t)est dt = f(is),

as this is simply the Fourier transform with a (sometimes useful) change of

variable.
A more complicated change of variable gets us to the Mellin transform.
For a function f dened on (0, ), we set

F (s) =

xs1 f (x) dx,

so that F is the Mellin transform of f . The variable s will in general be

complex, and then the function F is holomorphic in some strip.
For example, if we take f (x) = ex , then
F (s) =

xs1 ex dx = (s),

which is in fact analytic in C+ .

If we set x = et , so that t R, we obtain, at least formally,
F (s) =

f (et )est dt,

which expresses the Mellin transform as a Fourier transform (or a bilateral

Laplace transform). The Mellin inversion formula asserts that for suitable
functions f such that xa1 f (x) is integrable, we have
f (t) =

1
lim
2i y

a+iy
aiy

F (s)xs ds,

which is again an integral along a vertical contour in the complex plane.

Jonathan R. Partington

We conclude with a further application. Let us consider Laplaces equation

in a sector {(r, ) : r > 0 and a < < b}. In polar coordinates we have
r2 urr + rur + u = 0,
with some appropriate boundary conditions. Let us take a Mellin transform
in r, i.e.,
U (s, ) =

rs1 u(r, ) dr.

Then it is easily checked that we now have

U + s2 U = 0,
for 0 < Re s < , if u(r, ) = O(r ) at 0.
Suppose, to make life simple, we take 0 < < 1 and
u(r, 0) = 0,

u(r, 1) =

1 for 0 r 1,
0 otherwise.

(This might represent the heat ow in a piece of cake, heated on one side
only.) Then it is easily veried that
U (s, ) =

1 sin s
,
s sin s

and we can nd u by inverting the Mellin transform.

u(r, ) =

1
2i

a+i
ai

rs sin s
ds,
s sin s

where 0 < a < .

Curiously, the integral can be done using Cauchys residue theorem. In
that case our story comes full circle, as we obtain a Fourier series solution
1
u(r, ) =

(1)n rn
sin n.
n
n=1

Fourier Transforms and Complex Analysis

References
1. Y. Katznelson, An introduction to harmonic analysis. Dover Publications, Inc.,
New York, 1976.
rner, Fourier analysis. Cambridge University Press, Cambridge, 1988.
2. T.W. Ko
3. M. Levitin, Fourier Tauberian theorems. Appendix in Yu. Safarov and D. Vassiliev, The asymptotic distribution of eigenvalues of partial dierential operators.
AMS Series Translations of Mathematical Monographs 155, AMS, Providence,
R. I., 1997.
4. J.R. Partington, Interpolation, identication, and sampling. The Clarendon
Press, Oxford University Press, 1997.
5. N. Wiener, The Fourier integral and certain of its applications. Reprint of the
1933 edition. Cambridge University Press, Cambridge, 1988.
6. W.E. Williams, Partial dierential equations. The Clarendon Press, Oxford
University Press, 1980.
7. A. Zygmund, Trigonometric series. Vol. I, II. Third edition. Cambridge University Press, Cambridge, 2002.

Pad
e Approximants
Maciej Pindor
Instytut Fizyki Teoretycznej,
Uniwersytet Warszawski ul.Hoza 69,
00-681 Warszawa, Poland.

1 Introduction
The frequent situation one encounters in applied science is the following: the
information we need is contained in values, or some features of the analytical structure, of some function of which we have a knowledge only in the
form of its power expansion in a vicinity of some point. Favourably, it is the
Taylor expansion with some nite radius of convergence, but it may also be
an asymptotic expansion. Let us concentrate on the rst case, some remarks
concerning the second one will be given later, if time allows.
If the information we need concerns points within the circle of convergence
of the Taylor series, then the problem is (almost) trivial. If it concerns points
outside the circle, then the problem becomes that of analytic continuation.
Unfortunately, the method of direct rearrangements of the series, used in
theoretical considerations on the analytic continuation, is practically useless
here. The method of the practical analytic continuation which I shall discuss
is called the method of Pade Approximation. There exist ample monographs
on Pade Approximants [6], [2], [3] and my purpose here is to present you a
subjective glimpse of the subject.
Actually, the method is based on the very direct idea of using rational
functions instead of polynomials to approximate the function of interest. They
are practically as easy to calculate as polynomials, but when we recall that the
truncated Laurent expansions is just a rational function, we can expect that
they could provide reasonable approximations of functions also in a vicinity of
the poles of the latter, not only in circles of analyticity. Therefore the concept,
born already in XIXth century, was to substitute partial sums of the Taylor
series, by rational functions having the corresponding partial sums of their
own Taylor series identical to that former one. To formulate it precisely, let
us assume we have a function f (z) with its Taylor expansion

f (z) =

fi z i for |z| < R .

i=0

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 5969, 2006.
Springer-Verlag Berlin Heidelberg 2006

(1)

Maciej Pindor

Having the partial sum of the above series up to the power M , we seek a
rational function rm,n (z) which will have rst M + 1 terms of its Taylor
expansion identical to that of f (z) what I shall represent by
rm,n (z) f (z) = O(z M +1 ) .

(2)

Unfortunately this problem seems to be badly dened there are probably

many rational functions that can satisfy this condition: possibly all such that
m + n = M . In other words, assuming for the moment that all such rational
functions can be found, to the innite sequence of partial sums of the Taylor
series (1) there correspond an innite table (or a double sequence) of rational
approximants dened by (2). As we are after the analytic continuation of
f (z), we expect that some sequence of rational approximants dened this way
would converge, in some sense, to f (z) outside the convergence circle of (1).
But which one? Is it a case of advantageous exibility, or that of embarras
du choix? I shall argue in a moment that it is this rst one!

e Table
2 The Pad
Let us, however, discuss rst the problem of existence of rational functions
dened by (2). If we denote the numerator of rm,n (z) by Pm (z) and its denominator by Qn (z) and rm,n (z) by [m/n]f (z) then (2) becomes
[m/n]f (z) f (z) =

Pm (z)
f (z) = O(z m+n+1 ) .
Qn (z)

(3)

Let me make here an obvious remark that Pm depends also on n and Qn de[m/n]
, but for hygienic reasons
pends on m and they should be denoted, e.g., Pm
I shall almost everywhere skip this additional index. Finding coecients of
Pm and Qn by the expansion of [m/n]f (z) and then comparing the two series,
would be a horror, but the problem can immediately be reduced to the linear
one:
Pm (z) Qn (z)f (z) = O(z m+n+1 ) .

(4)

This is how Frobenius [5] dened N

aruhngsbr
uchen already in 1881 and
therefore (4) is called the Frobenius denition. One can immediately see that
it leads to a system of linear equations for coecients of Qn (z) and formulae
expressing coecients of Pm by those of Qn . Denoting the former by {pi }m
0
and the later by {qi }n0 we have (assuming that fi 0 for i < 0)

fm+1 fm fmn+2 fmn+1
q0
fm+2 fm+1 fmn+3 fmn+2 q1

= 0
(5)

qn
fm+n fm+n1 fm+1
fm

Pade Approximants

and
p0 = f 0 q 0
p1 = f 1 q 0 + f 0 q 1
p2 = f 2 q 0 + f 1 q 1 + f 0 q 2

(6)

min(m,n)

pm =

fmi qi .
i=0

The system (5) is an homogeneous one but it has n equations for n + 1

unknowns. The reason is that [m/n]f has m + n + 1 free coecients, but we
have written equations for m + n + 2 ones. The result is that the system (5)
has always at least one nontrivial solution. One could think that we can take
then an arbitrary value for one of the coecients qi and next solve (5) for the
remaining coecients of Qn . However, it may happen that the determinant
of this linear system vanishes and either we have again an innite number of
solutions, or no solution at all. The problem can also be stated in this way:
although the rational approximant dened by (4) always exists, it may happen
that it does not satisfy (3). The rst study of the table of all approximants
[m/n]f has been done by Henry Pade [8] in his PhD dissertation and it is why
now they are called Pade Approximants and the table is called Pade Table. The
result was that there were square areas of the Pade Table where all entries were
identical rational functions of degrees equal to those of its upper left corner
and they all fulll (4). However, only approximants on the antidiagonal of
the square and to the left (up) to it fulll also (3). According to one of the
contemporary denitions introduced by Baker [1], we take Qn (0) = 0 (e.g. 1,
i.e. q0 = 1) which is possible only when (3) is satised, and say that only in
this case Pade Approximants exist. See Fig. 1.
I shall present here only the very brief discussion of situations leading to
an appearance of blocks in the Pade table. Of course their existence is due to
special relations between coecients of the Taylor series e.g. vanishing of
some coecients or possibility of representing higher coecients by algebraic
functions of lower ones.
The rst situation is exemplied by the series containing only even powers
of the variable
f (z) =

log(1 + z 2 )
z2
z4
z6
z8
=
1

+
+
z2
2
3
4
5

For this series we have

[2/2]f =

1+
1+

z2
6
2z 2
3

z4
2z 6
z2
+

+
2
3
9

Obviously, [2/2]f is simultaneously [3/2]f and [2/3]f because its Taylor series
matches that of f (z) up to z 5 . On the other hand, there is no rational function

Maciej Pindor

[k/l]

[k/l+1]

[k/l+2]

............

[k/l+j-1]

[k+1/l+j-1]

[k+1/l]

............... .............

............

[k+2/l]

............

..............

...........

.........

.............

[k+j-1/l]

[k+j-1/l+1]

[k/l+j]

Here, in the lower

part of the table,
Pad Approximants
do not exist

[k+j/l]
Fig. 1. A block of the size j + 1 in the Pade Table. All Pade Approximants on
the positions indicated by their symbols, or by dots, exist and are identical to [k/l],
therefore they are rational functions of degrees k and l in the numerator and the
denominator, however they fulll equation (3) with m and n corresponding to their
positions in the Pade Table.

of degrees of the numerator and of the denominator both 3 that would

match the series for f (z) up to z 6 . [4/2]f satises this condition, but then it
is identical with [5/3]f and [4/3]f
[4/2] =

z2
4

3z 2
4

z4
24

z2
z4
z6
3z 8
+

+
+
2
3
4
16

In this case the whole Pade Table consists of blocks of the size 2.
The second situation appears typically when f (z) is a rational function
itself. In this case there is one innite block with the left upper corner at
the entry corresponding to the exact degrees of the numerator and the denominator of this function. Obviously all the Pade Approximants with degrees
of numerators and denominators larger or equal to these of the function, are
equal to this function, because it matches it own Taylor expansion to any
order!

Pade Approximants

3 Convergence
Rational functions are meromorphic, and therefore the rst speculation that
comes to the mind (at least mine) is that Pade approximants should be well
suited to approximate just the former ones.
This speculation appears to absolutely correct, because there holds the de
Montessus theorem ([3] p. 246):
Theorem 1 (de Montessus, 1902). Let f (z) be a function meromorphic in
the disk |z| < R with m poles at distinct points z1 , z2 , ..., zm with
0 < |z1 | |z2 | |zm | < R .
m

Let the pole at zk have multiplicity k and let the total multiplicity

k = M
k=1

precisely. Then
f (z) = lim [L/M ]
L

uniformly on any compact subset of

D = {z, |z| R, z = zk , k = 1, 2, . . . , m} .
One could be very enthusiastic about this theorem, considering that it solves
completely the problem of analytic continuation inside a disc of meromorphy.
There is however a practical obstacle in applying the theorem: generally, we
cannot say what M we should use. We cannot expect anything particularly
interesting if M is too small (e.g. smaller than the multiplicity of the nearest
singularity), but when it is too large, the uniform convergence can be expected
only for subsequences on rows in the Pade Table. This is well illustrated by [4]:
Theorem 2 (Beardon, 1968). Let f (z) be analytic in |z| R. Then an
innite subsequence of [L/1] Pade approximants converges to f (z) uniformly
in |z| R.
which casts into doubt whether the sequence [L/1] must converge even in a
disc of analyticity of the function! Although the theorem does not exclude
that the subsequence could be the complete sequence, many counterexamples
were constructed to show that the above theorem is the optimal result. Maybe
the best known is the one due to Perron [9] he has constructed the series
representing an entire function, but such that poles of [L/1] were dense in the
plane.
On the other hand such ugly phenomena do not appear in practice
e.g.
for f (z) = ez poles of [L/a] lie at L + 1, while these of [L/2] at L + 1 i L + 1
and both rows (and also all the other ones) of the Pade table converge to f (z)
on any compact subset of the complex plane containing the origin.
Happily, problems caused by the stray poles are not as acute as one could
think, as explained by the following theorem ([3] p. 264)

Maciej Pindor

Theorem 3. Let f (z) be analytic at the origin and also in a given disk |z| R
except for m poles counting multiplicity. Consider a row of Pade table [L/M ]
of f (z) with M xed, M m, and L . Suppose that arbitrarily small,
positive and are given. Then L0 exists such that |f (z) [L/M ]| < for
any L > L0 and for all |z| R except for z EL where EL is a set of points
in the z-plane of measure less than .
This type of convergence is known as the convergence in measure and seems to
be used in this context rst by Nuttal [7]. It means that we cannot guarantee
convergence at any given point in the z-plane, but it assures us that the area
where our Pade approximants do not approximate f (z) arbitrarily well can
be made as small as we wish.
It is important to understand that the theorem says nothing about where
this set EL is, and the practice shows that undesired poles are accompanied by
undesired zeros and form so called defects which spoil convergence in smaller
and smaller neighborhoods, but shift unpredictably from order to order.
But what about functions with more rich analytical structure essential
singularities and branch points?
The amazing (at least for me) fact is that if we are content with convergence in measure (or even stronger convergence in capacity) also such functions can be approximated by Pade approximants, if we consider sequences
with growing degrees of the numerator and of the denominator. The fundamental theorem on convergence of Pade approximants for functions with
essential singularities is due to Pommerenke [10]
Theorem 4 (Pommerenke, 1973). Let f (z) be a function which is analytic
at the origin and analytic in the entire z-plane except for a countable number
of isolated poles and essential singularities. Suppose > 0 and > 0 are given.
Then M0 exists such that any [L/M ] Pade approximant of the ray sequence
(L/M = ; = 0, = ) satises
|f (z) [L/M ]f (z)| <
for any M M0 , on any compact set of the z-plane except for a set EL of
capacity less than .
As you see, the essential notion here is that of capacity. It is also known
as Chebishev constant, or transnite diameter. I do not have time here to
dene it, as it is a dicult concept concerning geometry of the complex plane.
Anyway to understand practical implications of the theorem above and the
ones to follow, it is sucient to know that the capacity is a function on sets
in the complex plane such that it vanishes for countable sets of points, but
is dierent from zero on line segments, e.g. for a section of a straight line it
equals to one fourth of its length. For a circle it is the same as for the disk
inside the circle and equals to their radius. Actually it is proportional to the
electrostatic capacity in the plane electrostatics.

Pade Approximants

If we want to approximate functions having branchpoints the rst question

that comes to mind is how can rational functions approximate a function in
a vicinity of its branchpoint? The astonishing answer is they can do it very
well, simulating a cut as a line of coalescence of innite number of zeros and
poles! This answer may seem puzzling for you which cut? There seem to be
the enormous arbitrariness in joining branchpoints by cuts, and why should
Pade approximants choose just this set of cuts and not another, or why should
all Pade approximants choose the same cuts? The answer to these questions
lies in the interesting fact that although all cuts are equal, but some of them
are more equal than others.
denotes here the exThis fact is established by the following theorem (C
tended complex plane) [11]
Theorem 5 (Stahl, 1985). Let f be given by an analytic function element
in a neighborhood of innity. There uniquely exists a compact set K0 C such
that
0 is a domain in which f (z) has a single-valued analytic conti(i) D0 := C\K
nuation,
(ii) cap(K0 ) = inf cap(K), where the inmum extends over all compact sets
K C satisfying (i),
(iii) K0 K for all compact sets K C satisfying (i) and (ii).
The set K0 is called minimal set (for single-valued analytical continuation
extremal domain.
of f (z)) and the domain D0 C
The following theorem, due to H. Stahl [11], refers to, so called, closeto-diagonal sequences of Pade approximants. By the latter one means the
sequence [m/n] such that limm+n m/n = 1.
Theorem 6 (Stahl, 1985). Let the function f (z) be dened by

f (z) =

fj z j

j=0

of capacity zero. Then

and have all its singularities in a compact set E C
any close to diagonal sequence of Pade approximants [m/n](z) to the function
f (z) converges in capacity to f (z) in the extremal domain D0 .
In simple words, the theorem says that close-to-diagonal sequences of Pade
approximants converge practically, for a very wide class of functions, everywhere, except on a set of optimal cuts. However, we must keep in the mind
that it is not the uniform convergence, therefore when applying Pade approximants, we must be careful and compare few dierent approximants from a
close-to-diagonal sequence.

Maciej Pindor

4 Examples
Let us see some examples how Pade approximants work for dierent types
of functions. In illustrations below, I shall devote more attention to demonstrating that Pade approximants discover correctly singularities and zeros
than to approximating values of functions, though I shall not forget about the
latter.
Let f (z) = tanh(z)/z + 1/[2(1 + z)]. This function has an innite number
of poles uniformly distributed on the imaginary axis at z = (2k + 1)/2
k = 0, 1, 2, . . . and the pole at z = 1. It has also innite number of
zeros, the ones closest to origin are: z = 2.06727, .491559 2.93395i,
.5357536.17741i, .54597712.5134i and so on. I have added the geometric
series mainly to have a function with a series containing all powers of z, not
the one with even powers only. A small curiosity is that there is a block in the
Pade table of this function the one consisting of [0/1], [0/2], [1/1], [1/2].
As in any circle centred at the origin there is an odd number of zeros and
poles, we consider the sequence [M/3]. In the tables below I shall compare
positions of zeros and poles of the approximants in this sequence.
P.A.
[3/3]

zeros
2.06806, 1.02990 3.17939i

poles
1.00065, .002435 1.58229i

[4/3]

1.98353, .963506 2.89352i

25.5413

.999348, .006303 1.57462i

[5/3]

2.06711, .645195 2.98225i

6.10166, 8.37705

.999974, .000237 1.57193

[6/3]

2.08494, .632818 2.91477i

4.85867, 10.7332 4.29698i

1.00003, .000584 1.57118i

[7/3]

2.06730, .547353 2.93852i

4.73415 2.76689i,
5.76997 3.47249i

1.00000, .000025 1.57091i

[8/3]

2.06422, .540386 2.91715i

4.13620 2.49272i, 11.6809
5.90481 4.63395i

.999999, .000062 1.57084i

We clearly see that rst three poles and rst three zeros of [M/3] converge
to corresponding zeros and poles of f (z) as expected from the de Montessus
theorem. We could have also checked that values of [M/3] converge to values
of f (z) in the circle of the radius smaller than 3/2 the distance of the next
pair of poles. There appeared also stray zeros but they were outside this
circle.
Our function has innite number of poles, so let us see how diagonal
Pade approximants work here.

Pade Approximants

P.A.
[4/4]

zeros
poles
2.02230, .906076 3.08279i .999772, .001036 1.57569i
2.07416
2.08395

[5/5]

2.06727, .499934 2.93370i

2.74928 8.40343i

1.00000, 2 106 1.57081i

.003750 5.06207i

[6/6]

2.06716, .497714 2.93312i

2.03302, 2.66063 8.25624i

1.00000, 1 106 1.57080i

2.03304, .003394 5.02527i

If we remember that [7/3] and [5/5] both use the same number of the
coecients (11), we can conclude that the diagonal Pade approximants approximate our function better than approximants with a prescribed degree of
the denominator. We could say, there is a price to pay: [4/4] using 9 coefcients like [5/3] has an unwanted pole at 2.08395. We see however that it
is accompanied by a zero at 2.07416 and can (correctly) guess that values of
[4/4] deviate considerably from those of f (x) only close to the pair, which is
called the defect. The analogous defect appears in [6/6], but the pair is much
more tight here and we can (correctly) guess that it spoils the approximative quality of [6/6] in even smaller area close to the defect. This is just how
convergence in measure (and in capacity) manifests itself.
We can also see on Fig. 2 how the behavior of some Pade approximants, mentioned above, compares with the behavior of f (x) on the interval
[6, 1.5] i.e. behind the singularity at x = 1.
If you are curious what happens when f (z) has a multiple pole let me
tell you that in that case Pade approximants have as many single poles as is
a multiplicity of that pole and they all converge to this one when order the of
the approximation increases.
Finally, let me say that I would be glad if you have read the message:
diagonal Pade approximants are beautiful do not be discouraged by their
defects others can also have defects, but none are as useful.
You should not, however, think that diagonal Pade approximants are
always the best ones there are some situations when paradiagonal sequences
of Pade approximants, i.e. sequences [m + k/m] with k constant, are optimal.
It can happen if we have some information on the behavior of the function at
innity. Obviously [m + k/m](x) behaves like xk for x . If our function
behaves at innity in a similar way, such sequences of Pade approximants can
convergefaster.This is well exemplied by a study of Pade approximants for
f (x) = x + 1 2x + 1 + 2/(1 x). It has zeros at 1.60415 and 1.39193, a
pole at x = 1 and two branch points at x = 1/2 and x = 1. Look at zeros
and poles of [4/3] and [4/4], remembering that [4/4] uses one coecient the
series more.

Maciej Pindor

[4/3]
[7/3]
[4/4]

0.1

[5/5]
f(x)

[5/3]
0

x
[6/3]

0.1

0.2
Fig. 2. Values of dierent PA to f (x) = tanh(x) + 1/(1 + x)/2

P.A.
[4/3]

zeros
1.38833, .754876,
.556928, 1.60403

poles
.782628, .564096
.999985

[4/4]

1.38548, .739811,
.552442, 1.60441

2499.85, .767038
.558807, 1.00002

Positions of zeros and of the pole are clearly better reproduced by[4/3]
than [4/4]. Moreover, when x [4/3](x) behaves like 1.4146x ( 2
1.4142). Additionally we see that the cut (1, 1/2) is simulated by a line of
interlacing zeros and poles the line of minimal capacity connecting branchpoints.

Pade Approximants

5 Calculation of Pad
e approximants
In practical applications there appears a problem of how to calculate the given
Pade approximants. In principle one should avoid solving a system of linear
equations, because it is the process very sensitive both to errors of data and to
precision of calculations. Forty and thirty years ago much activity was devoted
to nding dierent algorithms of recursive calculation of Pade approximants.
It is well documented in [3] ch. 2.4. However you can see that the system
of equations for coecients of the denominator is the one with the Toeplitz
matrix and for such systems there exist relatively fast and reliable routines
in all numerical programs libraries. With the speed of computers now in use,
quadruple precision as a standard option in all modern Fortran compilers
and also multiprecision libraries spreading around, I think that nding Pade
approximants this way is in practice the most convenient solution. This is,
e.g., the method used for calculation of Pade approximants in the symbolic
algebra system Maple.

References
1. G.A. Baker, Jr. Existence and Convergence of Subsequences of Pade Approximants. J. Math. Anal. Appl., 43:498528, 1973.
2. G.A. Baker, Jr. Essentials of Pade Approximants. Academic Press, 1975.
3. G.A. Baker, Jr., P. Graves-Morris. Pade Approximants, volume 13 and 14
of Encyclopedia of Mathematics and Applications. Addison-Wesley, 1981.
4. A.F. Beardon. The convergence of Pade Approximants. J. Math. Anal. Appl.,
21:344346, 1968.
5. G. Frobenius. Ueber Relationen zwischen den N
aherungsbr
uchen von potenzreihen. J f
ur Reine und Angewandte Math., 90:117, 1881.
e. Number 667 in Springer Lecture Notes
6. J. Gilewicz. Approximants de Pad
in Mathematics. Springer Verlag, 1978.
7. J. Nuttal. Convergence of Pade approximants of meromorphic functions. J.
Math. Anal. Appl., 31:147153, 1970.
. Sur la representation approchee dune fonction par des fractions
8. H. Pade
rationelles. Ann. de lEcole Normale, 9(3ieme serie, Suppl. 3-93).
9. O. Perron. Die Lehre von den Kettenbr
uchen. B.G. Tuebner, 1957. Chapter 4.
10. Ch. Pommerenke. Pade approximants and convergence in capacity. J. Math.
Anal. Appl., 31:775780, 1973.
11. H. Stahl. Three dierent approaches to a proof of convergence for Pade approximants. In Rational Approximation and its Applications in Mathematics and
Physics, number 1237 in Lecture Notes in Mathematics. Springer Verlag, 1987.

Potential Theoretic Tools in Polynomial

and Rational Approximation
Eli Levin and Edward B. Sa
A.L. Levin
The Open University of Israel
Department of Mathematics
P.O. Box 808, Raanana
Israel
elile@openu.ac.il

E.B. Sa
Center for Constructive Approximation
Department of Mathematics
Vanderbilt University
Nashville, TN 37240, USA
esaff@math.vanderbilt.edu

Logarithmic potential theory is an elegant blend of real and complex analysis

that has had a profound eect on many recent developments in approximation
theory. Since logarithmic potentials have a direct connection with polynomial
and rational functions, the tools provided by classical potential theory and
its extensions to cases when an external eld (or weight) is present, have
resolved some long-standing problems concerning orthogonal polynomials, rates of polynomial and rational approximation, convergence behavior of Pade
approximants (both classical and multi-point), to name but a few.
In this article we provide an introduction to the tools of classical and
weighted potential theory, along with a taste of various applications. We
begin by introducing three dierent quantities associated with a compact
(closed and bounded) set in the plane.

1 Classical Logarithmic Potential Theory

Potential theory has its origin in the following
Problem 1 (Electrostatics Problem). Let E be a compact set in the complex plane C. Place a unit positive charge on E so that equilibrium is reached
in the sense that the energy is minimized.
To create a mathematical framework for this problem, we let M(E) denote
the collection of all positive unit measures supported on E (so that M(E)
contains all possible distributions of charges placed on E). The logarithmic
potential associated with is
U (z) :=

log

1
d(t),
|z t|

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 7194, 2006.
Springer-Verlag Berlin Heidelberg 2006

Eli Levin and Edward B. Sa

which is harmonic outside the support S() of and is superharmonic in C.

The latter means that the value of the potential at any point z is not less
than its average over any circle centered at z. Notice that, since is a unit
measure,
lim (U (z) + log |z|) = 0.

(1)

The energy of such a potential is dened by

I() :=

U d =

log

1
d(t)d(z).
|z t|

Thus, the electrostatics problem involves the determination of

VE := inf{I() : M(E)},
which is called the Robin constant for E. Note that since E is bounded, we
have
diam E := sup |z t| < ,
z,tE

which implies that

< VE +.
The logarithmic capacity of E, denoted by cap(E), is dened by
cap(E) := eVE .
If VE = +, we set cap(E) = 0. Such sets are called polar and they are
very thin. In particular, the area (= planar Lebesgue measure) and the
length (= one-dimensional Hausdor measure) of any polar set, are both
equal to zero. For example, any countable set has capacity zero. (However,
the classical Cantor set has positive capacity.)
A fundamental theorem of Frostman asserts that if cap(E) > 0, there
exists a unique measure E M(E) such that I(E ) = VE . This extremal
measure is called the equilibrium measure (or Robin measure) for E.
We do not dwell on the proof of the Frostman result, but only mention
that it utilizes three important properties:
(i) M(E) is compact with respect to weak-star convergence of measures;
(ii) I() is a lower semi-continuous function on M(E);
(iii) I() is a strictly convex function on M(E).
The existence of E follows from (i), (ii), while (iii) guarantees the uniqueness.
The weak-star convergence (denoted weak*) is dened as follows: we say that

a sequence {n } converges weak* to (write n ), if

Potential Theoretic Tools in Polynomial and Rational Approximation

f dn

f d

for any function f continuous in C.

The potential U E associated with E is called the equilibrium potential
(or conductor potential ) for E. Some basic facts about cap(E) and U E are:
(a) Let E denote the outer boundary of E (that is, the boundary of the
unbounded component of C \ E; see Fig. 1). Then E is supported on E:

E
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
E
1111111111111
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000

Fig. 1. Outer boundary of E

S(E ) E.
Moreover, if strict inclusion takes place, then the set E \S(E ) has capacity
zero. It follows from the above inclusion that, being unique, the equilibrium
measures for E and for E coincide. Therefore
cap(E) = cap( E).
(b) For all z C,
U E (z) VE
with equality holding quasi-everywhere on E; that is, except possibly for a set
of capacity zero. We write this as
U E (z) = VE = log

1
cap(E)

q.e. on E.

(2)

Moreover, such equality characterizes E :

If the potential of some M(E) is constant q.e. on E and I() < , then
= E .
(c) A point z E is called regular if (2) holds at z. If the interior1 IntE of E
is not empty, it follows from (a) that the conductor potential is harmonic there.
1

A point z0 Int E if and only if there is some open disk with center at z0 that
lies entirely in E.

Eli Levin and Edward B. Sa

Then (b) guarantees that (2) holds at every point of IntE. The following fact is
deeper: if E is connected, then every point of E is regular. Furthermore,
at every regular point the conductor potential is continuous.
It is helpful to keep in mind the following two simple examples.
Example 1. Let E be the closed disk of radius R, centered at 0. Then dE =
ds/2R, where ds is the arclength on the circle |z| = R. One way to derive this
is to observe that E is invariant under rotations. The equilibrium measure,
being unique and supported on |z| = R, must enjoy the same property, and
therefore must be of the above form. Calculating the potential, we obtain
U E (z) = log

1
,
|z|

|z| > R

U E (z) = log

and

1
,
R

|z| R.

Therefore (see (2)), cap(E) = R.

Example 2. Let E = [a, b] be a segment on the real line. Then cap(E) =
(b a)/4 and dE is the arcsine measure; i.e.
dE =

dx
(x a)(b x)

x [a, b].

If a = 1, b = 1, the conductor potential is given by

U E (z) = log 2 log |z +

z 2 1|

(for arbitrary a, b the expression is a bit more complicated). These results

can be obtained from Example 1 by applying the Joukowski conformal map
of C \ [1, 1] onto |w| > 1.
There is an important relation between the equilibrium potential and the
notion of Green function. Assume, for simplicity, that E is connected and
let denote the unbounded component of C \ E (so that = E and
{} is a simply connected domain in the extended complex plane).
Let w = (z) denote the conformal map of onto |w| > 1, normalized by
() = , () > 0. That is, for some constant c > 0,
(z) =

1
z + lower order terms,
c

z .

By the Riemann Mapping Theorem, such a exists and is unique. Moreover,

its absolute value || becomes a continuous function in the whole plane if we
set
|(z)| = 1,

z E.

Let us examine some properties of the function g = log ||.

First, g is the real part of log (z) which is analytic in . Therefore

Potential Theoretic Tools in Polynomial and Rational Approximation

(i)

g is harmonic in ;
Second, our normalization implies that
(ii) lim (g(z) log |z|) exists and is nite;
z
Finally,
(iii) g is continuous in the closed domain and equals zero on its boundary.
There is a unique function that enjoys these three properties. It is called
the Green function for with pole at innity and is denoted by g (, ). So
we have just shown that
log |(z)| = g (z, )
(and that the limit in (ii) is equal to log(1/c)). It is now easy to see that
1
g (z, ).
cap(E)

U E (z) = log

(3)

Indeed, let h denote the dierence of the two sides of (3). Then h is harmonic
in the domain , and is equal to zero on its boundary. Moreover, h has a
nite limit at innity, namely log(1/c) log(1/cap(E)), recall (1). By the
maximum principle, h is identically zero and we are done. We also obtain
that the constant c is just cap(E).
In the case when E is a smooth closed Jordan curve, there is a simple
representation for E . The equilibrium measure of any arc on E is given
by
E () =

1
2

1
g
ds =
n
2

| |ds,

where the derivative in the rst integral is taken in the direction of the outer normal on E. Alternatively, E () is given by the normalized angular
measure of the image ():
E () =

1
2

()

(4)

(for this representation, the smoothness of E is not needed).

The reader is invited to carry out the above calculations, for the special
case of a disk, considered in Example 1.
We now introduce another quantity associated with E. It arises in the
following
Problem 2 (Geometric Problem). Place n points on E so that they are
as far apart as possible in the sense of the geometric mean of the distances
between the points. Since the number of dierent pairs of n points is n(n1)/2,
we consider the quantity

2/n(n1)
n (E) :=

max

z1 ,... ,zn E

|zi zj |
1i<jn

Eli Levin and Edward B. Sa

Any system of points Fn =

(n)

z1 , . . . , zn

for which the maximum is at(n)

tained, is called an n-point Fekete set for E; the points zi in Fn are called
Fekete points.
(2)
(2)
(2) (2)
=
For example, if n = 2, then F2 = z1 , z2 , where z1 z2
diam E. Obviously, these 2 points lie on the outer boundary of E. In general, it follows from the maximum modulus principle for analytic functions,
that for all n, the Fekete sets lie on the outer boundary of E.
It turns out (cf. [11], [12]), that the sequence n decreases, so we may
dene
(E) := lim n (E).
n

The quantity (E) is called the transnite diameter of E.

Example 3. Let E be the closed unit disk. Then one can show that the set
of n-th roots of unity is an n-point Fekete set for E (and so is any of its
rotations). Furthermore, (E) = 1.
Example 4. Let E = [1, 1]. Then (cf. [15]) the set Fn turns out to be unique
(1,1)
(1,1)
and it coincides with the zeros of (1 x2 )Pn2 (x), where Pn2 is the Jacobi
polynomial with parameters (1, 1) of degree n 2. Also, (E) = 1/2.
Finally, we introduce a third quantity the Chebyshev constant, cheb(E)
which arises in a mini-max problem.
Problem 3 (Polynomial Extremal Problem). Determine the minimal
sup-norm on E for monic polynomials of degree n. That is, determine
tn (E) := min

pPn1

z n + p(z)

where Pn1 denotes the collection of all polynomials of degree n 1 and

E is dened by
f

:= max |f (z)|.
zE

We assume that E contains innitely many points (which is always the case
if cap(E) > 0). Then for every n there is a unique monic polynomial Tn (z) =
z n + such that Tn E = tn (E). It is called the n-th Chebyshev polynomial
for E.
In view of the simple inequality
tm+n (E) = Tm+n

Tm Tn

= tm (E)tn (E),

one can show (cf. [11], [12]) that the sequence tn (E)1/n converges, so we may
dene
cheb(E) := lim tn (E)1/n .
n

Potential Theoretic Tools in Polynomial and Rational Approximation

Example 5. Let E be the closed disk of radius R, centered at 0. For any p

Pn1 , the ratio (z n + p(z))/z n represents an analytic function in |z| 1 that
takes the value 1 at . By the maximum principle,
z n + p(z)

= max |z n + p(z)| = Rn max

|z|=R

z n + p(z)
Rn ,
zn

and strict inequality takes place if p(z) is not identically zero. It follows that
Tn (z) = z n . Therefore tn (E) = Rn and cheb(E) = R.
Example 6. Let E = [1, 1]. Then Tn is the classical monic Chebyshev polynomial
Tn (x) = 21n cos(n arccos x),

x [1, 1],

n 1.

Also, tn (E) = 21n from which it follows that cheb(E) = 1/2.

Closely related to Chebyshev polynomials are Fekete polynomials. An n-th
Fekete polynomial Fn (z) is a monic polynomial having all its zeros at the n
points of a Fekete set Fn .
Example 7. If E is the closed unit disk centered at 0, then one can take Fn (z) =
z n 1, so that Fn E = 2. Comparing this with Example 5 we see that the
Fn s are asymptotically optimal for the Chebyshev problem:
lim

1/n
E

= lim

1/n
E

= 1 = cheb(E).

Moreover, uniformly on compact subsets of |z| > 1, we have

lim |Fn (z)|1/n = lim |Tn (z)|1/n = |z| = exp{U E (z)},

(the last equality follows from Example 1). Finally, it is easy to see that the
zeros of Fn are asymptotically uniformly distributed on |z| = 1. By that we
mean that for any arc on this circle,
1
1
{number of zeros of Fn in }
{length of },
n
2

n .

Note that the second ratio coincides with E () (cf. Example 1).
The examples of this section illustrate the following fundamental theorem,
various parts of which are due to Fekete, Frostman, and Szeg
o.
Theorem 1 (Fundamental Theorem of Classical Potential Theory).
For any compact set E C,
(a) cap(E) = (E) = cheb(E);
(b) Fekete polynomials are asymptotically optimal for the Chebyshev problem:
lim

1/n
E

= cheb(E) = cap(E).

If cap(E) > 0 (so that E is dened), then we also have:

Eli Levin and Edward B. Sa

(c) Uniformly on compact subsets of the unbounded component of C \ E,

lim |Fn (z)|1/n = exp{U E (z)};

(d) Fekete points (the zeros of Fn ) have asymptotic distribution E .

The last statement is illustrated in Example 7, but let us make it more
precise. Let
n

Pn (z) =

(z zk )
k=1

and let zk denote the unit mass placed at zk . Then U zk (z) = log
and we see that
|Pn (z)|1/n = eU

(z)

1
,
|z zk |

where is the unit measure (normalized zero counting measure for Pn ) given
by
= Pn :=

1
n

zk .
k=1

Notice that for any set K,

(K) =

1
{number of zeros of Pn in K}.
n

We can now rigorously formulate part (d) of the Fundamental Theorem:

The normalized zero counting measures for Fekete polynomials converge weak*
to E .
In applications, the following result is also useful:
Let {Pn } be any sequence of monic polynomials having all their zeros in E

and such that Pn E . If E is regular (e.g., if it is connected), then the

assertions (b) and (c) of the Fundamental Theorem hold for the Pn s.
Such sequences can be constructed by various discretization techniques.
One of the simplest discretizations was employed by J.L. Walsh in his work
on polynomial and rational approximation; see Remark (a) at the end of the
next section.

2 Polynomial Approximation of Analytic Functions

Let f be a continuous function on a compact set E (symbolically, f C(E))
and let

Potential Theoretic Tools in Polynomial and Rational Approximation

en (f ; E) = en (f ) := min f p

(5)

pPn

be the error in best uniform approximation of f by polynomials of degree at

most n. We denote by pn the polynomial of best approximation: f pn E =
en (f ).
If en (f ) 0 as n , the series
p1 +

(pn+1 pn )

n=1

converges to f uniformly on E, so that the continuous function f must be

analytic at every interior point of E. (The collection of all functions that are
continuous on E and analytic in IntE is denoted by A(E).) Furthermore, it
follows from the maximum principle, that the above series automatically converges on every bounded component of C \ E, so that its sum represents an
analytic continuation of f to these components (e.g., if E is the unit circle
|z| = 1, then the convergence holds in the unit disk |z| 1). Such a continuation, however, may be impossible. Therefore, in order to ensure that
en (f ) 0 for every function f in A(E), it is necessary to assume that the
only component of C \ E is the unbounded one; that is, C \ E is connected
(so that E does not separate the plane).
A celebrated theorem of S.N. Mergelyan (cf. [3]) asserts that this assumption is also sucient. Here we prove this result in a special case when E is
connected and f is analytic in some neighborhood of E. The proof will also
give the rate of approximation.
So let C \ E and E both be connected. Then the complement of E with
respect to the extended complex plane is a simply-connected domain. Let
be the conformal map considered in Section1 and recall that
log |(z)| = log

1
U E (z) = gC\E (z, ),
cap(E)

z C \ E.

(6)

For any R > 1, let R denote the level curve {z : |(z)| = R}, see Fig. 2 (we
call such a curve a level curve with index R).

11111
00000
E
00000
11111

111
00
00
11

Fig. 2. Level curve of

Note that R is also a level curve for the potential:

Eli Levin and Edward B. Sa

U E (z) = log

1
,
R cap(E)

z R .

(7)

Let Fn+1 be the (n + 1)-st Fekete polynomial for E and let Pn be the polynomial of degree n that interpolates f at the zeros of Fn+1 . We are given that
f is analytic in a neighborhood of E; hence there exists R > 1 such that f is
analytic on and inside R . For any such R, the Hermite interpolation formula
yields
f (z) Pn (z) =

1
2i

Fn+1 (z) f (t)dt

,
Fn+1 (t) t z

z inside R .

(8)

(The validity of the Hermite formula follows by rst observing that the righthand side vanishes at the zeros of Fn+1 (z), and then by replacing f (z) by its
Cauchy integral representation to deduce that the dierence between f and
the right-hand side is indeed a polynomial of degree at most n).
Formula (8) leads to a simple estimate:
en (f ) f Pn

Fn+1 E
,
minR |Fn+1 (t)|

where K is some constant independent of n. Applying parts (b), (c) of the

Fundamental Theorem we obtain, with the aid of (7), that
lim sup en (f )1/n
n

1
cap(E)
=
< 1.
R cap(E)
R

(9)

We have proved that indeed en (f ) 0 and that the convergence is geometrically fast. Since R > 1 was arbitrary (but such that f is analytic on and
inside R ), we have actually proved that (9) holds with R replaced by R(f ),
where
R(f ) := sup{R : f admits analytic continuation to the interior of R }.
Can we improve on this? The answer is no! In order to show this, we need
the following very useful result.
Theorem 2 (Bernstein-Walsh Lemma). Assume that both E and C \ E
are connected. If a polynomial p of degree n satises |p(z)| M for z E,
then |p(z)| M rn for z r , r > 1.
The proof uses essentially the same argument as in Example 5. The function p(z)/n (z) is analytic outside E, even at . Since || = 1 on E, we
know that |p(z)/n (z)| M for z E. Hence the maximum principle yields
p(z)
M,
n (z)

z C\E

and the result follows by the denition of r .

Potential Theoretic Tools in Polynomial and Rational Approximation

Assume now that (9) holds for some R > R(f ) and let R(f ) <
Then for some constant c > 1,
en (f )

c
n

< R.

n 1.

Since, from the triangle inequality,

pn+1 pn

= pn+1 f + f pn

en+1 (f ) + en (f ) 2c

we obtain from the Bernstein-Walsh Lemma that for any r > 1,

pn+1 pn

n 1.

If we choose R(f ) < r < , we obtain that the series p1 + n=1 (pn+1 pn )
converges uniformly inside r . Hence it gives an analytic continuation of f to
the interior of r , which contradicts the denition of R(f ).
Let us summarize what we have proved.
Theorem 3 (Walsh [17, Ch. VII]). Let the compact set E be connected
and have a connected complement. Then for any f A(E),
lim sup en (f )1/n =
n

1
.
R(f )

Remarks.
(a) The proof of this theorem shows that on interpolating f at Fekete points we
obtain a sequence of polynomials that gives, asymptotically, the best possible
rate of approximation. It may be not easy, however, to nd these points and
it is desirable to have other methods at hand. Assume, for example, that E
is bounded by a smooth Jordan curve . With as above, let the points
w1 , . . . , wn be equally-spaced on |w| = 1 and let zi = 1 (wi ) E be their
preimages. These points (called the Fejer points) divide into n subarcs,
each having E -measure 1/n (the latter can be derived from the formula (4)).
Therefore, the Fejer points have asymptotic distribution E . Let Pn be the
monic polynomial with zeros at z1 , . . . , zn . According to the statement in the
end of Section 1, the sequence {Pn } enjoys the same properties (b), (c) as
{Fn } does, and the proof of Theorem 3 shows that
lim sup f Pn
n

1/n
E

1
.
R(f )

(b) R(f ) is the rst value of R for which the level curve R contains a
singularity of f . It may well be possible that f is analytic at some other points
of R(f ) , but the geometric rate of best polynomial approximation does not
feel this whether every point of R(f ) is a singularity or merely one point is
a singularity, the rate of approximation remains the same as if f was analytic

Eli Levin and Edward B. Sa

only inside of R(f ) ! To take advantage of any extra analyticity, dierent

approximation tools are needed; e.g., rational functions. We demonstrate this
in Section 6.
(c) It follows from (6) that
R = {z C \ E : gC\E (z, ) = log R}.

(10)

Assume now that C \ E is connected but is E not. Then one can still dene
the Green function gC\E via the formula
gC\E = log

1
U E ,
cap(E)

from which it follows that properties (i)(iii) described in Section 1 will hold,
provided E is regular. Then, with R dened by (10), it is easy to modify the
above proof to show that Walshs Theorem 3 holds in this case as well.
Example 8. Let E = [1, ] [, 1], 0 < < 1, and let f = 0 on [1, ]
and f = 1 on [, 1]. Some level curves R of gC\E are depicted on Fig. 3. For
R small, R consists of two pieces, while for R large, R is a single curve.
There is a critical value R0 = gC\E (0, ) for which R0 represents a selfintersecting lemniscate-like curve (the bold curve in Fig. 3). Clearly, f can be
extended as an analytic function to the interior of R0 (dene f = 0 inside
the left lobe and f = 1 inside the right lobe). For R > R0 , the interior of R
is a (connected) domain; hence there is no function analytic inside of R that
is equal to 0 on [1, ] and to 1 on [, 1]. Therefore
R(f ) = R0 = exp gC\E (0, ) ,
and by the (extension of) Walshs theorem:
lim sup en (f )1/n = exp gC\E (0, ) .
n

3 Approximation with Varying Weights a background

We start with two problems that have triggered much of the recent potential
theoretic research on polynomial and rational approximation and on orthogonal polynomials.
n
Let 0 < < 1. A polynomial P (x) = k=0 ak xk is said to be incomplete
of type (P I ), if ak = 0 for k < n. The study of such polynomials was
introduced in [6] by Lorentz who proved the following.
Theorem 4 (G.G. Lorentz, 1976). If Pn I , deg Pn as n
and

Potential Theoretic Tools in Polynomial and Rational Approximation

R , R < R0

R , R > R0

Fig. 3. Level curves of gC\E

[0,1]

= max |Pn (x)| M,

[0,1]

all n,

then
Pn (x) 0

x [0, 2 ).

for

Concerning the sharpness of this result, we state

Problem 4. Is [0, 2 ) the largest interval where the convergence to zero is
guaranteed?
Another problem, dealing with the asymptotic behavior of recurrence coecients for orthogonal polynomials, was posed by G. Freud, also in 1976 [2].
Let

w (x) := e|x| ,

(11)

be a weight on the real line and let {pn } be orthonormal polynomials with
respect to this weight:

pm (x)pn (x)e|x| dx = mn

(for = 2 these are the classical Hermite polynomials). Since the weight is
even, the polynomials pn satisfy the following 3-term recurrence relation
xpn (x) = an+1 pn+1 (x) + an pn1 (x),
where {an } is some sequence of real numbers (cf. [15]). For the weights (11),
G. Freud conjectured that
lim n1/ an

Problem 5. Resolve this conjecture.

exists.

Eli Levin and Edward B. Sa

Seemingly very dierent, these two problems are connected by a common

thread both can be formulated in terms of weighted polynomials of the
form
wn (x)Pn (x),

deg Pn n.

For the Lorentz Problem, one simply observes that any P I of degree
n/(1 ) (which for simplicity we assume to be an integer) can be written in
the form
P (x) = xn/(1) Pn (x),
where Pn is a polynomial of degree n. Therefore, this problem deals with
sequences of weighted polynomials that satisfy
w n Pn

[0,1]

w(x) = x/(1) ,

deg Pn n.

Regarding Problem 5, we observe that from the normalization

p2n (x)e|x| dx = 1

the substitution
x n1/ x,

pn (x) Pn (x) := n1/2 pn (n1/ x)

leads again to a sequence of weighted polynomials for which

w n Pn
where

L2 (R)

= 1,

w(x) = e|x|

degPn n,

is dened by
f

L2 (R)

|f (x)|2 dx

1/2

In this framework, the following question is of fundamental importance:

Problem 6 (Generalized Weierstrass Approximation Problem). For
E R closed, w : E [0, ), characterize those functions f continuous on
E that are uniform limits on E of some sequence of weighted polynomials
{wn Pn }, degPn n.
It turns out that Problems 4, 5, and 6 can be resolved with the aid of
potential theory, when an external eld is introduced.

Potential Theoretic Tools in Polynomial and Rational Approximation

4 Logarithmic Potentials with External Fields

Let E be a closed (not necessarily compact) subset of C and let w(z) be a
nonnegative weight on E. We dene a new distance function on E, replacing
|z t| by |z t|w(z)w(t). This gives rise to weighted versions of logarithmic
capacity, transnite diameter and Chebyshev constant.
Weighted capacity: cap(w, E).
As before, let M(E) denote the collection of all unit measures supported
on E. We set
Q := log

1
w

and call it the external eld. Consider the modied energy integral for
M(E):
Iw () :=
=

1
d(z)d(t)
|z t|w(z)w(t)
1
log
d(z)d(t) + 2 Q(z)d(z)
|z t|
log

(12)

and let
Vw :=

inf

M(E)

Iw ().

The weighted capacity is dened by

cap(w, E) := eVw .
In the sequel, we assume that w satises the following conditions:
(i) w > 0 on a subset of positive logarithmic capacity;
(ii) w is continuous (or, more generally, upper semi-continuous);
(iii) If E is unbounded, then |z|w(z) 0 as |z| , z E.
Under these restrictions on w, there exists a unique measure w M(E),
called the weighted equilibrium measure, such that
I(w ) = Vw .
The above integral (12) can be interpreted as the total energy of the unit
charge , in the presence of the external eld Q (in this electrostatics interpretation, the eld is actually 2Q). Since this eld has a strong repelling eect
near points where w = 0 (i.e. Q = ), assumption (iii) physically means that,
for the equilibrium distribution, no charge occurs near . In other words, the
support S(w ) of w is necessarily compact. However, unlike the unweighted
case, the support need not lie entirely on E and, in fact, it can be quite an

Eli Levin and Edward B. Sa

arbitrary closed subset of E. Determining this set is one of the most important
aspects of weighted potential theory.
Weighted transnite diameter: (w, E).
Let

n (w) :=
(n)

max

z1 ,... ,zn E

2/n(n1)

|zi zj |w(zi )w(zj )

1i<jn

(n)

Points z1 , . . . , zn at which the maximum is attained are called weighted

Fekete points. The corresponding Fekete polynomial is the monic polynomial
with all its zeros at these points.
As in the unweighted case, the sequence n (w) is decreasing, so one can
dene
(w, E) := lim n (w),
n

which we call the weighted transnite diameter of E.

Weighted Chebyshev constant: cheb(w, E).
Let
tn (w) := min

pPn1

wn (z)(z n p(z))

Then the weighted Chebyshev constant is dened by

cheb(w, E) := lim tn (w)1/n .
n

The following theorem (due to Mhaskar and Sa) generalizes the classical
results of Section 1.
Theorem 5 (Generalized Fundamental Theorem). Let E be a closed
set of positive capacity. Assume that w satises the conditions (i)(iii) and let
Q = log(1/w). Then
cap(w, E) = (w, E) = cheb(w, E) exp

Qdw .

Moreover, weighted Fekete points have asymptotic distribution w as n ,

and weighted Fekete polynomials are asymptotically optimal for the weighted
Chebyshev problem.
How can one nd w ?
In most applications, the weight w is continuous and the set E is regular. Recall that the latter means that the classical (unweighted) equilibrium
potential for E is equal to VE everywhere on E, not just quasi-everywhere.

Potential Theoretic Tools in Polynomial and Rational Approximation

Under these assumptions, the equilibrium measure = w is characterized

by the conditions that M(E), I() < and, for some constant cw , the
following variational conditions hold:
U + Q = cw
U + Q cw

on S()
on E.

(13)

On integrating (against = w ) the rst condition, we obtain that the constant is given by
cw = Iw (w ) +

Qdw = Vw

Qdw .

When trying to nd w , an essential step (and a nontrivial problem in its

own right!) is to determine the support S(w ). There are several methods
by which S(w ) can be numerically approximated, but they are complicated
from the computational point of view. Therefore, knowing properties of the
support can be useful and we list some of them.
Properties of the support S(w )
(a) The sup-norm of weighted polynomials lives on S(w ). That is, for any
n and for any polynomial Pn of degree at most n, there holds
w n Pn

= w n Pn

S(w ) .

(b) Let K be a compact subset of E of positive capacity, and dene

F (K) := log cap(K)

QdK ,

where K is the classical (unweighted) equilibrium measure for K. This socalled F-functional of Mhaskar and Sa is often a helpful tool in nding
S(w ). Since cap(K) and K remain the same if we replace K by K, we
obtain that F (K) = F ( K). It turns out that the outer boundary of S(w )
maximizes the F-functional:
max F (K) = F ( S(K )).
K

This result is especially useful when E is a real interval and Q is convex. It

is then easy to derive from (13) that S(w ) is an interval. Thus, to nd the
support, one merely needs to maximize F (K) only over intervals K E,
which amounts to a standard calculus problem for the determination of the
endpoints of S(w ).
Example 9 (Incomplete polynomials). Here E = [0, 1] and

Eli Levin and Edward B. Sa

Q(x) = log(1/w(x)) =

log x
1

is convex. Maximizing the F-functional one gets S(w ) = [2 , 1]. (For details,
see [12, Sec. IV.1]).
Example 10 (Freud Weights). Here E = R and w(x) = exp(|x| ). Hence
Q(x) = |x| is convex provided that > 1, and we obtain Sw = [a , a ],
where a can be given explicitly in terms of the Gamma function. (Actually,
this result also holds for all > 0; see [12, Sec. IV.1].) For example, when
= 2, we get Sw = [1, 1].

5 Generalized Weierstrass Approximation Problem

We address here Problems 4, 5, and 6. Let E be a regular closed subset of R
and w(x) be continuous on E. Then we have the following weighted analogue
of the Bernstein-Walsh lemma:
|wn (x)Pn (x)| wn Pn

S(w )

exp{n(U n (x) + Q(x) cw )},

x E \ S(w ).

With the aid of (13) and a variant of the Stone-Weierstrass theorem (cf.
[12]), one can show that if a sequence {wn (x)Pn (x)}, degPn n, converges
uniformly on E, then it tends to 0 for every x E \ S(w ).
Thus, if some f C(E) is a uniform limit on E of such a sequence, it
must vanish on E \ S(w ). The converse is not true, in general, but it is true
in many important cases.
Incomplete polynomials
For the weight w = x/(1) , we have mentioned that S(w ) = [2 , 1]. It was
proved by Sa and Varga and, independently, by M. v. Golitschek (cf. [13],
[5]), that any f C[0, 1] that vanishes on [0, 2 ] is a uniform limit on [0, 1] of
incomplete polynomials of type .
In particular, choosing f (x) = 0 for x [0, 2 ], and f (x) = x 2 for
x > 2 , the sequence of type polynomials converging uniformly to f on [0, 1]
is uniformly bounded on [0, 1], but does not tend to zero for x > 2 . Thus the
answer to Problem 4 is yes, Lorentzs Theorem 4 is indeed sharp!
Freud Conjecture
For > 1, let [a , a ] be the support of the equilibrium measure for the

weight e|x| . Lubinsky and Sa showed in [7], that any f C(R) that
vanishes outside this support is a uniform limit of a sequence of the form
exp{n|x| }Pn (x), n 1. This result was the major ingredient in the argument given by Mhaskar, Lubinsky, and Sa [8], that resolved the Freud
Conjecture in the armative.
Concerning more general weights, Sa made the following conjecture:

Potential Theoretic Tools in Polynomial and Rational Approximation

Conjecture 1. Let E be a real interval, and assume that Q = log(1/w) is

convex on E. Then any function f C(E) that vanishes on E \ S(E ) is
the uniform limit on E of some sequence of weighted polynomials {wn Pn },
degPn n.
This conjecture was proved by V. Totik [16] utilizing a careful analysis
of the smoothness of the density of the weighted equilibrium measure. We
remark that for more general Q and E, the conjecture is false, and additional
requirements on f are needed.

6 Rational Approximation
For a rational function R(z) = P1 (z)/P2 (z), where P1 and P2 are monic
polynomials of degree n, one can write

1
log |R(z)| = U 1 (z) U 2 (z),
n

where 1 , 2 are the normalized zero counting measures for P1 , P2 , respectively.

The right-hand side represents the logarithmic potential of the signed measure
= 1 2 :
U 1 (z) U 2 (z) = U (z) =

log

1
d(t).
|z t|

The theory of such potentials can be developed along the same lines as in
Section 1. We present below only the very basic notions of this theory that
are needed to formulate the approximation results. A more in-depth treatment
can be found in the works of Bagby [1], Gonchar [4], as well as [12].
The analogy with electrostatics problems suggests considering the following energy problem. Let E1 , E2 C be two closed sets that are a positive
distance apart. The pair (E1 , E2 ) is called a condenser and the sets E1 , E2
are called the plates. Let 1 and 2 be positive unit measures supported on
E1 and E2 , respectively. Consider the energy integral of the signed measure
= 1 2 :
I() =

log

1
d(z)d(t).
|z t|

Since (C) = 0, the integral is well-dened, even if one of the sets is unbounded. While not obvious, it turns out that such I() is always positive. We
assume that E1 and E2 have positive logarithmic capacity. Then the minimal
energy (over all signed measures of the above form)
V (E1 , E2 ) := inf I()

is nite and positive. We then dene the condenser capacity cap(E1 , E2 ) by

Eli Levin and Edward B. Sa

cap(E1 , E2 ) := 1/V (E1 , E2 ).

One can show, as with the Frostman theorem, that there exists a unique
signed measure = 1 2 (the equilibrium measure for the condenser) for
which I( ) = V (E1 , E2 ). Furthermore, the corresponding potential (called
the condenser potential ) is constant on each plate:

U = c1 on E1 ,

U = c2 on E2 ,

(14)

(we assume throughout that E1 , E2 are regular otherwise the above equalities hold only quasi-everywhere). On integrating against , we deduce from
(14) that
c1 + c2 = V (E1 , E2 ) = 1/cap(E1 , E2 ).

(15)

We mention that (similar to the case of the conductor potential) the relations
of type (14) characterize . Moreover, one can deduce from (14) that the
measure i is supported on the boundary (not necessarily the outer one) of
Ei , i = 1, 2. Therefore, on replacing each Ei by its boundary, we do not change
the condenser capacity or the condenser potential.
Example 11. Let E1 , E2 be, respectively, the circles |z| = r1 , |z| = r2 , r1 < r2
These sets are invariant under rotations. Being unique, the measure is
therefore also invariant under rotations and we obtain that
1 =

1
ds,
2r1

d2 =

1
ds,
2r2

where ds denotes the arclength over the respective circles E1 , E2 . Applying

the result of Example 1, we nd that

|z| > r2
0,

U (z) = log(r2 /|z|), r1 |z| r2

log(r2 /r1 ), |z| < r1 .

Therefore (recall (15))
cap(E1 , E2 ) = 1/ log

r2
.
r1

(16)

Assume now that each plate of a condenser is a single Jordan arc or curve
(without self-intersections), and let G be the doubly-connected domain that
is bounded by E1 and E2 , see Fig. 4. We call such a G a ring domain.
For ring domains one can give an alternative denition of condenser capacity. Let
u(z) :=

log(z t)d (t) + c1 .

1111111111
0000000000
1111111111
0000000000
E
1111111111
0000000000
1111111111
0000000000
G
1111111111
0000000000
1111111111
0000000000
E
1111111111
0000000000

Potential Theoretic Tools in Polynomial and Rational Approximation

1111111111
0000000000
G
0000000000
1111111111
0000000000
1111111111
E
0000000000
1111111111
E
0000000000
1111111111
1111111111
0000000000
2

Fig. 4. Ring domains

This function is locally analytic but not single-valued in G (notice that there
is no modulus sign in the integral). Moreover, if we x t and let z move along
a simple closed counterclockwise oriented curve in G that encircles E1 , say,
then the imaginary part of log(z t) increases by 2, for t E1 , while for
t E2 it returns to the original value. Since 1 and 2 are unit measures, it
follows that the function : z w = exp(u(z)) is analytic and single-valued.
Moreover, it can be shown to be one-to-one in G. By its denition, satises

log || = U + c1 = 0 on E1 ;

log || = U + c1 = c1 + c2 on E2 .

Therefore maps G conformally onto the annulus 1 < |w| < ec1 +c2 .
It is known from the theory of conformal mapping, that, for a ring domain
G, there exists unique R > 1, called the modulus of G (we denote it by
mod(G)), such that G can be mapped conformally onto the annulus 1 <
|w| < R. We have thus shown that
cap(E1 , E2 ) = 1/ log(mod(G)).

(17)

We remark that if G1 G2 are two ring domains, then mod(G1 ) mod(G2 ).

Example 12. Let E1 , E2 be as above, and assume that E2 is the R-th level
curve for E1 . That is, |(z)| = R for z E2 , where maps conformally
the unbounded component of C \ E1 onto |w| > 1. In particular, maps the
corresponding ring domain G onto the annulus 1 < |w| < R, and we conclude
that mod(G) = R (so that cap(E1 , E2 ) = 1/ log R). Applying this to the
conguration of Example 11, we see that (z) = z/r1 , so that R = r2 /r1 , and
we obtain again (16).
We now turn to rational approximation. Let E C be compact. We denote
by Rn the collection of all rational functions of the form R = P/Q, where
P , Q are polynomials of degree at most n, and Q has no zeros in E. For
f A(E), let
rn (f ; E) = rn (f ) := inf

rRn

f r

be the error in best approximation of f by rational functions from R\ . Clearly,

since polynomials are rational functions, we have (cf. (5)) rn (f ) en (f ). A

Eli Levin and Edward B. Sa

basic theorem regarding the rate of rational approximation was proved by

Walsh [17, Ch.IX]. Following is a special case of this theorem.
Theorem 6 (Walsh). Let E be a single Jordan arc or curve and let f be
analytic on a simply connected domain D E. Then
lim sup rn (f )1/n exp{1/cap(E, D)}.

(18)

The proof of (18) follows the same ideas as the proof of inequality (9). Let
be a contour in D \E that is arbitrarily close to D. Let = 1 2 be the
(n)
(n)
equilibrium measure for the condenser (E, ). For any n, let 1 , . . . , n be
(n)
(n)
equally spaced on E (with respect to 1 ) and let 1 , . . . , n be equally spaced on (with respect to 2 ). Then one can show that the rational functions
(n)
(n)
rn (z) with zeros at the i s and poles at the i s satisfy

max |rn |
E

min |rn |

1/n

e1/cap(E, ) .

(19)

(n)

Let Rn = pn1 /qn be the rational function with poles at the i s that
(n)
interpolates f at the points i s. Then the Hermite formula (cf. (8)) takes
the following form:
f (z) Rn (z) =

1
2i

rn (z) f (t)
dt,
rn (t) t z

z inside ,

and it follows from (19) that

lim sup rn (f )1/n lim sup f Rn
n

1/n
E

e1/cap(E, ) .

Letting approach D, we get the result.

Remarks.
(a) Unlike in the polynomial approximation, no rate of convergence of rn (f )
to 0 can ensure that a function f C(E) is analytic somewhere beyond E.
(b) One can construct a function for which equality holds in (18), so that
this bound is sharp. Such a function necessarily has a singularity at every
point of D; otherwise f would be analytic in a larger domain, so that the
corresponding condenser capacity will become smaller. In view of Theorem 6,
this would violate the assumed equality in (18).
Although sharp, the bound (18) is unsatisfactory, in the following sense.
Assume, for example, that E is connected and has a connected complement,
and let R , R > 1, be a level curve for E. Let f be a function that is analytic
in the domain D bounded by R and such that the equality holds in (18).
According to Example 12 we then obtain that

Potential Theoretic Tools in Polynomial and Rational Approximation

lim sup rn (f )1/n =

1
.
R

(20)

By Remark (b) above, such f must have singularities on R . Hence (recall

Remark (b) following Theorem 3) the relation (20) holds with rn (f ) replaced
by en (f ). But the family Rn contains Pn and it is much more rich than Pn
it depends on 2n + 1 parameters while Pn depends only on n + 1 parameters.
One would expect, therefore, that at least for a subsequence of ns, rn (f )
behaves asymptotically like e2n (f ). This was a motivation for the following
conjecture.
Conjecture 2 (A.A. Gonchar). Let E be a compact set and f be analytic in
an open set D containing E. Then
lim inf rn (f ; E)1/n exp{2/cap(E, D)}.
n

(21)

This conjecture was proved by O. Parfenov [9] for the case when E is a
continuum with connected complement and in the general case by V. Prokhorov [10]; they used a very dierent method the so-called AAK Theory
(cf. [18]). However this method is not constructive, and it remains a challenging problem to nd such a method. Yet, potential theory can be used to
obtain bounds like (21) in the stronger form
lim rn (f ; E)1/n = exp{2/cap(E, D)}

for some important subclasses of analytic functions, such as Markov functions

[4]) and functions with a nite number of algebraic branch-points [14]).

References
1. T. Bagby, The modulus of a plane condenser, J. Math. Mech., 17:315-329, 1976.
2. G. Freud, On the coecients in the recursion formulae of orthogonal polynomials, Proc. Roy. Irish Acad. Sect. A(1), 76:1-6, 1976.
auser, Boston Inc., Bo3. D. Gaier, Lectures on Complex Approximation, Birkh
ston, MA, 1987.
4. A.A. Gonchar, On the speed of rational approximation of some analytic functions, Math USSR-Sb., 125(167):117-127, 1984.
5. M. v. Golitschek, Approximation by incomplete polynomials, J. Approx.
Theory, 28:155-160, 1980.
6. G.G. Lorentz, Approximation by Incomplete Polynomials (problems and results). In E.B. Sa and R.S. Varga, editors, Pade and Rational Approximations:
Theory and Applications, Academic Press, New York, 289-302, 1977.
7. D.S. Lubinsky, E.B. Saff, Uniform and mean approximation by certain weighted polynomials, with applications, Constr. Approx., 4:21-64, 1988.
8. D.S. Lubinsky, H.N. Mhaskar, E.B. Saff, Freuds conjecture for exponential
weights, Bull. Amer. Math. Soc., 15:217-221, 1986.

Eli Levin and Edward B. Sa

9. O.G. Parfenov, Estimates of singular numbers of the Carleson embedding

operator, Math. USSR Sbornik, 59:497-514, 1986.
10. V.A. Prokhorov, Rational approximation of analytic functions, Mat. Sb,
184:3-32, 1993, English transl. Russian Acad. Sci. Sb. Math. 78, 1994.
11. T. Ransford, Potential Theory in the Complex Plane, Cambridge University
Press, Cambridge, 1995.
12. E.B. Saff, V. Totik, Logarithmic Potentials with External Fields, SpringerVerlag, New-York, 1997.
13. E.B. Saff, R.S. Varga, The sharpness of Lorentzs theorem on incomplete
polynomials, Trans. Amer. Math. Soc., 249:163-186, 1979.
14. H. Stahl, General convergence results for rational approximation, in: Approximation Theory VI, volume 2, C.K. Chui et al. (eds.), Academic Press, Boston,
605-634.
, Orthogonal Polynomials, volume 23 of Colloquium Publications,
15. G. Szego
Amer. Math. Soc., Providence, R.I., 1975.
16. V. Totik, Weighted polynomial approximation for convex external elds, Constr. Approx., 16:261-281, 2000.
17. J.L. Walsh, Interpolation and Approximation by Rational Functions in the
Complex Plane, volume 20 of Colloquium Publications, Amer. Math. Soc., Providence, R.I., 1960.
18. N. Young, An Introduction to Hilbert Space, Cambridge University Press, Cambridge, 1988.

Good Bases
Jonathan R. Partington
School of Mathematics,
University of Leeds,
Leeds LS2 9JT, U.K.
J.R.Partington@leeds.ac.uk

1 Introduction
There are two standard approaches to nding rational approximants to a given
function. The rst approach, which we shall review in this paper, is to employ
a basis of possible functions (interpreted in a fairly loose sense) such that the
possible rational approximants are linear combinations of the basis functions,
and thus given by a simple parametrization. In this situation it is required
to choose the most appropriate parameters or coordinates. An alternative,
which we shall not discuss, is the situation when the possible approximants
are not linearly parametrized: this is seen in Pade approximation, Hankelnorm approximation, and similar schemes.
Thus the theme of this paper is to describe some families of bases that have
been found to be particularly useful in problems of approximation, identication, and analysis of data. The techniques employed are mostly Hilbertian;
even in a comparatively simple Banach space such as the disc algebra (the
space of functions continuous on the closed unit disc and analytic on the open
disc), the technical problems involved in constructing bases well-adapted to
the given norm are much more complicated. In this case the functions constructed also tend to have a much less natural appearance, and seem to be of
mainly theoretical interest.
Our material divides naturally into two sections. In Section 2 we shall
explore situations when we have an orthonormal basis of rational functions and
can use inner-product space techniques, such as least squares. Then in Section
3 we review the theory of wavelets, where the basis functions are obtained
by translation and dilation of one xed function: under these circumstances
rational approximation is most usefully achieved in the context of frames,
which are a convenient generalization of orthonormal bases.

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 95102, 2006.
Springer-Verlag Berlin Heidelberg 2006

Jonathan R. Partington

2 Orthogonal polynomials and Szeg

o bases
2.1 Functions in the unit disc
We recall that the Hardy space H 2 (D) is the Hilbert space consisting of all

functions f (z) = n=0 an z n analytic in the unit disc D, such that f 2 =

2
2
n=0 |an | < . These functions have L boundary values on the unit circle
T, and we have
f

1
2

|f (eit )|2 dt.

The inner product is

an z n ,

n=0

bn z n

n=0

a n bn .
n=0

The functions {1, z, z 2 , z 3 , . . . } provide a simple orthonormal basis of

H (D) and of course they connect very well with the theory of Fourier series, since a 2-periodic function g(t) may be identied with a function on the
circle, by writing g(t) = f (eit ).
Now let w be a positive continuous (weight) function dened on T. Then
it is possible to dene a new inner product on H 2 (D), by
2

f, g

1
2

2
0

f (eit )g(eit )w(eit ) dt.

With a view to analysing certain questions of weighted approximation, one can

construct a new sequence (gn )n0 of polynomials, dened to be orthonormal
with respect to the inner product . , . w . These can be obtained by applying
the GramSchmidt procedure to {1, z, z 2 , z 3 , . . . }.
It is clear that deg gn = n for all n, and that for any f H 2 (D), the
polynomial p = pN of degree N that minimizes the quantity
1
2
N

2
0

|f (eit ) p(eit )|2 w(eit ) dt

is simply pN = k=0 f, gk w gk .
These orthogonal functions are sometimes known as Szeg
o polynomials.
Indeed, it was Szeg
o who made the rst systematic study of the asymptotic
properties of such polynomials; he also looked at the convergence of expansions
of analytic functions in orthogonal polynomials, and studied the location of the
zeroes of such polynomials (in the situation described above, all the zeroes
of gn lie in the open unit disc). Moreover, Szeg
os work goes further and
includes an analysis of the behaviour of functions orthogonal with respect to
a line integral along a general curve in the plane.

Good Bases

One particular case in which the orthogonal polynomials can be calculated

very simply is when v(eit ) := w(eit )1 is a positive trigonometric polynomial.
In that case, by a theorem of Fejer and Riesz, v has a spectral factorization
as v(eit ) = |h(z)|2 , where h is a polynomial in z = eit having no zeroes in the
unit disc. It can easily be veried that
gn (z) = z n h(z 1 )

for

n deg h,

where h denotes the polynomial whose coecients are the complex conjugates
of the coecients of h; thus we have an explicit expression for all but a nite
number of the gn . The remaining ones are easy to calculate as well.
We shall now consider bases of more general rational functions in H 2 (D).
Let (zn )
n=1 be a sequence of distinct points in the unit disc satisfying

(1 |zn |) = ,
n=1

which implies that the only function f H 2 (D) such that f (zn ) = 0 for all n
is the identically zero function. (The negation of this condition is called the
Blaschke condition.)
2
We dene the Malmquist basis (gn )
n=1 in H (D) by
g1 (z) =

(1 |z1 |2 )1/2
,
1 z1z

gn (z) =

(1 |zn |2 )1/2
1 znz

and
n1
k=1

z zk
,
1 zk z

for

n 2.

Note that each gn has zeroes in the disc at z1 , . . . , zn1 and poles outside
the disc. In fact the functions (gn ) form an orthonormal basis for H 2 (D). The
Fourier coecients of a function f with respect to this basis are given by
interpolation at the points (zn ), since

f (zm ) =

f, gn gn (zm ),
n=1

for each m, and we observe that gn (zm ) = 0 if n > m, and so we have the
following formulae, which are a form of multi-point Pade approximant:
f (z1 ) = f, g1 g1 (z1 ),
f (z2 ) = f, g1 g1 (z2 ) + f, g2 g2 (z2 ),
f (z3 ) = f, g1 g1 (z3 ) + f, g2 g2 (z3 ) + f, g3 g3 (z3 ),

Jonathan R. Partington

and so on. Indeed, the Malmquist basis can also be obtained by applying the
GramSchmidt procedure to the reproducing kernels kzn (z) = 1/(1 z n z),
which satisfy f (zn ) = f, kzn for f H 2 (D).
Thus if we want the best rational H 2 (D) approximant to f with poles at
1/z 1 , . . . , 1/z n , then the above interpolation procedure explains how to nd
it.
2.2 Functions in the right half-plane
Recall that H 2 (C+ ) consists of all analytic functions F : C+ C such that
F

sup
x>0

|F (x + iy)|2 dy

1/2

< ,

(roughly speaking, functions analytic in the right half-plane, with L2 boundary

values). These are the Laplace transforms of functions in L2 (0, ).
We cannot use polynomial approximation this time, since there are no
non-constant polynomials in the space. However, two simple rational bases
are of interest, namely the Laguerre basis (with poles all at one point), and
the Malmquist basis (with poles all at dierent points).
To construct the Laguerre basis, we x a number a > 0 and write
ek (s) =

a (a s)k
,
(a + s)k+1

k = 0, 1, . . . .

These are a natural analogue of {1, z, z 2 , . . . } in H 2 (D). Note that the functions ek all have zeroes at a and poles at a. Moreover, they form an orthonormal basis of H 2 (C+ ). Their inverse Laplace transforms form an orthogonal
basis of L2 (0, ) and have the form
fk (t) = pk (t)eat ,
where pk is a polynomial of degree k. In fact
pk (t) =

a
Lk (2at),

where Lk denotes the Laguerre polynomial

Lk (t) =

et dk k t
(t e ).
k! dtk

Alternatively some people use Kautz functions, which are more appropriate
for approximating lightly damped dynamical systems. These have all their
poles at two complex conjugate points: the approximate models have the form
(s2

p(s)
,
+ bs + c)m

Good Bases

where p is a polynomial.
It is also possible to construct Malmquist bases in the half-plane using the
reproducing kernel functions for H 2 (C+ ). Recall the dening formula for a
reproducing kernel, namely
f (sn ) = f, ksn .
In this case the reproducing kernel functions have the formula
ksn (s) =

1
.
2(s + sn )

The Malmquist basis functions for the right half-plane are given by
g1 (s) =

1 (Re s1 )1/2
s + s1

gn (s) =

1 (Re sn )1/2
s + sn

and
n1
k=1

s sk
,
s + sk

for n 2.

In some examples from the theory of linear systems, an approximate location of the poles of a rational transfer function is known, and these techniques
enable one to construct models with poles in the required places. In the next
section we shall see how wavelet theory enables one to gain further insight
into the local behaviour of functions.

3 Wavelets
3.1 Orthonormal bases
One of the purposes of wavelet theory is to provide good orthonormal bases
for function spaces such as L2 (R). These basis functions are derived from a
single function by taking translated and dilated versions of it, and will be
denoted (j,k )j,kZ , where the parameter j controls the scaling and k controls
the positioning. Thus the inner product f, j,k gives information on f at
resolution j and time k. One may compare classical Fourier analysis,
where the Fourier coecients
1
f(k) =
T

T
0

f (t)e2ikt/T dt = f, ek ,

say,

give us information about f at frequency 2k.

To illustrate this, we construct the Haar wavelets. Let V0 L2 (R) be the
closed subspace consisting of all functions f that are constant on all intervals

100

Jonathan R. Partington

(k, k + 1), k Z. Let (t) = (0,1) (t) and k (t) = (t k) for k Z. Then
(k )kZ is an orthonormal basis of V0 . Any function f V0 has the form

f, k k ,

f=
k=

and
f

|f (t)|2 dt =

| f, k |2 .

Next, for j Z, let Vj be the space of functions constant on all intervals

(k/2j , (k + 1)/2j ), k Z. Functions in Vj have steps of width 2j . Then we
have a chain of subspaces,
. . . V2 V1 V0 V1 . . . .
Also

Vj = L2 (R) and

Vj = {0}. Now we have a rescaling property,

f (t) Vj f (2j t) V0 ,

and Vj has orthonormal basis consisting of the functions 2j/2 (2j t k) for
k Z. Any chain of subspaces with these properties is called a multi-resolution
approximation or multi-resolution analysis of L2 (R).
We cannot directly use the functions as an orthonormal basis of L2 (R),
and one new trick is needed. We build the Haar wavelet, which is a function
bridging the gap between V0 and V1 .
We dene the Haar wavelet by
(t) = (2t) (2t 1) = (0,1/2) (t) (1/2,1) (t).
The functions k (t) = (t k), k Z, form an orthonormal basis for a space
W0 such that V0 W0 = V1 (orthogonal direct sum). Then Vj Wj = Vj+1 ,
where Wj has orthonormal basis
j,k (t) = 2j/2 (2j t k),

k Z.

Finally
L2 (R) = . . . W2 W1 W0 W1 . . .
and has orthonormal basis (j,k )j,kZ . Hence, if f L2 (R), we have

f, j,k j,k ,

f=
j= k=

converging in L2 norm.

Good Bases

101

In the construction sketched above, the j,k are very simple functions, but
they are all discontinuous. By working harder, one may obtain wavelets that
are better adapted to approximation problems.
Here is a list of the wavelets most commonly seen in the literature. To
obtain good properties of and its Fourier transform is not straightforward,
and the following are listed in (approximately) increasing order of diculty.1

Wavelet
Properties of (t)
Properties of (w)
Haar
C.S., discontinuous
O(1/w), C

LittlewoodPaley
O(1/t), C
C.S., discontinuous
Meyer
Rapidly-decreasing, C C.S., can be C
O(1/wk ), C
BattleLemarie Rapidly-decreasing, C k
k
Daubechies
C.S., C
O(1/wk ), C
3.2 Frames
For rational approximation, orthogonal wavelets are not so useful, and we
settle for something weaker. A frame (j,k ) in a Hilbert space H is a sequence
for which there are constants A, B > 0 such that
A f

| f, j,k |2 B f

for all f H.

j,k

This is a weaker notion than an orthonormal basis (in a nite-dimensional

vector space it would correspond to a nite spanning set), but an element
f H can be reconstructed from its frame coecients, the numbers f, j,k .
Namely, there exist dual functions (j,k ) such that every f H has the
representation
f=

f, j,k j,k .
j,k

The (j,k ) also form a frame, the dual frame.

There is a general condition, due to Daubechies, which shows that the
following two examples, and many others, produce frames.
If we take
(t) = (1 t2 )et

the Mexican hat function (the puzzled reader is invited to sketch it), then
the functions j,k (t) = 2j/2 (2j t k), with j, k Z, form a frame for
L2 (R); these were used by Morlet in the analysis of seismic data.
1

In the table C.S. is short for Compact support

102

Jonathan R. Partington

An example involving rational functions can be found by taking (t) =

(1 it)n , with n 2, which leads to frames in H 2 (C+ ), the Hardy space
of the upper half-plane (and thus, by conformal mapping, one obtains
rational frames in H 2 (D)). We easily obtain rational frames in H 2 (C+ ),
i.e., functions of the form
2j/2 (1 + 2j t + ik)n .
Their poles lie on a dyadic lattice in the right half-plane, and they have
been used for approximation purposes in linear systems theory.
One advantage of such frames over orthonormal bases is their built-in redundancy; they represent a function that is, in general, less sensitive to errors
or perturbations in the frame coecients. It is this property that makes frames
so useful in problems of reconstruction, as well as in certain approximation
problems. Certainly non-orthogonal expansions (non-harmonic Fourier series) have shown themselves to be an ecient alternative to more traditional
methods within the last few years; nowadays, a familiarity with both methods
is essential in many branches of analysis and its applications.

References
1.
2.
3.
4.

I. Daubechies, Ten lectures on wavelets. SIAM, 1992.

P.J. Davis, Interpolation and approximation, Dover, 1975.
G. Kaiser, A friendly guide to wavelets. Birkh
auser, 1994.
J.R. Partington, Interpolation, identication, and sampling. The Clarendon
Press, Oxford University Press, 1997.
, Orthogonal polynomials. American Mathematical Society, New York,
5. G. Szego
1939.
6. J.L. Walsh, Interpolation and approximation by rational functions in the complex domain. American Mathematical Society, New York, 1935.

Some Aspects of the Central Limit Theorem

and Related Topics
Pierre Collet
Centre de Physique Theorique
CNRS UMR 7644 Ecole Polytechnique
F-91128 Palaiseau Cedex (France)
collet@cpht.polytechnique.fr

1 Introduction
Very often the observation of natural phenomena leads to an average trend
with uctuations around it. One of the most well known example is the observation by Brown and others of a pollen particle in water. The particle is
subject to many collisions with water molecules and an average behaviour
follows by the law of large numbers. Here the average velocity of the particle
is zero, and the particle should stay at rest. However the observation reveals
an erratic motion known as Brownian motion. The goal of the central limit
theorem (abbreviated below as CLT) and the related results is to study these
uctuations around the average trend.
The CLT is historically attributed to De Moivre and then to Laplace for
a more rigorous study. The original argument is interesting for its relation to
Statistical Mechanics and we will come back to this approach several times. I
will therefore briey present this argument although it is not the most ecient
approach nowadays.
Consider a game of head or tail. One performs independent ips of a coin
which has a probability p to display head and q = 1 p to display tail. We
assume 0 < p < 1 and leave to the reader the discussion of the extreme cases.
One performs a large number n of independent ips and records the number
N (n) of times the coin displayed head. This is equivalent to a simple model of
Statistical Mechanics of n uncoupled 1/2 spins in a magnetic eld. The law of
large numbers gives the average behaviour of N (n) for large n. Namely, with
probability one
lim

N (n)
=p.
n

In other words, if the number of ips n is large, we typically observe np times

the coin displaying head and n(1 p) times the coin displaying tail. However
the law of large numbers only tells us that (N (n) np)/n tends to zero with
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 105127, 2006.
Springer-Verlag Berlin Heidelberg 2006

106

Pierre Collet

probability one. This does not say anything about the size of N (n) np,
namely the uctuations.
Since the ips are independent, the probability that a sequence of n ips
gives r heads (and hence n r tails) is pr q nr . Therefore we obtain
n r nr
p q
.
r

P N (n) = r) =

(1)

In particular, since the events {N (n) = r} are mutually exclusive and one of
them is realized, we have
n

P N (n) = r) =
r=0

r=0

n r nr
p q
.
r

It turns out that relatively few terms contribute to this sum. By the law of
large numbers, only those terms with r np contribute. More precisely,using
Stirlings approximation, one gets the following result for r np = O n
2

e(rnp) /(2npq)

P N (n) = r) =
+O
2npq
We will discuss later on in more detail the case |r np|
turn out that the event

|N (n) np|
O n

1
n

.
O

(2)

n and it will

has a probability which tends to zero when n tends to innity.

Coming back to formula (2), we see that since the Gaussian function decays

very fast, the number r np should be of order npq. In other words,

we now
have the speed of convergence of N (n)/n toward p (of the order of 1/ n),
or
equivalently the size of the uctuations of N (n) np (of the order of n).
This can be measured more precisely by the variance of N (n) which is equal
to npq
2
2

r P N (n) = r

Var N (n) =
r

rP N (n) = r

= npq .

Notice that for p xed, with a probability very near to 1 (for large n), the
observed sequence of heads and tails satises
r = np + O

npq .

All these sequences have about the same probability enh and their number
is about enh where h is the entropy per ip
h = p log p + q log q) .

Some Aspects of the Central Limit Theorem and Related Topics

107

Using formula (1), the reader can give a rigorous proof of these results (see
also [21]).
A more modern and more ecient approach to the CLT is due to Paul
Levy. This approach is based on the notion of characteristic function. Before
we present this method, we will briey recall some basic facts in probability
theory and introduce some standard notations.

2 A short elementary probability theory refresher

We will give in this section a brief account of probability theory mostly to
x notations and to recall the main denitions and results. Fore more details
we refer the reader to [13], [24], [25], [36], [17] and to the numerous other
excellent books on the subject.
A real random variable X is dened by a positive measure on R with total
mass one. In other words, for any (measurable) subset A of R, we associate a
number (weight) P(A) which is the probability of the event {X A} (X falls
in A). We refer to the previously mentioned references for a discussion of the
interpretation of a probability. If A and B are two disjoint subsets of R, we
have P (A B) = P(A)+P(B). This means that the probability of occurrence
of one or the other of the two events is the sum of their probabilities (it is
important that they are mutually exclusive for this formula to hold, namely
that A and B are disjoint). If f is a function of a real variable, f (X) is also
a random variable, and we have
P f (X) A = P f 1 (A)
where since f may not be invertible, f 1 (A) is dened as the set of points
whose image by f is in A, namely
f 1 (A) = x f (x) A .
The expectation of f (X) denoted by E f (X) is dened by
E f (X) =

f (x)dP(x) .

In particular
P f (X) A = E 1f (X)A

where 1C (y) is the function equal to 1 if y C and zero otherwise. The

variance of f (X) denoted by Var f (X) is dened by
Var (f (X)) = E f (X)2 E f (X)

f (X) E(f (X))

108

Pierre Collet

Observe that the variance is always non-negative, and it is equal to zero if

the random variable f (X) is constant except on a set of measure zero. The
standard deviation is the square root of the variance. For f (x) = x we obtain
the average and the variance of the random variable X.
The law of X (also called its distribution) is the function FX (x) =
P((, x]). It is equivalent to know the probability P or the function FX .
The characteristic function of X is dened by
X (t) = E eitX

It is nothing else than the Fourier transform of P.

We say that a property of X is true almost surely if it holds except on a
set of probability zero. For example consider the Lebesgue measure on [0, 1]
and the random variable X(x) = x. Then the value of X is almost surely
irrational.
We recall that the measure P can be of dierent nature. It can be a nite
(or countable) combination of Dirac (atomic) point mass measures. In this case
X takes only a nite (or countable) number of dierent values. The measure
P can also be absolutely continuous with respect to the Lebesgue measure,
namely have a density h which should be a nonnegative integrable function
(of integral one)
dP(x) = h(x)dx .
There are other types of measures called singular continuous (like the Cantor measure described in section 7), which are neither atomic nor absolutely
continuous with respect to the Lebesgue measure. The general case is a combination of these three types of measure but we emphasize that a probability
measure is always normalized to have total mass one (and is always a positive
measure).
We often have to consider several real random variables X1 , . . . , Xn at
the same time. In order to describe their joint properties, we need to extend
slightly the above denition. One considers a set (equipped with a measurable structure) and a positive measure P of total mass one on this set (we
refer to [15] for all measurability questions). In order to complete the link with
the previous denition, the above simple denition corresponds to = R. We
now dene as a real random variable Y as a (measurable) real valued function
on . Dene a measure on R by
A = P Y 1 (A)
where Y 1 (A) is as before the set of points in such that Y () belongs
to A. We leave to the reader the easy exercise to check that is a probability
measure, and if Y is the function Y (x) = x we recover the above denition.
The set can be chosen in various ways, for n real random variables
X1 , . . . , Xn (and not more) it is convenient to take = Rn , P is now a

Some Aspects of the Central Limit Theorem and Related Topics

109

measure of total mass one on this space. Two real random variables X1 and
X2 are said to be independent if for any pair A and B of (measurable) subsets
of R we have
P

X1 A X2 B

X1 A

X2 B

Equivalently, X1 and X2 are independent if for any pair of real (measurable)

functions f and g we have
E f X1 g X 2

= E f X1

E g X2

3 Another proof of the CLT

Consider a sequence X1 , X2 , . . . of real random variables independent and
identically distributed (with the same law). This is often abbreviated by i.i.d.
Denote by the common average of these random variables and by their
common standard deviation (both assumed to be nite and > 0).
As in section 1, by the law of large numbers, the sum
n

Sn =

Xj
j=1

behaves like n for large n. It is therefore natural to subtract this dominant

term and to consider the sequence of random variables Sn n. In order to
understand the behaviour of this random variable for large n, it is natural
to look for a normalization (a scale), namely for a sequence of numbers an
such that Sn n /an stabilizes to something nontrivial (i.e. non-zero and
nite). Of course, if (an ) diverges too fast, Sn n /an tends to zero, while
if (an ) diverges too slowly the limit will be almost surely innite.
The method of characteristic functions to prove the CLT is based on the
asymptotic behaviour of the sequence of functions
n (s) = E eis

Sn n /an

A very important result of Paul Levy is the following.

Theorem 1. If for any real number s, we have n (s) (s), and the function (s) is continuous at s = 0, then is the Fourier transform of a probability measure (on R), and
P

Sn n
x
an

(, x] .

110

Pierre Collet

We refer to [24] or other standard probability books for a proof. We will say
that a sequence n of probabilities on R converges in law to the probability
if for any real number x we have
lim

n ((, x])

= ((, x]) .

This implies (see [31]) that for any (measurable) set B such that (B) = 0,
lim n (B) = (B) .

In other words, Levys Theorem relates the convergence in law to the convergence of the characteristic functions.
In order to be able to apply Levys Theorem, we have to understand the
behaviour of n (s) for large n. Since the random variables X1 , X2 , . . . are
independent, we have
n (s) = (s/an )n
where
(s) = E eis

If we assume that the numbers an diverge with n, we have for any xed s
(s/an ) = 1

s2 2
+o
2a2n

1
a2n

This estimate follows from the Lebesgue dominated convergence theorem. It

also follows more easily if the fourth order moment is nite. We now see that
except for a xed change of scale, there is only one choice of the sequence an
(more precisely of its
asymptotic behaviour) for which we obtain a non-trivial
limit, namely an = n. Indeed, with this choice we have
n (s) =

s2 2
+o
2a2n

1
a2n

2 /2

We now observe that

2 /2

1
2

eix ex

/(2 2 )

and we can apply the above Levy theorem which proves the following version
of the CLT.
Theorem 2. Let Xj be a sequence of i.i.d. real random variables with mean
(nite) and standard deviation (nite and non-zero). Then, for any real
number x,
lim P

Sn n
x
n

1
=
2

dy .

Some Aspects of the Central Limit Theorem and Related Topics

111

4 Some extensions and related results

Unless otherwise stated, we will all along this section consider a sequence of
i.i.d. real random variables Xj . We will denote by their common average
(assumed to be nite), and by 2 their common variance (assumed to be nite
and non-zero). The sequence Sn of partial sums is dened by
n

Sn =

Xj .
j=1

4.1 Other proofs of the CLT

We have already seen the combinatorial proof of De Moivre and the proof
using characteristic functions. There are many other proofs, for example the
proof due to Lindeberg based on a semi-group idea (see [13]), the proof of
Kolmogorov also based on a semi-group idea (see [6]), the so-called Stein
method (see [32]), and many others. A useful extension deals with the case of
independent random variables but with dierent distributions. In this context
one has the well known Lindeberg-Feller theorem (see [13]).
Theorem 3. Let Xj be a sequence of independent real random variables
with averages j (nite) and variances j2 (nite and non-zero). In other
words
E Xj = j ,

j =

Var Xj =

E Xj2 E (Xj ) .

Let
s2n =

j2 .

j=1

If for any t > 0

1
n s2
n

lim

E Xj2 1

j=1

Xj >tsn

=0,

then Sn j=1 j /sn converges in law to a Gaussian random variable with

zero mean and unit variance.
4.2 Rate of convergence in the CLT
One can control the rate of convergence if something is known about moments
higher than the second one. A classical result is the Berry-Esseen theorem for
i.i.d. real random variables.

112

Pierre Collet

Theorem 4. Let
integer n
sup P
x

= E Xj

< , then for any real number x and for any

1
Sn n

x
n
2

33
.
4 3 n

This result can be used for nite n if one has information about the three
numbers , and . If one assumes that higher order moments are nite, one
can construct higher order approximations. They involve Hermite functions
(Edgeworth expansion). We refer the reader to [13] and [4] for more details.
4.3 Other types of convergence
A rst result is the so-called local CLT which deals with the convergence of
probability densities (if they exist). The simplest version is as follows.
Theorem 5. If the common characteristic function of the real i.i.d. random
variables Xj is summable (its modulus is integrable), then for any integer n

the random variable Sn n /( n) has a density fn (its distribution has

a density with respect to the Lebesgue measure), and we have uniformly in x
2
1
lim fn (x) = ex /2 .
n
2
Stronger versions of this result have been proved recently. We refer the reader
to [2] for the convergence in the sense of Fischer information and in the sense
of relative entropy.
One may wonder if a stronger form of convergence may hold. For example
one could be tempted (as is often seen in bad texts) to formulate the CLT
by saying that there is a Gaussian random variable with zero average and
variance unity such that

Sn n

= + n
n

(3)

with n 0 when n tends to innity. This is not true in general, one can
consider for example the case of i.i.d. Gaussian random variables and use the
associated Hilbert space representation. This only holds in the weaker sense of
distributions as stated above. There is however a so-called almost sure version
of the CLT. In some sense it accumulates all the information gathered for the
various values of n. A simple version is as follows.
Theorem 6. For any real number x, we have almost surely
1
n log n

lim

j=1

Sj j
1

x
j
j

1
=
2

ds .

where (y) is the Heaviside function which vanishes for y < 0 and equals 1 for
y > 0. We refer to [3] for references and a review of the results in this domain.
We only emphasize that 1/j is essentially the unique weight for which the
result holds.

Some Aspects of the Central Limit Theorem and Related Topics

113

4.4 Bounds on the uctuations

A classical theorem about uctuations is the law of iterated logarithms which
gives the asymptotic size of the uctuations.
Theorem 7. Assume that for some > 0, 0 < E [Xj |2+ < . Then we
have almost surely
lim sup
n

Sn n
n 2 log log n 2

1/2

=1.

There is of course an analogous result for the lim inf. We refer the reader to
[33] for a proof and similar results.
4.5 Brownian motion
It is also quite natural to study the sequence Sn n as a function of n and
to ask if there is a normalization of the sequence and of the time (n) such that
one obtains a non-trivial limit. Let n (t) be the sequence of random functions
of time (t) dened by
n (t) =

[nt]

Xj ,
j=1

where [ ] denotes the integer part. This function is piecewise constant and has
discontinuities for some rational values of t. One can also interpolate linearly
to obtain a continuous function. Note that this is a random function since it
depends on the random variables Xj . More generally, a random function on
R+ (or R) is called a stochastic process.
An important result is that this sequence of processes converges to the
Brownian motion in a suitable sense. We recall that the Brownian motion
B(t) is a real valued Gaussian stochastic process with zero average and such
that
E Bt Bs = min{t, s} .
We refer the reader to [5] or [14] for the denition of convergence and the proof.
We refer to [37] for the description of the original experimental observation
by Brown.
A related result is connected with the
question of convergence of the sequence of random variables Sn n / n to a Gaussian random variable.
This is the almost sure invariance principle.
Theorem 8. For any sequence Xj of i.i.d. real random variables with zero
average, non-zero variance 2 , and such that for some > 0,
E [X1 |2+ < ,

114

Pierre Collet

there exists another (enriched) probability space with a sequence Sn or real

valued random variables having the same joint distributions as the sequence

and two constants C > 0 and 0 < < 1/2

Sn , a Brownian motion B(t),
such that almost surely

Sn n B(n)
Cn1/2 .
In other words, there exists on this other probability space an integer valued
such that for any n > N
the above inequality is satised.
random variable N
Using the scaling properties of the Brownian motion, we have also (with C =
C/)
Sn n

C n .
B(1)
n
We see how this result escapes from the diculty mentioned about the formulation (3) by constructing in some sense a larger probability space which
contains the limit. We refer the reader to [28] or [33] for a proof.
We also stress an important consequence of the central limit theorem which
explains the ubiquity of the Brownian motion. A stochastic process (random
function) is called continuous if its realizations are almost surely continuous.
Theorem 9. Any continuous stochastic process with independent increment
has Gaussian increments.
We refer to [14] for a proof. It is also possible to express any such process in
terms of the Brownian motion. Indeed, if (t) is a continuous stochastic process
(with (0) = 0) with independent increments, there are two (deterministic)
functions e(t) and (t) such that
(t) = e(t) +

t
0

(s)dBs .

The integral in the above formula has to be dened in a suitable way since the
function Bs is almost surely not dierentiable. We refer to [14] for the details.
In the physics literature, the derivative of B (in fact a random distribution) is
known as a white noise. The independence of the increments reects the fact
that the system is submitted to a noise which is renovating at a rate much
faster than the typical rate of evolution of the system. We also refer to [37]
for more discussions on this subject.
4.6 Dependent random variables
There are many extensions of the CLT and of the above mentioned results to
the case of dependent random variables under dierent assumptions. A rst
diculty is that even for non-trivial random variables, the asymptotic variance

Some Aspects of the Central Limit Theorem and Related Topics

115

may vanish. Indeed, let (Yj ) be a sequence of real i.i.d. random variables with
nite non-zero variance 2 , and consider the sequence (Xj ) given by
Xj = Yj+1 Yj .
It is easy to verify that E Xj = 0 and the common variance is 2 2 > 0.
However, we have
n

Sn =

Xj = Xn+1 X1
j=1

which
implies that S2n / n converges in law to zero. Also, the variance of
Sn / n is equal to 2 /n and tends to zero when n tends to innity. We refer
to [19] for a general discussion around this phenomenon.
For
a sequence of non-independent random variables (Xj ), the variance of
Sn / n involves the correlation functions
Ci,j = C(Xi , Xj ) = E (Xi Xj ) E (Xi ) E (Xj ) .
If we moreover assume that the sequence is stationary (i.e. the joint distributions of Xi1 , . . . , Xik are equal to those of Xi1 +l , . . . , Xik +l for any k, i1 , . . . , ik
and any l > min{i1 , . . . , ik }), then Ci,j depends only on |i j|. In this case,
if as a function of |i j|, |Ci,j | is summable, then
lim E

Sn n
n

E
j=2

X1 ( Xj

.
(4)

Under slightly stronger assumptions on the decay of correlations (and some

other technical hypothesis) one can prove a CLT and many other related
results. We refer to [30] for the precise hypothesis, proofs and references.
A situation where non-independent random variables appear naturally is
the case of dynamical systems. Consider for example the map of the unit
interval f (x) = 2x (mod 1). It is easy to verify that the Lebesgue measure is
invariant ( f 1 (A) = (A)) and ergodic (the law of large numbers holds).
It is a probability measure on the set = [0, 1]. If g is a real function, one
denes a sequence of identically distributed real random variables Xj by
Xj (x) = g f j1 (x)
where f n denotes the nth iterate of f . Namely, f 0 (x) = x, and for any integer
n, f n (x) = f f n1 (x) . In general the random variables Xj are not independent. Note that here the randomness is only coming from the choice of the
initial condition x under the Lebesgue measure. In this context it is natural
to ask about the asymptotic uctuations of ergodic sum

116

Pierre Collet

1
Sn (x) =
n

g f

(x) =

j=1

Xj ,
j=1

and to wonder if there is a central limit theorem. In order to ensure that the
asymptotic variance does not vanish, one has to impose that g is not of the
form u u f and with this assumption one can prove a CLT. We refer to
[16] or [8] for the details and to [38] for more general cases.
Of course it may happen that even though the Xj have a nite variance,
the quantity (4) diverges. This is for example the case for some observables
in a second order phase transition in Statistical Mechanics. One should use a
non-trivial normalization to understand the uctuations. Some non-Gaussian
limiting distributions may then show up. We refer to [18] for a review of this
question in connection with probability theory.

5 Statistical Applications
The CLT is one of the main tool in statistics. For example it allows to construct
condence intervals for statistical tests. We refer to [7] for a detailed exposition
and many other statistical applications. There are also many results about
uctuations of empirical distributions, we refer to [35] for more on this subject.

6 Large deviations

The CLT describes the uctuations of order n of a sum of n random variables having zero average. One can also ask what would be the probability of
observing a uctuation of larger (untypical) size. For example, a giant uctuation (large deviation) which would provide a wrong estimate of the average
(i.e. an anomaly in the law of large numbers). There are many results in this
direction starting with Chernovs exponential bound. We will give some ideas
for the i.i.d. case, and refer to the literature for deeper results.
We will assume that for any real s, the random variable exp(sXj ) is integrable (existence of exponential moments). One can then dene the sequence
of functions
Zn (s) = E esSn

(5)

Using the i.i.d. property, it follows immediately that

Zn (s) = E esX1

(6)

Therefore we immediately conclude the existence of the limit

lim

1
log Zn (s) = P (s) = log E esX1
n

Some Aspects of the Central Limit Theorem and Related Topics

117

We now come back to (5). If we want to know (estimate) the probability of

the event Sn > n ( > 0), we can obtain an upper bound as follows using
Chebishevs inequality. Starting from (5) we have for any s
Zn (s) E esSn 1

ens P

Sn >n

Sn > n

Using (6) we have

Sn > n

sP (s)

being kept xed, we now choose s optimally. In other words, we take the
value of s minimizing s P (s). In doing so there appears the so called
Legendre transform of the function P dened by
() = sup s P (s) .
s

(7)

If the function P is dierentiable, the optimal s is obtained by solving the

equation
=

dP
(s) .
ds

(8)

One may wonder (and should wonder) if the solution is unique. It is easy to
see that P is a convex function. We leave to the reader the interesting exercise
of computing P (s) and P (s) and to interpret the results in particular for
s = 0. The solution of the problem (7) is unique unless P has ane pieces.
This occurs in statistical mechanics in the presence of phase transitions (we
refer to [22] for more details). Finally, we have
lim sup
n

1
log P
n

Sn > n

() .

With some more work, one can also obtain a lower bound. The following result
is due to Plachky and Steinebach.
Theorem 10. Let Wj be a sequence of real random variables and assume
that there exists a number T > 0 such that
i)
Zn (t) = E etWn <
ii)

for any 0 t < T and any integer n.

1
log Zn (t)
n n

P (t) = lim

exists for any 0 < t < T , is dierentiable on (0, T ), and P is strictly

monotone on the interval (0, T ).

118

Pierre Collet

Then for any

P (t) t 0, T )
we have
lim

1
log P (Wn > n) = ()
n

where
() = sup

t(0,T )

t P (t) .

We refer to [29] for a proof. In the present context, one applies this result
with Wn = Sn n, or Wn = Sn + n to obtain information on the large
deviations in the other direction. In the case where = 0, it is an interesting
exercise to compute the rst and second derivatives of () in = 0 and to
relate at least intuitively the above result to the CLT.
We now give an application to the (easy) case of the game of head or tail
discussed in the introduction. Formula (1) already solves the problem in this
case, namely one gets easily for q > x > 0 using Stirlings formula
O(1)
P N (n) > n(p + x) e(qx) log(qx)+(p+x) log(p+x)(q) log q(p+x) log p .
n
In other words,
(p + x) = (q x) log(q x) + (p + x) log(p + x) (q x) log q (p + x) log p .
(9)
A similar formula holds for the large deviations below np. Let us recover this
expression using the large deviation formalism (this is essentially the original
Chernovs bound). We rst have to compute the partition function
n

Zn (s) = E es(N (n)np) = enps

esr P N (n) = r = enps (pes + q) .

r=0

This immediately implies

P (s) = log (pes + q) ps .
We therefore get
=

pes
p.
pes + q

After easy manipulations we obtain

() = ( + p) log( + p) + (q ) log(q ) ( + p) log p (q ) log q

Some Aspects of the Central Limit Theorem and Related Topics

119

which is identical to (9).

The initial paper by Lanford [22] is still a fundamental reference. In particular, it makes the connection with the formalism of Statistical Mechanics.
One can also refer to [29], [11], [10] and many other books and articles.
Note that these results are formulated in terms of the asymptotics of
1
log P
n

Sn > n

In other words, they dont say anything on the behaviour of en()

P Sn > n except that it should be sub-exponential. One can compare
for example with the more precise formula (9). For results in this direction
one can refer to [27], or [26]. This question also falls in the realm of recent
results concerning the so called concentration phenomenon (see [34]).

7 Multifractal measures
One among the numerous applications of large deviations is the analysis of the
multifractal behaviour of measures. We rst introduce briey this notion. In
order to simplify the discussion we will restrict ourselves to (positive) measures
on the unit interval, the extension to higher dimension being more or less
immediate. The driving question in the multifractal analysis of a (positive)
measure is what is the measure of a small interval. The simplest behaviour
that immediately comes to mind is that the measure of any interval of length
r could be proportional to r. More precisely, we will say that a measure is
monofractal if there is a number > 0 and two positive numbers C1 < C2
such that for any point x [0, 1] belonging to the support of and for any
r > 0 small enough
C1 r Br (x) C2 r

(10)

where Br (x) is the interval [xr, x+r] (or more precisely [xr, x+r][0, 1]).
The Lebesgue measure satises this property with = 1 (with C2 = 2 and
C1 = 1 because of the boundary points). The number is intuitively related
to a dimension. If one considers the Lebesgue measure in dimension two, one
gets a similar relation with exponent two, and this extends immediately to
any dimension. We will say more about this below. Another interesting case
is the Cantor set K. This set can be dened easily as the set of real numbers
in [0, 1] whose triadic expansion does not contain one. In other words

K=
j 3j j 0, 2
.

j=1

It is well known (see [12] or [23]) that this set has dimension log 2/ log 3. This
set can also be dened as the intersection of a decreasing sequence of nite

120

Pierre Collet

unions of closed intervals. Namely for any n 1 and for any nite sequence
1 , . . . , n of numbers 0 or 2, let
n

j 3j ,

x1 ,... ,n =
j=1

and
I1 ,... ,n = x1 ,... ,n , x1 ,... ,n + 3n .
It is left to the reader to verify that
K=

I1 ,... ,n .
n 1 ,... ,n

Since each interval I1 ,... ,n has length 3n , and there are 2n such intervals,
this almost immediately leads to the above mentioned fact that the (Hausdor) dimension of K is log 2/ log 3. Let us now dene a measure on K (the
Cantor measure) by imposing
I1 ,... ,n = 2n .
There are various ways to prove that this indeed denes a probability measure
supported by K. We refer to [12] or [23] for the details. We now check (10).
For x K and a given r > 0 (r < 1/3), let n be the unique integer such that
3n r 3n+1 . It is easy to check that there is a nite sequence 1 , . . . , n
of numbers equal to 0 or 2 such that
I1 ,... ,n Br (x)

and

Br (x) K I1 ,... ,n1 .

Therefore
2n Br (x) 2n+1
and we obtain an estimate (10) with = log 2/ log 3 (it is left to the reader
to compute the two constants C1 and C2 ). We see again a relation between
and the dimension. This is a general fact discovered by Frostman, namely
if (10) holds, the dimension of any set of non-zero measure is at least .
There is a converse to this result known as Frostmans Lemma. We refer to
[20] for the complete statement and a proof. We only sketch the proof of the
direct (easy) part. Recall (see [20], [12] or [23]) that the Hausdor dimension
of a set A is dened as follows. Let Brj xj be a sequence of balls covering
A, namely
Br j xj .

A
j

For a numbers d > 0 and > 0, dene

Some Aspects of the Central Limit Theorem and Related Topics

Hd () =

121

rjd .

inf

Aj Brj (xj ) , supj rj

This is obviously a non-increasing function of which may diverge when

tends to zero. Moreover, if the limit when
0 is nite for some d, it is
equal to zero for any larger d. This limit is also non-increasing in d. Moreover,
if it is nite and non-zero for some d, it is innite for any smaller d. The
Hausdor dimension of A, denoted below by dH (A) is dened as the inmum
of the set of d such that the limit vanishes (for this special d the limit may
be innite). This is also the supremum of the set of d such that the limit is
innite. Coming back to (10), let Brj xj be a sequence of balls covering a
set A. Using (10) we have immediately
(A)

Br j xj

rj .

Therefore, if (A) > 0,

lim H () > 0 ,

and hence the Hausdor dimension of A is at least .

Monofractal measures are relatively simple objects and one often encounters more complicated situations. The notion of multifractal measures originated from theoretical investigations on turbulence. There a whole spectrum
of values for the exponent is allowed. The multifractal analysis is devoted to
understanding the characteristics of the sets where the exponent has a given
value. For > 0 we dene the set
E =

log Br (x)
=
r0
log r

x lim

(11)

Roughly speaking, if x E , then Br (x) r . One way to say something

interesting about these sets is to compute their (Hausdor) dimension. Note
that for monofractal measures, these sets are all empty except one which is
the support of the measure. In the simplest generalization, all these sets have
measure zero except one which has full measure. The corresponding is called
in this case the dimension of the measure. We warn the reader that one can
construct wilder examples of measures where the set of with E > 0
is of cardinality larger than one, and even wilder examples where the sets E
are not well dened.
A way to obtain the Hausdor dimension of the sets E is to use the
thermodynamic formalism. This method works under some assumptions on
the measure for which we refer the reader to the literature (see for example
[9] or [1]). We rst consider a sequence An of partitions of the support of
with atoms of decreasing diameter. The simplest case is to use a partition
with atoms of equal size, for example p dyadic partition. We also assume that

122

Pierre Collet

the cardinality of An grows exponentially fast with n, more precisely that

there is a number > 0 such that
lim

1
log # An =
n

where #( ) denotes the cardinality. From now on we will only consider this
case. We then consider the sequence of partition functions at inverse temperature dened by
Zn () =

1
# An

(I) .
IAn

The parameter may be chosen positive or negative (this is a dierence with

standard Statistical Mechanics where temperature which is proportional to
the inverse of is non-negative). We now dene the pressure function P ()
(if it exists) by
1
log Zn () .
n n

P () = lim
Note that

d log Zn
(1) =
d

(I) log (I)

IAn

which is the entropy of with respect to the partition An . For = 1, the

quantity P ()/( 1) is often referred to as the Renyi entropy. At this point,
provided the function P is non-trivial, we can make the link with large deviations. For this purpose, we rst dene a sequence (Wn ) of random variables.
The values of Wn are the numbers log (I) (I An ) and the corresponding
probability is 1/# An . Of course, we should take care that (I) = 0. However, if (I) = 0 we can simply ignore the atom I since it does not belong to
the support of . We now immediately see that our partition function Zn () is
exactly the expectation appearing in hypothesis i) of Theorem 10. Therefore,
if the second hypothesis ii) of this Theorem is also satised, we get
# I An (I) en
1
log
n n
en
lim

= () .

Here we assume a little more than the conclusion of Theorem 10, namely that
instead of having information about those atoms I for which (I) > en , we
have information for (I) en . This follows easily if () is dierentiable
with non-zero derivative for this value of .
From this result we can come back to the Hausdor dimension of the sets
E . For this purpose, we will assume that there is a number (0, 1) such
that all atoms of An are intervals of length n (uniform partition). Therefore

Some Aspects of the Central Limit Theorem and Related Topics

using denition (11), we need about en en( log

E . In other words
inf

E j Brj (xj ) , supj rj

rjd

balls of radius

nd n( log ) n

123

to cover

If d > (( log ) )/ log , the above quantity tends to zero when n tends
to innity and we conclude that dH E (( log ) )/ log . Under some
bounded distortion properties, one can prove that this is also a lower bound
(see [9] and [1]).
As an easy example consider on the Cantor set K the measure dened
for 0 < p < 1 (and q = 1 p) by
I1 ,... ,n = p

n
j=1

j /2 n

n
j=1

j /2

When p = q = 1/2 we get the Cantor measure dened above which has
a trivial mono-fractal structure. From now on we assume p = q. Consider
the sequence of partitions An = I1 ,... ,n . It is easy to prove that the
multifractal formalism applies to this measure using the large deviation results
previously established. One gets since = log 2
P () = log p + q log 2 ,
and assuming for example p > q (the other case is left to the reader) it follows
that
s=

log log p log log q

,
log q log p

and one deduces immediately

log q log log q
log p log log p
+
log p log q
log p log q

() =

+ log 2 log log p log q .

Therefore, since
d H E =

= 1/3 we get = log 3 (see the denition (11)) and

log3 p + log log3 p +

log3 p log3 q

log3 q log log3 q

log3 p log3 q

+ log3 log3 p log3 q .

When we consider the graph of dH E ) in gure 1, there are four particularly interesting points. First of all there are the two extreme points

124

Pierre Collet

dH (E )

..
.....

.
.....

....

..
.....
.... ............
.......... ... ..............
.
.
.
.
.
.
.....
..
..
.....
.....
....
.....
..
...
....... ...
..
...
...... .
.
...
.
.
.....
...
...
...
.
.
.
.
...
.
.
..... ....
...
.
.
.
...
.
.
.
...
..... ....
...
...
...
.
.
.
.
...
....
..
..
..
.
.
.
...
.
.
.
.
.
.
.
.
.
...
..
.
.
...
.
.
.
.
.
..
.
.
.
.
.
.
..
.
.
.
.
.
.
...
.
.
.
..
...
.....
....
....
....
...
.
.
.
.
...
..
....
..
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
.
.
..
...
..
..
.
.
.
.
..
..
.
.
.
.
.
.
.
.
.
..
.
.
.
.
.
...
.
..
...
.....
..
..
....
...
.
.
.
.
.
.
.
.
...
.....
....
...
.
.
.
.
.
.
.
.
.
.
.
..
.
...
.
.

min

max

Fig. 1. The Hausdor dimension dH E ) as a function of for p = 2/3.

min = log3 p and max = log3 q where dH E vanishes (recall that p > q).
These correspond respectively to the largest and smallest measure of atoms
of xed size, namely for an atom I of An we have q n (I) pn , and there
is only one atom reaching each bound. The constraint log3 p log3 q
can also be deduced from the formula
=

ps log p + q s log q
.
ps + q s

There is a unique maximum for dH E at = log3 p+log3 q /2 which

gives dH E = log 2/ log 3, namely the Hausdor dimension of the support
K of . Note that it corresponds to the value of such that E is covered by
the largest number of atoms of An . However the total contribution of these
atoms to the weight of is asymptotically negligible, namely

(I) 2n en log 3 en( log 3) (2 pq)n

IAn , (I)en log 3

which tends to zero when n tends to innity since for p = 1/2 one has p(1p) <
1/4.
Finally there is the point H = p log3 p q log3 q where the slope is equal
to one. Since () = s, this gives s = 1. Note also that by the normalization
of the probability measure , we have for each
(I) 2n en log 3 en( log 3) 2n enP
IAn ,

(I)en log 3

Some Aspects of the Central Limit Theorem and Related Topics

125

where = log 3 . In particular we obtain immediately

d H E .
We can have equality only if P = log 2, and this must be a tangency.
This equation leads immediately to p + q = 1, and since p + q = 1 we
deduce = 1. From = P () we deduce = p log p + q log q, and an easy
calculation leads indeed to H log 3 = 1. The number dH EH = H
is called the dimension of the measure . This quantity is dened in general
as the inmum of the dimensions of sets of measure one. We see here that
because of the multifractal behaviour, this dimension is strictly smaller than
the dimension of the support (since p = q). We also see again a phenomenon
reminiscent of Statistical Mechanics. If we consider atoms of a given size only a
very small percentage contribute to the total mass of the measure. Moreover,
these atoms have about the same measure (roughly equal to the inverse of
their number).
For p = q = 1/2, the curve collapses to one point and we recover the
monofractal Cantor measure.
We refer to the literature (see for example [9] and [1]) for other examples
and extensions.
We also mention that although most of the sets E have zero measure,
they may be of positive (and even full) measure for another measure. This is
why in certain situations they may become important and in fact observable.

References
1. L. Barreira, Y. Pesin, J. Schmeling. On a general concept of multifractality: multifractal spectra for dimensions, entropies, and Lyapunov exponents.
Multifractal rigidity. Chaos 7:27-38 (1997).
2. A. Barron, O. Johnson. Fisher information inequality and the central limit
theorem. http://arXiv.org/abs/math/0111020.
ki. A universal result in almost sure central limit theory.
3. I. Berkes, E. Csa
Stochastic Process. Appl. 94:105-134 (2001).
4. R.N. Bhattacharya, R. Ranga Rao. Normal approximations and asymptotic
expansion. Krieger, Melbourne Fla. 1986.
5. P. Billingsley. Convergence of Probability Measures. John Wiley & Sons,
New York 1968.
6. A. Borovkov. Boundary-value problems, the invariance principle, and large
deviations. Russian Math. Surveys 38:259-290 (1983).
7. A. Borovkov. Statistique Mathematique. Editions Mir, Moscou 1987.
8. P. Collet. Ergodic properties of maps of the interval. In Dynamical Systems.
R. Bamon, J.-M. Gambaudo & S. Martnez editeurs, Hermann, Paris 1996.
9. P. Collet, J. Lebowitz, A. Porzio. The dimension spectrum of some dynamical systems. J. Statist. Phys. 47:609-644 (1987).
10. A. Dembo, O. Zeitouni. Large Deviation Techniques and Applications. Jones
and Bartlett, Boston 1993.

126

Pierre Collet

11. R.S. Ellis. Entropy, Large Deviations, and Statistical Mechanics. Springer,
Berlin 1985.
12. K.J. Falconer. The geometry of fractal sets. Cambridge Tracts in Mathematics, 85. Cambridge University Press, Cambridge, 1986.
13. W. Feller. An introduction to Probability Theory and its Applications I, II.
John Wiley & Sons, New York, 1966.
` la Theorie des Processus
14. I. Guikhman, A. Skorokhod. Introduction a
Aleatoires. Editions Mir, Moscou 1980.
15. P. Halmos. Measure Theory. D. Van Nostrand Company, Inc., New York,
N. Y., 1950.
16. F. Hofbauer, G. Keller. Ergodic properties of invariant measures for piecewise monotonic transformations. Math. Zeit. 180:119-140 (1982).
17. E.T. Jaynes. Probability Theory, The Logic of Science. Cambridge University
Press, Cambridge 2004.
18. G. Jona-Lasinio. Renormalization group and probability theory. Physics Report 352:439-458 (2001).
19. A. Kachurovskii. The rate of convergence in ergodic theorems. Russian Math.
Survey 51:73-124 (1996).
20. J.-P. Kahane. Some random series of functions. Cambridge University press,
Cambridge 1985.
21. A.I. Khinchin. Mathematical Foundations of Statistical Mechanics. Dover, New
York 1949.
22. O.E. Lanford III. Entropy and equilibrium states in classical statistical mechanics. In Statistical Mechanics and Mathematical Problems. A. Lenard editor,
Lecture Notes in Physics 20, Springer, Berlin 1973.
23. P. Mattila. Geometry of sets and measures in Euclidean spaces. Fractals and
rectiability. Cambridge Studies in Advanced Mathematics, 44. Cambridge University Press, Cambridge, 1995.
tivier. Notions fondamentales de la theorie des probabilites. Dunod,
24. M. Me
Paris 1968.
25. J. Neveu. Calcul des Probabilites. Masson, Paris 1970.
26. P. Ney. Notes on dominating points and large deviations. Resenhas 4:79-91
(1999).
27. V.V. Petrov. Limit Theorems of Probability Theory. Sequences of independent
random variables. Clarendon Press, Oxford 1995.
28. W. Philipp, W. Stout. Almost sure invariance principles for partial sums of
weakly dependent random variables. Memoirs of the AMS, 161:1975.
29. D. Plachky, J. Steinebach. A theorem about probabilities of large deviations
with an application to queuing theory. Periodica Mathematica 6:343-345 (1975).
30. E. Rio. Theorie asymptotique des processus aleatoires faiblement dependants.
Springer, Berlin 2000.
31. L. Schwartz. Cours dAnalyse de lEcole Polytechnique. Hermann, Paris 1967.
32. C. Stein. Approximate Computations of Expectations. IMS, Hayward Cal.
1986.
33. W. Stout. Almost Sure Convergence.Academic Press, New York 1974.
34. M. Talagrand. Concentration of measure and isoperimetric inequalities in

product spaces. Inst. Hautes Etudes

Sci. Publ. Math. 81:73-205 (1995).
35. W. van der Vaart, J. Wellner. Weak convergence and empirical processes :
with applications to statistics. Springer, Berlin 1996.

Some Aspects of the Central Limit Theorem and Related Topics

127

36. H. Ventsel. Theorie des Probabilites. Editions Mir, Moscou 1973.

37. N. Wax. Selected Papers on Noise and Stochastic Processes. Dover, New York
1954.
38. L.S. Young. Recurrence times and rates of mixing. Israel J. Math. 110:153-188
(1999).

Distribution of the Roots of Certain Random

Real Polynomials
Benedicte Dujardin
Departement Artemis, Observatoire de la C
ote dAzur,
BP 4229, 06304 Nice Cedex 4, France.
dujardin@obs-nice.fr

1 Introduction
Random polynomials appear naturally in dierent elds of physics, like quantum chaotic dynamics, where one has to study the statistical properties of
wavefunctions of chaotic systems and the distribution of their zeros [2, 11].
Our personal interest lies rather in their connection with noisy data analysis, especially in the context of the linear parametric modelization of random
processes [5, 3, 16] and the problem of the resonance recognition, i.e. the identication of the poles of rational estimators of the power spectrum, computed
from a measured sample of the signal.
In this contribution we address a probabilistic question concerning the real
and complex roots of certain classes of random polynomials, the coecients
of which are random real numbers. The roots of such polynomials are random
variables, real or complex conjugates, and one is interested in the mathematical expectation of their distribution in the complex plane, according to the
degree of the polynomial and the statistics of its coecients.
Because of mathematical simplicity, we study in section 2 the statistics
of the real roots of polynomials with real random Gaussian coecients. This
material is taken from the historical papers by Kac [8, 9] and subsequent works
[6, 4, 12]. In section 3 are introduced several directions of generalization; we
investigate the statistics of the roots in the whole complex plane, and introduce
the notion of generalized monic polynomials. We just give an outline of the
derivation of the density of complex roots by recalling the passage from the
real case to the complex one, the proof of which can be found in [13]. We use
this result in order to understand and characterize the behavior of the roots
in the two extreme cases, homogeneous random polynomials on the one hand,
monic polynomials with weak disorder on the other hand. We briey look
at the particular class of self-inversive random polynomials [2], whose roots
have an interesting behavior on the unit circle; the case of random complex
coecients is just mentioned in the conclusion.

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 129143, 2006.
Springer-Verlag Berlin Heidelberg 2006

130

Benedicte Dujardin

2 Real roots
The rst problem about random polynomials is the question of the average
number of real roots of a polynomial of degree n and was solved by Kac in the
40s [8] in the simple case of coecients ak independent identically distributed
(i.i.d.) with Gaussian probability density function (pdf) N(0, 1) of average zero
and variance 1. Let
n

ak z k

Pn (z) =

(1)

k=0

be a random polynomial, and Nn the number of dierent real roots, called

(n)
tk , of Pn . In the following it is assumed that the probability to have multiple
roots is negligible. Nn is the integral on the real axis of the counting measure
Nn

n (t)
k=0

(n)

(t tk ) = |Pn (t)| (Pn (t)),

(2)

the Jacobian being due to the change of independent variable in the Dirac
distribution. n (t) is actually the exact density of the roots for a given realization, and one can calculate its mathematical expectation
n (t)

E(n (t)) =

dP dP P(P, P ) |P | (P )

(3)

considering, for any xed t R, Pn (t) and Pn (t) as two correlated random
variables written P and P . Let us now make the hypothesis that the ak are
i.i.d. N(0, 1); since Pn (t) and Pn (t) are linear combinations of the ak , they are
themselves Gaussian variables with zero mean and joint pdf
P(P, P ) =

1
P
exp (P, P )C 1
P
2
det(C)

given the correlation matrix

(4)

E(P 2 ) E(P P )
.
E(P P ) E(P 2 )

Equations (3) and (4) lead to the average density

n (t)

dP |P | exp

E(P 2 )P
2

,
E(P 2 )

(5)

where = E(P 2 )E(P 2 ) E(P P )2 .

Replacing P and P by their respective values 0 ak tk and

obtain the exact formula of the average density of real roots

n
0

k ak tk1 we

Distribution of the Roots of Certain Random Real Polynomials

131

1
1
(n + 1)2 t2n

2
2
(1 t )
(1 t2n+2 )2

(6)

n (t)

1/2

Figure 1 shows the distribution of real roots for n = 10 and 100. The dotted
line is the theoretical density n (t) given by (6); it has two peaks centered
near 1 and these peaks get narrower when the degree of the polynomial
n increases. The black line is an histogram over 1000 realizations of the real
roots of random polynomials as dened by (1), and we see that the simulations
match the theory quite well.
10

n=10

theoretical density
histogram of real roots

n=100

_10

_100

-4

0
t

-2

-4

-2

0
t

Fig. 1. Density of real roots and histograms over 1000 realizations for polynomials
of degrees n = 10 and n = 100.

The number of real roots is then

Nn =

dt n (t).

(7)

Since the integrals on t and on P , P can be inverted, the average number of

real roots is
E(Nn ) =

n (t)

1
(n + 1)2 t2n

(1 t2 )2
(1 t2n+2 )2

1/2

(8)

2
1
2
{ln n + ln(2 )} E(Nn ) {ln n + ln 2 + 4 3} n N,

(9)

This integral is bounded by

so for large degrees the main term of E(Nn ) varies like 2 ln n, as illustrated
in Fig. 2 with numerical simulations.
Several other works have been carried out on this problem, relaxing the
hypothesis of independence or gaussianity of the coecients ak . Littlewood
and Oord [12] worked with uniform and bimodal distributions of the ak , and
proved that for large degrees the order of magnitude of E(Nn ) kept on growing
like 2 ln n.

132

Benedicte Dujardin
5

<Nn> : average number of real roots

2
average number of real roots
2/ {ln n + ln 2}

10
100
n : degree of the random polynomial

Fig. 2. Average number of real roots of Pn over 1000 realizations, as a function of

the degree of the polynomial n.

Edelman and Kostlan [4] developed another method based on geometrical

considerations that led them to Kacs formula (6). Their method, just like the
calculation above, can be generalized to correlated and non-centered Gaussian
coecients, as long as the joint pdf P(P, P ) is Gaussian. The average density
is not so easy to write, mainly because of the emergence of an error function
erf, but the asymptotic limit for E(Nn ) remains generally valid. Ibragimov
and Maslova [6] extended this result to any i.i.d. centered variables whose
probability laws belong to the basin of attraction of the normal law according
to the central limit theorem.
The logarithmic growth of the average number of real roots as a function
of the degree is thus a common feature of random real polynomials. Although
its Lebesgue measure is zero, the real axis of the complex plane, which is a
symmetry axis for the set of the roots, is a singularity for the distribution of
roots.

3 Complex roots
3.1 Complex roots
We are now interested in the average distribution in the whole complex plane
of the roots of the random polynomial Pn , at least in the limit n
1. The
same argument as in section 2 can be applied so we get an integral formula
for the roots density, with slight modications. The counting measure in the
plane is
(n)

n (z) =
k

(2) (z zk ) = |Pn (z)|2 (2) (Pn (z)),

z C.

(10)

The change between formul (2) and (10) is due to the transition from a 1dimensional space to a 2-dimensional space, and to the holomorphy of Pn . The

Distribution of the Roots of Certain Random Real Polynomials

133

notation is the same as in section 2, P and P being the random variables

that are the polynomial and its derivative at point z. Those quantities are
complex, since z is complex, so we actually have 4 random variables, the real
and imaginary parts of P and P , that we must take in account when writing
the average density of roots
n (z)

d2 P |P |2 P(0, 0, Re(P ), Im(P )).

(11)

The average number of roots in a domain C is the integral

E(Nn ()) =

d2 z

n (z).

(12)

3.2 Generalized monic polynomials

Motivated by the remarkable properties of the roots of Szeg
o polynomials,
Mezincescu et al. [13] became interested in the case of monic polynomials, and
more widely, in the class of random generalized monic polynomials, dened
as
n

Pn (z) = (z) +

ak fk (z),

z C.

(13)

k=0

, f0 , . . . , fn are holomorphic functions, and the ak are the real random coecients. We will later focus on two cases of particular interest: taking = 0
and fk = z k returns an homogeneous random polynomial as studied in section
2; taking (z) as a polynomial of degree n and fk = z k , we get a monic in
the classical sense polynomial.
With the hypothesis that the joint pdf of P and P is Gaussian, computing
the integral over P becomes possible and leads to the density of complex roots.
Let us suppose that the ak are i.i.d. N(0, 1); this hypothesis is not restrictive,
since a judicious choice of and fk allows one to reduce systematically the
coecients to zero-mean and same variance random variables.
Working with 4 random variables instead of 2 makes the calculations more
mathematically intensive but does not change the principle, so we just give
the nal result. Let us rst introduce some notations adapted to the problem
[13, 14, 16]. Let v and w be two complex vectors of dimension n, (v, w) is
dened as the 2 2 matrix
(v, w)

Re(v) Re(w) Re(v) Im(w)

Im(v) Re(w) Im(v) Im(w)

(14)

The transposed column vector of (f0 (z), . . . , fn (z)) and its derivative are written f and f is considered as a 2-dimensional vector (Re((z)), Im((z)))
of derivative . With those notations, the mathematical expectation of the
counting measure of the complex roots is, at the points where det(f , f ) = 0,

134

Benedicte Dujardin

1
2

det(f , f )

exp 2 (f ,f )

(15)

Tr[(f , f ) (f , f )(f , f )1 (f , f )] + (f , f )(f , f )1

3.3 Strong disorder limit: classical homogeneous polynomial

When = 0 and fk = z k , the polynomial is an homogeneous random polynomial of degree n. The average pdf of its complex roots can be explicitly
computed for any z = rei C \ R according to
1
2

1
det(f , f )

Tr[(f , f ) (f , f )(f , f )1 (f , f )].

(16)

This expression is a function of r, and n, plotted in Fig. 3 for n = 10 and

100. As one can see, the area of the plane where this function is not negligible
is an annulus around the unit circle that becomes narrower when the degree
n increases. This result is not valid on the real axis, where det(f , f ) = 0;
function (16) is equal to zero, as it appears on the plot, but we have already
seen in section 2 that the global measure has a singular component on the
real axis given by (6).

n=100

Density of complex roots of Pn

n=10

Im(z)
Im(z)

Re(z)

Fig. 3. 3-dimensional representation of the density of complex roots of an homogeneous random polynomial of degree n, for n = 10 (left) and 100 (right).

In the domain of the plane dened by

1 1 r2n+2
sin
ln r 1 + r2n+2

z = rei ,

(17)

i.e. close to the unit circle and far enough from the real axis, as shown in Fig. 4
for n = 10, and which corresponds to the interesting area of strong density,
the average density can be approximated by

Distribution of the Roots of Certain Random Real Polynomials

135

Interesting domain for the

study of the density of
complex roots

n=10

Fig. 4. 3-dimensional representation of the function (17) for n = 10.

n (re

1
1
(n + 1)2 r2n+2
.

2
2
(ln r )
(1 r2n+2 )2

(18)

With this approximation, n (z) is a function of the radial variable r only

and is independent of the angular variable , which implies a certain uniformity of the angular distribution of the roots. The curve, plotted in Fig. 5 for
n = 10 and 100, has a peak for r = 1.
This behavior is characterized by two asymptotic results in the limit n
1,
still valid in the case of an -stable distribution of the coecients [7]. The
rst theorem concerns the fraction of roots in a disc of radius R
1
n
E(Nn (D(0, R))) 0 in probability R < 1.
n

(19)

Since the ak are i.i.d., the statistics of the distribution of complex roots
is invariant under the transformation z z 1 , and (19) implies that most of
the roots are present in a neighborhood of the unit circle. The other result
concerns the fraction of roots in an angular sector [, ]
1
n | |
E(Nn (, ))
n
2

in probability [, ] ]0, [.

(20)

Because of the symmetry with regards to the real axis, and apart from is
singularity, formula (20) implies an angular uniform distribution. In Fig. 5 are
plotted, on the left, the 10000 roots of 1000 random polynomials of degree 10.
We observe the strong concentration of points around the unit circle, and the
singularity of the real axis. On the right are plotted histograms of the moduli
of the complex (non-real) roots for n = 10 and 100, and with dotted lines the
asymptotic curves given by (18); for the angular distribution, see Fig. 10.
Let us now call the order of magnitude of the parameters of the deterministic term . In the limit
1, i.e. for a strong disorder, the governing

136
4
3

Benedicte Dujardin
400

n=10
1000 real.

histogram of the moduli, n=100

"
"
"
" , n=10.
theoretical curve, n=100
"
"
" , n=10.

300

_10

Im(z)

1
0

200

-1
-2

unit circle

100

-3
-4
-4

-3

-2

-1

0
Re(z)

0
0.5

0.6

0.7

0.8

0.9

1
|z|

1.1

1.2

1.3

1.4

1.5

Fig. 5. On the left, the location in the complex plane of the roots of an homogeneous
random polynomial of degree 10, for 1000 realizations. On the right, the radial
distribution of complex roots and histograms of the moduli for n = 10 and 100.

term of Pn is its random part; the resulting density of roots is similar to the
density for the homogeneous polynomial, and this behavior remains true as
long as = O(1).
3.4 Weak disorder limit: monic polynomials
In the weak disorder limit
1, Pn is dominated by (z); the main
contributions to the density of roots (given by formula (15)) come from
2
det(f , f )

(f , f )(f , f )1

1
exp 2 (f , f )1 .
2

(21)

Let be a polynomial of degree n, and z0 a root of of multiplicity M . In

(M )
a neighborhood of z0 , (z) can be written as 0 (z z0 )M + O((z z0 )M +1 ).
2
2
The rst function grows then like M z z0 2M 2 , when the other one is
a decreasing exponential of the shape exp{2 C z z0 2M }; C is a positive
quantity depending on z and z0 , of order 0 in z z0 .
If the root is simple, for M = 1, the result is simply Gaussian, and the
density has a peak centered on z0 , with a width of order 1 .
If M > 1, the areas of strong density result from a balance between the positive power of zz0 and the decreasing exponential exp{2 C z z0 2M }.
The average modulus of (z z0 ) is not zero but is such that 2 C z z0 2M =
O(1), as illustrated in Fig. 6. The roots are then located in an annulus centered
on z0 , with radius of order C 1/(2M ) 1/M .
A rough estimate of the integral of the density in a domain surrounding
z0 shows that the relative weight of the annulus circling a root of multiplicity
M , with regards to the weight of the peak corresponding to a simple root,

Distribution of the Roots of Certain Random Real Polynomials

M=1
M=2
M=3
M=10

||z-zo||^{2M-2} exp(-||z-zo||^{2M})

137

0.8

0.6

0.4

0.2

0
-3

-2

Fig. 6. Function z z0

-1

2M 2

0
||z-zo||

exp{ z z0

} for dierent values of M .

is of order M . In other words, each peak in density centered on a root of

multiplicity M does actually contain the equivalent of M roots.
Furthermore, the dependence of C with regards to the argument of (z z0 )
causes some directions to possibly be more favorable. This phenomenon is
strongly related to the expression of , so nothing more can be said at this
level of generality, but we will observe the emergence of an angular structure
with the example (z) = z n .
In Fig. 7 are plotted the 10 roots of the random polynomial
P10 (z) = 20 (z 2 + 4) (z +

i
1
i
1
)(z + + ) (z 1 i)3 (z 1 + i)3 +
2 2
2 2

ak z k

(22)

for 500 realizations. We observe the presence of four (very) sharp peaks of
density around the four simple roots 2i and 0.5 0.5i. The 2 3 remaining
roots are located on two circles surrounding the roots of order M = 3, 1 i.
Let us now consider the particular case of a root of multiplicity n at the
origin by taking (z) = z n . The density has then the shape
2 r2n2 exp

1 2 2n2 sin2 n + O(r2 )

,
r
2
sin2 + O(r2 )

z = rei R \ C ;

(23)

it is of order 1 as long as 2 r2n2 sin2 n

1, which means on a circle,
the radius of which varies like 1/(n1) , more particularly in the directions
= k/n, for k = 0, . . . , 2n 1. Outside of those areas, the density of roots
is negligible. This result can be recovered using dimensional analysis. Let us
introduce the rescaled variable y 1/n |a0 |z. The roots of Pn are given by
the roots of the new polynomial
Pn (y) = y n +

n1
k=1

ak |a0 | n 1 n y k + sgn(a0 ).

(24)

138

Benedicte Dujardin
2

500 real.

Im(z)

zeros of the randomly perturbed polynomial

zeros of the non-perturbed polynomial

-1

-2
-2

-1

0
Re(z)

Fig. 7. Positions of the roots of 500 realizations of a random polynomial of degree

10. The deterministic part has 4 simple roots in 2i, 0.5 0.5i, and 2 roots of
multiplicity 3 in 1 i, indicated by a circle, and is of order = 20.

Since
1 we neglect all the negative powers of . The roots of Pn (y)
th
are then the n roots of 1, depending on the sign of the random variable
a0 , and the roots of Pn (z) are located on a circle of average radius
1/n

1/n E(a0

n+1
1
),
) = 1/n 21/2n (
2n

where (t) =

(25)

dz z t1 ez is the Euler function,

at the favored angles 2k/n, k = 0, . . . , n 1, if a0 0, and (2k + 1)/n if

a0 0, which happens with equal probability since a0 is N(0, 1).
Figures 8, 9 and 10 demonstrate this particular behavior of the distribution
of the roots of monic polynomials. The roots form what is called a quasicrystal [1, 5], with a specic angular structure, located at a distance of the
origin that depends on the order of magnitude of the deterministic part and
the degree n. On the left are located in the plane the roots of 1000 monic
polynomials with (z) = z n for n = 4 (Fig. 8) and n = 10 (Fig. 9). On the
right of Fig. 8 is plotted a function of the shape given by (23), for n = 4. We
observe 8 peaks of density at the regular angular intervals k/4, k = 0, . . . , 7.
The radial behavior is studied in Fig. 9. On the right are plotted histograms
of the moduli of complex (non-real) roots of monic polynomials of degree
n = 10 and 100. As n increases, the peak gets sharper and its location tends
to 1.
The radial and angular behaviors of the density of roots for respectively
homogeneous and monic random polynomials are characterized and compared in Fig. 10. On the left we observe the evolution of the average modulus
of complex roots as a function of the degree of the polynomial n. In the ho-

Distribution of the Roots of Certain Random Real Polynomials

1000 real.
n=4

0.8

Function

C(0,1)

2n2

exp{ r

139
2

sin n
sin 2

0.6

for n=4
0.4

C(0, 1/n)

Im(z)

0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-1

-0.8

-0.6

-0.4

-0.2

0
Re(z)

0.2

0.4

0.6

0.8

Fig. 8. Left, position in the plane of the roots of 1000 monic random polynomials
with (z) = z 4 and = 20. Right, a 3-dimensional plot of function (23) for n = 4.
1
0.8

1000

1000 real.
n=10

0.6

800

histogram of the moduli, n=10.

"
"
"
"
, n=100

0.4

Im(z)

0.2

600

C(0, 1/n )

0
-0.2

400

-0.4
-0.6

200

-0.8
-1
-1

C(0,1)
-0.5

0
Re(z)

0.5

0
0.5

0.6

0.7

|z|

0.8

0.9

Fig. 9. On the left, position in the plane of the roots of 1000 monic random polynomials with (z) = z 10 and = 20. On the right, histograms of the moduli of the
non-real roots of monic polynomials for n = 10 and 100.

mogeneous case, the position of the peak is almost constant, close to 1; in the
monic case, we have an exponential law of the inverse of n, corresponding to
the order of magnitude 1/n . On the right, we study the angular distribution
with histograms of the arguments of the complex (non real) roots. We observe
the appearance of an angular structure in the monic case, while the angular
distribution is quite uniform in the homogeneous case.
3.5 Self-inversive polynomials
Let us nally mention the particular case when the polynomial Pn has the
self-inverse symmetry, which means that its coecients have the reective
property ak = ank , k = 0, . . . , n, with the consequence that the set of its
roots is invariant through the transformation z z 1 .

140

Benedicte Dujardin
200 bins

Rmax : average modulus of the complex roots

120

histogram of the argument of the roots, monic case, n=10

"
"
"
"
"
" , homogeneous case, n=10

100

60
-1/n

average modulus, monic polynomials

"
"
, homogeneous polynomial
theoretical values

0.05

0.1
0.15
1/n : inverse of the degree of the polynomial

0.2

0.25

0.5

1.5

arg(z)

2.5

Fig. 10. Left, the average moduli of the complex roots of homogeneous and monic
( = 20) random polynomials, as a function of the inverse of the degree 1/n, compared to their theoretical values (the constant 1 and 1/n ). Right, histograms of
the positive arguments of the complex roots of homogeneous and monic ( = 20,
= z n ) random polynomials of degree n = 10.

Such polynomials have been studied by Bogomolny et al. [1, 2] and we

just recall here their main result, which more generally holds for complex
coecients. The roots of a self-inversive random polynomial Pn have this
remarkable property that they are not only concentrated in the neighborhood
of the unit circle, but that a macroscopic fraction of them stands precisely on
the circle, a fraction equal on average to
1
E(Nn ({|z| = 1})) =
n

n2
,
3n

(26)

which tends to 1/ 3 0.577 in the limit of large degrees. In contrast to the

fraction of real roots of real random polynomials, this fraction does not tend
to zero as n tends to innity.
This behavior is illustrated in Fig. 11. On the left are plotted the roots of
1000 self-inversive real polynomials of degree n = 10. Apart from the singularity of the real axis, we observe that a certain number of them is located
exactly along the circle. On the right is plotted the average fraction of roots located exactly on the unit circle, as a function of the degree of the polynomial.

As n tends to innity, this fraction tends to the asymptotic value 1/ 3.

4 Conclusion
We have discussed two classes of random real polynomials, according to the
order of magnitude of the random part with regards to the deterministic
component. In the strong disorder case, the roots are concentrated around
the unit circle. The second class concerns random monic polynomials in the

Distribution of the Roots of Certain Random Real Polynomials

0.68

self-inversive random polynomials

n=10

fraction of roots on the circle

1000 real.
0.66

Im(z)

141

0.64

0.62

0.6
-1

1/sqrt(3)

0.58
-2
-2

-1

0
Re(z)

0.56

40
n : degree of the polynomial

Fig. 11. Left, roots of 1000 self-inversive polynomials of degree n = 10. Right,
evolution of the average fraction of roots on the unit circle as a function
of the

degree of the polynomial, tending towards to an asymptotic value of 1/ 3 (dashed

line). The average number is computed over 1000 realizations.

presence of weak disorder. Their roots are located in the neighborhood of the
roots of the non perturbed polynomial if those roots are simple, and in the
case of multiple roots, on a quasi-crystal centered on the root.
We have seen two examples of polynomials whose roots have a certain
symmetry, with regards to the real axis when the coecients are real, or with
regards to the unit circle in the self-inversive case. In both situations, the
symmetry line attracts a certain fraction of the roots.
We have not studied here the case of random polynomials with complex
coecients. Yet, many studies have been carried out concerning this problem
[1, 2, 15]. For high degrees, the roots are located in an annulus around the
unit circle, with a uniform angular distribution, as one can see in Fig. 12 with
the roots of 1000 random polynomials of degree 10 with complex coecients
whose real and imaginary parts are i.i.d. Gaussian variables. Histograms of
the moduli and arguments complete this illustration.
The accumulation of the roots around the unit circle is related to the
existence of a natural boundary of analyticity on this circle of the random
series [10]

ak z k .

(27)

k=0

The zeros of homogeneous random polynomials, i.e. partial sums of this series,
are located in the neighborhood of the boundary [15].
It is possible to pursue the study of the statistical properties of the zeros
of random polynomials with the determination of the k-point correlation functions k (z1 , . . . , zk ) using the same method [14]. Taking k = 1 returns the
density of zeros, and 2 (z1 , z2 ) characterizes the correlation between the roots.

142

Benedicte Dujardin

Hist. of the moduli of a polynomial with random complex coefficients

400

Complex random polynomials

n=10
1000 real.

300

200

100

Im(z)

|z|

60
Histogram of the arguments of a complex polynomial with random gaussian coefficients

-1

-2

-3
-3

C(0,1)

-2

-1

0
Re(z)

-3

-2

-1

0
arg(z)

Fig. 12. Left, roots of 1000 complex random polynomials of degree n = 10. The real
and imaginary parts of the coecients are i.i.d. Gaussian N(0, 1) random variables.
Right, histograms of the moduli and of the arguments of the roots.

Acknowledgements: I want rst of all to thank Juliette and Jean-Daniel

for directing the organization of the school and the edition of these proceedings. In addition to the questions some people asked me during the school,
and the help of J.-D., Jonathan and David, I appreciated the constructive
remarks and advice of my anonymous referee, and tried to take all of these
suggestions into account. As usual, I am innitely grateful to PB for his sympathetic ear and his olympian calm. I found that botching up this lecture was
particularly protable.

References
1. E. Bogomolny, O. Bohigas, P. Leboeuf, Distribution of roots of random
polynomials, Phys. Rev. Lett., 68(18):2726-2729, 1992.
2. E. Bogomolny, O. Bohigas, P. Leboeuf, Quantum Chaotic Dynamics and
Random Polynomials, J. Stat. Phys., 85:639-679, 1996.
3. B. Dujardin, J.-D. Fournier, Coloured noisy data analysis using Pade
approximants, submitted.
4. A. Edelman, E. Kostlan, How many zeros of a random polynomial are
real?, Bull. Amer. Math. Soc., 32:1-37, 1995.
5. J.-D. Fournier, Complex zeros of random Szeg
o polynomials, Computational Methods and Function Theory, pp. 203-223, 1997.
6. I.A. Ibragimov, N.B. Maslova, On the expected number of real zeros of
random polynomials I. Coecient with zero means, Theory Probab. Appl.
16:228-248, 1971.
7. I.A. Ibragimov, O. Zeitouni, On Roots of Random Polynomials, Trans.
Amer. Math. Soc., 349(6):2427-2441, 1997.
8. M. Kac, On the average number of real roots of a random algebraic equation,
Bull. Amer. Math. Soc., 49:314-320, 1943.

Distribution of the Roots of Certain Random Real Polynomials

143

9. M. Kac, Probabilities & Related Topics in Physical Sciences, Lectures in

Applied Mathematics Vol. 1A, Am. Math. Soc., 1959.
10. J.-P. Kahane, Some random series of function, Cambridge University Press,
1968.
11. P. Leboeuf, P. Shukla, Universal uctuations of zeros of chaotic wavefunctions, J. Phys. A. : Math. gen. 29:4827-4835, 1996.
12. J.E. Littlewood, A.C. Offord, On the number of real roots of a random
algebraic equation, J. London Math. Soc., 13:288-295, 1938.
13. A. Mezincescu, D. Bessis, J.-D. Fournier, G. Mantica, F. Aaron, Distribution of Roots of Random Real Generalized Polynomials, J. Stat. Phys.,
86:675-705, 1997.
14. T. Prosen, Exact statistics of complex zeros for Gaussian random polynomials with real coecients, J. Phys. A. : Math. gen., 29:4417-4423, 1996.
15. B. Schiffman, S. Zelditch, Equilibrium distribution of zeros of random
polynomials, Int. Math. Res. Not., 2003.
16. R. Schober, W.H. Gerstacker, The zeros of random polynomials : Further
results and applications, IEEE transactions on communications, 50(6):892896, 2002.

Rational Approximation and Noise

Maciej Pindor
Instytut Fizyki Teoretycznej,
Uniwersytet Warszawski ul.Hoza 69,
00-681 Warszawa, Poland.

1 Introduction
In the previous lecture I discussed (and advertised) a special type of rational
approximation to functions of the complex variable the one that can be
constructed when the information on the function approximated is given in the
form of coecients of its power (favorably Taylor) expansion. The knowledge
of the Taylor series coecients species a function completely and, as we have
seen, one can construct from a nite number of coecients a rational function
which (almost everywhere) approximates this function better and better when
we take into account more and more coecients. There are however other
interesting sequences of rational approximants and I shall rst say few words
about them. They use the information on a behaviour of the function at
several points. In either case, every application of this or other approximation
scheme encounters in practice the additional diculty: all the information we
want and can use to construct an approximating rational function is biased
by errors we can know either expansion coecients or function values with
nite accuracy only. Consequences of this fact, fundamental in all practical
applications, will be discussed in later sections.

2 Rational Interpolation
As we know an analytic function f (z) can also be uniquely specied by an
innite number of its values at points contained in a compact set, on which
the function is analytic. Construction of a sequence of rational functions having the same values as f (z) on a given nite set of points is known as the
rational interpolation problem. It is the classical problem of the numerical
analysis and was studied long ago. My exposition will be partially based on
[10]. Before we discuss the convergence of sequences of rational interpolants
let us comment on the problem of their existence. Let there be a sequence
of points in the complex plane {zi }N
i=0 (which we shall call the nodes) and a
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 145156, 2006.
Springer-Verlag Berlin Heidelberg 2006

146

Maciej Pindor

sequence of complex numbers {fi }N

i=0 we call them henceforth the data
such that
f (zi ) = fi

i = 0, . . . , N .

(1)

We are looking for a rational function rm,n (z) of degrees m in the numerator
and n in the denominator such that rm,n (zi ) = fi , i = 0, . . . , N . If we call the
numerator and the denominator of rm,n , Tm (z) and Bn (z) respectively, and
treat their coecients as unknowns we get the system of equations for these
coecients
Tm (zi )
i = 0, . . . , N
(2)
= fi
Bn (zi )
and we can expect a unique solution if m + n N .
These equations are nonlinear, but can be linearized to the form
Tm (zi ) = Bn (zi )fi

i = 0, . . . , N

(3)

however (3) is not strictly equivalent to (2) all solutions of the latter are
solutions of the former, but not vice versa. The situation seems analogous to
the one encountered in the construction of Pade approximants, but its origin is
even easier to comprehend here. It is obvious that if (3) is satised and Bn (z)
does not vanish on any node, then we can divide the equation by Bn (zi ) and
(2) is also satised. We conclude that if a solution of (3) does not satisfy
(2) then Bn (z) must vanish on some subset of (say k) nodes. Then, however,
Tm (z) must also vanish there. Therefore both Tm (z) and Bn (z) contain a
common factor the polynomial of degree k vanishing on these nodes let it
be wk (z). In this case (3) looks as
nk (zi )wk (zi )fi
i = 0, . . . , N
(4)
Tmk (zi )wk (zi ) = B
and it means that there exists a rational function of degrees m k and n k
respectively, such that it interpolates our data on a subset of N + 1 k nodes.
Vice versa, it is easy to see that if (4) is satised then there is no rational
function of degrees m and n respectively and with relatively prime numerator
and denominator that interpolates all our data the k nodes at which wk (z)
vanishes are called unattainable. All the details of the problem are studied in
depth in [8]. We can say that the problem is the one of degeneracy and we
shall not be concerned with it in the rest of the lecture.
Before I talk about convergence let me rst point you out that rational
interpolants we discussed above and Pade approximants are not entirely alien
to each other. Actually, they are rather extreme cases of general rational
interpolants. To see that we consider an interpolation scheme, i.e. a triangular
matrix of interpolation nodes ai,j C dened as follows

a00

A :=
(5)
a0n ann

Rational Approximation and Noise

147

Each row of the matrix A

An = (a0n , , ann )

(6)

denes an interpolation set of n + 1 nodes. We allow here some or all nodes

in the set to be identical.
To each interpolation set A we assign the polynomial
n

wn (z) =

(z ain ).

(z x) =
xA

(7)

i=0

We say now that rm,n (z) is is the (generalized) rational interpolant of the
function f (z) on the set Am+n (where the function is assumed to be analytical)
if it is the rational function of degrees at most m in the numerator and at
most n in the denominator, such that
f (x) rm,n (x)
is bounded at each x Am+n .
wm+n (x)

(8)

rm,n (z) is also called Hermite type (sense) rational interpolant, or multi-point
Pade approximant. This latter name can be understood if we observe that
when all the points in the interpolation set are identical then rm,n is just
[m/n] Pade approximant. On the other side, if all of them are distinct we
have the ordinary rational interpolant. In the intermediate cases rm,n and its
(k)
derivatives rm,n are identical with f and its derivatives f (k) at x Am+n up
to an order corresponding to a number of occurrences of x in Am+n which
are our data in this situation.
As is the case for the classical rational interpolant, after introducing the
numerator and the denominator of rm,n , Pm and Qn , respectively, we can
substitute (8) by the linearized version
Qm (x)f (x) Pn (x)
is bounded at each x Am+n .
wm+n (x)

(9)

Again, not every rational interpolant with the numerator and the denominator
satisfying (8) satises (9), but the latter always has a solution. If however there
exists a pair of polynomials satisfying (9) and Qn (z) = 0 on any of the points
of Am+n , then the problem (8) is also soluble and the solution is
rmn,n (x) =

Pm (z)
.
Qn (z)

Many algebraic problems connected with existence of multipoint Pade approximants have been studied in [5] and it is known that blocks appearing
in the table do not need to be of square shape.
A special intermediate case is the one called Two-Point Pade Approximants. In this case the interpolation set consists of only two distinct points:

148

Maciej Pindor

zero and the point at innity, appearing in Am+n altogether m + n + 1 times.

This type of rational approximation appears sometimes in physics or technology when we are interested in the function (assumed or postulated to be
analytical) of some variable having special meaning at zero and innity and
we know some number of coecients of its expansion around these two points.
For example the function can be the dielectric constant of the composite of
two dierent materials and the variable, the ratio of their contents [11].

3 Convergence
The convergence problem is in many respect analogous to the one of Pade
approximants, though we have here the additional dependence on the asymptotic distribution of the interpolation nodes ain in the interpolation scheme
Am+n . There is no place here to discuss it in detail, but we can summarize the
results by saying that rational interpolants converge in capacity for holomorphic functions and, apart of the set of cuts they choose, also for functions with
branchpoints, but depending on the localisation of the interpolation scheme
and the set of branchpoints it can happen that in dierent areas of the complex plane the rational interpolants will converge to dierent branches of the
function.

4 Rational Interpolation with Noisy Data

As I have mentioned in the introduction, practical application of rational interpolation encounters the serious obstacle in the fact that the data (function
values and expansion coecients) are always known with nite accuracy only.
This poses the problem when they are supposed to be used to construct an
approximation for an analytical function. The analytical function is the sti
object any, even the smallest, modication of it at some place may result
in an arbitrarily large change at a nite distance from the place at which
we made the modication. Look at the simplest possible, even naive examN
ple: assume that we have two sets of values at nodes {zi }N
i=0 {di }i=0 and
N
{di + /(1 + zi )}i=0 . They can dier arbitrarily little, but if the rst set comes
from a function f (z), the second one comes from f (z) + /(1 + z) which diers
from f (z) arbitrarily much at z = 1 (assuming that f (z) is regular there).
In the following I shall take for granted that the interpolation set is contained in the real line, that functions studied are real on the real line and
that perturbations of data are also real. This assumptions are inessential in
all algebraic considerations and I shall comment below when they inuence
presented results.
The problem has been observed when the rst applications of Pade approximants in physics appeared. In physics Pade approximants have been used to
sum so called perturbation series rather to estimate the sum of the series

Rational Approximation and Noise

149

from nite number (usually small, unfortunately) of coecients. Calculation

of those coecients is generally a serious task, involving numerical calculation
of multiple integrals, and usually physicists must accept substantial limitations in accuracy with which they can know such coecients. Very soon it was
found that varying these coecients within limits of the accuracy with which
they were known, resulted in wild variations of singularities of Pade approximants constructed from the coecients. In this situation Marcel Froissart
[4] made very simple, but highly enlightening numerical experiments. He took
just the geometrical series and perturbed randomly coecients of its power
series in the following manner
1 + x + x2 + . . .

1 + r0 + (1 + r1 )x + (1 + r2 )x2 + . . .

(10)

with some small and random ri s taken from same distribution. Obviously
all Pade approximants to the series on the left are equal to [0/1], i.e. to the
function itself. On the other hand, all (almost, except for the set of measure
zero on the event space) Pade approximants to the series on the right are
dierent and, if we concentrate rst on the sequence [n 1/n], they have n 1
dierent zeros and n dierent poles both randomly distributed. At rst this
seems a catastrophe independently of how small is, Pade approximants to
the perturbed series seem to have nothing in common with the function represented by the original series! However, when one looks where zeros and poles
of these Pade approximants are, an amazing phenomenon can be seen. Look
at zeros and poles of [4/5] with some choice of (real, normally distributed)
ri s

zeros
.00001 .57471 .64809i
.091740, 3.1348
.01

.57034 .64907i
.091958, 3.0223

poles
.57472 .64809i 1.00000098
.091740, 3.1349
.57468 .64812i
.091958, 3.1384

(11)

1.00099

First you see that there is a pole close to 1 the place where the function
represented by the original series has one. Next you see that all the other zeros
and poles which represent only noise come in tight pairs. The smaller is,
the tighter they are quite natural, because we want that at = 0 we return
to the original series! Of course you must remember that positions of all these
zeros and poles are random and for any nite both the separation of the
pairs as well as the distance of the unpaired pole from 1 can be arbitrarily
large, but we expect that at 0 they will both vanish. You can see it clearly
in the next example where I took dierent choice of ri s

150

Maciej Pindor

zeros
poles
.00001 395.688, .55084, 387.376, .55084, 1.000000097
.013502 1.48561i .013471 1.48566i
.0001 490.299, .55084, 387.370, .55084,
.013776 1.48518i .013471 1.48566i

1.00000097

.001

1.0000097

356.883, .55083, 387.311, .55084,

.016466 1.48084i .013471 1.48566i

(12)

When grows, the pair at large negative xs separates so strongly that at

= 001 there is no pair at all.
This phenomenon of pairing of noise induced zeros and poles would not
be interesting at all if it manifested itself only for geometrical series, but it
appeared to be universal and got the name of Froissart phenomenon and the
pairs are known as Froissart doublets.
For other sequences of Pade approximants, when there is a surplus of
zeros or poles, it appeared that these extra zeros or poles escape to innity
when 0.
Let us now see at Fig. 1 what happens when n grows as you see Froissart
doublets are distributed in a close vicinity of the unit circle! This phenomenon
would be even more pronounced if we took n larger, on the other hand one
would always nd some doublets, even for large n at a nite distance from
the circle like those inside the circle.
To show you what happens if we perturb in the same way as before a
series representing a function with branchpoints, let us consider the function
f (z) =

z + 1 2z + 1 +

2
.
z1

(13)

It has branchpoints at 1 and 1/2, the pole at 1,

but also zeros at
1.604148754 and 1.391926826. Moreover it behaves like 2z when z .
We know already that for this function we should expect approximants
[n + 1/n] be the best suited. Below you have zeros and poles of [6/5] exact
i.e. = 0 and also = .001 and = .00001. We clearly see that the exact
approximants give zeros and the pole very close to zeros and the pole of the
function while the remaining zeros and poles of the approximant simulate the
cut (1, 1/2).
It is also interesting to note that [6/5] behaves for z like
1.4142139z ( 2 1.4142136). For approximants to perturbed series we see
that zeros and the pole of the function are reproduced much worse, but they
are there. Behaviour at innity is also perturbed, but makes sense. Finally we
also see Froissart doublets.

Rational Approximation and Noise

151

2
Fig. 1. Zeros and poles of [39/40] to perturbed geometrical series; = .001

105

zeros
poles
p6 /q5
1.3918660, .89206841, .90621750, .73283509, 1.4142139
.71781989, .59123097, .59816026, .52337768,
.52175271, 1.60414873
.999999998
1.391698, .854144,
.873892, .665806,
.651406, .535246,
.538656, .1948449666,
.1948449670, 1.6041319
.9999996

1.414239

.001

1.38207, .712257,
.23399823 .63417286i
.545154, 1.602343

.737541, .550235,
.23399385 63418746i,
.999951

1.41749

.01

1.3608, .587831
.358697 .433745i,
.0517020759, 1.58744

.602766, .999609,
.358691 .433605i,
.051702075

1.43968

(14)

But what happens with Froissart doublets when n grows? Look at Fig. 2.
Now it seems that Froissart doublets are attracted by the circle of the radius

152

Maciej Pindor

1.5

0.5

1.5

0.5

1.5

x
0.5

1.5

Fig. 2. Zeros and poles of [20/19] to perturbed series of the function discussed in
the text; = .001

1/2 yes, this is what takes place. But what is so special about 1/2? it is
the distance to the closest (with respect to the point of expansion) singularity.
Let us summarize our observations: Pade approximants to perturbed series exhibit the Froissart phenomenon, i.e. part of the zeros and poles form
doublets that are tighter and tighter when perturbation becomes smaller. How
large is this part also depends on a size of perturbation when it is small most
of the zeros and poles of the approximant are only slightly perturbed. When it
grows more and more zeros and poles leave the neighborhood of exact zeros
and poles and from Froissart doublets. For growing degrees of the numerator
and of the denominator (more and more terms of the series used) and a size
of the perturbation kept constant, Froissart doublets become attracted by the
circle of a radius of the closest singularity.
Before I give you some explanation of this behaviour, let me show you
what happens for other types rational interpolant. For this end I calculate 12
values of our function at equidistant interpolation nodes on (2, 4) and calculate

Rational Approximation and Noise

153

the (6/5) rational interpolant, rst from exact function values and then for
values perturbed in the way analogous to the way I perturbed coecients of
the series
f (zi ) f (zi )(1 + ri )

(15)

where zi s belong to the interpolation set and ri are independent random

numbers from the same distribution.
In the table below you see that there appear here also noise induced
doublets, but they seem to be attracted by the interpolation interval! On the
other hand, even for the largest this pairing of zeros and poles produced
by noise is so strong that we can guess that there are somewhere two real
zeros and one real pole, though their positions came out very badly. This
example demonstrates the main characteristic feature of the result of rational interpolation made out of noisy data: Froissart doublets appear on the
interpolation interval, or in the very close vicinity of it.

108

106

104

zeros
poles
p6 /q5
1.3919268, 1.604148754,
.9999999999,
1.41421356
.940143, .799964,
.947363, .814864,
.649026, .539497
.660840, .543340
1.38538, 1.604150,
.640039, 2.541938,
3.016366, 3.341408

1.00004
.671489, 2.541938,
3.016366, 3.341408

1.34710, 1.604116,
.998614,
2.51337413, 3.1022690, 2.51337430, 3.1022689,
3.4848785, 3.7939914
3.4848786, 3.7939911
1.33677, 1.604231,
2.523215, 2.590149,
3.095483, 3.683980

1.000956,
2.523219, 2.590160,
3.095482, 3.6839740

1.414230
(16)
1.41544

1.40403

Summing up what has been observed in many numerical experiments and

what I demonstrated to you on simple examples: the main eect of the noise in
the data used for rational approximation is the appearance of doublets of zeros
and poles separated by a distance roughly proportional to the size (relative
with respect to the deterministic part of the data) of the noise. What is
the distribution of these Froissart doublets depends on a specic type of the
rational approximation we use. We have seen that for Pade approximants
they coalesced on a circle of a radius of the closest singularity, for classical
rational interpolants they sticked to the interpolation interval. Except for the
close vicinity of such doublets, rational approximation reproduces values of
the function with accuracy specied by the size of the noise. Can we nd
an explanation of this phenomenon?

154

Maciej Pindor

5 Froissart Polynomial
If we assume that f (z) the function responsible for our unperturbed data
can be approximated well by a sequence of rational approximants then we can
consider our perturbed data as perturbed data produced by some member of
this sequence. Let us, therefore, assume that, for given m and n such that
m + n + 1 + 2k = M , there exists a rational function
rm,n (z) =

Tm (z)
Bn (z)

(17)

approximating f (z) with some accuracy on some vicinity of the set of interM
polation nodes {zi }M
i=0 where M > m + n + 1. We are given data {di }i=0 at
these nodes; let me recall you that if some even all nodes appear with
multiplicity mi > 1 then the data at this (multiple) node are the value of
f (z) and derivatives of f (z) up to the (mi 1)th one which are perturbed
randomly with some scale as in (10) or (15)
(0)

di = di (1 + ri )

i = 0, . . . , M

(18)

(r)

(0)

where di are the exact data. Introducing {di }M

0 data of the same type
as di s but coming from rm,n (z), we can write
(r)

(0)

di = di (1 + ri ) + (di
=

(r)
di

(r)

di )(1 + ri )

i = 0, . . . , M

(19)

i
(r)

introducing arbitrarily some as a scale of deviations of di from di . It

(r)
(0)
summarizes the eect of random perturbations and of dierences di di .
It can be proved using elementary algebra, but with some eort [2] that
the rational interpolant of degrees m + k and n + k, Rm+k,n+k (z) constructed
from data di has the form
Rm+k,n+k (z) =

Tm (z)Gk (z) +
Bn (z)Gk (z) +

n+1 l (l)
l=1 Um+k (z)
.
n
l (l)
l=1 Vn+k (z)

(20)

Actually this formula is almost obvious it says that if coecients of a system

of linear equations and right sides of the system are polynomials of the rst
degree in some , then the solution is a polynomial in of degree equal to the
size of the system. If vanishes we must also get rm,n , therefore the free terms
of the numerator and of the denominator must be proportional to Tm (z) and
Bn (z) correspondingly, with the same coecient. What is not obvious is that
this coecient must be a polynomial of degree k in z.
If we now look at this formula with attention, we see that it perfectly
(0)
(r)
explains the appearance of Froissart doublets if both and di di are
suciently small, i.e. is small, then zeros of the numerator and of the

Rational Approximation and Noise

155

denominator are close to zeros of Tm (z)Gk (z) and of Bn (z)Gk (z). It is then
Gk (z), depending on i s, that governs where the Froissart doublets appear
distances of zeros of numerator and of the denominator from zeros of Gk (z)
will be O(). If f (z) itself was a rational function rm,n (z) or it diered neglibly
from rm,n (z) then Gk (z) would depend only on perturbations ri s and on rm,n .
In that case we shall call it the Froissart polynomial and use the symbol Fk (z).
This is the manageable situation and we can say a lot about Fk (z) ([6], [7],
[3]).
Before I discuss the Froissart polynomial let me point you out that as
seen from (20) and (19), the zero-pole doublets appearing for perturbed data
coming form arbitrary function will behave like Froissart doublets, i.e. will
be distributed randomly with their mutual distance being O(), only when
(r)
(0)
is denitely larger than di di . We can formulate it this way: Froissart
doublets will be observed in a rational approximant constructed from perturbed data of a function when perturbations of the data are much larger than
dierences between exact data from this function and exact data from the best
rational approximation of the same type but lower degrees, to the function of
interest.
To say where the Froissart doublets go, we would have to study the distribution of zeros of Fk (z). For this end one needs a formula expressing coecients of this polynomial by ri s and this formula depend on what type of
rational interpolant we deal with. From considerations in [2] one can only say
that they will be linear combinations of all possible products of k dierent ri s
from a set of M of them. The only thing that was possible to nd from this
very general information was the asymptotic behaviour of the pdf of zeros of
Fk for |z| [1].
As was shown in [9] pdf of zeros of polynomials with random but real
coecients has two components: pdf of real zeros (called the singular component) and pdf of complex zeros (called the regular component). One can
show that the pdf of the singular component of Fk (z), which we denote s (x)
behaves like 1/x2 as x , while pdf of the regular component r (z)
falls of like 1/|z|4 for |z| , except for k directions along which it falls of
like 1/|z|3 . This behaviour means that whatever is a locus of coalescence of
Froissart doublets when their number grows, their distribution has the long
tail the behaviour observed in numerical experiments.
Up to now it was possible to nd the exact form of the pdf of zeros of
the Froissart polynomial only for k = 1 in that case coecients of the
polynomial are linear in ri s therefore a pdf of the polynomial and of its
derivative, necessary to calculate pdf of zeros according to formulae in [9],
are very simple. The very interesting result came out for classical rational
interpolation on equidistant nodes inside of a real interval [3]: the pdf of zeros
of F1 (x) (they are all real, here) has a maximum on the interpolation interval
and also the probability of nding the zero on the interpolation interval is
larger than the probability of nding it outside. It means that the Froissart

156

Maciej Pindor

doublets will appear rather inside the interpolation interval than outside, i.e.
the extrapolation will be less aected by noise in data than the interpolation!

6 Conclusions
My goal was to convince you that rational functions are a very powerful tool in
deciphering (or if you prefer: making a sophisticated guess about) an analytical
structure of a function known from a nite set of data. This explains why
they are so good in approximating values of functions most economically no
wonder they are exploited in your pocket calculators and in internal compiler
routines for transcendental functions. Moreover, the rational approximation
of the form I discussed, has the amazing property of being practically stable
with respect to perturbation of these data noise in data goes mainly into
Froissart doublets that almost annihilate themselves. This is one more reason,
I think, why rational functions have much more potential in applications than
generally recognised.

References
1. J.D. Fournier, M. Pindor. in preparation.
2. J.D. Fournier, M. Pindor. On multi-point Pade approximants to perturbed
rational functions. submitted to Constr. Math. and Funct. Th.
3. J.D. Fournier, M. Pindor. Rational interpolation from stochastic data: A
new froissart phenomenon. Rel. Comp., 6:391409, 2000.
4. M. Froissart. private information. J. Gammel, see also J. Gilewicz, Approximants de Pade LNM 667, Springer Verlag 1976 ch 6.4.
5. M.A. Galluci, W.B. Jones. Rational approximations corresponding to Newton series (Newton-Pade approximants). J. Appr. Th., 17:366372, 1976.
6. J. Gilewicz, M. Pindor. Pade approximants and noise: A case of geometric
series. JCAM, 87:199214, 1997.
7. J. Gilewicz, M. Pindor. Pade approximants and noise: rational functions.
JCAM, 105:285297, 1999.
8. J. Meinguet. On the solubility of the Cauchy interpolation problem. In Proc.
of the University of Lancaster Symposium on Approximation Theory and its
Applications, pages 137164. Academic Press, 1970.
9. G.A. Mezinescu, D. Bessis, J.-D. Fournier, G. Mantica, F. D. Aaron.
Distribution of roots of random real generalized polynomials. J. Stat Phys.,
86:675705, 1997.
10. H. Stahl. Convergence of rational interpolants. Technical Report 299/8-1,
Deutsche Forschungsgemeinschaft Report Sta, 2002.
11. S. Tokarzewski, J.J. Telega, M. Pindor, J. Gilewicz. Basic inequalities
for multipoint Pade approximants to Stieltjes functions. Arch. Mech., 54:141
153, 2002.
12. H. Wallin. Potential theory and approximation of analytic functions by rational interpolation. In Springer Verlag, editor, Proc. of the Colloquium on
Complex Analysis at Joensuu, number 747 in LNM, pages 434450, 1979.

Stationary Processes and Linear Systems

Manfred Deistler
Department of Mathematical Methods in Economics, Research Group
Econometrics and System Theory, Vienna University of Technology
Argentinierstr. 8, A-1040 Wien, Austria
Deistler@tuwien.ac.at

1 Introduction
Time series analysis is concerned with the systematic approaches to extract
information from time series, i.e. from observations ordered in time. Unlike
in classical statistics of independent and identically distributed observations,
not only the values of the observations, but also their ordering in time may
contain information. Main questions in time series analysis concern trends,
cycles, dependence over time and dynamics.
Stationary processes are perhaps the most important models for time series. In this contribution we present two central parts of the theory of wide
sense stationary processes, namely spectral theory and the Wold decomposition; in addition we treat the interface between the theory of stationary
processes and linear systems theory, namely ARMA and state-space systems,
with an emphasis on structure theory for such systems.
The contribution is organized as follows: In section 2 we give a short introduction to the history of the subject, in section 3 we deal with the spectral
theory of stationary processes with an emphasis on the spectral representation
of stationary processes and covariance functions and on linear transformations
in frequency domain. In section 4, the Wold decomposition and prediction are
treated. Due to the Wold decomposition every (linearly) regular stationary
process can be considered as a (in general innite dimensional) linear system
with white noise inputs. These systems are nite dimensional if and only if
their spectral density is rational and this case is of particular importance for
statistical modeling. Processes with rational spectral densities can be described as solutions of ARMA or (linear) state space systems (with white noise
inputs) and the structure of the relation between the Wold decomposition and
ARMA or state space parameters is analyzed in section 5. This structure is
important for the statistical analysis of such systems as is shortly described
in section 6.
The intention of this contribution is to present main ideas and to give
a clear picture of the structure of fundamental results. The contribution is
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 159179, 2006.
Springer-Verlag Berlin Heidelberg 2006

160

Manfred Deistler

oriented towards a mathematically knowledgeable audience. A certain familiarity with probability theory and the theory of Hilbert spaces is required.
We give no proofs. The main references are [12], [7], [8], [10] and [11]. For the
sake of brevity of presentation, we do not give reference, even to important
original literature, if cited in the references listed above; for this reason important and seminal papers by Kolmogorov, Khinchin, Wold, Hannan, Kalman,
Akaike and others will not be found in the list of references at the end of this
contribution.

2 A Short View on the History

Here we give a short account of the historical development of the subject treated in this contribution. For the early history of time series analysis we refer to
[3], for the history of stationary processes to the historical and bibliographic
references in [12] and for a recent account to [6].
The early history of time series analysis dates back to the late eighteenth
century. At this time more accurate data from the orbits of the planets and
the moon become available due to improvements in telescope building. The
fact that Keplers laws result from a two body problem, whereas more than
two bodies are in our planet system, triggered the interest in the detection of
systematic deviations from these laws, and in particular in hidden periodicities
and long term trends in these orbits. Harmonic analysis begins probably with
a memoir published by Lagrange in 1772 on these problems. Subsequently the
theory of Fourier series has been developed by Euler and Fourier. The method
of least squares tting of a line into a scatter plot was introduced by Legendre
and Gauss in the early nineteenth century. Later in the nineteenth century
Stokes and Schuster introduced the periodogram as a method for detecting
hidden periodicities, to study, among others, sunspot numbers.
The empirical analysis of business cycles was on other important area
is early time series analysis. In the seventies and eighties of the nineteenth
century the British economist Jevons investigated uctuations in economic
time series.
The statistical theory of linear regression analysis was developed at the
turn of the nineteenth to the twentieth century by Galton, Pearson, Gosset
and others.
Stochastic models for time series, namely moving average and autoregressive models have been proposed by Yule in the nineteen-twenties, mainly in
order to model non-exactly periodic uctuations such as business cycles. Closely related is the work of Slutzky on the summation of random causes as
a source of cyclical processes and Frischs work on propagation and impulse
problems in dynamic economics.
In the thirties and forties of the twentieth century, the theory of stationary
processes was developed. The concept of a stationary process was introduced
by Khinchin, the spectral representation is due to Kolmogorov, its proof based

Stationary Processes and Linear Systems

161

on the spectral representation of unitary operators was given by Karhunen.

The properties of covariance functions were investigated by Khinchin, Wold,
Cramer and Bochner; linear transformations of stationary processes appear in
Kolmogorovs work. ARMA processes and the Wold representation are introduced in Wolds thesis. The prediction theory was developed by Kolmogorov,
the rational case was investigated by Wiener and Doob.
At about the same time, the work of the Cowles Commission, constituting
econometrics as a eld of its own, came o. Triggered by the great economic depression, starting 1929, economic research activities in describing the
macrodynamics of an economy were intensied. Questions of quantitative
economic policy based on Keynesian theory led to problems of estimating parameters in macroeconomic models. In the work of the Cowles Commission, in
particular in the works of Mann and Wald, Haavelmo and Koopmans, identiability and least squares - and maximum likelihood estimators, in particular
their asymptotic properties were investigated for multi-input, multi-output
AR(X) systems.
In the late forties and fties time series analysis, mainly for the scalar case,
using non-parametric frequency domain methods was booming, in particular
in engineering applications. The statistical properties of the periodogram were
derived and the smoothed spectral estimators were introduced and analyzed
by Tukey, Grenander and Rosenblatt, Hannan and others; analogously, nonparametric transfer function estimation methods based on spectral estimation
were developed.
Almost parallel to the development of non-parametric frequency domain
analysis, the parametric time domain counterparts, namely identication of
AR(X) and ARMA(X) models, were developed in the forties, fties and sixties of the twentieth century, mainly for the scalar case, by Mann and Wald,
T.W. Anderson, Hannan, and others. For AR(X) models actual identication and the corresponding asymptotic theory turned out to be much simpler
compared to the ARMA(X) case. The reason is that in the rst case parameterization is simple and ordinary least squares estimators are asymptotically
ecient and numerically simple at the same time. For the ARMA(X) case,
on the other hand, the maximum likelihood estimator (MLE) has, in general,
no explicit representation and is obtained by numerically optimizing the likelihood function. In addition questions of parameterization and the derivation
of the asymptotic properties of the MLE are quite involved.
The work of Kalman, which is based on state space representations, triggered a time domain revolution in engineering. A particularly important
aspect in the context of this paper is Kalmans work on realization and parameterization of, in general, multi-input, multi-output state space systems.
The book by [1] triggered a boom in applications, mainly because explicit
instructions for actually performing applications for the scalar case were given. This included rules for transforming data to stationarity, for determining
orders, an algorithm for maximizing the likelihood function and procedures
for model validation.

162

Manfred Deistler

A major shortcoming of the Box-Jenkins approach was that order determination had to be done by an experienced modeler in a non-automatic way.
Thus an important step was the development and evaluation of automatic
model selection procedures based on information criteria like AIC or BIC by
Akaike, Hannan, Rissanen and Schwartz.
Identication of multivariate ARMA(X) and state space systems was further developed in the seventies and eighties of the last century, leading to a
certain maturity of methods and theory. This is also documented in the monographs on the subject appearing in the late eighties and early nineties, in
particular [9], [2], [8], [13] and [11]. However substantial research in this area
is still going on.

3 The Spectral Theory of Stationary Processes

For more details, in particular for proofs, concerning results presented in this
and the next section we refer to [12] and [7].
3.1 Stationary Processes and Hilbert spaces
Here we give the basic denitions and introduce the Hilbert space setting for
stationary processes.
Let (, A, P ) be a probability space and consider random variables xt :
Cs where C denotes the complex numbers. A stochastic process (xt | t T ) is
a family of random variables; here T R, where R denotes the real numbers
and in particular the case T = Z, the integers, is considered. In the latter case
we write (xt ) and Z is interpreted as time axis. A stochastic process (xt ) is
said to be (wide sense) stationary if
(i) Ext xt <
tZ
(ii) Ext = m = const
(iii) Ext+r xt does not depend on t, for every r Z
holds. Here denotes the conjugate transpose of a vector or a matrix. For
a stationary process the rst and second moments exist and do not depend
on time t; in particular the linear dependence relations between arbitrary
(j)
(i)
one dimensional component variables xt+r and xt ; i, j = 1, . . . , s, which
(i)
(i)
(j)
(j)
are described by the (central) covariances E(xt+r Ext+r )(xt Ext ) do
only depend on the time dierence r but not on the position in time t. The
covariance function then is dened by
: Z Css : (r) = E(xt+r Ext+r )(xt Ext )
Note that here the covariance matrices are dened as being central; this is of
no great importance and in many cases m is assumed to be zero.

Stationary Processes and Linear Systems

163

Stationary processes are appropriate descriptions for many steady state

random phenomena. But even in apparently nonstationary situations, such as
in the presence of trends, stationary process are often used as models, e.g.
for transformed data or as part of an overall model. The rst and second
moments do not fully describe a stationary process or its probability law,
but they contain important information about the process which is sucient
e.g. for forecasting and ltering problems. Here we restrict ourselves to this
information.
An arbitrary function : Z Css is called nonnegative - denite if, for
every T , T = 1, 2, . . . , the matrices of the form

(0) (1) . . . (T + 1)

(1)
.
(0)

T =

..
...

.
(T 1) . . .
(0)
are nonnegative-denite (denoted by T 0). The following theorem gives a
mathematical characterization of covariance functions of stationary processes:
Theorem 1. A function : Z Css is a covariance function of a stationary
process if and only if it is nonnegative-denite.
Let L2 denote the Hilbert space of square integrable random variables
x : C (or, to be more precise, of the corresponding P-a.e. equivalence classes), over the complex numbers, with inner product dened by <x, y> = Ex
y
where y denotes the conjugate of y. Then the Hilbert space Hx L2 , spanned
(i)
by the one dimensional process variables xt , t Z, i = 1, . . . , s is called the
time domain of the stationary process (xt ) (Note that condition (i) above
(i)
implies xt L2 .)
The stationarity condition (iii), in Hilbert space language, means that for
(i)
(i)
every i, i = 1, . . . , s, the lengths ||xt || of the xt do not depend on t and
(i)
(j)
that the angles between xt+r and xt also do not depend on t. Note that the
lengths are the square roots of the noncentral variances and the angles are
noncentral correlations. Thus the operator shifting the process in time does
not change lengths and angles.
This motivates the following theorem:
Theorem 2. For every stationary process (xt ) there is a unique unitary operator U : Hx Hx such that
(i)

(i)

xt = U t x0 ,
holds.
We only consider stationary processes where the random variables are
Rs -valued; clearly then is Rss valued; the complex notation is only used
for simplication of formulas for the spectral representation.
Important examples of stationary processes are:

164

Manfred Deistler

1. White noise (t ), which is dened by Et = 0 ; Es t = st , where st is

the Kronecker Delta, t is the transpose of t (the same notation is used
for matrices) and where 0 holds. White noise has no linear memory
(i.e. dependencies) over time.
2. Moving average (MA) processes can be represented as:
q

yt =

bj Rsm

bj tj ,

(1)

j=0

where (t ) is white noise. (yt ) is said to be an MA(q) process if bq = 0.

A stationary process is an MA(q) process if and only if its covariance
function satises (q + r) = 0 for some q > 0 and for all r > 0 and if
(q) = 0. MA processes have nite linear memory.
3. Linear - or MA () processes can be represented as

yt =

bj Rsm

bj tj ,

(2)

where (t ) is white noise and where the condition

(3)

guaranteeing the existence of the innite sum in (2) in the sense of mean
squares convergence, holds. In this paper limits of random variables are
always dened in this sense; bj denotes a norm. Note that the rst and
second moments of MA() processes are given by
Eyt = 0
and

(r) =

bj bjr .

(4)

From (4) and (3), we see that an MA() process has fading linear memory.
The class of MA() processes is a large class of stationary processes; it
includes important subclasses, such as the class of causal or one-sided
MA() processes

yt =

bj tj

(5)

j=0

or the class of stationary processes with rational spectral density treated

in detail in section 5.

Stationary Processes and Linear Systems

165

4. Harmonic processes are of the form

eij t zj

xt =

(6)

j=1

where without loss of generality the (angular) frequencies j are restricted to (, ], 1 < 2 < . . . < h and where zj : Cs are, in general, genuine complex random variables describing random amplitudes and
phases. In order to guarantee stationarity of (xt ) we have to assume
Ezj zj <
Ezj =

Ext
0

for j = 0
for j = 0

and
Ezj zl = 0

for j = l.

Since xt is R -valued, in addition we have

1+j = hj ,

j = 0, . . . , h 1

and
z1+j = zhj

j = 0, . . . , h 1.

A harmonic process has a nite dimensional time domain; actually Hx is

(i)
spanned by zj , i = 1, . . . , s, j = 1, . . . , h.
The covariance function of a harmonic process is of the form
h

(r) =
j=1

eij r Fj ;

Fj =

Ezj zj
E(zj Ezj )(zj Ezj )

for j = 0
for j = 0

(7)

From this we see, that for (nontrivial) harmonic processes, the memory
is not fading. The spectral distribution function F : [, ] Css is
dened by
Fj .

F () =

(8)

j:j

As is easily seen and F are in an one-to-one relation, and thus contain the
same information about the underlying process, however this information
is displayed in F in a dierent way. The k th diagonal element of Fj is a
(k)
measure of the expected amplitude of the frequency component eij t zj of
(k)

the k th component process (xt | t Z). The (k, l) o-diagonal element

of Fj (which is a complex number in general) measures by its absolute
value the strength of the linear dependence between the k th and l th
component process at frequency j and by its phase the expected phase
shift.

166

Manfred Deistler

3.2 The Spectral Representation

In this subsection the Fourier representation for stationary processes and for
their covariance functions are described. The main result states that, in a
certain sense, every stationary process can be obtained as a limit of a sequence
of harmonic processes.
A stochastic process (z() | [, ]) where the random variables
z() : Cs are complex in general, is said to be a process with orthogonal increments if:
1.
2.
3.
4.

Ez ()z() <
z() = 0
lim0 z( + ) = z(), [, ]
E(z(4 ) z(3 ))(z(2 ) z(1 )) = 0 for 1 < 2 3 < 4

holds. A process of orthogonal increments can be considered as a random

variable or Ls2 -valued distribution function and thus determines an Ls2 -valued
measure on the Borel sets of [, ] and an associated integral (dened in the
sense of convergence in mean squares).
By Theorem 2, the shift operator for a stationary process is unitary. From
the spectral representation of unitary operators then we obtain:
Theorem 3 (Spectral representation of stationary processes). For
every stationary process (xt ) there is a unique process with orthogonal increments (z() | [, ]) satisfying z() = x0 and z (i) () Hx such that
xt =

[,]

eit dz()

(9)

holds.
The importance of the spectral representation (9) is twofold: First, it allows to interpret a stationary process in terms of frequency components. In
particular, as has been said already, every stationary process may be obtained
as a limit, pointwise in t, of a sequence of harmonic processes. Note that, in
general, convergence will not be uniform in t. Second, as will be seen in the
next subsection, certain operations are easier to perform and to interpret in
frequency domain.
For a general stationary process its spectral distribution function F :
[, ] Css is dened by
F () = E
z ()
z ()

where

z() =

z()
z() Ext

for < 0
for 0.

(10)

Theorem 3 implies that the covariance function has spectral representation

of the form

Stationary Processes and Linear Systems

(t) =

[,]

eit dF ()

167

(11)

constituting a one-to-one relation between and F .

In many cases F is absolutely continuous w.r.t Lebesgue-measure, say;
then there exists the so-called spectral density f : [, ] Css satisfying
F () =

f ()d.

A sucient condition for the existence of a spectral density is that

||(t)||2 <

(12)

holds. Clearly a spectral density is uniquely dened only -a.e.; analogously

to the case of random variables, we do not distinguish between f as function
and f as an equivalence class of -a.e. identical functions. If (12) holds, then
the one-to-one relation between f and is given by
(t) =

f () = (2)1

eit f ()d

(13)

(t)eit

(14)

where the innite sum in (14) corresponds to convergence in the L2 over

[, ] with Lebesgue measure.
As a consequence of Theorem 1, a function f : [, ] Css is a spectral
density if and only if
f () 0

a.e.

and

f ()d

(= (0))

exists

(15)

hold. Since we only consider Rs -valued stationary processes, (t) = (t)

holds and thus, in addition f () = f () has to be satised.
(Nontrivial) harmonic processes are examples for stationary processes having no spectral density.
F describes the second moments of (
z () | [, ]). In particular we
have
F (2 ) F (1 ) = E(
z (2 ) z(1 ))(
z (2 ) z(1 ))

for

2 > 1 (16)

168

Manfred Deistler

and, if the spectral density exists, this is equal to

2
1

f ()d .

(17)

Interpreting the integral in (9) as a limit of a sums of the form (6), we can
adopt the interpretation of F , given for harmonic processes, for general stationary processes, and, if f exists, analogously for f . For instance, for the case
s = 1, the integral (17) is a measure for the expected amplitudes in this
interval (often called frequency band) (1 , 2 ). In a certain sense, peaks of f
(to be more precise areas under such peaks) mark the important frequency
bands. Equation (15) gives a decomposition of the variance of the stationary
process (xt ) into the variance contributions (17) corresponding to dierent
frequency bands. For the case s > 1, e.g. the o diagonal elements in (17)
(which are complex in general) again convey the information concerning the
strength of the linear dependence between dierent component processes in a
certain frequency band and about expected phase shifts there.
3.3 The Isomorphism between Time Domain and Frequency
domain. Linear Transformations of Stationary Processes
The spectral representation (9) denes an isomorphism between the time domain Hx and an other Hilbert space introduced here, the so-called frequency
domain. For simplicity of notation here we assume Ext = 0, otherwise F ()
has to be replaced by Ez()z () in this subsection. As shown in this subsection, the analysis of linear transformations of stationary processes has some
appealing features in the frequency domain.
We start by introducing the frequency domain: For the one-dimensional
(i.e. s = 1) case, the frequency domain LF
2 is the L2 over the measure space
([, ], B [, ], F ), where B [, ] is the -algebra of Borel sets over
[, ] and F is the measure corresponding to the spectral distribution function, i.e. F ((a, b]) = F (b) F (a). The isomorphism I : Hx LF
2 , given by
(9) then is dened by I(xt ) = eit .
For the multivariate (s > 1) case, things are more complicated: First
consider a measure on B [, ] such that there exists a density f () for
F w.r.t. this measure, i.e. such that
F () =

[,]

f () d

holds. Such a measure always exists, one particular choice is the measure
corresponding to the sum of all diagonal elements of F . Let = (1 , . . . , s )
and = (1 , . . . , s ) denote row vectors of functions i , i : [, ] C; we
identify and if
[,]

( )f () ( ) d = 0

Stationary Processes and Linear Systems

169

holds. Then the set (of equivalence classes)

LF
2 = { |

[,]

f () d < }

endowed with the inner product

<, > =

[,]

f () d

is a Hilbert space; in particular, LF

2 is the frequency domain of the stationary
process (xt ). As can be shown, the frequency domain does not depend on the
special choice of the measure and of f () . We have:
(j)

Theorem 4. The mapping I : Hx LF

2 , dened by I(xt ) = (0, . . . ,
eit , 0, . . . , 0), where eit is in j th position, is an isomorphism of the two
Hilbert spaces.
Now, we consider linear transformations of (xt ) of the form

yt =

aj Rsm .

aj xtj ;

(18)

Here

||aj || <

(19)

is a sucient condition for the existence of the innite sum in (18) or, to be
more precise a necessary and sucient condition for the existence of this innite sum for all stationary inputs (xt ). As can easily be seen, the stationarity
of (xt ) implies that (xt , yt ) is (jointly) stationary. From (9) we obtain (using
an obvious notation):
yt =

[,]

eit dzy () =

[,]

eit (

aj eij )dzx ().

(20)

The transfer function, dened by

k(z) =

aj z j

(21)

is in one-to-one relation with the weighting function (aj | j Z).

(j)
By denition yt Hx and thus Hy Hx holds. If U is the unitary shift
for (xt ) then, by linearity and continuity of U , the restriction of U to Hy is the
shift for (yt ). Due to the isomorphism between the time- and the frequency

170

Manfred Deistler

domain of (xt ), kj (ei )eit , where kj is the j th row of the transfer function
(j)
k, corresponds to yt . Strictly speaking there are two transfer functions. The
rst is dened under the condition (20), from (21) as a function in the sense
of pointwise convergence. The second is a matrix whose rows are elements of
the frequency domain of (xt ). In the latter case (19) is not required.
Note that the discrete convolution (18) in time-domain corresponds to
multiplication in frequency domain. In a sloppy notation we have from (20)
dzy () = k(ei )dzx ()

(22)

As a straightforward consequence from (20) we obtain:

Theorem 5. Let (xt ) be stationary with spectral density fx and let (18) hold.
Then the spectral density fy of (yt ) and the cross spectral density fyx between
(yt ) and (xt ) (i.e. the upper o-diagonal block in the spectral density matrix
of the joint process (xt , yt ) ) exist and are given by
fy () = k(ei )fx ()k(ei )

(23)

fyx () = k(ei )fx ()

(24)

where k is given by (21).

An analogous (and more general) result holds for spectral distribution
functions. As a direct consequence of the above theorem, we see that for a
linear process the spectral density exists and is of the form
fy () = (2)1 k(ei )

k(ei ) ;

k(z) =

bj z j

(25)

where (3) holds. Note that (3) is more general than (19). The expression (18)
shows an input process (xt ) transformed by a (deterministic) linear system
(described by its weighting function (aj | j Z) or its transfer function k) to
an output process (yt ). Such systems are time invariant, i.e. the aj do not
depend on t and stable, i.e. the inputoutput operator is bounded.
The eect of the linear transformation (18) can easily be interpreted from
(22). For instance for the case s = m = 1, where k is scalar, the absolute
value of k(ei ) shows how the frequency components of (xt ) are amplied
(for |k(ei )| > 1) or attenuated (for |k(ei )| < 1) by passing through the
linear system and its phase indicates the phase-shift.
Linear systems with noise are of the form
yt = yt + ut

yt =

lj xtj ;
j=

(26)
lj Rsm

(27)

Stationary Processes and Linear Systems

kj Rss

kj tj ;

ut =

171

(28)

where (xt ) are the observed inputs, (ut ) is the noise on the unobserved outputs
(
yt ); (t ) is white noise and nally (yt ) are the observed outputs. We assume
that
Ext us = 0

for all s, t Z

(29)

(j)

holds. This is equivalent to saying that yt is the projection of yt on Hx or,

(j)
(j)
due to the projection-theorem, that yt is the best approximation of yt L2
by an element in Hx . We will then say that (
yt ) is the best linear least squares
approximation of (yt ) by (xt ).
If (xt ) has a spectral density, then we have
fy () = l(ei )fx ()l(ei ) + (2)1 k(ei )k(ei )

(30)

fyx () = l(ei )fx ()

(31)

and

j
j= lj z ,

kj z hold.
k(z) =
where l(z) =
Formulas (30), (31) describe the relations between the second moments of
observed inputs and outputs on one side and the covariance matrix and
the two linear systems described by l and k on the other side. If fx () > 0,
[, ] holds, then l is obtained from the second moments of the observations by the so called Wiener lter formula
l(ei ) = fyx ()fx ()1 .
An important special case occurs if both transformations (30), (31) are causal,
i.e. lj = 0, j < 0; kj = 0, j < 0 and the transfer functions are k(z) and l(z)
are rational, i.e. there exist polynomial matrices
p

a(z) =

aj z ,
j=0

b(z) =

bj z ,
j=0

d(z) =

dj z j

j=0

such that k = a1 b, l = a1 d. In this case the linear system can be represented

by an ARMA(X) system (see e.g. [8])
a(z)yt = d(z)xt + b(z)t

(32)

st+1 = Ast + Bt + Dxt

(33)

yt = Cst + t + Ext .

(34)

or a state space system

Here z is used for a complex variable as well as for the backward shift
z(xt | t Z) = (xt1 | t Z), st is the state at time t and A, B, C, D, E are
parameter matrices. For further details we refer to [8].

172

Manfred Deistler

4 The Wold Decomposition and Forecasting

The Wold decomposition provides important insights in the structure of stationary processes. These insights are particularly useful for forecasting.
Let (xt ) again be stationary. Linear least squares forecasting is concerned
with the best (in the linear least squares sense) approximation of a future
variable xt+h , h > 0 by past (and present) variables xr , r t. By the
(1)
(s)
projection theorem, this approximation, x
t,h = (
xt,h , . . . , x
t,h ) say, is obtai(j)

ned by projecting the components xt+h of xt+h on the Hilbert space Hx (t)
(j)

(j)

spanned by the xr ; r t, j = 1, . . . , s, yielding x

t,h . Then x
t,h is called the
predictor and xt+h x
t,h is called the prediction error.
As far as forecasting is concerned, we may distinguish the following two
extreme cases:
A stationary process (xt ) is called (linearly) singular if xt+h = x
t,h for
one t and h > 0, and thus for all t, h, holds. Thus a singular process can
be forecasted without error and Hx (t) = Hx holds. Harmonic processes are
examples for singular processes.
A stationary process (xt ) is called (linearly) regular if
lim x
t,h = 0

for one t and thus for all t holds. White noise is a simple example for a regular
process. For a regular process we have rt Hx (r) = {0}.
Theorem 6 (Wold).
1. Every stationary process (xt ) can be uniquely decomposed as
x t = y t + zt

(35)

where
Eys zt = 0
(j)

for all s, t

(j)

yt , zt Hx (t), j = 1, . . . , n and where (yt ) is regular and (zt ) is

singular.
2. Every regular process (yt ) can be represented as

yt =

kj tj ,
j=0

||kj ||2 <

(36)

j=0

where (t ) is white noise and where H (t) = Hy (t) holds.

From the theorem above we see that Hx (t) is the orthogonal sum of Hy (t)
and Hz (t) and thus we can predict the regular and the singular part separately.
For a regular process we can split the Wold representation (36) as

Stationary Processes and Linear Systems

173

yt+h =

kj t+hj +

kj t+hj

(37)

j=0

j=h

The components of the rst part of the r.h.s. of (37) are elements of
Hy (t) = H (t) and the components of the second part of the r.h.s. are orthogonal to Hy (t). Thus, by the projection theorem,

yt,h =

kj t+hj

(38)

j=h

and the second part on the r.h.s of (37) is the prediction error. Expressing
(j)
(j)
the l as linear combinations or limits of linear combinations of yr , r l
and inserting this in (38) gives the prediction formula, i.e. yt,h as a linear
function of yr , r t. Thus, for given Wold representation (36) (i.e. for given
kj , j = 0, 1, . . . ) the predictor formula can be determined.
From (36) we see that every linearly regular process can be interpreted
as the output of a linear system with white noise inputs. Thus the spectral
density fy of (yt ) exists and is of the form (see (25))
fy () = (2)1 k(ei )k(ei )

(39)

where

k(z) =

kj z j ,

= Et t .

(40)

j=0

5 Rational Spectra, ARMA and State Space Systems

From a statistical point of view, AR(X), ARMA(X) and state space systems
are the most important models for stationary processes. The reason is that
for such models only nitely many parameters have to be estimated and that
a large class of linear systems can be approximated by such models. Here, for
simplicity of presentation, we only consider the case of no observed inputs.
Most of the results can be extended to the case of observed inputs in a straight
forward manner. In this section we investigate the relation between the internal parameters (system parameters and possibly the variance covariance
matrix of the white noise (t )) and external behavior (described by the
second moments of the observations (yt ) or the transfer function k(z)) of such
systems.
An ARMA system is of the form
a(z)yt = b(z)t

(41)

where z is the backward shift operator, t is the unobserved white noise,

p
q
a(z) = j=0 aj z j , b(z) = j=0 bj z j , aj , bj ; Rss and (yt ) is the (observed)

174

Manfred Deistler

output process. As is well known, the set of all solutions of a linear dierence
equation (41) is of the form one particular solution plus the set of all solutions
of a(z)yt = 0. We are only interested in stationary solutions; they are obtained
by the so called z-transform. In solving (41), the equation, in a certain sense,
has to be multiplied by the inverse of a(z) from the left. Using the fact that
multiplication of power series in the backward shift and in z C is done in
the same way, we obtain:
Theorem 7. Under the assumption
det a(z) = 0

|z| 1

(42)

the causal stationary solution of (41) is given by

yt = k(z)j =

kj tj

(43)

j=0

where the transfer function is given by

k(z) =

kj z j = a1 (z)b(z) = (det a(z))1 adj(a(z))b(z)

|z| 1

(44)

j=0

Here det and adj denote the determinant and the adjoint respectively.
Condition (42) is called the stability condition. It guarantees that the
norms kj in the causal solution converge geometrically to zero. Thus the
ARMA process has an exponentially fading (linear) memory. For actually
determining the kj , the following block recursive linear equation system
a 0 k 0 = b0 ,

a 0 k 1 + a 1 k 0 = b1 , . . .

obtained by a comparison of coecients in a(z)k(z) = b(z), has to be solved.

If in addition the so-called miniphase condition
det b(z) = 0

|z| < 1

(45)

is imposed, then we have Hy (t) = H (t) and thus the solution (43) is
already the Wold representation (36). Condition (42) sometimes is relaxed to det a(z) = 0 for |z| = 1. Then there exists a stationary solution

yt = j= kj tj , which in general will not be causal.

A state space system (in innovations form) is given as
st+1 = Ast + Bt

(46)

yt = Cst + t

(47)

Stationary Processes and Linear Systems

175

Here st is the, in general unobserved, n-dimensional state and A Rnn .

B Rns , C Rsn are parameter matrices.
The stability condition (42) is of the form
|max (A)| < 1

(48)

where max (A) denotes an eigenvalue of A of maximum modulus. The steady

state solution then is of the form
yt = (C(Iz 1 A)1 B + I)t .

(49)

Here the coecients of the transfer function are given as kj = CAj1 B for
j > 0.
The miniphase condition
|max (A BC)| 1

(50)

then guarantees that (49) corresponds to Wold representation (36). Note that
(43) and (49) dene causal linear processes, with a spectral density given by
(39). Clearly the transfer function of both, ARMA and state space solutions
are rational and so are their spectral densities. The following theorem claries the relation between rational spectral densities, ARMA and state space
systems:
Theorem 8. 1. Every rational and -a.e nonsingular spectral density fy can
be uniquely factorized (as in (39)) such that k(z) is rational, analytic within a circle containing the closed unit disk, det k(z) = 0 |z| < 1, k(0) = I
and > 0;
2. For every rational transfer function k(z) with the properties given in (1),
there is a stable and miniphase ARMA system with a0 = b0 and conversely, every such ARMA system has a rational transfer function with the
properties given in (1);
3. For every rational transfer function k(z) with the properties given in (1),
there is a stable and miniphase state space system and conversely, every
such state space system has a rational transfer function with the properties
given in (1).
Thus, in particular, (stable and causal) ARMA (with a0 = b0 )- and (stable
and causal) state space systems represent the same class of transfer functions
or spectral densities.
Now we consider the inverse problem of nding an ARMA or state space
system from the spectral density, or, equivalently, from the transfer function.
From now on we assume throughout that the stability and the miniphase
conditions hold. Two ARMA systems (a, b) and (
a, b) say, are called observationally equivalent if they have the same transfer function (and thus for given
, the same second moments of the solution) i.e. if a1 b = a
1b holds. Observational equivalence for state space systems is dened analogously. Now,

176

Manfred Deistler

in general, (a, b) is not uniquely determined from k = a1 b. Let us assume

that (a, b) is relatively left prime, i.e. that every common left (polynomial
matrix) divisor u of (a, b) is unimodular, i.e. det u(z) = const = 0 holds. Here
a polynomial matrix u is called a common left divisor of (a, b), if there exist
polynomial matrices (
a, b) such that (a, b) = u(
a, b) holds. In a certain sense,
relative left primeness excludes redundant ARMA systems.
Then we have:
Theorem 9. Let (a, b) and (
a, b) be relatively left prime; then (a, b) and (
a, b)
are observationally equivalent if and only if there exists a unimodular u matrix
such that
(a, b) = u(
a, b)

(51)

holds.
A state space system (A, B, C) is called minimal if the state dimension n
is minimal among all state space systems corresponding to the same transfer
function. This is the case if and only if the observability matrix
On = (C , A C , . . . , (A )n1 C )
and the controllability matrix
Cn = (B, AB, . . . , An1 B)
both have rank n. Also minimality is a requirement of nonredundancy. We
have:
B,
C)
are
Theorem 10. Two minimal state space systems (A, B, C) and (A,
observationally equivalent if and only if there exists a nonsingular matrix
T Rnn such that
1 ,
A = T AT

B = T B,

1
C = CT

holds.
A class of ARMA or state space systems is called identiable if it contains no distinct observationally equivalent systems. Of course identiability
is a desirable property, because it attaches to a given spectral density or a
given transfer function a unique ARMA or state space system. In general
terms, identiability is obtained by selecting representatives from the classes
of observationally equivalent systems. In addition, from an estimation point
of view, subclasses of the class of all ARMA or state space systems, leading
to nite dimensional parameter spaces and to a continuous dependence of the
parameters on the transfer function (for details see e.g. [5]) are preferred.
As an example consider the set of ARMA systems (a, b) where (42), (45)
and a0 = b0 = I hold, which are relatively left prime and where the degrees

Stationary Processes and Linear Systems

177

of a(z) and b(z) are both p and where (ap , bp ) has rank s. We denote the set
of all corresponding vec (a1 , . . . , ap , b1 , . . . , bp ), where vec means stacking
the columns of the respective matrix, by Tp,p . As can be shown, Tp,p contains
2
a nontrivial open subset of R2ps and is identiable as under these conditions,
since (51) implies that u must be the identity matrix; thus Tp,p is a reasonable parameter space. In this setting a system is described by the integer
valued parameter p and by the real valued parameters in vec (a1 , . . . , bp ). For
the description of fy , of course also is needed. Let Up,p denote the set of all
transfer functions k corresponding to Tp,p via (44). Then due to identiability
there exists a mapping : Up,p Tp,p attaching to the transfer functions the
corresponding ARMA parameters. Such a mapping is called parameterization.
A disadvantage of the specic approach described above is that for s > 1 not
every transfer function corresponding to an ARMA system can be described
in this way, i.e. there are k for which there is no p such that k Up,p .
For a general account on parameter spaces for and parameterizations of
ARMA and state space systems we refer to [8], [4] and [5].

6 The Relation to System Identication

In system identication, the task is to nd a good model from data. The
approach we have in mind here is semi-nonparametric in the sense that identication can be decomposed into the following two steps, see [5]:
1. Model selection: Here we commence from the original model class, i.e. the
class of all a priori candidate systems, for instance the class of all ARMA
systems (41), for given s and for arbitrary p and q. The task then is to nd
a reasonable subclass from the data, such as the class Tp,p described in
the last section; typically estimation of the subclass consists in estimation
of integers, such as p for Tp,p , e.g. by information criteria such as AIC or
BIC see e.g. [8]
2. Estimation of real valued parameters: Here, for a given subclass, the system parameters such as vec (a1 , . . . , bp ) for Tp,p and the variance covariance matrix are estimated. As has been mentioned already in the
previous section, the subclasses are chosen in a way such that they can be
described by nite dimensional parameter spaces.
It should be noted, that in most cases only parameters describing the second
moments (the spectral density) of (yt ) are estimated; accordingly estimation
of moments of order greater than two, which is of interest in some applications,
is not considered here.
For the AR(X) case, i.e. when b(z) = I holds on the r.h.s. of (32), identication is much simpler compared to the ARMA(X) or state space case.
This is the reason why AR(X) models still dominate in many applications,
despite the fact that they are less exible, so that more parameters may be
needed for modeling. Again we restrict ourselves to the case of no observed

178

Manfred Deistler

inputs. Once the maximum lag p has been determined, assuming a0 = I,

2
the parameter space Tp = {vec(a1 , . . . , ap ) Rs p | det a(z) = 0 |z| 1}
is identiable. Ordinary least squares type estimators (such as Yule-Walker
estimators) can be shown to be consistent and asymptotically ecient under
general conditions on the one hand and are easy to calculate on the other
hand.
For ARMA and state space systems identication is more complicated for
two reasons:
1. The maximum likelihood estimators (for the real valued parameters) are
in general not explicitly given, but have to be determined by numerical
optimization procedures.
2. Parameter spaces and parameterizations are more complicated.
In this case, for a full understanding of identication procedures an analysis of
topological and geometric properties of parameter spaces and parameterizations is needed. For instance as shown in [8] the MLEs of the transfer function
are consistent; thus continuity of the parametrization guarantees consistency
of parameter estimators. The importance of such structural properties for
identication is discussed in detail in [8], [4] and [5].

References
1. G. Box, G. Jenkins, Time Series Analysis. Forecasting and Control, Holden
Day, San Francisco, 1970
2. P. E. Caines, Linear Stochastic Systems, John Wiley & Sons, New York, 1988.
3. H. Davis, The Analysis of Economic Time Series, Principia Press, Bloomington, 1941.
4. M. Deistler, Identication of Linear Dynamic Multiinput/Multioutput Systems, in D. Pena et al. (ed.), A Course in Time Series Analysis, John Wiley
& Sons, New York, 2001.
5. M. Deistler, System Identication - General Aspects and Structure, in
G. Goodwin (ed.), System Identication and Adaptive Control, (Festschrift for
B.D.O. Anderson), Springer, London, pp. 3 26, 2001.
6. M. Deistler, System Identication and Time Series Analysis: Past, Present
and Future., in B. Pasik-Duncan (ed.), Stochastic Theory and Control, Springer,
Kansas, USA, pp. 97 108. (Festschrift for Tyrone Duncan), 2002.
7. E. Hannan, Multiple Time Series, Wiley, New York, 1970.
8. E. Hannan, M. Deistler, The Statistical Theory of Linear Systems, John
Wiley & Sons, New York, 1988.
9. L. Ljung, System Identication: Theory for the User, Prentice Hall, Englewood
Clis, 1987.
10. M. Pourahmadi, Foundations of Time Series Analysis and Predicton Theory,
Wiley, New York, 2001.
11. G. Reinsel Elements of Multivariate Time Series Analysis, Springer Verlag,
New York, 1993.

Stationary Processes and Linear Systems

179

12. Y. A. Rozanov, Stationary Random Processes, Holden Day, San Francisco,

1967.
derstro
m, P. Stoica, System Identication, Prentice Hall, New York,
13. T. So
1989.

Parametric Spectral Estimation and Data

Whitening
Elena Cuoco
INFN, Sezione di Firenze, Via G. Sansone 1, 50019 Sesto Fiorentino (FI),
present address: EGO, via Amaldi, Santo Stefano a Macerata, Cascina (PI)
cuoco@fi.infn.it, elena.cuoco@ego-gw.it

Abstract
The knowledge of the noise Power Spectral Density is fundamental in signal
processing for the detection algorithms and for the analysis of the data. In this
lecture we address both the problem of identifying the noise Power Spectral
Density of physical system using parametric techniques and the problem of
the whitening procedure of the sequence of data in time domain.

1 Introduction
In the detection of signals buried in noisy data, it is necessary to know the
Power Spectral Density (PSD) S() of the noise of the detector in such a way
to be able to perform the Wiener lter [9]. By the theory of optimal ltering
for signal buried in stationary and Gaussian noise [9, 10], if we are looking for
a signal of known wave-form with unknown parameters, the optimal lter is
given by the Wiener matching in lter domain
C() =

x()h(, )
,
S()

(1)

where h(, ) is the template of the signal we are looking for, are the parameters of the waveform and x() is the Fourier Transform of our sequence of
data x[n].
We can implement the Wiener lter in the frequency domain and, supposing the noise is stationary, we can estimate the PSD, for example, as a
windowed averaged periodogram:

PPER

1
=
N

N 1

x[n] exp(2in) .

(2)

n=0

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 181191, 2006.
Springer-Verlag Berlin Heidelberg 2006

182

Elena Cuoco

Pmedio =

1
K

K1
m=0

(m)

PPER () = S().

(3)

Sometimes it could be useful to implement the Wiener lter in time domain

[1, 3], in this case what we perform in time domain is the so called whitening
procedure, i.e. we estimate the lter which ts our PSD and we use the lters
parameters to performs the division by the PSD of the Wiener lter in time
domain.
These procedure could help if we know that our noise is not stationary.
In that case we could use adaptive whitening lter in time domain, but it is
out of the purpose of this lecture to go inside the theory of adaptive lters
[6, 5, 11].

2 Parametric modeling for Power Spectral Density:

ARMA and AR models
The advantages of parametric modeling with respect to the classical spectral
methods are described in an exhaustive way in reference [4]. We focused on
the rational function in the eld of parametric estimation, because they oer
the possibility of building a whitening stable lter in the time domain.
What we want to parametrize is the transfer function of our physical system: a linear system can be modeled as an object which transform an input
sequence w[z] in the output x[z] by the transfer function H[z] (see g. 1).

w(z)

x(z)

H(z)
Fig. 1. Linear model

If the transfer function is in the form

H(z) =

B(z)
A(z)

(4)

this is a rational transfer function modeled system.

In particular a general process described by a ARMA (p, q) model satises
the relation:
p

x[n] =

ak x[n k] +
k=1

bk w[n k]
k=0

(5)

Parametric Spectral Estimation and Data Whitening

183

in the time domain, and its transfer function, in zdomain, is given by

p
H(z) = B(z)/A(z), where A(z) = k=0 ak z k represent the autoregressive
q
k
(AR) part and B(z) = k=0 bk z the moving average (MA) part.
An AR model is called an all poles model, while the MA one is called an all
zero models. Some physical systems are well described by an ARMA model,
other by an AR and other by the MA one.
If we want to model our physical process with a parametric one, we have
to choose the appropriate model and then we have to estimate its parameters.
The parameters of an ARMA model are linked to the autocorrelation function of the system rxx [n]. So we have to estimate it before determine the ak or
bk parameters. The relation between these parameters and the autocorrelation
function is given by the YuleWalker equations.
2.1 The YuleWalker equations
The parameters of the ARMA model are linked to the autocorrelation function
of the process by the YuleWalker equations [4].
One way to derive the YuleWalker is to write the correlation function
rxx [k] in the rst term of the equation (5). To do this, we have to simply
multiply the equation (5) by x [n k] and take the expectation value on both
sides.
We obtain the relation
p

rxx [k] =

al rxx [k l] +
l=1

bl rxw [k l],

(6)

l=0

where rxw [k] is the cross correlation between the output x[n] and the driving
noise w[n]. Let h[l] be the taps of the lter H(z), the lter being causal, we
n

h[n l]w[l]. It is evident that rxw [k] = 0

can write the output as x[n] =

for k > 0, since the output depends only on the driving input at step l < n.
Noting that
n

rxw [k] = x [n]w[n + k] = (

h [n l]w [l]w[n + k]) = 2 h [k],

( is the amplitude of the driving white noise) we can write the YuleWalker
equations in the following way
rxx [k] =

p
l=1
p
l=1

al rxx [k l] + 2
al rxx [k l]

qk
l=0

h[l] bl+k

for k = 0, 1, . . . , q
for k q + 1 .

(7)

In the general case of an ARMA process we must solve a set of non linear
equations while, if we specialize to an AR process (that is an all-poles model)
the equations to be solved to nd the AR parameters become linear.

184

Elena Cuoco

The relationship between the parameters of the AR model and the autocorrelation function rxx (n) is given by the YuleWalker equations written in
the form
p
l=1
p
l=1

rxx [k] =

al rxx [k l]
al rxx [l] + 2

for k 1
for k = 0 .

(8)

In the following, we specialize the discussion to the AR estimation, since

we are looking the way to build a whitening stable lter in the time domain,
and the AR lter give us the solution.

3 AR and whitening process

An AR(P ) process is identied by the relation
P

ak x[n k] + w[n],

x[n] =

(9)

k=1

w[n] being the driving white noise.

The tight relation between the AR lter and the whitening lter is clear
in the gure 2. The gure describes the scheme of an AR lter. The AR lter
colors the white process w[n] at the input of the lter (look at the picture from
left to right). If you look at the picture from right to left you see a colored
process at the input which passes through the AR inverse lter coming out
as a white process.
w[n]

x[n]

-1
x[n-1]

z-1
x[n-2]

...

z -1
ap

x[n-p]

Fig. 2. Link between AR lter and whitening lter

Suppose you have a sequence x[n] of data which is characterized by an

autocorrelation rxx [n] which is not a delta function, and that you need to

Parametric Spectral Estimation and Data Whitening

185

remove all the correlation, making it a white process (see refs [1, 2, 3] for
application of whitening in real cases), the idea is to model x[n] as an AR
process, nd the AR parameters and use them to whiten the process.
Since we want to deal with real physical problem we have to assume the
causality of the lter. So when we whiten the data we must assure that the
causality has been preserved. Moreover we must have a stable lter to avoid
divergences in the application of this lter to the data. In the next section
we will show how estimating the AR parameters assures the causality and
stability of the whitening lter in the time domain.
3.1 Minimum phase lter and stability
The necessary condition to have a stable and causal lter H(z) is that the
all poles of the lter are inside the unit circle of the zplane [4, 8]. This lter
is called minimum-phase lter. A complete anticausal lter will have all its
poles outside the unit circle. This lter is called a maximum phase lter.
If a system has poles or zeros outside the unit circle, it can be made minimum phase by moving poles and zeros z0 inside the unit circle. For example if
we want to put inside the unit circle a zero we have to multiply the function
by this term
z 1 z0
.
1 z0 z 1

(10)

This will alter the phase, but not the magnitude of the transfer function.
If we want to nd the lter H(z) of a linear system for a random process
with given PSD S(z) (the complex PSD), which satises the minimum phase
condition, we must perform a spectral factorization (see [8]) in causal and
anti-causal components:
S(z) = H(z)H(z 1 ).

(11)

We can perform this operation in alternative way. We can nd a rational function t to the PSD with polynomials that are minimum phase. A
minimum-phase polynomial is one that has all of its zeros and poles strictly
inside the unit circle. If we consider a rational function H(z) = B(z)/A(z),
both B(z) and A(z) must be minimum phase polynomials.
If we restrict to AR t, we are looking for a polynomial A(z) which is a
minimum phase one. The AR estimation algorithm we choose will ensure that
this condition is always satised.

4 AR parameters estimation
There are dierent algorithms to estimate the AR parameters of a process
which we assumed can be modeled as an autoregressive one, for example the

186

Elena Cuoco

Levinson, Durbin or Burg ones [4, 8]. We are looking for parameters of a
transfer function which models a real physical system.
We can show that problem of determining the AR parameters is the same
of that of nding the optimal weights vector w = wk , for k = 1, . . . , P
for the problem of Linear Prediction [4]. In the Linear Prediction we would
predict the sample x[n] using the P previous observed data x[n] = {x[n 1],
x[n 2], . . . , x[n P ]} building the estimate x
[n] as a transversal lter:
P

wk x[n k] .

x
[n] =

(12)

k=1

We choose the coecients of the Linear Predictor lter by minimizing a

cost function that is the mean squares error = E[e[n]2 ], being
e[n] = x[n] x
[n]

(13)

the error we made in this prediction, obtaining the so called Normal or WienerHopf equations
P

min = rxx [0]

wk rxx [k] ,

(14)

k=1

which are identical to the YuleWalker equations with

wk = ak
2

min = .

(15)
(16)

This equivalence relationship between AR model and linear prediction

assures us to obtain a lter which is stable and causal [4].
It is possible to show that an equivalent representation for an AR process
is based on the value of the autocorrelation function at lag 0 and a set of coecients called reection coecient or parcor (partial correlation coecients)
kp , p = 1, . . . , P , P being the order of our model. The k-th reection coecient is the partial correlation coecient between x[n] and x[n k], when the
dependence of the samples in between has been removed.
We report here after the procedure to estimate the reection coecients
and the AR parameters using the Levinson-Durbin algorithm.
The algorithm proceeds in the following way:
Initialize the mean squares error as 0 = rxx [0].
Introduce the reection coecients kp , linked to the partial correlation
between the x[n] and x[n p] [8]:

p1
1
(p1)
kp =
rxx [p]
aj
rxx [p j] .
(17)
p1
j=1

Parametric Spectral Estimation and Data Whitening

187

At the p stage the parameter of the model is equal to the p-th reection
coecient
ap(p) = kp .

(18)

The other parameters are updated in the following way:

For 1 j p 1
(p)

(p1)

= aj

(p1)

kp apj

p = (1 kp2 )p1

(19)
(20)

At the end of the p loop, when p = P , the nal AR parameters are

(P )

aj = aj ,

2 = P .

(21)

5 The whitening lter in the time domain

We can use the reection coecients in implementing the whitening lter [2,
3] in a lattice structure. Let us suppose to have a stochastic Gaussian and
stationary process x[n] which we modeled as an autoregressive process of
order P . Remember that an AR model could be viewed as a linear prediction
problem. In this context we can dene the forward error (FPE) for the lter
of order P in the following way
efP [n] = x[n] +

P
k=1

(P )

ak x[n k] ,

(22)

where the coecients ak are the coecients for the AR model for the process
x[n]. The FPE represents the output of our lter. We can write the zeta
transform for the FPE at each stage p for the lter of order P as

F P E(z) = Fpf [z]X[z] = 1 +

j=1

(p)

aj z j X[z] .

(23)

At each stage p of the Durbin algorithm the coecients ap are updated as

(p)

(p1)

= aj

(p1)

+ kp apj

1j p1 .

(24)

If we use the above relation for the transform Fpf [z], we obtain

f
Fpf [z] = Fp1
[z] + kp z p +

p1
j=1

(p1)

apj z j .

(25)

188

Elena Cuoco

Now we introduce in a natural way the backward error of prediction BPE

b
Fp1
[z] = z (p1) +

p1
j=1

(p1)

apj z (j1) .

(26)

In order to understand the meaning of Fpb [z] let us see its action in the time
domain
b
Fp1
[z]x[n] = ebp1 [n] = x[n p + 1] +

p1
j=1

(p1)

apj x[n j + 1] .

(27)

So ebp1 [n] is the error we make, in a backward way, in the prediction of the
data x[n p + 1] using p 1 successive data {x[n], x[n 1], . . . , x[n p + 2]}.
b
We can write the eq. (25) using Fp1
[z]. Let us substitute this relation in the
f
ztransform of the lter Fp [z]
f
b
Fpf [z] = Fp1
[z] + kp Fp1
[z].

(28)

In order to know the FPE lter at the stage p we must know the BPE lter
at the stage p 1.
Also for the backward error we may write in a similar way the relation
f
b
[z] + kp Fp1
[z] .
Fpb [z] = z 1 Fp1

(29)

The equations (28) (29) represent our lattice lter that in the time domain
could be written
efp [n] = efp1 [n] + kp ebp1 [n 1] ,

(30)

kp efp1 [n] .

(31)

ebp [n] = ebp1 [n 1] +

In gure 3 is showed how the lattice structure is used to estimate the forward
and backward errors.
f
ep (n)

b
ep (n)

x(n)

f
e (n)
p+1
k (n)
p+1

z-1

SINGLE STAGE

k (n)
p+1

STAGE
1

eb (n)
p+1
f
e (n)
1
b
e (n)
1

STAGE
2

f
e (n)
2 ....
e b(n)
2

....

STAGE
P

f
e (n)
P
eb
P(n)

Fig. 3. Lattice structure for Durbin lter.

Using a lattice structure we can implement the whitening lter following

these steps:

Parametric Spectral Estimation and Data Whitening

189

estimate the values of the autocorrelation function rxx [k], 0 k P of

our process x[n];
use the Durbin algorithm to nd the reection coecients kp , 1 p P ;
implementation of the lattice lter with these coecients kp initiating the
lter ef0 [n] = eb0 [n] = x[n].
In this way the forward error at the stage P -th is equivalent to the forward
error of a transversal lter and represents the output of the whitening lter.

6 An example of whitening
We will show an example of the whitening procedure in time domain. First of
all, we generate a stochastic process in time domain, using an AR(4) model
with the values for the parameters reported in table 1.

0.01

a1
0.326526

a2
0.338243

a3
0.143203

a4
0.101489

Table 1. Parameters for the simulated noise

These parameters describe a transfer function which is stable and causal,

since all the poles are inside the unit circle (see Fig. 4). We perform an AR(4)
t to this noise, estimating the reection coecients for the whitening lter
and the AR parameter for the PSD t. We obtain a stable and causal tted
lter. In fact in gure 4 we reported in the complex plane the poles obtained
using the Durbin algorithm: they are all inside the unit circle.
AR simulated poles

-1

-2
-2

-1

-2
-2

Poles of the AR estimated model

-1

Fig. 4. Poles for the simulated AR model and poles for the estimated AR t

In gure 5 we report the PSD of this noise process and the AR t. It is

evident that the t reproduces the features of the noise (for realistic examples
see [1, 2, 3]).

190

Elena Cuoco
1
simulated noise
AR fit

S(f)

0.01

0.0001

1e-06
0.01

0.1

1
f

100

Fig. 5. Simulated power spectral density and AR(4) t

In gure 6 we show the PSD of the simulated noise process and the PSD
of the output of the whitening lter applied in the time-domain, using the
estimated reection coecients.
1
Simulated process
Whitened process

S(f)

0.01

0.0001

1e-06
0.01

0.1

1
f

100

Fig. 6. Simulated power spectral density and PSD of the whitened data

The PSD of the output of the whitening lter, as we expected, is at.

References
1. M. Beccaria, E. Cuoco, G. Curci, Adaptive System Identication of VIRGOlike noise spectrum, Proc. of 2nd Amaldi Conference, World Scientic, 1997.
2. E. Cuoco et al, Class.Quant.Grav., 18, 1727-1752, 2001.
3. E. Cuoco et al, Phys. Rev., D 64, 122002, 2001.
4. S. Kay, Modern spectral estimation:Theory and Application, Prentice Hall,
Englewood Clis, 1998.
5. S. Haykin, Adaptive Filter Theory, (Upper Saddle River: Prentice Hall), 1996.
6. S.T. Alexander, Adaptive Signal Processing, (Berlin: Springer-Verlag), 1986.
7. M.H. Hayes, Statistical Digital Signal Processing and Modeling, Wiley, 1996.

Parametric Spectral Estimation and Data Whitening

191

8. C.W. Therrien, Discrete Random Signals and Statistical Signal Processing

(Englewood Clis: Prentice Hall), 1992
9. L.A. Zubakov, V.D. Wainstein, Extraction of signals fromnoise, (Englewood
Clis: Prentice Hall), 1962.
10. E. Parzen, An approach to Time series modeling: determining the order of approximating autoregressive schemes in Multivariate Analysis, North Holland,
1977.
11. B. Widrow, S. D. Stearns, Adaptive Signal Processing, (Englewood Clis:
Prentice Hall), 1985.
12. S.J. Orfanidis, Introduction to Signal Processing, (Englewood Clis: PrenticeHall), 1996.

The Laplace Transform in Control Theory

Martine Olivi
INRIA, BP 93,
06902 Sophia-Antipolis Cedex, FRANCE,
olivi@sophia.inria.fr

1 Introduction
The Laplace transform is extensively used in control theory. It appears in
the description of linear time-invariant systems, where it changes convolution
operators into multiplication operators and allows one to dene the transfer
function of a system. The properties of systems can be then translated into
properties of the transfer function. In particular, causality implies that the
transfer function must be analytic in a right half-plane. This will be explained
in section 2 and a good reference for these preliminary properties and for a
panel of concrete examples is [11].
Via Laplace transform, functional analysis provides a framework to formulate, discuss and solve problems in control theory. This will be sketched
in section 3, in which the important notion of stability is introduced. We
shall see that several kind of stability, with dierent physical meaning can
be considered in connection with some function spaces, the Hardy spaces of
the half-plane. These functions spaces provide with their norms a measure
of the distance between transfer functions. This allows one to translate into
well-posed mathematical problems some important topics in control theory,
as for example the notion of robustness. A design is robust if it works not only
for the postulated model, but also for neighboring models. We may interpret
closeness of models as closeness of their transfer functions.
In section 4, we review the main properties of nite order linear timeinvariant (LTI) causal systems. They are described by state-space equations
and their transfer function is rational. We give the denition of the McMillan
degree or order of a system, which is a good measure of its complexity, and
some useful factorizations of a rational transfer function, closely connected
with its pole and zero structure. Then, we consider the past inputs to future
outputs map, which provides a nice interpretation of the notions of controllability and observability and we dene the Hankel singular values. As claimed
by Glover in [6], the Hankel singular values are extremely informative invari-

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 193209, 2006.
Springer-Verlag Berlin Heidelberg 2006

194

Martine Olivi

ants when considering system complexity and gain. For this section we refer
the reader to [8] and [6].
Section 5 is concerned with system identication. In many areas of engineering, high-order linear state-space models of dynamic systems can be derived
(this can already be a dicult problem). By this way, identication issues
are translated into model reduction problems that can be tackle by means of
rational approximation. The function spaces introduced in section 3 provide
with their norms a measure of the accuracy of a model. The most popular
norms are the Hankel-norm and the L2 -norm. In these two cases, the role of
the Hardy space H 2 with its Hilbert space structure, is determinant in nding
a solution to the model reduction problem. In the case of the Hankel norm,
explicit solutions can be found [6] while in the L2 case, local minima can be
numerically computed using gradient ow methods. Note that the approximation in L2 norm has an interpretation in stochastic identication: it minimizes
the variance of the output error when the model is fed by a white noise. These
approximation problems are also relevant in the design of controllers which
maximize robustness with respect to uncertainty or minimize sensitivity to
disturbances of sensors, and other problems from H control theory. For an
introduction to these elds we refer the reader to [4].
In this paper, we are concerned with continuous-time systems for which
Laplace transform is a valuable aid. The z-transform performs the same task
for discrete-time systems. This is the object of [3] in the framework of stochastic systems. It must be noted that continuous-time and discrete-time systems
are related through a M
obius transform which preserves the McMillan degree
[6]. For some purposes, it must be easier to deal with discrete-time. In particular, the poles of stable discrete-time systems lay in a bounded domain the
unit circle. Laplace transform is also considered among other transforms in
[12]. This paper also provides an introduction to [2].

2 Linear time-invariant systems and their transfer

functions
Linear time-invariant systems play a fundamental role in signal and system
analysis. Many physical processes possess these properties and even for nonlinear systems, linear approximations can be used for the analysis of small derivations from an equilibrium. Laplace transform has a number of properties
that makes it useful for analysing LTI systems, thereby providing a set of
powerful tools that form the core of signal and system analysis.
A continuous-time system is an input-output map
u(t) y(t),
from an input signal u : R Cm to an output signal y : R Cp . It will be
called linear if the map is linear and time-invariant if a time shift in the input
signal results in an identical time shift in the output signal.

The Laplace Transform in Control Theory

195

A linear time-invariant system can be represented by a convolution integral

y(t) =

h(t )u( )d =

h( )u(t )d,

in terms of its response to a unit impulse [11]. The p m matrix function h

is called the impulse response of the system.
The importance of complex exponentials in the study of LTI systems stems
from the fact that the response of an LTI system to a complex exponential
input is the same complex exponential with a change of amplitude. Indeed, for
an input of the form u(t) = est , the output computed through the convolution
integral will be
y(t) =

h( )es(t ) d = est

h( )es d.

Assuming that the integral converges, the response to est is of the form
y(t) = H(s)est ,
where H(s) is the Laplace transform of the impulse response h(t) dened by
H(s) =

h( )es d.

In the specic case in which Re{s} = 0, the input is a complex integral eit at
frequency and H(i), viewed as a function of , is known as the frequency
response of the system and is given by the Fourier transform
H(i) =

h( )ei d.

In practice, pointwise measurements of the frequency response are often

available and the classical problem of harmonic identication consists in nding a model for the system which reproduces these data well enough.
The Laplace transform of a scalar function f (s)
Lf (s) =

est f (t)dt

is dened for those s = x + iy such that

|f ( )|ex d < .

The range of values of s for which the integral converges is called the region of
convergence. It consists of strips parallel to the imaginary axis. In particular,
if f L1 (R), i.e.

196

Martine Olivi

|f (t)|dt < ,

then Lf is dened on the imaginary axis and the Laplace transform can be
viewed as a generalization of the Fourier transform.
Another obvious and important property of the Laplace transform is the
following. Assume that f (t) is right-sided, i.e. f (t) = 0, t < T , and that the
Laplace transform of f converges for Re{s} = 0 . Then, for all s such that
Re{s} = > 0 , we have that

|f ( )|e d =

|f ( )|e d e(0 )T

|f ( )|e0 d,

and the integral converges so that Laplace transform is well dened in

Re{s} 0 . If f L1 (R), then the Laplace transform is dened on the right
half-plane and it can be proved that it is an analytic function there. It is
possible that for some right-sided signal, there is no value of s for which the
Laplace transform will converge. One example is the signal h(t) = 0, t < 0
2
and h(t) = et , t 0.
The importance of Laplace transform in control theory is mainly due to
the fact that it allows to express any LTI system
y(t) =

h(t )u( )d

as a multiplication operator
Y (s) = H(s)U (s),
where
Y (s) =

y( )es d, H(s) =

h( )es d, U (s) =

u( )es d,

are the Laplace transforms. The p m matrix function H(s) is called the
transfer function of the system.
Causality is a common property for a physical system. A system is causal
if the output at any time depends only on the present and past values of the
input. A LTI system is causal if its impulse response satises
h(t) = 0 for

t < 0,

and in this case, the output is given by the convolution integral

y(t) =

h( )u(t )d =

t
0

h(t )u( )d.

Then, the transfer function of the system is dened by the unilateral Laplace
transform

The Laplace Transform in Control Theory

H(s) =

h( )es d,

197

(1)

whose region of convergence is, by what precedes, a right half-plane (if it is

not empty). In the sequel, we shall restrict ourselves to causal systems.
Of course our signals must satisfy some conditions to ensure the existence
of the Laplace transforms. There are many ways to proceed. We shall require
our signals to belong to some spaces of integrable functions and this is closely
related to the notion of stability of a system. This will be the object of the next
section. Via Laplace transform, properties of an LTI system can be expressed
in terms of the transfer function and by this way, function theory brings
insights in control theory.

3 Function spaces and stability

An undesirable feature of a physical device is instability. In this section, we
translate this into a statement about transfer functions. Intuitively, a stable
system is one in which small inputs lead to responses that do not diverge.
To give a mathematical statement, we need a measure of the size of a signal
which will be provided by appropriate function spaces.
We denote by Lq (X) the space of complex valued measurable functions f
on X satisfying
f

q
q

|f (t)|q dt < ,

= sup |f (t)| < ,

if 1 q < ,
if q = .

The most natural measure is the L norm. A signal will be called bounded
if there is some M > 0 such that
u

= sup u(t) < M,

t>0

where . denotes the Euclidean norm of a vector. We still denote by L (0, )

the space of bounded signals, omitting to mention the vectorial dimension. A
system will be called BIBO stable if a bounded input produces a bounded
output.
We may also be interested in the energy of a system which is given by the
integral
u

2
2

u(t) u(t)dt.

We still denote by L2 (0, ) the space of signal with bounded energy.

198

Martine Olivi

Notions of stability are associated with the requirement that the convolution operator
u(t) y(t) = h u(t),
is a bounded linear operator, the input and output spaces being endowed with
some (maybe dierent) norms. This implies that the transfer functions of such
stable systems belong to some spaces of analytic functions, the Hardy spaces
of the right half-plane [7]. We rst introduce these spaces.
3.1 Hardy spaces of the half-plane
The Hardy space H p is dened to be the space of functions f (s) analytic in
the right half-plane which satisfy
f

:= sup

0<x<

|f (x + iy)|p dy

1/p

< ,

when 1 p < , and, when p = ,

sup |f (s)| < .

Re{s}>0

A theorem of Fatou says that, for any f H p , 1 p ,

f0 (iy) = lim f (x + iy),
x0+

exits a.e. on the imaginary axis. We may identify f H p with f0 Lp (iR)

and the identication is isometric, so that we may consider H p as a subspace
of Lp (iR). The case p = 2 is of particular importance since H 2 is an Hilbert
2
2
space. We denote by H
the left half-plane analog of H 2 : that is f H
if
2
and only if the function s f (s) is in H 2 . We may also consider H
as
a subspace of L2 (iR). We denote by + and the orthogonal projections
2
from L2 (iR) to H 2 and H
respectively, and we have
2
.
L2 (iR) = H 2 H

If f L1 (0, ), then Lf is dened and analytic on the right half-plane.

Moreover, we may extend the denition to functions f L2 (0, ), since
L1 (0, ) L2 (0, ) is dense in L2 (0, ). The Laplace transform of a function
f L2 (0, ) is again dened and analytic on the right half-plane and we have
the following theorem [13, Th.1.4.5]
Theorem 1. The Laplace transform gives the following bijections
L : L2 (0, ) H 2 ,
2
L : L2 (, 0) H
,

and for f L2 (0, ) (resp. L2 (, 0))

Lf 2 = 2 f

The Laplace Transform in Control Theory

199

Since we are concerned with multi-input and multi-output systems, vec

torial and matricial versions of these spaces are needed. For p, m N, Hpm
2

and Hpm are the spaces of p m matrix functions with entries in H and
H 2 respectively endowed with the norm
F
F

2
2

sup

F (iw)

<w<

= Tr

F (iw) F (iw)dw,

(2)
(3)

where . denotes the Euclidean norm for a vector and for a matrix, the
operator norm or spectral norm (that is the largest singular value). We shall

2
and Hpm
, the size of the matrix or vector
often write H , H 2 etc. for Hpm
functions (case m = 1) being understood from the context.
3.2 Some notions of stability
We shall study the notions of stability which arises from the following choices
of norm on the input and output function spaces:
stability L L (BIBO). A system is BIBO stable if and only if
its impulse response is integrable over (0, ). Indeed, if h(t) is integrable
and u < M, then
y(t) M
=M
M

t
0
0
0

h(t ) d
h( ) d,
h( ) d,

and y(t) is bounded. Conversely, if h(t) is not integrable, a bounded input

can be constructed which produces an unbounded output (see [13] in the
SISO case and [1, Prop.23.1.1] in the MIMO
case).

stability L2 L2 . By Theorem 1 2 L is a unitary operator from

L2 (0, ) onto the Hardy space H 2 . Thus a system
y(t) = h u(t),
will be L2 L2 stable if its transfer function H is a bounded operator
from H 2 to H 2 . Now, the transfer function is a multiplication operator
MH : U (s) Y (s),
whose operator norm is H
Hardy space H .

given by (2) and H must belong to the

200

Martine Olivi

stability L2 L . The interest of this notion of stability comes from

the fact that it requires that the transfer function H(s) belongs to the
Hardy space H 2 which is an Hilbert space. Indeed, it can be proved that
the impulse response of such a stable system must be in L2 (0, ) and thus
by Theorem 1 its transfer function must be H 2 .

4 Finite order LTI systems and their rational transfer

functions
Among LTI systems, of particular interest are the systems governed by dierential equations
x(t)

= A x(t) + B u(t)

(4)

y(t) = C x(t) + D u(t),

where A, B, C, D are constant complex matrices matrices of type nn, nm,

p n and p m, and x(t) Cn is the state of the system. Assuming x(0) = 0,
the solution is
x(t) =
y(t) =

t
0
0

e(t )A Bu( )d,

Ce(t )A Bu( )d + Du(t),

and the impulse response given by

g(t) = CeAt B + D0 ,
where 0 is the delta function or Dirac measure at 0. Thus g is a generalized
function.
As previously, we denote by the capital roman letter the Laplace transform of the function designated by the corresponding small letter. Laplace
transform possesses the nice property to convert dierentiation into a shift
operator
Lx(s)

= sX(s).
so that the system (4) takes the form
sX(s) = A X(s) + B U (s)
Y (s) = C X(s) + D U (s),
and yields
Y (s) = [D + C(sI A)1 B]U (s),

(5)

The Laplace Transform in Control Theory

201

where G(s) = D + C(sI A)1 B is the transfer function of the system. It is

remarkable that transfer functions of LTI systems are rational.
Conversely, if the transfer function of a LTI system is rational and proper
(its value at innity is nite), then it can be written in the form (see [1])
G(s) = D + C(sI A)1 B.
We call (A, B, C, D) a realization of G and the system then admits a statespace representation of the form (4). A rational transfer function has many
realizations. If T is a non-singular matrix, then (T AT 1 , T B, T 1 C, D) is also
a realization of G(s). A minimal realization of G is a realization in which the
size of A is minimal among all the realizations of G. The size n of A in a
minimal realization is called the McMillan degree of G(s). It represents the
minimal number of state variables and is a measure of the complexity of the
system.
For nite order systems all the notions of stability agree: a system is stable
if and only if all the eigenvalues of A lie in the left half-plane.
To end with this section, we shall answer to some natural questions concerning these rational matrix functions: what is a pole? a zero? their multiplicity
? what could be a fractional representation?
Let G(s) be a rational p m matrix function. Then G(s) admits the Smith
form
G(s) = U (s)D(s)V (s),
where U (s) and V (s) are square size polynomial matrices with constant nonzero determinant and D(s) is a diagonal matrix
D(s) = diag

1 2
r
,
,... ,
, 0, . . . , 0
1 2
r

in which for i = 1, . . . r, i and i are polynomials satisfying the divisibility

conditions
1 /2 / . . . /r ,
r /r1 / . . . /1 .
This representation exhibits the pole-zeros structure of a rational matrix. A
zero of G(s) is a zero of at least one of the polynomial i . The multiplicity of
a given zero in each of the i is called a partial multiplicity and the sum of the
partial multiplicities is the multiplicity of the zero. In the same way, the poles
of G(s) are the zeros of the . They are also the eigenvalues of the dynamic
matrix A. It must be noticed that a complex number can be a pole and a zero
at the same time. For more details on that Smith form, see [8]. It provides
a new interpretation of the McMillan degree as the number of poles of the
rational function counted with multiplicity, i.e. the degree of = 1 2 r .
The Smith form also allows one to write a left coprime polynomial factorization (see [1, Chap.11] or [8]) of the form

202

Martine Olivi

G(s) = D(s)1 N (s),

where D(s) and N (s) are left coprime polynomial matrices, i.e.
D(s)E1 (s) + N (s)E2 (s) = I,

s C,

for some polynomial matrices E1 (s) and E2 (s). In this factorization the matrix
D(s) brings the pole structure of G(s) and the matrix N (s) its zero structure.
This representation is very useful in control theory. In our function spaces context another factorization is more natural. It is the inner-unstable or
Douglas-Shapiro-Shields factorization
G(s) = Q(s)P (s),
where Q(s) is an inner function in H , i.e. such that
Q(iw) Q(iw) = I,

w R,

and P (s) is unstable (analytic in the left half-plane). We shall also require this
factorization to be minimal. It is then unique up to a common left constant
unitary matrix and the McMillan degree of Q is the McMillan degree of G.
The existence of such a factorization follows from Beurlings theorem on shift
invariant subspaces of H 2 [5]. Here again, the inner factor brings the pole
structure of the transfer function and the unstable factor the zero structure. In
many approximation problems this factorization allows to reduce the number
of optimization parameters, since the unstable factor can often be computed
from the inner one. This makes the interest of inner function together with
the fact that inner functions are the transfer function of conservative systems.
4.1 Controllability, observability and associated gramians
The notions of controllability and observability are central to the state-space
description of dynamical systems. Controllability is a measure for the ability to
use a systems external inputs to manipulate its internal state. Observability is
a measure for how well internal states of a system can be inferred by knowledge
of its external outputs.
The following facts are well-known [8]. A system described by a state-space
realization (A, B, C, D) is controllable if the pair (A, B) is controllable, i.e. the
matrix
B AB A2 B An1 B
has rank n, and the pair (C, A) observable, i.e. the matrix

C
CA

CA2

..
.
CAn1

The Laplace Transform in Control Theory

203

has rank n. A realization is minimal if and only if it is both controllable and

observable. Note that the matrix D play no role in this context.
We now give an alternative description of these notions which is more
adapted to our functional framework [6, Sect.2]. If the eigenvalues of A are
assumed to be strictly in the left half-plane, then we can dene the controllability gramian as
P =

eAt BB eA t dt,

and the observability gramian as

eA t C CeAt dt.

It is easily veried that P and Q satisfy the following Lyapunov equations

AP + P A + B B = 0,
A Q + QA + C C = 0.
A standard result is that the pair (A, B) is controllable if and only if P is
positive denite and the pair (C, A) observable if and only if Q is positive
denite.
These gramians can be illustrated by considering the mapping from the
past inputs to the future outputs, g : L2 (, 0) L2 (0, ), given by
(g u)(t) =

CeA(t ) Bu( )d =

CeA(t+ ) Bv( )d,

(6)

where v(t) = u(t) is in L2 (0, ). The mapping g can be view as a composition of two mappings:
u(t) x(0) =

eA Bu( )d,

and
x(0) y(t) = CeAt x(0),
where x(0) is the state at time t = 0. Now, consider the following minimum
energy problem
min

uL2 (,0)

2
2

subject to x(0) = x0 .

Since x0 is a linear function of u(t), the solution u

exists provided that P is
positive denite and is given by the pseudo-inverse

u
(t) = B eA t P 1 x0 .

204

Martine Olivi

It satises
2
2

= x0 P 1 x0 .

If P 1 is large, there will be some state that can only be reached if a large input
energy is used. If the system is realized from x(0) = x0 with u(t) = 0, t 0
then
y

2
2

= x0 Qx0 ,

so that, if the observability gramian Q is nearly singular then some initial

conditions will have little eect on the output.
4.2 Hankel singular values and Hankel operator
We now introduce the Hankel singular values which turn out to be fundamental invariants of a linear system related to both gain and complexity [6]. The
link with complexity will be further illustrated in section 5.1.
The problem of approximating a matrix by a matrix of lower rank was
one of the earliest application of the singular-value decomposition ([10], see
[6, Prop.2.2] for a proof).
Proposition 1. Let M Cpm have singular value decomposition given by
M = U DV,
Dr 0
, Dr = diag(1 , 2 , . . . , r ),
0 0
where 1 2 . . . r > 0 are the singular values of M . Then,

where U , V are square unitary and D =

inf

k
rank M

= k+1 ,
M M

and the bound is achieved by

k = Dr 0 ,
D
0 0

Dk = diag(1 , 2 , . . . , k ).

This result can be generalized to the case of a bounded linear operator

T L(H, K) from an Hilbert space H, to another, K. For k = 0, 1, 2, . . . , the
kth singular value k (T ) of T is dened by
k (T ) = inf{ T R , R L(H, K),

rank R k}.

Thus 0 (T ) = T and
0 (T ) 1 (T ) 2 (T ) 0.
When T is compact, it can be proved that k (T )2 is an eigenvalue of T T
[15, Th.16.4]. Any corresponding eigenvector of T T is called a Schmidt vector

The Laplace Transform in Control Theory

205

of T corresponding to the singular value k (T ). A Schmidt pair is a pair of

vectors x H and y K such that
T x = k (T )y,

T y = k (T )x.

The past inputs to future outputs mapping g associated with a LTI system
by (6) is a compact operator from L2 (, 0) to L2 (0, ). The Hankel singular
values of a LTI system are dened to be the singular values of g . Via the
Laplace transform, we may associate with g , the Hankel operator
2
G : H
H 2,

whose symbol G is the Laplace transform of g. It is dened by

2
G (x) = + (Gx), x H
.

Since g and G are unitarily equivalent via the Laplace transform, they share
the same set of singular values
0 (G) 1 (G) 2 (G) 0.
The Hankel norm is dened to be the operator norm of G , which turns out
to be its largest singular value 0 (G):
G

= G = 0 (G).

Note that
G

sup

uL2 (,0)

y L2 (0,)
,
u L2 (,0)

so that the Hankel norm gives the L2 gain from past inputs to future outputs.
If the LTI system has nite order, then its Hankel singular values correspond to the singular values of the matrix P Q, where P is controllability
gramian and Q the observability gramian. Indeed, let be a singular value of
g with u the corresponding eigenvector of g g : (g g u)(t) = 2 u(t). Then,
since the adjoint operator g is given by
(g y)(t) =

B eA

(t+ )

C y( )d,

we have that

(g g u)(t) = (g y)(t) = B eA t Qx0 ,

so that

u(t) = 2 B eA t Qx0 .

(7)

206

Martine Olivi

Now,
2 x0 =

e(A ) B 2 u( )d = P Qx0 ,

and 2 is an eigenvalue of P Q associated with the eigenvector x0 . Conversely,

if 2 is an eigenvalue of P Q associated with the eigenvector x0 , then is
a singular value of g with corresponding eigenvector of g g given by (7).
A useful state-space realization in this respect is the balanced realization for
which P = Q = diag(0 , 1 , . . . , n1 ).
Remark 1. The Hankel norm of a nite order LTI system doesnt depend on
its D matrix.

5 Identication and approximation

The identication problem is to nd an accurate model of an observed system
from measured data. This denition covers many dierent approaches depending on the class of models we choose and on the data we have at hand. We
shall pay more attention on harmonic identication. The data are then pointwise values of the frequency response in some bandwidth and the models are
nite order linear time-invariant (LTI) systems. A robust way to proceed is
to interpolate the data on the bandwidth into a high order transfer function,
possibly unstable. A rst step consists in approximating the unstable transfer function by a stable one. This can be done by solving bounded extremal
problems (see [2]).
For computational reasons, it is desirable if such a high-order model can
be replaced by a reduced-order model without incurring to much error. This
can be stated as follows:
Model reduction problem: given a pm stable rational matrix function
stable of McMillan degree n < N which
G(z) of McMillan degree N , nd G
minimizes
|.
|G G

(8)

The choice of the norm |. | is inuenced by what norms can be minimized with reasonable computational eorts and whether the chosen norm is an
appropriate measure of error. The most natural norm from a physical viewpoint is the norm . . But this is an unresolved problem: there is no known
numerical method which is guaranteed to converge. In Banach spaces other
than Hilbert spaces, best approximation problems are usually dicult. There
are two cases in which the situation is easier since they involve the Hardy
space H 2 which is an Hilbert space: the L2 -norm and the Hankel norm, since
the Hankel operator acts on H 2 . In this last case an explicit solution can be
computed.

The Laplace Transform in Control Theory

207

5.1 Hankel-norm approximation

In the seventies, it was realized that the recent results on L approximation
problems, such as Neharis theorem and the result of Adamjan, Arov and
Krein on the Nehari-Takagi problem, were relevant to the current problems
of some engineers in control theory. In the context of LTI systems, they have
led to ecient new methods of model reduction.
A rst step in solving the model reduction problem in Hankel-norm is
provided by Neharis theorem. Translated in the control theory framework,
it states that if one wishes to approximate a causal function G(s) by an
anticausal function, then the smallest error norm that can be achieved is
precisely the Hankel-norm of G(s).
Theorem 2. For G H
0 (G) = G

= inf G F
F H

The model reduction problem, known under the name of Nehari-Takagi,

was rst solved by Adamjan, Arov and Krein for SISO systems and Kung and
Lin for MIMO discrete-time systems. In our continuous-time framework, it
can be stated as follow:
Theorem 3. Given a stable, rational transfer function G(s) then
k (G) = inf

k.
McMillan degree of G

H k (G), for all G(s)

The fact that G G

stable and of McMillan
degree k, is no more than a continuous-time version of Proposition 1 [6,
Lemma 7.1]. This famous paper [6] gives a beautiful solution of the computational problem using state-space methods. An explicit construction of a

solution G(s)
is presented which makes use of a balanced realization of G(s)
[6, Th.6.3]. Moreover, in [6] all the optimal Hankel norm approximations are
characterized in state-space form.
Since,

F
= inf G G
F H

the Hankel norm approximation G(s)

can be a rather bad approximant in

L norm. However, the choice of the D matrix for the approximation is

does
arbitrary, since the Hankel-norm doesnt depend on D, while G G
depend on D. In [6, Sect.9, Sect.10.2] a particular choice of D is suggested
which ensures that

k (G) +

j (G).
j>k

208

Martine Olivi

It is often the case in practical applications that G has a few sizable singular
values and the remaining ones tail away very quickly to zero. In that case the
right hand-side can be made very small, and one is assured that an optimal
Hankel norm approximant is also good with respect to the L norm.
5.2 L2 -norm approximation
In the case of the L2 norm, an explicit solution of the model reduction problem
cannot be computed. However, the L2 norm being dierentiable we may think
of using a gradient ow method. The main diculty in this problem is to
describe the set of approximants, i.e. of rational stable functions of McMillan
degree n. The approaches than can be found in the literature mainly dier
from the choice of a parametrization to describe this set of approximants.
These parametrizations often arise from realization theory and the parameters
are some entries of the matrices (A, B, C, D). To cope with their inherent
complexity, some approaches choose to relax a constraint: stability or xed
McMillan degree. They often run into diculties since smoothness can be lost
or an undesirable approximant reached.
Another approach can be proposed. The number of optimization parameters can be reduced using the inner-unstable factorization (see section 4) and
be a best L2 approximant
the projection property of an Hilbert space. Let G
of G, with inner-unstable factorization
= QP,
G
where Q is the inner factor and P the unstable one. Then, H 2 being an Hilbert
must be the projection of G onto the space H(Q) of matrix functions
space, G
of degree n whose left inner factor is Q. We shall denote this projection by

G(Q)
and the problem consists now in minimizing

Q G G(Q)
2,
over the set of inner functions of McMillan degree n.
Then, more ecient parametrizations can be used which arise from the
manifold structure of this set. It consists to work with an atlas of charts, that
is a collection of local coordinate maps (the charts) which cover the manifold
and such that changing from one map to another is a smooth operation. Such
a parametrization present the advantages to ensure identiability, stability of
the result and the nice behavior of the optimization process. The optimization
is run over the set as a whole changing from one chart to another when
necessary. Parametrizations of this type are available either from realization
theory or from interpolation theory in which the parameters are interpolation
values. Their description goes beyond the aim of this paper, and we refer
the reader to [9] and the bibliography therein for more informations on this
approach.

The Laplace Transform in Control Theory

209

References
1. J.A. Ball, I. Gohberg, L. Rodman. Interpolation rational matrix functions,
Birkh
auser, Operator Theory: Advances and Applications, 1990, vol. 45.
2. L. Baratchart. Identication an Function theory. This volume, pages 211.
3. M. Deistler. Stationary Processes and Linear Systems. This volume, pages
159.
4. B.A. Francis. A course in H control theory, Springer, 1987.
5. P.A. Fhurmann. Linear systems and operators in Hilbert Spaces, McGraw-Hill,
1981.
6. K. Glover. All optimal Hankel norm approximations of linear multivariable
systems and their L error bounds, Int. J. Control, 39(6):1115-1193, 1984.
7. K. Hoffman. Banach spaces of analytic functions, Dover publications, New
York, 1988.
8. T. Kailath. Linear systems, Prentice-Hall, 1980.
9. J.-P. Marmorat, M. Olivi, B. Hanzon, R.L.M. Peeters. Matrix rational
H 2 approximation: a state-space approach using Schur parameters, in Proceedings of the CDC02, Las-Vegas, USA.
10. L. Mirsky. Symmetric gauge functions and unitarily invariant norms, Quart.
J. Math. Oxford Ser. 2(11):50-59, 1960.
11. A.V. Oppenheim, A.S. Willsky, S.H. Nawab. Signals and Systems, PrenticeHall, 1997.
12. J.R. Partington. Fourier transforms and complex analysis. This volume, pages 39.
13. J.R. Partington. Interpolation, identication and sampling, Oxford University
Press, 1997.
14. W. Rudin. Real and complex analysis, New York, McGraw-Hill, 1987.
15. N.J. Young. An introduction to Hilbert space, Cambridge University Press,
1988.
16. N.J. Young. The Nehari problem and optimal Hankel norm approximation,
Analysis and optimization of systems: state and frequency domain approach for
innite dimensional systems, Proceedings of the 10th International Conference,
Sophia-Antipolis, France, June 9-12, 1992.

Identication and Function Theory

Laurent Baratchart
INRIA, BP 93, 06902 Sophia-Antipolis Cedex, FRANCE
laurent.baratchart@sophia.inria.fr

1 Introduction
We survey in these notes certain constructive aspects of how to recover an
analytic function in a plane domain from complete or partial knowledge of
its boundary values. This we do with an eye on identication issues for linear
dynamical systems, i.e. one-dimensional deconvolution schemes, and for that
reason we restrict ourselves either to the unit disk or to the half-plane because
these are the domains encountered in this context. To ensure the existence of
boundary values, restrictions on the growth of the function must be made,
resulting in a short introduction to Hardy spaces in the next section. We
hasten to say that, in any case, the problem just mentioned is ill-posed in
the sense of Hadamard [32], and actually a prototypical inverse problem: the
Cauchy problem for the Laplace equation. We approach it as a constrained
optimization issue, which is one of the classical routes when dealing with illposedness [51]. There are of course many ways of formulating such issues;
those surveyed below make connection with the quantitative spectral theory
of Toeplitz and Hankel operators that are deeply linked with meromorphic
approximation. Standard regularization, which consists in requiring additional
smoothness on the approximate solution, would allow us here to use classical
interpolation theory; this is not the path we shall follow, but we warn the
reader that linear interpolation schemes are usually not so extremely ecient
in the present context. An excellent source on this topic and other matters
related to our subject is [39].

2 Hardy spaces
Let T be the unit circle and D the unit disk in the complex plane. We let C(T)
denote continuous functions and Lp = Lp (T) the familiar Lebesgue spaces. For
1 p , the Hardy space H p of the unit disk is the closed subspace of Lp
consisting of functions whose Fourier coecients of strictly negative index do
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 211230, 2006.
Springer-Verlag Berlin Heidelberg 2006

212

Laurent Baratchart

vanish. These are the nontangential limits of functions g analytic in the unit
disk D having uniformly bounded Lp means over all circles centered at 0 of
radius less than 1:
sup

0<r<1

g(rei )

< .

(1)

The correspondence is one-to-one and, using this identication, we alternatively regard members of H p as holomorphic functions in the variable z D.
The extension to D is obtained from the values on T through a Cauchy as well
as a Poisson integral, namely if g H p then:
g(z) =

1
2i

g()
d,
z

z D,

(2)

and also
g(z) =

1
2

ei + z
ei z

g(ei )d,

z D.

(3)

The sup in (1) is precisely g(ei ) p . The space H consists of bounded

analytic functions in D, and by Parsevals theorem we also get that
g(z) H 2

f (z) =

ak z k ,

with

j=0

|aj |2 < .

j=0

If p = 2 it is not easy to characterize H p functions from their Fourier-Taylor

coecients. Very good expositions on Hardy spaces are [19, 22, 30], and we
recall just a few facts here. Actually, we only work with p = 2 and p = ,
but nothing would be gained in this section from such a restriction.
A nonzero g H p can be uniquely factored as g = jw where
w(z) = exp

1
2

2
0

ei + z
log |f (ei )|d
ei z

(4)

belongs to H p and is called the outer factor of g, while j H has modulus

1 a.e. on T and is called the inner factor of g. The latter may be further
decomposed as j = bS , where
b(z) = cz k
zl =0

z l z zl
|zl | 1 zl z

(5)

is the Blaschke product, with order k 0 at the origin, associated to a sequence

of points zl D \ {0} and to the constant c T, while
S (z) = exp

1
2

2
0

ei + z
d()
ei z

(6)

Identication and Function Theory

213

is the singular inner factor associated with , a positive measure on T which

is singular with respect to Lebesgue measure. The zl are the zeros of g in
D \ {0}, counted with their multiplicities, while k is the order of the zero at 0.
If there are innitely many zeros, the convergence of the product b(z) in D is
ensured by the condition l (1 |zl |) < which holds automatically when
g H p \ {0}. If there are only nitely many zl , say n, we say that (5) is a
nite Blaschke product of degree n.
That w(z) in (4) is well-dened rests on the fact that log |g| L1 if
f H 1 \ {0}; this also entails that a H p function cannot vanish on a set
of strictly positive Lebesgue measure on T unless it is identically zero.
Intimately related to Hardy functions is the Nevanlinna class N + consisting of holomorphic functions in D that can be factored as jE, where j is an
inner function, and E an outer function of the form
E(z) = exp

1
2

2
0

ei + z
log (ei ) d
ei z

(7)

being a positive function on T such that log L1 (T), although itself

need not be summable. Such functions again have nontangential limits a.e.
on T that serve as denition for their boundary values, and they are often
instrumental in that N + Lp = H p . In fact, (7) denes an H p -function
with modulus a.e. on T if, and only if, Lp . A useful consequence is
that, whenever g1 H p1 and g2 H p2 , we have g1 g2 H p3 if, and only if,
g 1 g 2 Lp3 .
p of the complement of the disk,
We need also introduce the Hardy space H
p
consisting of L functions whose Fourier coecients of strictly positive index
do vanish; these are, a.e. on T, the complex conjugates of H p -functions, and
they can also be viewed as nontangential limits of functions analytic in C \ D
having uniformly bounded Lp means over all circles centered at 0 of radius
p H
p , consisting of
bigger than 1. We further single out the subspace H
0
functions vanishing at innity or, equivalently, having vanishing mean on T.
p if, and only if, it is of the form ei g(ei ) for
Thus, a function belongs to H
0
p
some g H .
We let Rm,n be the set of rational functions of type (m, n) that can be
written p/q where p and q are algebraic polynomials of degree at most m and
n respectively. Note that a rational function belongs to some H p if, and only
if, its poles lie outside D, in which case it belongs to every H p . Similarly, a
p if, and only if, it can be written as p/q with
rational function belongs to H
deg p deg q where q has roots in D only; in the language of system theory,
p if,
such a rational function is called stable and proper, and it belongs to H
0
and only if, deg p < deg q in which case it is called strictly proper. We dene
Hnp to be the set of meromorphic functions with at most n poles in D, that
may be written g/q where g H p and q is a polynomial of degree at most n
with roots in D only.
We now turn to the Hardy spaces Hp of the right half-plane. These consist
of functions G analytic in + = {s; Re s > 0} such that

214

Laurent Baratchart

sup
x>0

|G(x + iy)|p dy < ,

and again they have nontangential limits at almost every point of the imaginary axis, thereby giving rise to a boundary function G(iy) that lies in
Lp (iR). The space H consists of bounded analytic functions in + , and a
theorem of Paley-Wiener characterizes H2 as the space of Fourier transforms
of functions in L2 (R) that vanish for negative arguments.
p thanks to the isometry :
The study of Hp can be reduced to that of H
0
g (s 1)2/p g

s+1
s1

(8)

p onto Hp . The latter preserves rationality and the degree for p = 2, .

from H
0
For applications to system-theory, it is often necessary to consider functions in H p or Hp that have the conjugate-symmetry g(
z ) = g(z); in the
case of H p this means they have real Fourier coecients, or in the case of
H2 that they are Fourier transforms of real functions. For rational functions
it means that the coecients of p and q are real in the irreducible form p/q.
In the presence of conjugate symmetry, every symbol will be decorated by a
subscript or a superscript R, like in HRp or RR
m,n etc.

3 Motivations from System Theory

We provide some motivation from control and signal theory for some of the
approximation problems that we will consider. The connection between linear
control system and function theory has two cornerstones:
the fact that these systems can be described in the so-called frequency
domain as a multiplication operator by the transfer function which belongs
to certain Hardy classes if the system has certain stability properties;
the fact that rational functions are precisely transfer functions of systems
having nite-dimensional state-space, namely those that can be designed
and handled in practice.
A discrete control system is a map u y where the input u = (. . . , uk1 ,
uk , uk+1 , . . . ) is a real-valued function of the discrete time k, generating an
output y = (. . . , yk1 , yk , yk+1 , . . . ) of the same kind, where yk depends on
uj for j k only. The system is said to be time-invariant if a shift in time of
the input produces a corresponding shift of the output.
Particularly important in applications are the linear systems:

yk =

fj ukj ,
j=0

Identication and Function Theory

215

where the output at time k is a linear combination of the past inputs with
xed coecients fj R. For such systems, function theory enters the picture
when signals are encoded by their generating functions:
uk z k ,

u(z) =

yk z k .

y(z) =

Indeed, if we dene the transfer function of the linear control system to be:

f (z) =

fk z k ,

k=0

the input-output behavior can be described as y(z) = f (z) u(z). In particular:

, and the operator
(i) the system is a bounded operator l2 l2 i f H
R
2 2
norm is f : the system is called (l , l )-stable;
2 , and the operator
(ii) the system is a bounded operator l2 l i f H
R
2
norm is f 2 : the system is called (l , l )-stable.
A linear control system is said to have nite dimension n if it can be
described as a linear automata in terms of a state variable xk Rn which is
updated at each time k Z:
xk+1 = Axk + Buk ,

yk = Cxk + Duk ,

where A is a real n n matrix, B (resp. C) a column (resp. row) vector with

n real entries, and D some real number, n being the smallest possible integer
for which such an equation holds. The classical result here (see e.g. [29], [42])
is that a linear time-invariant system has dimension n i its transfer-function
is rational of degree n and analytic at innity. The transfer-function is then
f (z) = D + C (zIn A)

For nite-dimensional linear systems the requirement that the poles of f

p for some, and in fact all p, which
should lie in {|z| < 1} amounts to f H
is equivalent to any reasonable denition of stability. A much broader picture
is obtained by letting input and output signals be vector-valued and transfer
functions matrix-valued, and indeed many questions to come are signicantly
enriched by doing so; this, however, is beyond the scope of these notes. We now
mention two specic applications of approximation theory in Hardy spaces to
identication of linear dynamical systems. Further applications to control can
be found in [20, 40].
3.1 Stochastic identication
Consider a discrete time real-valued stationary stochastic process:
y = (. . . , y(k 1), y(k), y(k + 1), . . . ).

216

Laurent Baratchart

If it is regular (i.e. purely non-deterministic in a certain sense), we have the

Wold decomposition:

y(k) =

fj u(k j)
j=0

where u is a white noise called the innovation, and where fj is independent

of k by stationarity. We also have, by the Parseval identity, that

fj2 = E y(k)2

j=0

which is independent on k by stationarity. If we set

2,
fk z k H

f (z) =
k=0

we see that a regular process is obtained by feeding white noise to an l2 l

stable linear system [45].
In the special case where f is rational, y is called an Auto-Regressive
Moving Average process, which is popular because it lends itself to ecient
computations. When trying to t such a model, say of order n to y, a typical
interest is in minimizing the variance of the error between the true output
and the prediction of the model. In this way one is led to solve for
min

2
gRR
n,n H

f g 2.

This principle can be used to identify a linear system from observed stochastic
inputs, although computing the fj is dicult because it requires spectral
factorization of the function
E {y(j + k)y(k)}
jZ

whose Fourier coecients can only be estimated by ergodicity through time

averages of the observed sample path of y. In practice, one would rather use
time averages already in the optimization criterion, but this can be proved
asymptotic to the previous problem [26].
To lend perspective to the discussion, let us briey digress on the more
general case where the input is an arbitrary stationary process. Applying the
spectral theorem to the shift operator on the Hilbert space of the process
allows one to compute the squared variance of the output error as a weighted
L2 integral:
1
2

2
0

|f g|2 d,

(9)

Identication and Function Theory

217

where the positive measure is the so-called spectral measure of the input
process (that reduces to Lebesgue measure when the latter is white noise),
2 (d). Though we
and f now has to belong to a weighted Hardy space H
shall not dwell on this, we want to emphasize that the spectral theorem, as
applied to shift operators, stresses deep links between time and frequency
representations of a stochastic process, and the isometric character of this
theorem (that may be viewed as a far-reaching generalization of Parsevals
relation) is a fundamental reason why L2 approximation problems arise in
system theory. The scheme just mentioned is a special instance of maximum
likelihood identication where the noise model is xed [26, 33, 49], that aims
at a rational extension of the Szeg
o theory [50] of orthogonal polynomials.
At this point, it must be mentioned that stochastic identication, as applied to linear dynamical systems, is not just concerned with putting up probabilistic interpretations to rational approximation criteria. Its main methodological contribution is to provide one with a method of choosing the degree
of the approximant as the result of a trade-o between the bias term (i.e.
the approximation error that goes small when the degree goes large) and the
variance term (i.e. the dispersion of the estimates that goes large when the
degree goes large and eventually makes the identication unreliable). We shall
not touch on this deeper aspect of the stochastic paradigm, whose deterministic counterpart pertains to the numerical analysis of approximation theory
(when should we stop increasing the degree to get a better t since all we
shall approximate further is the error caused by truncation, round o, etc.?).
For an introduction to this circle of ideas, the interested reader is referred to
the above-quoted textbooks.
3.2 Harmonic Identication
This example deals with continuous-time rather than discrete-time linear control systems, namely with convolution operators u(t) y(t) of the form:
y(t) =

t
0

h(t )u( ) d.

The function h : [0, ) R is called the impulse response of the system, as

it formally corresponds to the output generated by a delta function. If h and
u have exponential growth, so does y and the one-sided Laplace transforms
Y (s), U (s) and H(s) are dened on some common half-plane {Re z > }. The
system operates in this frequency domain as multiplication by the transferfunction H:
Y (s) = H(s)U (s).
This time, rational transfer-functions of degree n correspond to linear dierential operators of order n forced by the input u.
The Hardy spaces involved are now those of the right half-plane, and their
relation to stability is:

218

Laurent Baratchart

(i) the system is bounded L2 [0, ) L2 [0, ) i H HR ;

(ii) the system is bounded L2 [0, ) L [0, ) i H HR2 ;
(iii) the system is bounded L [0, ) L [0, ) i H WR , the Wiener algebra of the right half-plane consisting of Laplace transforms of summable
functions [0, ) R.
One of the most eective methods to identify a system which sends bounded inputs to bounded outputs is to plug in a periodic input u = eit and to
observe the asymptotic steady-state output which is:
y(t) = ei eit ,
where and are respectively the modulus and the argument of H(i).
In this way, one can estimate the transfer function on the imaginary axis
and, for physical as well as computational purposes, one is often led to rationally approximate the experimental data thus obtained. In practice, the
situation is considerably more complicated, because experiments cannot be
performed on the whole axis and usually the system will no longer behave
linearly at high frequencies. In fact, if designates the bandwidth on the
imaginary axis where experiments are performed, one can usually get a fairly
precise estimate of H| , the restriction of H to , but all one has on iR \
are qualitative features of the model induced by the physics of the system.
Though it seems natural to seek
min

2
GRR
n,n H

H G

L2 () ,

min

GRR
n,n H

H G

L () ,

such a problem is poorly behaved because the optimum may not exist (the best
rational may have unstable poles) and even if it exists it may lead to a wild
behavior o . One way out, which is taken up in these notes, is to extrapolate
a complete model in Hp from the knowledge of H| by solving an analytic
bounded extremal problems as presented in the forthcoming section. Once
the complete model is obtained, one faces a rational approximation problem
in Hp that we will comment upon. The H2 norm is often better suited, due
to the measurement errors, but physical constraints on the global model, like
passivity for instance, typically involve the uniform norm. Figure 1 shows a
numerical example of this two-steps identication scheme on a hyperfrequency
lter (see [5, 44]), and an illustration in the design of transmission lines can be
found in [47]. Often, weights are added in the criteria to trade-o between L2
norms, that tend to oversmooth the data, and L norms that are put o by
irregular samples. We shall not consider this here, and turn to approximation
proper.

4 Some approximation problems

We discuss below some approximation problems connected to identication
along the previous lines. This will provide us with an opportunity to intro-

Identication and Function Theory

219

rational approximation of a hyperfrequency filter

H2 approx. in degree 8
frequency measurements

0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-0.8

-0.6

-0.4

-0.2

0.2

0.4

0.6

0.8

Fig. 1. The dotted line in this diagram is the Nyquist plot (i.e. the image of the
bandwidth on the imaginary axis) of the transfer function of the reexion of a hyperfrequency lter measured by the French CNES (Toulouse). The data were rst
2 bounded extremal problem and then approximated
completed by solving an H
by a rational function of degree 8 whose Nyquist plot has been superimposed on
the gure. The locus is not conjugate-symmetric because a low-pass transformation
sending the central frequency to the origin was performed on the data. This illustrates that approximation with complex Fourier coecients can be useful in system
identication, even though the physical system is real.

duce analytic operators of Hankel and Toeplitz type. We begin with analytic
extrapolation from partial boundary data.
4.1 Analytic bounded extremal problems
For I is a measurable subset of T and J the complementary subset, if h1 is
a function dened on E and h2 a function dened on J, we use the notation
h1 h2 for the concatenated function, dened on the whole of T, which is h1
on I and h2 on J. The L2 (I)/L2 (J) analytic bounded extremal problem is:
[ABEP L2 (I), L2 (J) ]
Given f L2 (I), L2 (J) and a strictly positive constant M , nd g0 H 2
such that
g0 (ei ) (ei )

L2 (J)

and

220

Laurent Baratchart

f g0

L2 (I)

min

gH 2

f g

L2 (I)

(10)

L2 (J) M

We saw in section 3.2 the relevance of such a problem in identication,

although on the line rather than the circle. The isometry (8) transforms the
version there to the present one. Moreover, if I is symmetric with respect to
the real axis and f has the conjugate symmetry (i.e. f 0 has real Fourier
coecients), then g0 also will have the conjugate symmetry because it is
unique by strict convexity.
Problem [ABEP L2 (I), L2 (J) ] may be viewed as a Tikhonov-like regularization of a classical ill-posed issue mentioned in the introduction, namely
how can one recover an analytic function in a disk from its values on a subset of
the boundary circle. In the setting of Hardy spaces, this issue was initially approached using the so-called Carleman interpolation formulas [2]. Apparently
the rst occurrence of (one instance of) Problem [ABEP L2 (I), L2 (J) ] was
investigated in [28], that proceeds in the time rather than the frequency domain. Our exposition below is dierent in that it emphasizes the connection
with Harmonic Analysis and Analytic Operator theory. This approach lends
itself to various generalizations and algorithmic analyses that we shall discuss
briey.
As in any constrained convex problem, one expects the solution of Problem
[ABEP L2 (I), L2 (J) ] to depend linearly on the data via some unknown
Lagrange parameter. This is best expressed upon introducing the Toeplitz
operator :
J : H 2 H 2
g PH 2 (J g)

(11)

with symbol J , the characteristic function of J.

Theorem 1 ([1, 7]). Assume that I has positive measure. Then, there is a
unique solution g0 to (10). Moreover, if f is not the restriction to I of a H 2
function whose L2 (J)-distance to is less than or equal to M , this unique
solution is given by
g0 = 1 + J

PH 2 (f (1 + )),

(12)

where (1, +) is the unique real number such that the right hand side
of (12) has L2 (J)-norm equal to M .
Note that (12) indeed makes sense because the spectrum of J is [0, 1].
Theorem 1 provides a constructive means of solving ABEP L2 (I), L2 (J)
because, although the correct value for is not known a priori, the L2 (J)norm of the right-hand side in (12) is decreasing with so that iterating by
dichotomy allows one to converge to the solution.
Let us point out that H|2I is dense in L2 (I), hence the error in (10) can
be made very small, but this is at the cost of making M very big unless

Identication and Function Theory

221

f H|2I , a circumstance that essentially never happens due to modeling and

measurement errors. In this connection, it is interesting to ask how fast M
goes to + as the error e = f g0 L2 (I) goes to 0. Using the constructive
diagonalization of Toeplitz operators [41] with multiplicity 1, one can get
fairly precise asymptotics when I is an interval. To state a typical result, put
I = (eia , eia ) with 0 < a < , and let W 1,1 (I) denote the Sobolev space of
absolutely continuous functions on I.
Theorem 2 ([6]). Let I be as above and f satises the following two assumptions :
(1 ei eia )1/2 (1 ei eia )1/2 f (ei ) L1 (I) ,

(13)

(1 ei eia )1/2 (1 ei eia )1/2 f (ei ) W 1,1 (I) .

(14)

If we set e = f g0 2L2 (I) , where g0 is the solution to (10), then to each

K1 > 0 there is K2 = K2 (f ) > 0 such that
M 2 K2 e2 exp{K1 e1 } .

(15)

In the above statement, the factor e1 in the exponent cannot be replaced by

h(e) for some function h : R+ R+ such that h(x) = o(1/x) as x
0.
Adding nitely many degrees of smoothness would improve the above rate but
only polynomially, and the meaning of Theorem 2 is that a good approximation is numerically not feasible if f
/ C , because M goes too large, unless
f is very close to the trace of a Hardy function. It is striking to compare this
with the analogous result when f is the trace on I of a meromorphic function:
Theorem 3. If f is of the form h/qN with h H 2 and qN a polynomial of
degree N whose poles lie at distance d > 0 from T. Then
M 2 = O N 2 | log e| ,

(16)

and the Landau symbol O holds uniformly with respect to h

estimate being sharp in the considered class of functions.

and d, the

Comparing Theorems 2 and 3 suggests that the approximation is much easier

if f extends holomorphically in a 2-D neighborhood of I, and that data in
rational or meromorphic form should be favored, say as compared to splines.
The L (I)/L (J) analog can also be addressed:
[ABEP L (I), L (J) ]
Given f L (I), L (J), and a strictly positive constant M , nd
g0 H such that
g0 (ei ) (ei )

L (J)

and

222

Laurent Baratchart

f g0

L (I)

min

gH
L (J) M

f g

L (I)

(17)

A more general version is obtained by letting M be a function in L (J)

and the constraint become |g | M a.e. on J. If /M L (J), this
version reduces to the present one because either log M
/ L1 (J) in which
case the inequality log |g| log M + log(1 + |/M |) shows that g = 0 is the
only candidate approximant, or else log M L1 (J) and we can form the outer
function wM H having modulus 1 on I and M on J; then, upon replacing
f by f /wM and /wM and observing that g belongs to H and satises
|g| M a.e. on J if, and only if, g/wM lies in H and satises g/wM 1
a.e. on J (because g/wM lies by construction in the Nevanlinna class whose
intersection with L (T) is H ), we are back to M = 1. If /M
/ L (J) the
situation is more complicated.
This time we need introduce Hankel rather than Toeplitz operators. Very
nice expositions can be found in [38, 40, 36], the rst being very readable and
the second very comprehensive. Given L , the Hankel operator of symbol
is the operator
2
: H 2 H
0
given by
g = PH 02 (g)
2 . A Hankel opewhere PH 02 denotes the orthogonal projection of L2 (T) onto H
0
rator is clearly bounded, and it is compact whenever it admits a continuous
symbol; note that the operator only characterizes the symbol up to the addition of some H -function. Thus, whenever H + C(T) (the latter is in
fact an algebra), the operator is compact and therefore it has a maximizing
vector v0 H 2 , namely a function of unit norm such that (v0 ) 2 = ||| |||,
the norm of . Let us mention also that Hankel operators of nite rank are
those admitting a rational symbol.
Theorem 4 ([8]). Assume that I has positive measure and that extends
continuously to J. Then, there is a solution g0 to (17). Moreover, if f is not
the restriction to I of a H function whose L (J)-distance to is less than
M , so that the value of the problem is strictly positive, and if moreover
f H + C(T), this solution is unique and given by:
1
g0 = wM/

PH 2 (f )wM/ v0
,
v0

(18)

where wM/ is the outer function with modulus M/ on I and modulus 1 on

J, and v0 is a maximizing vector of the Hankel operator (f )wM/ .

Identication and Function Theory

223

Here again, the solution has conjugate symmetry if the data do. Although
the value of the problem is not known a priori, it is the unique positive real
number such that the right hand side of (18) has modulus M a.e. on J, and
so the theorem allows for us to constructively solve [ABEP L (I), L (J) ]
if a maximizing vector of (f )wM/ can be computed for given . In [9],
generically convergent algorithms to this eect are detailed in the case where
I is an interval and f is C 1 -smooth. They are based on the fact that a
smooth H function may be added to f to make it vanish at the endpoints
of I, and in this case (f )wM/ may suciently well approximated by
rationals, say in H
older norm (uniform convergence is not sucient here, see
[40]). Reference [9] also contains a meromorphic extension.
Analytic bounded extremal problems have been generalized to abstract
Hilbert and Banach space settings, with applications to hyperinvariant subspaces [34, 17, 48, 18]; they can be posed with dierent constraints where
bounds are put on the imaginary part rather than the modulus [27]. In
another connection, the work [3] investigates the problem of mixed type
[ABEP L2 (I), L (J) ], which is important for instance when identifying
passive systems whose transfer-function must remain less than 1 in modulus at every frequency. It turns out that the solution can be expressed very
much along the same lines as [ABEP L2 (I), L2 (J) ], except that this time
unbounded Toeplitz operators appear. We shall not go further into such generalizations, and we rather turn to rational approximation part of the two-step
identication procedure sketched in section 3.
4.2 Meromorphic and rational approximation
We saw in subsection 3.1, and in the second step of the identication scheme
sketched in subsection 3.2, that stable and proper rational approximation of
a complete model on the line or the circle is an important problem from the
system-theoretic viewpoint. Here again, the isometry (8) makes it enough to
consider the case of the circle. We shall start with the Adamjan-Arov-Krein
theory (in short: AAK) which deals with a related issue, namely meromorphic
approximation in the uniform norm.
For k = 0, 1, 2, . . . , recall that the singular values of are dened by the
formula:
sk ( ) := inf ||| A|||,

A an operator of rank k on H 2 .

When H + C(T), the singular values are, by compactness, the square

roots of the eigenvalues of arranged in non-increasing order; a k-th
singular vector is an eigenvector of unit norm associated to sk ( ).
A celebrated connection between the spectral theory of Hankel operators
and best meromorphic approximation on the unit circle is given by AAK
theory [38, 40] as follows. Recall from the introduction the notation Hn for
meromorphic functions with at most n poles in L . The main result asserts
that:

224

Laurent Baratchart

inf

gHn

= sn ( )

(19)

where the inmum is attained; moreover, the unique minimizer is given by

the formula
gn =

vn
P 2 (vn )
= H
,
vn
vn

(20)

where vn is any n-th singular vector of . Formula (20) entails in particular

that the inner factor of vn is a Blaschke product of degree at most n. The
error function gn has further remarkable properties; for instance it has
constant modulus sn ( ) a.e. on T.
From the point of view of constructive approximation, it is remarkable
that the inmum in (19) can be computed, and the problem as to whether
one can pass from the optimal meromorphic approximant in (20) to a nearly
optimal rational approximant has attracted much attention. Most notably, it
is shown in [23] that PH 02 (gn ), which is rational in Rn1,n , produces an L
error within

sj ( )

(21)

j=n+1

of the optimal one out of Rn1,n . To estimate how good this bound requires
a link between the decay of the singular values of and the smoothness of
. The summability of the singular values is equivalent to the belonging of
PH 02 () to the Besov class B11 of the disk [40], but this does not tell how fast
the series converges.
For an appraisal of this, we need introduce some basic notions of potential
theory. For more on fundamental notions like equilibrium measure, potential,
capacity, balayage, as well as the basic theorems concerning them, the reader
may want to consult some recent textbook such as [43]. However, for his
convenience, we review below the main concepts, starting with logarithmic
potentials.
Let E C be a compact set. To support his intuition, one may view E
as a plane conductor and imagine he puts a unit electric charge on it. Then,
if a distribution of charge is described as being a Borel measure on E, the
electrostatic equilibrium has to minimize the internal energy:
I() =

log

1
d(x)d(t)
|x t|

among all probability measures supported on E. This is because on the plane

the Coulomb force is proportional to the inverse of the distance between the
particles, and therefore the potential is its logarithm. There are sets E (called
polar sets or sets of zero logarithmic capacity) which are so thin that the
energy I() is innite, no matter what the probability measure is on E; for

Identication and Function Theory

225

those we do not dene the equilibrium measure. But if E is such that I() is
nite for some probability measure on E, then there is a unique minimizer
for I() among all such probability measures. This minimizer is called the
equilibrium measure (with respect to logarithmic potential) of E, and we denote it by E . For example, the equilibrium measure of a disk or circle is the
normalized arc measure on the circumference, while the equilibrium measure
of a segment [a, b] is
d[a,b] (x) =

(x a)(b x)

dx.

That the equilibrium measure of a disk is supported on its circumference is

no accident: the equilibrium measure of E is always supported on the outer
boundary of E.
Associated to a measure on a set E is its logarithmic potential :
U (x) =

log

1
d(t),
|x t|

which is superharmonic on C with values in (, +].

From the physical viewpoint, this is simply the electrostatic potential corresponding to the distribution of charge . Perhaps the nicest characterization
of E among all probability measures on E is that U E (x) is equal to some
constant M on E, except possibly on a polar subset of E where it may be
less than M . Of necessity then, we have that M = I(E ) because measures
of nite energy like E do not charge polar sets. Points at which U E is discontinuous are called irregular (of necessity they lie on the outer boundary
of E), and E itself is said to be regular if it has no irregular points. All nice
compact sets are regular, in particular all whose boundary has no connected
component that reduces to a point. The regularity of E is equivalent to the
regularity of the Dirichlet problem in the unbounded component V of C \ E,
meaning that for any continuous function f on the outer boundary e E of E,
there is a harmonic function in V which is continuous on V = V e E and
coincides with f on e E.
The number exp{IE } is called the logarithmic capacity of E; conventionally, polar sets have capacity zero. A property that holds in the complement
of a polar set is said to hold quasi-everywhere.
We now turn to Green potentials: when E U where U is a domain
(i.e. a connected open set) whose boundary is non-polar, one can introduce
similar concepts upon replacing the logarithmic kernel log 1/|x t| by the
Green function of U with pole at t. This gives rise to the notions of Green
equilibrium measure, Green potential, and Green capacity. We use them only
when U is the unit disk, in which case the Green function with pole at a is
g(z, a) = log

1 az
.
za

226

Laurent Baratchart

To each probability measure with support in E, we associate the Green

energy:
IG () =

g(z, a)d(z)d(a).

If E is non polar, then among all probability measures supported on E there

is one and only one measure E minimizing the Green energy, which is called
the Green equilibrium measure of E associated with the unit disk. It is the
only probability measure on E whose Green potential
GE (z) =

g(z, a) dE (a)

is equal to a constant M quasi-everywhere on E and less or equal to that

constant everywhere (see e.g. [43]). Of necessity M = IG (E ), and the number 1/M is called the Green capacity of E (note the discrepancy with the
logarithmic case: we do not take exponentials here). It is also referred to as
the capacity of the condenser (T, E).
To explain this last piece of terminology, let us mention that from the
point of view of electrostatics, the Green equilibrium distribution of E is the
distribution of charges at the equilibrium if one puts a positive unit charge
on E (the rst plate of the condenser) and a negative unit charge on T (the
second plate of the condenser).
With these denitions in mind, we are ready to go deeper analyzing our
rational approximation problem. When is analytic outside some compact
K D, it is shown in [52] that, if en is the optimal error in uniform approximation to from Rn,n on C \ D, then
lim sup e1/n
e1/(C)
n

(22)

where C is the capacity of the condenser (K, T).

Equation (22) shows that the decay of the singular values is geometric
when is analytic outside D and across T, and allows for an appraisal of (21)
in this case although this appraisal is pessimistic in that, as was proved in
[37], one actually has:
lim inf e1/m
e1/(2C) .
m
For functions dened by Cauchy integrals over so-called symmetric arcs, this
lim inf is a true limit [24]. Moreover, in the particular case of functions analytic
except for two branchpoints in the disk, the probability measure having equal
mass at each pole of gn converges weak-* (in the dual of continuous functions
with compact support on C) to the equilibrium distribution on the cut K that
minimizes the capacity of (K, T) among all cuts joining the branch points
[12, 31], a hyperbolic analog to Stahls theorem on Pade approximants [46].
Generalizing these results would bring us into current research.

Identication and Function Theory

227

2 rational approximation
Let us conclude with a few words concerning H
of type (n, n). We saw in subsection 3.1 and 3.2 the relevance of this problem
in identication, but still it is basically unsolved. For a comparison, observe
from the Courant minimax principle that the error in AAK approximation to
is
sn ( ) = inf

V Vn

sup
vV

(v),

2 =1

where Vn denotes the collection of subspaces of codimension at least n, by

2 rational approximation is
whereas it is easy to show that the error in H
En () = min (b)
bBn

where Bn denotes the set of Blaschke products of degree at most n. The nonlinear character of Bn makes for a much more dicult problem, and practical
algorithms have to rely on numerical searches with the usual burden of local
minima. Space prevents us from describing in details what is known on this
problem, and we refer the reader to [4] for a survey. Let us simply mention
that in the special case of Markov functions, namely Cauchy transforms of positive measures on a segment that also correspond to the transfer functions of
so-called relaxation systems [53, 16], a lot is known including sharp error rates
[10, 13] asymptotic uniqueness of a critical point for Szeg
o-smooth measures
[14] and uniqueness for all orders and small support [15]. For certain entire
functions like the exponential, sharp error rates and asymptotic uniqueness
of a critical point have also been derived [11], but for most classes of functions the situation is unclear. Results obtained so far concern functions for
which the decay of the error is comparable to the one in AAK approximation
and fairly regular. Finally we point out that, despite the lack of a general
theory, rather ecient algorithms are available to generate local minima e.g.
[21, 25, 35].

References
1. D. Alpay, L. Baratchart, J. Leblond. Some extremal problems linked
with identication from partial frequency data. In J.L. Lions, R.F. Curtain,
A. Bensoussan, editors, 10th conference on analysis and optimization of systems,
SophiaAntipolis 1992, volume 185 of Lect. Notes in Control and Information
Sc., pages 563573. Springer-Verlag, 1993.
2. L. Aizenberg. Carlemans formulas in complex analysis. Kluwer Academic
Publishers, 1993.
3. L. Baratchart, J. Leblond, F. Seyfert. Constrained analytic approximation of mixed H 2 /H type on subsets of the circle. In preparation.
4. L. Baratchart. Rational and meromorphic approximation in Lp of the circle :
system-theoretic motivations, critical points and error rates. In Computational
Methods and Function Theory, pages 4578. World Scientic Publish. Co, 1999.
N. Papamichael, St. Ruscheweyh and E.B. Sa eds.

228

Laurent Baratchart

5. L. Baratchart, J. Grimm, J. Leblond, M. Olivi, F. Seyfert, F. Wielonsky. Identication dun ltre hyperfrequence. Rapport Technique INRIA
No 219., 1998.
6. L. Baratchart, J. Grimm, J. Leblond, J.R. Partington. Asymptotic estimates for interpolation and constrained approximation in H 2 by diagonalization
of toeplitz operators. Integral equations and operator theory, 45:269299, 2003.
7. L. Baratchart, J. Leblond. Hardy approximation to Lp functions on subsets
of the circle with 1 p < . Constructive Approximation, 14:4156, 1998.
8. L. Baratchart, J. Leblond, J.R. Partington. Hardy approximation to
L functions on subsets of the circle. Constructive Approximation, 12:423436,
1996.
9. L. Baratchart, J. Leblond, J.R. Partington. Problems of AdamjanArov
Krein type on subsets of the circle and minimal norm extensions. Constructive
Approximation, 16:333357, 2000.
10. L. Baratchart, V. Prokhorov, E.B. Saff. Best LP meromorphic approximation of Markov functions on the unit circle. Foundations of Constructive
Math, 1(4):385416, 2001.
11. L. Baratchart, E.B. Saff, F. Wielonsky. A criterion for uniqueness of a
critical points in H 2 rational approximation. J. Analyse Mathematique, 70:225
266, 1996.
12. L. Baratchart, F. Seyfert. An Lp analog to the AAK theory. Journal of
Functional Analysis, 191:52122, 2002.
13. L. Baratchart, H. Stahl, F. Wielonsky. Asymptotic error estimates for
L2 best rational approximants to Markov functions on the unit circle. Journal
of Approximation Theory, (108):5396, 2001.
14. L. Baratchart, H. Stahl, F. Wielonsky. Asymptotic uniqueness of best
rational approximants of given degree to Markov functions in L2 of the circle.
2001.
15. L. Baratchart, F. Wielonsky. Rational approximation in the real Hardy
space H 2 and Stieltjes integrals: a uniqueness theorem. Constructive Approximation, 9:121, 1993.
16. R.W. Brockett, P.A. Fuhrmann. Normal symmetric dynamical systems.
SIAM J. Control and Optimization, 14(1):107119, 1976.
17. I. Chalendar, J.R. Partington. Constrained approximation and invariant
subspaces. J. Math. Anal. Appl. 280 (2003), no. 1, 176187.
18. I. Chalendar, J.R. Partington, M.P. Smith. Approximation in reexive
Banach spaces and applications to the invariant subspace problem. Proc. Amer.
Math. Soc. 132 (2004), no. 4, 11331142.
19. P.L. Duren. Theory of H p -spaces. Academic Press, 1970.
20. B. Francis. A course in H control theory. Lecture notes in control and
information sciences. SpringerVerlag, 1987.
21. P. Fulcheri, M. Olivi. Matrix rational H 2 approximation: a gradient algorithm based on Schur analysis. 36(6):21032127, 1998. SIAM Journal on Control
and Optimization.
22. J.B. Garnett. Bounded Analytic Functions. Academic Press, 1981.
23. K. Glover. All optimal Hankelnorm approximations of linear multivariable
systems and their L error bounds. Int. J. Control, 39(6):11151193, 1984.
24. A.A. Gonchar, E.A. Rakhmanov. Equilibrium distributions and the degree
of rational approximation of analytic functions. Math. USSR Sbornik, 176:306
352, 1989.

Identication and Function Theory

229

25. J. Grimm. Rational approximation of transfer functions in the hyperion software. Rapport de recherche 4002, INRIA, September 2000.
26. E.J. Hannan, M. Deistler. The statistical theory of linear systems. Wiley,
New York, 1988.
27. B. Jacob, J. Leblond, J.-P. Marmorat, J.R. Partington. A constrained
approximation problem arising in parameter identication. Linear Algebra and
its Applications, 351-352:487-500, 2002.
28. M.G. Krein, P. Ya Nudelman. Approximation of L2 (1 , 2 ) functions by
minimumenergy transfer functions of linear systems. Problemy Peredachi Informatsii, 11(2):3760, English transl., 1975.
29. R.E. Kalman, P.L. Falb, M.A. Arbib. Topics in mathematical system theory.
Mc Graw Hill, 1969.
30. P. Koosis. Introduction to Hp -spaces. Cambridge University Press, 1980.
stner. Distribution asymptotique des zeros de polyn
31. R. Ku
omes orthogonaux
par rapport a
` des mesures complexes ayant un argument `
a variation bornee.
Ph.D. thesis, University of Nice, 2003.
32. M.M. Lavrentiev. Some Improperly Posed Problems of Mathematical Physics.
Springer, 1967.
33. L. Ljung. System identication: Theory for the user. PrenticeHall, 1987.
34. J. Leblond, J.R. Partington. Constrained approximation and interpolation
in Hilbert function spaces. J. Math. Anal. Appl., 234(2):500513, 1999.
35. J.P. Marmorat, M. Olivi, B. Hanzon, R.L.M. Peeters. Matrix rational H 2
approximation: a state-space approach using schur parameters. In Proceedings
of the C.D.C., 2002.
36. N.K. Nikolskii. Treatise on the shift operator. Grundlehren der Math. Wissenschaften 273. Springer, 1986.
37. O.G. Parfenov. Estimates of the singular numbers of a Carleson operator.
Math USSR Sbornik, 59(2):497514, 1988.
38. J.R. Partington. An Introduction to Hankel Operators. Cambridge University
Press, 1988.
39. J.R. Partington. Robust identication and interpolation in H . Int. J. of
Control, 54:12811290, 1991.
40. V.V. Peller. Hankel Operators and their Applications. Springer, 2003.
41. M. Rosenblum, J. Rovnyak. Hardy classes and operator theory. Oxford
University Press, 1985.
42. H.H. Rosenbrock. State Space and Multivariable Theory. Wiley, New York,
1970.
43. E.B. Saff, V. Totik. Logarithmic Potentials with External Fields, volume 316
of Grundlehren der Math. Wiss. Springer-Verlag, 1997.
44. F. Seyfert, J.P. Marmorat, L. Baratchart, S. Bila, J. Sombrin. Extraction of coupling parameters for microwave lters: Determination of a stable
rational model from scattering data. Proceedings of the International Microwave
Symposium, Philadelphia, 2003.
45. A.N. Shiryaev. Probability. Springer, 1984.
46. H. Stahl. The convergence of Pade approximants to functions with branch
points. J. of Approximation Theory, 91:139204, 1997.
47. J. Skaar. A numerical algorithm for extrapolation of transfer functions. Signal
Processing, 83:12131221, 2003.
48. M.P. Smith. Constrained approximation in Banach spaces. Constructive Approximation, 19(3):465-476, 2003.

230

Laurent Baratchart

derstro
m, P. Stoica. System Identication. PrenticeHall, 1987.
49. T. So
. Orthogonal Polynomials. Colloquium Publications. AMS, 1939.
50. G. Szego
51. A. Tikhonov, N. Arsenine. Methodes de resolution des probl`emes mal poses.
MIR, 1976.
52. J.L. Walsh. Interpolation and approximation by rational functions in the complex domain. A.M.S. Publications, 1962.
53. J.C. Willems. Dissipative dynamical systems, Part I: general theory, Part
II: linear systems with quadratic supply rates. Arch. Rat. Mech. and Anal.,
45:321351, 352392, 1972.

Perturbative Series Expansions: Theoretical

Aspects and Numerical Investigations
Luca Biasco and Alessandra Celletti
Dipartimento di Matematica,
Universit`
a di Roma Tre,
Largo S. L. Murialdo 1,
I-00146 Roma (Italy)
biasco@mat.uniroma3.it

Dipartimento di Matematica,
Universit`
a di Roma Tor Vergata,
Via della Ricerca Scientica 1,
I-00133 Roma (Italy)
celletti@mat.uniroma2.it

Abstract
Perturbation theory is introduced by means of models borrowed from Celestial Mechanics, namely the twobody and threebody problems. Such models
allow one to introduce in a simple way the concepts of integrable and nearly
integrable systems, which can be conveniently investigated using Hamiltonian
formalism. After discussing the problem of the convergence of perturbative series expansions, we introduce the basic notions of KAM theory, which allows
(under quite general assumptions) to state the persistence of invariant tori.
The value at which such surfaces breakdown can be determined by means of
numerical algorithms. Among the others, we review three methods to which
we refer as Greene, Pade and Lyapunov. We present some concrete applications to discrete models of the three dierent techniques, in order to provide
complementary information about the breakdown of invariant tori.

1 Introduction
The dynamics of the planets and satellites is ruled by Newtons law, according
to which the gravitational force is proportional to the product of the masses
of the interacting bodies and it is inversely proportional to the square of their
distance. The description of the trajectories spanned by the celestial bodies
starts with the simplest model in which one considers only the attraction exerted by the Sun, neglecting all contributions due to other planets or satellites.
Such model is known as the twobody problem and it is fully described by
Keplers laws, according to which the motion is represented by a conic. Consider, for example, the trajectory of an asteroid moving on an elliptic orbit
around the Sun. In the twobody approximation the semimajor axis and the
eccentricity of the ellipse are xed in time. However, such example represents

J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 233261, 2006.
Springer-Verlag Berlin Heidelberg 2006

234

Luca Biasco and Alessandra Celletti

only the rst approximation of the asteroids motion, which is actually subject
also to the attraction of the other planets and satellites of the solar system.
The most important contribution comes from the gravitational inuence of
Jupiter, which is the largest planet of the solar system, its mass being equal
to 103 times the mass of the Sun. Therefore, next step is to consider the
threebody problem formed by the Sun, the asteroid and Jupiter. A complete
mathematical solution of such problem was hunted for since the last three centuries. A conclusive answer was given by H. Poincare [18], who proved that
the threebody problem does not admit a mathematical solution, in the sense
that it is not possible to nd explicit formulae which describe the motion of
the asteroid under the simultaneous attraction of the Sun and Jupiter. For
this reason, the threebody problem belongs to the class of nonintegrable
systems. However, since the mass of Jupiter is much smaller than the mass
of the Sun, the trajectory of the asteroid will be in general weakly perturbed
with respect to the Keplerian solution. In this sense, one can speak of the
threebody problem as a nearlyintegrable system (see section 3).
Let us consider a trajectory of the threebody problem with preassigned
initial conditions. Though an explicit solution of such motion cannot be found,
one can look for an approximate solution of the equations of motion by means
of mathematical techniques [2] known as perturbation theory (see section 4),
which can be conveniently introduced in terms of the Hamiltonian formalism
reviewed in section 2. More precisely, one can construct a transformation of
coordinates such that the system expressed in the new variables provides a
better approximation of the true solution. The coordinates transformations
are builded up by constructing suitable series expansions, usually referred to1
as PoincareLindstedt series (see [2]), and a basic question (obviously) concerns their domain of convergence. In the context of perturbation theory, a
denite breakthrough is provided by Kolmogorovs theorem, later extended
by Arnold and Moser, henceforth known as KAM theory by the acronym of
the authors [15], [1], [17]. Let denote the perturbing parameter, such that
for = 0 one recovers the integrable case (in the threebody problem the perturbing parameter is readily seen to represent the JupiterSun massratio).
The novelty of KAM theory relies in xing the frequency, rather than the initial conditions, and in using a quadratically convergent procedure of solution
(rather than a linear one, like in classical perturbation theory). The basic assumption consists in assuming a strongly irrational (or diophantine) condition
on the frequency . In conclusion, having xed a diophantine frequency for
the unperturbed system ( = 0), KAM theory provides an explicit algorithm
to prove the persistence, for a suciently small = 0, of an invariant torus
on which a quasiperiodic motion with frequency takes place; moreover,
Kolmogorovs theory proves that the set of such invariant tori has positive
measure in the phase space.
1

Quoting [2] the Lindstedt technique is one of the earliest methods for eliminating
fast phases. We owe its contemporary form to Poincare.

Perturbative series expansions

235

KAM theorem provides a lower bound on the breakdown threshold of

the invariant torus with frequency , say = c (). Nowadays there are
several numerical techniques which allow to evaluate, with accurate precision,
the transition value c (). One of the most widely accepted methods was
developed by Greene [13] and it is based on the conjecture that the break
down of the invariant torus with rotation number is related to the transition
from stability to instability of the periodic orbits with frequency equal to the
rational approximants to . We remark that such conjecture was partially
justied in [10], [16].
Being the invariant torus described in terms of the PoincareLindstedt
series expansions, it is denitely important to analyze the domain of convergence of such series in the complex parameter plane; we denote by c ()
the corresponding radius of convergence. Numerical experiments suggest that
Greenes threshold coincides with the intersection of such domain with the positive real axis. Whenever the domain of convergence deviates from a circle,
the two thresholds c () and c () may be markedly dierent. The domain of
analyticity can be obtained by implementing Pades method [3], which allows
to approximate the perturbative series by the ratio of two polynomials. The
zeros of the denominators provide the poles, which contribute to the determination of the analyticity domain. In order to evaluate the radius of convergence
one can apply an alternative technique developed in [5], to which we refer as
Lyapunovs method, based on the computation of a quantity related to the
numerical denition of Lyapunovs exponents.
Concrete applications of Greenes, Pades and Lyapunovs techniques are
presented in sections 5 and 6 for a discrete model, known as the standard map.
We consider also variants of such mapping (adding suitable Fourier components) and we analyze the behavior of invariant curves with three dierent
frequencies. Notice that the results (Tables and Figures) provided in sections
5 and 6 are taken from [5].

2 Hamiltonian formalism
Consider a smooth function H := H(y, x) with (y, x) M := Rn Rn ,
n = 1, 2, 3, . . . and the following systems of O.D.E.s:
y(t)
= Hx (y(t), x(t)),

(1)

x(t)

= Hy (y(t), x(t))
(here and in the following Hx (y, x)

H
x (y, x),

Hy (y, x)

H
y (y, x)).

Dene

tH (y0 , x0 ) := (y(t), x(t))

as the solution at time t with initial data y(0) := y0 and x(0) := x0 . We
remark that the value of H along the solution of (1) is constant, i.e.

236

Luca Biasco and Alessandra Celletti

H(y(t), x(t)) H(y0 , x0 ) =: E

(2)

for a suitable E R. The function H is called Hamiltonian, M is the phase

space, (1) are Hamiltons equations associated to H, tH is the Hamiltonian
ow of H at time t starting from (y0 , x0 ) and (2) expresses the preservation
of the energy.
An elementary example of the derivation of the Hamiltonian function associated to a mechanical model is provided by a pendulum. Consider a point of
mass m, which keeps constant distance d from the origin of a reference frame.
Suppose that the system is embedded in a weak gravity eld of strength .
Normalizing the units of measure so that m = 1 and d = 1, the Hamiltonian
describing the motion is given by
H(r, ) =

1 2
r + cos ,
2

(3)

where T R/(2Z) is the angle described by the particle with the vertical
Being onedimensional, the system
axis and r R is the velocity, i.e. r = .
can be easily integrated by quadratures.
Another classical example of a physical model which can be conveniently
studied through Hamiltonian formalism, is provided by the harmonic oscillator. The equation governing the small oscillations (described by the coordinate
x R) of a body attached to the end of an elastic spring is given by Hookes
law, which can be expressed as
x
= 2 x,
for a suitable > 0 representing the strength of the spring. The corresponding
Hamiltonian is given by
H(y, x) =

1 2 1 2 2
y + x ,
2
2

whose associated Hamiltons equations are

y = 2 x,

x = y.

(4)

Equations (4) can be trivially solved as

y(t) = sin(t + ),
x(t) = cos(t + ),

(5)

for suitable constants , related to the initial conditions x(0), y(0). It is rather instructive to use this example in order to introduce suitable coordinates,
known as actionangle variables, which will play a key role in the context of
perturbation theory. Indeed, we proceed to construct a change of coordinates
(I, ) = (y, x), dened through the relation

Perturbative series expansions

tH = tH .

(6)

For (y, x) = (0, 0), we can dene the change of variables

y = 2I cos ,
x=

237

(7)

2I/ sin ,

where I > 0 and T := R/(2Z). Notice that the coordinate I has the
dimension of an action2 , while is an angle. A trivial computation shows that
the previous change of coordinates satises (6). Moreover, we remark that the
new Hamiltonian
K := K(I) := H (I, ) = I
does not depend on . Denoting by := K(I)
become

K(I)
I ,

I = 0, = ,

equations (1)
(8)

whose solution is given by

I(t) I0
(t) t + 0 .

(9)

Notice that inserting (9) in (7), one recovers the solution (5). In view of this
example, we are led to the following
Denition: A change of coordinates verifying (6) is said to be canonical.
The coordinates (I, ) M (where the phase space is M := Rn Tn with
Tn := Rn /(2Z)n ) are called actionangle variables.
Notice that if the Hamiltonian H is expressed in terms of actionangle
variables, namely H = H(I, ), then equations (1) become
= H (I(t), (t)),
I(t)

(t)

= HI (I(t), (t)).

(10)

3 Integrable and nearlyintegrable systems

Let us consider a Hamiltonian function expressed in actionangle variables,
namely H = H(I, ), where (I, ) M := Rn Tn . A system described by a
Hamiltonian H, which does not depend on the angles, namely
H(I, ) = h(I)
2

Namely energytime.

238

Luca Biasco and Alessandra Celletti

for a suitable function h, is said to be (completely) integrable. For these systems, Hamiltons equations can be written as I = 0, = h(I). Correspondingly, we introduce the invariant tori 3
T0 := {(I, ) | I I0 ,

Tn },

run by the linear ow (t) = 0 t + 0 , with 0 := h(I0 ) Rn . If the

frequency 0 is rationally independent (i.e., 0 k = 0 for all k Zn \ {0}),
the torus T0 is called nonresonant and it is densely lled by the ow t
0 t+0 . In this case, the ow is said to be quasiperiodic. If the frequency 0 is
rationally dependent, the torus is called resonant (and it is foliated by lower
dimensional invariant tori). We remark that this case is highly degenerate,
since the probability to have a rationally dependent frequency is zero.
In the previous statements we have fully classied all motions associated
to integrable Hamiltonian systems. A more dicult task concerns the study
of systems which are close to integrable ones. More precisely, consider a weak
perturbation of an integrable Hamiltonian h(I): denoting by R the size of
the perturbation, we can write a perturbed system as
H(I, ; ) := h(I) + f (I, ; ),

(11)

where (I, ) M := Rn Tn , f is a smooth function and the real parameter

is small, i.e. 0 < < 1. The equations of motion (10) become
= f (I(t), (t); ),
I(t)

f
(t)

= h(I(t)) + (I(t), (t); ).

(12)

Mechanical systems described by Hamiltonian functions of the form (11) are

called nearlyintegrable, since for = 0 equations (12) can be trivially integrated (compare with (8), (9)).
In order to provide explicit examples of integrable and nearlyintegrable
Hamiltonian systems, we recall in the following subsections the celebrated
twobody and threebody problems.
3.1 The twobody problem
Consider the motion of an asteroid A orbiting around the Sun S, which is
assumed to coincide with the origin of a xed reference frame. Suppose to
neglect the gravitational interaction of the asteroid with the other objects of
the solar system. The twobody asteroidSun motion is described by Keplers
laws, which ensure that for negative energy the orbit of the asteroid around the
3

Namely tH (T0 ) T0 .

Perturbative series expansions

239

Sun is an ellipse with the Sun located at one of the two foci. The Hamiltonian
formulation of the twobody problem is described as follows. Choose the units
of length, mass and time so that the gravitational constant and the mass of the
Sun are normalized to one. In order to investigate the asteroidSun problem, it
is convenient to introduce suitable coordinates, known as Delaunay variables:
(I1 , I2 ) (L, G) R2 ,

(1 , 2 ) (l, g) T2 ,

whose denition is the following. Denoting by a and e, respectively, the semimajor axis and eccentricity of the orbit of the asteroid, the Delaunay actions
are:

L := a,
G := a(1 e2 ).
(13)
The conjugated angle variables are dened as follows: l is the mean anomaly,
which provides the position of the asteroid along its orbit and g is the longitude of perihelion, namely the angle between the perihelion line and a xed
reference direction (see [20]).
The Hamiltonian function in Delaunay variables can be written as
H(L, G, l, g) := h(L) :=

1
,
2L2

(14)

which shows that the system is integrable, since H = h(L) depends only on
the actions. From the equations of motion L = 0, G = 0, we immediately
recognize that L and G are constants, L = L0 , G = G0 , which in view of (13)
is equivalent to say that the orbital elements (semimajor axis and eccentricity)
do not vary along the motion. Being also g constant (g = 0), we obtain that
the orbit is a xed ellipse with one of the foci coinciding with the Sun. Finally,
the mean anomaly is obtained from l = H(L)
:= (L) as l(t) = (L0 )t + l(0).
L
Let us remark that the Hamiltonian (14) does not depend on all the actions,
being independent on G: such kind of systems are called degenerate and often
arise in Celestial Mechanics.
3.2 The threebody problem
The twobody problem describes only a rough approximation of the motion
of the asteroid around the Sun; indeed, the most important contribution we
neglected while considering Keplers model comes from the gravitational inuence of Jupiter. We are thus led to consider the motion of the three bodies: the
Sun (S), Jupiter (J) and a minor body of the asteroidal belt (A). We restrict
our attention to the special case of the planar, circular, restricted threebody
problem. More precisely, we assume that the Sun and Jupiter revolve around
their common center of mass, describing circular orbits (circular case). Choose
the units of length, mass and time so that that the gravitational constant, the
orbital angular velocity and the sum of the masses of the primaries (Sun and

240

Luca Biasco and Alessandra Celletti

Jupiter) are identically equal to one. Consider the motion of an asteroid A moving in the same orbital plane of the primaries (planar case). Assume that the
mass of A is negligible with respect to the masses of S and J; this hypotheses
implies that the motion of the primaries is not aected by the gravitational
attraction of the asteroid (restricted case). Finally, let us identify the mass
of J with a suitable small parameter . Though being the simplest (non trivial) threebody model, as shown by Poincare [18] such problem cannot be
explicitly integrated (like the twobody problem through Keplers solution).
In order to introduce the Hamiltonian function associated to such problem,
it is convenient to write the equations of motion in a barycentric coordinate
frame (with origin at the center of mass of the SunJupiter system), which
rotates uniformly at the same angular velocity of the primaries. The resulting
system is described by a nearlyintegrable Hamiltonian function with two degrees of freedom (see [20]) with the perturbing parameter representing the
JupiterSun massratio.
We immediately recognize that for = 0 (i.e., neglecting Jupiter), the
system reduces to the unperturbed twobody problem. As described in the
previous section, we can identify the Delaunay elements with actionangle
variables; the only dierence is that in the rotating reference frame the variable
g represents the longitude of the pericenter, evaluated from the axis coinciding
with the direction of the primaries. If H = h(AS) (L) denotes the Hamiltonian
function associated to the asteroidSun problem, it can be shown [20] that
the Hamiltonian describing the threebody problem has the form
H(L, G, l, g; ) := h(AS) (L) G + f (L, G, l, g; ),

(15)

for a suitable analytic function f , which represents the interaction between

the asteroid and Jupiter. The Hamiltonian (15) is a prototype of a nearly
integrable system, since the integrable twobody problem is recovered as soon
as the perturbing parameter is set equal to zero.
The actionangle formalism provides a standard tool to solve explicitly the
equations of motion associated to integrable systems; on the other side, one
could expect that for small perturbations the behavior of nearlyintegrable
systems is similar to the integrable ones. By (12) this remark is obviously true
as long as the time is less than 1/, though the question remains open for longer time scales. In order to face this problem, one could naively try to perform
a change of variables, which transforms the nearlyintegrable system (11) into
a trivially integrable one (at least for small enough). However, a natural question concerns the existence of a canonical transformation := (J, ) (I, )
such that on a given time scale the nearlyintegrable Hamiltonian system (11)
is transformed into a new Hamiltonian system, K := H , which does not
depend on the new angle coordinates on a given time scale; perturbation
theory will provide a tool to investigate such strategy of approaching nearly
integrable systems.

Perturbative series expansions

241

4 Perturbation theory
Perturbation theory owered during the last two centuries through the works
of Laplace, Leverrier, Delaunay, Poincare, Tisserand, etc.; it provides constructive methods to investigate the behavior of nearlyintegrable systems.
The importance of studying the eects of small Hamiltonian perturbations
on an integrable system was pointed out by Poincare, who referred to it as
the fundamental problem of dynamics. We introduce perturbation theory in
section 4.1 and we present in section 4.2 the celebrated KolmogorovArnold
Moser theorem.
4.1 Classical perturbation theory
Consider an analytic Hamiltonian of the form
H(I, ; ) := h(I) + f (I, ; ),

(16)

where I BR := {I Rn , |I| R}, Tn and || for suitable real

constants R > 0, > 0. Let us expand f in Taylor series of as
j f (j) (I, ),

f (I, ; ) =:
j0

for suitable functions f (j) (I, ), which can be expanded in Fourier series as
(j)
f (j) (I, ) = kZn fk (I)eik . We implement an integrating transformation
: (J, ) (I, ) ,
dened by the implicit equations
I = J + S(J, ; ),

= + J S(J, ; ),

(17)

where the generating function S can be expanded as a PoincareLindstedt

series of the form
j S (j) (J, ; )

S(J, ; ) =
j0

for suitable analytic functions S (j) to be determined as follows.

By the implicit function theorem it is simple to prove that, for small
enough, (17) denes a good dieomorphism, which is also canonical. We want
to determine recursively S (j) so that the function K := H does not depend
on :
(J, ) = (J +

S (0) (J, ; )
+ O(2 ), + O()).

To this end, we start by expanding H as

242

Luca Biasco and Alessandra Celletti

(H )(J, ) = h(J) + h(J) S (0) (J, ; ) + f (0) (J, ) + O(2 ).

(18)
We look for S (0) so that the term of order does not depend on the angles;
we are thus led to the equation
h(J) S (0) (J, ; ) + f (0) (J, ) = h(1) (J; ),

(19)

(0)

where h(1) (J; ) f0 (J) and

(0)

S (0) (J, ) := i
k=0

fk (J) ik
e
.
h(J) k

(20)

The above expression contains a quantity at the denominator to which we

refer as the small divisor, since it can become arbitrarily small:
h(J) k.

(21)

Indeed, the function S (0) can be dened only for values of the actions such that
the small divisors (21) are dierent from zero, namely only for J belonging to
the set of rational independent frequencies
:= {J Rn , such that h(J) k = 0, k Zn \ {0} }.
Since has empty interior, if we want to dene S (0) in an open neighborhood
of a xed J0 , we must truncate the series in (20) up to a suitable order, say
|k| d0 for a given d0 Z+ . In particular, let us write the Fourier expansion
of f (0) as
(0)

f (0) (J, ) =

(0)

fk (J)eik +
|k|d0

fk (J)eik ;

(22)

|k|>d0

choosing a suciently large value of the truncation index d0 := d0 (), we can

make the second sum in (22) of order , so that it will nally contribute to
the O(2 )term in the development (18). In summary, we have eliminated the
angles in the expression (18) up to O() by using the transformation associated
to S (0) , which is dened for J B 0 (J0 ), for a suciently small 0 := 0 ()
such that no zerodivisors will occur, i.e.
h(J) k = 0,

0 < |k| d0 (),

J B

0 ()

(J0 ).

In order to eliminate the angle dependence in (18) for the terms of order
2 , 3 , . . . , m+1 , . . . , we determine S (1) , S (2) , . . . , S (m) , . . . , by solving equations similar to (19). Again, we need to truncate the series associated to S (m)
at the orders |k| dm for suciently large indexes
dm := dm () for

m .

Perturbative series expansions

Therefore, S (m) will be dened only for J B

values of the radii, such that
m

m ()

0 for

243

(J0 ) for suciently small

m .

As a consequence, we remark that the radii of the domains of denition of the

functions S (0) , S (1) , . . . , S (m) , . . . will drastically shrink to 0 as m increases,
so that we cannot expect that the overall procedure will converge on an open
set.
Finally, iterating innitely many times the above procedure we can formally write the resulting Hamiltonian function as
K(J; ) := H (J, ; ) = h(J) + h (J; ),
for a suitable function h . However, we are immediately faced with the problem of the convergence of such procedure, namely with the question of the
existence of J0 and 0 > 0, such that K(J0 ; ) is welldened for || 0 .
In general, as shown by Poincare, the answer to this question is negative. We
report here an example quoted in [2] of a diverging PoincareLindstedt series.
Consider the Hamiltonian

H(I, ; ) := I + I1 +

ak sin(k ) ,

(23)

kN2 \{0}

(I, ) R2 T2 , where ak := exp(|k|) and = (1 , 2 ), 1 < 0, 2 > 0 with

1 /2 R\Q. For = 0 the phase space is foliated by nonresonant invariant
tori, wrapped by the quasiperiodic ow t t + 0 . In this example, it is
very simple to evaluate the PoincareLindstedt series at any order, though it
is not necessary for our purposes. Indeed, we just notice that if the Poincare
Lindstedt series converges, we can dene the canonical transformation :
(J, ) (I, ), which integrates the system, i.e. H (J, ) =: K(J; ).
Therefore, from (6) the values of I1 and I2 remain bounded in time, since J
is a constant vector. On the other hand, we can readily integrate (23) and in
particular we have 1 (t) = (1 + )t + 1 (0) and 2 (t) = 2 t + 2 (0), so that
the solution for I1 (with initial condition (0) = 0) is given by
I1 (t) = I1 (0)

ak k1
kN/{0}

t
0

cos

(1 + )k1 + 2 k2 t dt,

(24)

which can be solved by quadratures. Moreover, whenever

1 +
p
=
2
q

for some

p Z, q N,

the sum in (24) over all the terms with (k1 , k2 ) = (q, p) gives a uniformly
bounded contribution (for any t R), while the term with (k1 , k2 ) = (q, p)

244

Luca Biasco and Alessandra Celletti

gives a contribution which goes to when t , since we identically

have
(1 + )k1 + 2 k2 = 0.
This computation allows to conclude that the PoincareLindstedt series diverges.
As a nal remark, we note that if the PoincareLindstedt series converges
for some J0 and || 0 , then for any || 0 the perturbed system
admits the invariant torus
T0 := (J0 , ; ),

with frequency 0 := h(J0 ) + h(1) (J0 ; ) + .... In general, the rational

dependence of the vector 0 will vary according to the values of . Therefore
the torus T0 will be nonresonant or resonant according to the values of . As
we have seen before, a resonant torus is foliated into lower dimensional tori;
actually, this situation is highly degenerate and it determines the intrinsic
reason for the divergence of the PoincareLindstedt series.
4.2 KAM theory
The KolmogorovArnoldMoser (KAM) theory provides a constructive
method to investigate nearlyintegrable Hamiltonian systems [15], [1], [17]. In
analogy to the PoincareLindstedt theory, the basic idea relies on the elimination of the angle variables through suitable changes of coordinates, with the
further requirement that the sequence of transformations will be quadratically
convergent. Ignoring the size of the contribution of the small divisors, after
performing the mth change of coordinates the part of the remainder which
m
depends on the angle variables will be of order 2 . The superconvergent estimate of the remainder terms counteracts the inuence of the small divisors
ensuring the convergence of the KAM procedure on a suitable nonresonant set.
Indeed, the second ingredient of KAM theory is to focus on a given strongly
rationally independent frequency, rather than on a given action.
To be more precise, consider an analytic Hamiltonian of the form (16). Fix
a rationally independent frequency vector 0 , such that 0 := h(J0 ) for a
suitable J0 Rn . We perform a canonical change of variables 1 , implicitly
dened as in (17) with S = S (0) , where S (0) can be determined as in section
4.1. Due to the strong rational independence of 0 , for J J0 the following
estimate on the small divisors holds
|h(J) k| |0 k|
The transformed Hamiltonian becomes

(25)

Perturbative series expansions

245

K1 := H 1 = h(J) + h(1) (J; ) + 2 R2 (J, ; )

for a suitable remainder function R2 .
Next step is substantially dierent from the corresponding case of section 4.1; let h (J; ) := h(J) + h(1) (J; ) be the integrable part and let
2 R2 (J, ; ) be the perturbation. The crucial advantage is that after one
transformation the new perturbative parameter is := 2 with Hamiltonian
K2 = h + R2 . Then we introduce a second canonical change of variables
2 , dened in a neighborhood of a suitable J1 () which is chosen to satisfy
h (J1 (); ) = 0 ; this relation will ensure a good estimate on the small divisors as in (25). For small enough, we can nd J1 () by the implicit function
theorem, if we assume that
det h (J0 ) = 0.

(26)

It is easy to see that after the last change of variables the new remainder term
2
will be of order 2 = 2 .
Iterating this procedure, at the mstep the angledepending remainder will
m
be of order 2 . Finally, applying innitely many times the KAM scheme, we
end up with an integrable Hamiltonian of the form
K(J ; ) := H 1 2 . . . =: H (J , ; ),

(27)

which is dened for J = J () and it satises

K(J (); ) = 0 .
We have thus proved the existence of the invariant torus
T0 := (J (), ; ),

Tn ,

provided that the overall procedure converges on a non trivial domain. Actually such domain results to be a Cantor set; the equality (27) holds precisely
on such set and can be dierentiated innitely many times on it, see [19], [7].
Remark 1. Before stating the KAM theorem, let us summarize the main differences between PoincareLindstedt series and KAM procedures.
In the rst case:
(1) after the mth coordinates transformation the remainder term (depending
upon angles) is of order m+1 ,
(2) the initial datum J0 is kept xed, while the nal frequency h(J0 ; ) (respectively the invariant surface Th(J0 ;) ) varies with , eventually becoming rationally dependent (respectively, a resonant torus).
Concerning the KAM procedure we recall that:

246

Luca Biasco and Alessandra Celletti

(1) after the mth coordinates transformation the remainder term (depenm
ding upon angles) is of order 2 (ignoring the contribution of the small
divisors4 ),
(2) the frequency vector 0 is xed and it is supposed to be strongly rational
independent, so that the corresponding torus T0 is strongly non resonant.
We now proceed to state the KAM theorem as follows. Consider a Hamiltonian system as in (11). As we discussed previously, for = 0 the phase
space of the unperturbed (integrable) system h is foliated by ndimensional
invariant tori labeled by I = I0 . Such tori are resonant or nonresonant according to the fact that the frequency 0 := h(I0 ) is rationally dependent or
not. KAM theorem describes the fate of nonresonant tori under perturbation.
We recall that the three assumptions necessary to prove the theorem are the
following: the smallness of , the strong rational independence of 0 and the
nondegeneracy of the unperturbed Hamiltonian h as given in (26).
Theorem 1 (KAM Theorem). If the unperturbed system is nondegenerate,
then, for a suciently small perturbation, most nonresonant invariant tori
do not breakdown, though being deformed and displaced with respect to the
integrable situation. Therefore, in the phase space of the perturbed system
there exist invariant tori densely lled with quasiperiodic phase trajectories
winding around them. These invariant tori form a majority, in the sense that
the measure of the complement of their union is small when the perturbation
is small.
The proof of the KAM theorem is based on the superconvergent procedure
described above. We remark that the strong rational independence of 0 (see
(25)) plays a central role. In particular, the KAM scheme works if 0 satises
the socalled Diophantine condition
|0 k|

,
|k|

k Zn \ {0},

(28)

for a suitable = () > 0 and > n 1.

More precisely, it results that all tori having Diophantine frequency
vector with

> const.
are preserved under the perturbation.
We remark that the KAM result is global, in the sense that for all 0
(0 xed) and for all Diophantine frequency vectors 0 satisfying (28), the
Hamiltonian system (11) admits an invariant perturbed torus with frequency
vector 0 . We refer to [9], [11], [6], [12] for dierent methods to construct
4

Taking into account the contribution of the small divisors, one can nevertheless
obtain a superexponential decay even if it is not strictly necessary for the convergence of the KAM scheme.

Perturbative series expansions

247

invariant tori through a classical proof of the convergence of the perturbative

series involved.
It is worth stressing that KAM theorem can be also stated as follows.
Consider an unperturbed ( = 0) surface with a xed diophantine frequency
0 and look at its fate when the perturbation is switched on ( > 0). Then, for
small enough, the torus persists, being deformed and displaced with respect
to the integrable limit, until the perturbing parameter reaches a threshold
c := c (0 ) at which the surface is destroyed (i.e., it looses regularity). For
lowdimensional systems, KAM theory allows to prove a strong stability result
concerning the connement of the action variables in phase space.
Theorem 2. In a nondegenerate system with two degrees of freedom, for
0 (for a suitable suciently small 0 > 0) and for all initial conditions,
the values of the actions remain forever near their initial values.
The above statement is based on the following remark. For a two degrees of
freedom Hamiltonian system, the phase space is fourdimensional, the energy
level sets (on which the motion takes place) are threedimensional, while the
KAM tori are twodimensional, lling a large part of each energy level. Any
orbit starting in the region between two invariant tori remains forever trapped
in it.
As an immediate corollary of the previous theorem, the stability of the
actions in the planar, circular, restricted threebody problem follows. Using
the original versions of Arnolds theorem, M. Henon [14] proved the existence
of invariant tori provided that the perturbing parameter is less than 10333 ;
this value can be improved by implementing Mosers theorem, which yields
an estimate of the order of 1050 . We recall that the perturbing parameter
represents the ratio of the masses of Jupiter and the Sun; its astronomical
value amounts to about 103 .
Accurate analytical estimates based on a computerassisted implementation allow to drastically improve such results by showing the existence of
invariant tori on a preassigned energy level for parameter values less or equal
than 103 , in agreement with the physical measurements (see [4] for further
details).

5 A discrete model: the standard map

In the previous sections we focused our attention on continuous systems; now
we want to introduce discrete systems, which can be viewed as surfaces of
section of Hamiltonian ows. Such models are denitely simpler than continuous systems, since their evolution can be known without introducing any
numerical error due to integration algorithms, though roundo errors cannot
be avoided. Let us start by considering the familiar pendulum model described

248

Luca Biasco and Alessandra Celletti

by equation (3). In order to compute numerically the trajectory associated to

Hamiltons equations for (3), i.e.
r = sin
= r,

(29)

we can use a leap-frog method, such that if T is the timestep and (rn , n )
denotes the solution at time nT , then (29) can be integrated as
rn+1 = rn + T sin n
n+1 = n + T rn+1 .
Normalizing the timestep to one, we are reduced to the study of the socalled
standard map introduced by Chirikov in [8]:
rn+1 = rn + sin n
n+1 = n + rn+1 ,

(30)

with rn R and n T R/(2Z). We will also consider the generalized standard map, which is obtained replacing the sine term in (30) by any
periodic, continuous function f ():
rn+1 = rn + f (n )
n+1 = n + rn+1 = n + rn + f (n ),

(31)

where rn R and n T R/(2Z). We notice that the Jacobian associated

to (31) is identically one, being the map areapreserving. We remark also that
for = 0 the mapping reduces to a simple rotation, i.e.
rn+1 = rn = r0
n+1 = n + r0 .
We refer to r0 as the frequency or rotation number of the unperturbed
mapping, which is generally dened as
lim

n
.
n

In the unperturbed case the motion takes always place on an invariant circle.
If Q, the trajectory described by the iteration of the mapping with initial data (r0 , 0 ) is a periodic orbit; if R\Q, the motion lls densely an
invariant curve with frequency , say C0, . When = 0 the system becomes nonintegrable and we proceed to describe the conclusions which can be
drawn by applying perturbation theory. As we described in the previous sections, KAM theorem [15], [1], [17] ensures that if is suciently small, there
still exists for the perturbed system an invariant curve C, with frequency .

Perturbative series expansions

249

The KAM theorem can be applied provided that the frequency satises a
strong irrationality assumption, namely the diophantine condition (28), which
can now be rephrased as
|

p
1

|
,
2
q
Cq 2

p, q Z \ {0},

for some positive constant C. As the perturbing parameter is increased, the

invariant curve becomes more and more distorted and displaced, until one
reaches a critical value at which C, breaksdown. Let us dene the critical
breakdown threshold c () as the supremum of the positive values of , such
that there still exists an invariant curve with rotation number .
We provide a mathematical denition of C, as the invariant curve described by the parametric equations
rn = + U (n ) U (n ),
n = n + U (n ),

(32)

where n T and the conjugating function U (n ) is 2periodic and analytic

in ; moreover, the parameterization is dened so that the ow is linear, i.e.
n+1 = n + .
The function U can be expanded as a Taylor series in and a Fourier series
in (the socalled PoincareLindstedt series) as

U (n )

Uj (n ) =
j=1

j=1

lj eiln
U

j
lZ

lj . We defor suitable Taylor coecients Uj and FourierTaylor coecients U

ne the radius of convergence of the PoincareLindstedt series as
c ()

= inf (lim sup |Uj ()|1/j )1 .

We intend to study the relation between the two quantities c () and

()
in the complex parameter space (i.e., taking C). In particular, we
c
want to investigate the shape of the domain of convergence of the Poincare
Lindstedt series and the location of the complex singularities. This task will be
performed by looking at the behavior of the periodic orbits, which approach
more closely the invariant curve with frequency . We stress that if the domain
of analyticity is not a circle, then the analyticity radius and the critical break
down threshold might be very dierent.
The tools adopted to investigate the shape of the analyticity domains are
based on dierent methods: Greenes technique [13], the computation of Pade
approximants and a recent method developed in [5].

250

Luca Biasco and Alessandra Celletti

5.1 Link between periodic orbits and invariant curves

In order to familiarize with the concepts of periodic orbits and invariant curves, let us consider a specic example provided by the mapping (30) and by
the

invariant curve with frequency proportional to the golden ratio: = 2 51

2 .
As it is well known, the golden ratio is approximated by the sequence of
Fibonaccis numbers {Fk /Fk+1 } Q, dened as
Fk+1 = Fk + Fk1

(k 1)

with

F0 = 0,

F1 = 1.

Figure 1 shows in the (, r)plane (with normalized by a factor 2), how the
dierent periodic orbits with frequency 2Fk /Fk+1 approach the goldenmean
invariant curve.

1
1/ 1
0.9

0.8
2/ 3
0.7

3/ 5

0.6
1/ 2

0.5
0

0.2

0.4

0.6

0.8

Fig. 1. The invariant curve with rotation number = 2 51

and its approxima2
ting periodic orbits with frequencies (proportional by a factor 2) 1, 1/2, 2/3, 3/5
for the mapping (30) in the (, r)plane.

In order to explore the link between periodic orbits and invariant curves,
it is useful to recall some simple notions of number theory and in particular
about rational approximants.
For any R, let the continued fraction representation of be dened
as the sequence of integer numbers aj , such that
= a0 +

1
a1 +

1
a2 +...

[a0 ; a1 , a2 , ...] .

If R \ Q, then the sequence of the {aj }s is innite; if Q, the sequence

is nite: {aj } = {a1 , ..., aN }. For example, in the case of the golden ratio, one
has:

Perturbative series expansions

251

51
= [0; 1, 1, 1, 1...] [0; 1 ] .
2

Those numbers whose continued fraction expansion is eventually 1, i.e.

= [a0 ; a1 , ..., aN , 1, 1, 1, 1....]
for some integer N , are called noble numbers. Let {pj /qj } be the sequence of
rational approximants to , whose terms are computed as the truncations of
the continued fraction expansion, i.e.
p0
= a0
q0
p1
1
= a0 +
q1
a1
p2
1
= a0 +
q2
a1 +

1
a2

...
The rational numbers {pj /qj } are the best rational approximants to the irrational number .
5.2 Perturbative series expansions
Let us consider the standard map dened in (30); by the relations n+1 n =
rn+1 and n n1 = rn , one obtains that must satisfy the equation
n+1 2n + n1 = sin n .

(33)

Let us look for a periodic solution with frequency = 2p/q, satisfying the
periodicity conditions
n+q = n + 2p

(34)

rn+q = rn .
In analogy to (32), we parameterize the solution as
n = n + u(n ),

where n T and u(n ) is 2pperiodic; moreover, the ow is linear with

frequency 2p/q: n+1 = n + 2p/q (notice that the conditions (34) are
trivially satised). Next, we expand u in FourierTaylor series as

u(n )
j=1

uj (n )j =

j=1

min(j,q)

alj sin(ln ),
l=1

(35)

252

Luca Biasco and Alessandra Celletti

where the real coecients alj will be recursively determined by means of (33).
Notice that the reason for which the summation ends at min(j, q) is due
to the fact that one needs to avoid zero divisors, as an explicit computation
shows.
From (33), one nds that u must satisfy the relation
p
p
u(n + 2 ) 2u(n ) + u(n 2 ) = sin(n + u(n )).
q
q

(36)

The coecients alj can be obtained by inserting the series expansion (35) in
(36) and equating same orders of . More precisely, dening l,j such that

j+1

j=0

l=1

sin(n + u(n ))

l,j+1 sin(ln ) j ,

one obtains (see [5]) that the coecients alj in (35) are given by
alj =

lj
.
2[cos(2lp/q) 1]

At the order q , one meets a singularity whenever l = q. In order to bypass

this problem, one is forced to select n so that the sine term in (35) is zero,
i.e.
sin(qn ) = 0 ;

(37)

such choice compensates the zero term (for l = q) occurring in

cos(2q

p
) 1.
q

Equation (37) provides a value for n as well as for 0 , since n = 0 +2np/q.

Correspondingly, one has two solutions (modulus 2), given by
0 =

and

0 =

2
.
q

The two solutions are stable or unstable according to the parity of q. In

particular /q is stable for q odd, unstable for q even, while the opposite
situation occurs for 2/q. Due to (37) the coecient aqq is not determined by
the recursive formulae; however, this term does not contribute to the general
solution since sin(q0 ) = 0.
Remark 2. For an invariant curve with rotation number , the conjugating
function must satisfy the equation
U (n + ) 2U (n ) + U (n ) = sin(n + U ()).
Notice that in this case the function U = U (n ; ) depends explicitly on the
parametric coordinate n .

Perturbative series expansions

253

6 Numerical investigation of the breakdown threshold

6.1 Pad
e approximants
A numerical investigation of the analyticity domains of periodic orbits is performed using the Pade approximants of order [N/N ], i.e. PN (), QN (), such

PN ()
that j=1 uj j Q
+ O(2N +1 ). The shape of the analyticity domain
N ()
will be provided by the zero of the denominators. In particular, having xed
a value for 0 , we consider the series

u(0 ) =

uj (0 )j ,

(38)

j=1

We compute the Pade approximants of order [200/200], where for consistency

the coecients uj (0 ) must be calculated with a precision of 400 decimal
digits. False poles have been discarded by comparison with the zeros; indeed,
we recognize a pole as spurious whenever its coordinates are close to a pole
within a suitable tolerance parameter.
We study the periodic orbits generated by the best approximants to the
rotation numbers dened as = 2[0; 1 ], 1 = 2[0; 3, 12, 1 ], 2 =
2[0; 2, 10, 1 ]; the corresponding sequences of rational approximants are

1 2 3 5 8 13 21 34 55 89
, ...}
{ , , , , , , , , ,
2 3 5 8 13 21 34 55 89 144
2
1 12 13 25 38
1
{ , , , ,
, ...}
3 37 40 77 117
2
{

10 11 21 32 53 85
2
, , , ,
,
, ...}
.
2
21 23 44 67 111 178

Figure 2 (see [5]) shows the Pade approximants of the periodic orbits 3/5,
13/21, 34/55, 89/144 (times 2), associated to the mapping (30); the inner
black region denotes the analyticity domain of the invariant curve with frequency . We remark that, as the order q of the periodic orbit grows, the
singularities associated to the periodic orbits approach more and more the
analyticity domain of the goldenmean invariant curve.
Similarly, the Pade approximants corresponding to 1 and 2 and to some
of their rational approximants are shown, respectively, in Figures 3a and 3b
(see [5]).
Let u satisfy (36) and let the radius of convergence of the series u(0 ) =

j
j=1 uj (0 ) be dened as
c(

p
)=
q

lim sup |uj (0 )|1/j

254

Luca Biasco and Alessandra Celletti

3/ 5

34/ 55

89/ 144

-1

13/ 21
-2

-2

-1

Fig. 2. Pade approximants of order [200/200] of the golden mean curve and of the
periodic orbits with frequencies 3/5, 13/21, 34/55, 89/144 (times 2) for the map
(30) (after [5]).

1.5

25/77

21/44

0.5

10/21

0.5

38/117
0

-0.5

-1.5
-1.5

53/111

13/40
-0.5

0.5

1.5

-1

-0.5

0.5

Fig. 3. Pade approximants of order [200/200] for the invariant curve with rotation number 1 (a) and 2 (b) and for some of their rational approximants in the
framework of the mapping (30) (after [5]).

Perturbative series expansions

255

while, for irrational, we have already dened

c ()

= inf

lim sup |Uj ()|1/j

As a byproduct of our numerical analysis, we conjecture that

lim

pk
)=
qk

c (); ,

where {pk /qk } are the rational approximants to .

Remark 3. The evidence that the domains of the approximating periodic orbits tend to a limiting domain seems to suggest that the position of the poles
of the invariant curve does not depend on the specic value of the coordinate
, which must be xed while computing the Pade approximants.
In order to make our analysis more exhaustive, we investigate also dierent
standardmap like systems; in particular, we consider the following examples:
f12 :

rn+1 = rn + (sin n +
n+1 = n + rn+1 ,

f13 :

rn+1
n+1

1
sin 2n )
20

1
sin 3n )
= rn + (sin n +
30
= n + rn+1 .

(39)
(40)
(41)
(42)

provides the singularities of f12 (Figure 4a) and f13 (Figure

Figure 4 (see [5])

51
4b) for = 2 2 .
6.2 Lyapunovs method
In order to estimate the radius of convergence of the PoincareLindstedt series
(38), we review the algorithm proposed in [5], to which we refer as the Lyapunovs method. This technique consists in applying the following procedure:
1) consider discrete values of the small parameter from an initial in to a
nal f in with a relative increment (1 + h);
2) for any of these values, compute the distance dk between the truncated
series at order k calculated at with that at (1 + h); more precisely, denoting
by
u(k) (0 ; )

k
j=1

we dene the quantity dk () as

uj (0 )j ,

256

Luca Biasco and Alessandra Celletti

21/34

8/13

-1

3/5
-2

-2

-1

-2

3/5
-2

-1

Fig. 4. Pade approximants of order [200/200] for the golden mean curve and some
rational approximants associated to the mapping f12 (a) and f13 (b), respectively
(after [5])

dk = dk () |u(k) (0 ; (1 + h)) u(k) (0 ; )| ;

3) for N Z+ large enough, compute the sum
s1 = s1 ()

1
N 1

log
k=2

dk
;
d1

4) plot s1 versus (see Figure 5a). Experimentally one notices that all graphs
show an initial almost constant value of s1 as is increased, followed by a
small well, and then by a sharp increase with almost linear behavior;
5) estimate the analyticity radius as follows (compare with Figure 5b): having
xed the order N (see step 3), at which the series is explicitly computed, and
the increment h (see step 1), we interpolate the points before the well with
a straight line. The critical value, say L , is determined as the intersection of
such line with the portion of the curve after the minimum is reached.
Figure 5 (see [5]) shows an implementation of such algorithm for the standard
mapping (30) and the frequency = 234/55; the parameters has been set
as N = 800 and h = 0.001.
6.3 Greenes method
Let C, denote the invariant curve with irrational frequency . We dene the
critical threshold at which C, breaks down as

Perturbative series expansions

257

a)
15

2.6

10
2.4
5
2.2
0
0.99

1.01

1.02

1.03

1.04

1.05

1.01

1.011

1.012

1.013

1.014

1.015

1.016

Fig. 5. (a) Plot of s1 versus for = 234/55, N = 800, h = 0.001; (b) zoom of
(a) for 1.01 1.016 and computation of L (after [5]).

c () = sup{ 0 : for any < , there exists an analytic

invariant curve C

, }.

The most widely accepted numerical technique to compute c () is Greenes

method [13], which is based on the analysis of the stability of the periodic
orbits approaching C, . In order to investigate the stability character of a
periodic orbit with frequency 2p/q (for some p, q Z), we look at the
eigenvalues of the monodromy matrix
q

M=
i=1

1 + cos(i ) 1
,
cos(i ) 1

where (1 , ..., q ) are successive points on the periodic orbit associated to the
mapping (30). From the areapreserving property, one has that det M = 1.
Let T be the trace of M , whose eigenvalues are solutions of the equation:
2 T +1 = 0. Then, if |T | < 2, the eigenvalues of M are complex conjugates
on the unit circle and the periodic orbit is stable. On the contrary, if |T | > 2
the eigenvalues are real and the periodic orbit is unstable.
To be concrete, let us x a periodic orbit with frequency 2p/q, such that
it is elliptic for small values of (as well as for = 0). As increases, the
trace of the matrix M exceeds eventually 2 in modulus and the periodic orbit
becomes unstable.
Figure 6a (see [5]) shows the quantity Gr (pk /qk ), which corresponds to
the value of marking the transition from stability to instability of the periodic orbit with frequency 2pk /qk . We selected the sequence of rational
approximants to the golden ratio and we represented with a dotted line the
estimated breakdown value of C, . Figure 6b provides a comparison between
Greenes and Lyapunovs methods, by showing the plot of the relative error
of the quantities c (pk /qk ) and Gr (pk /qk ).

Luca Biasco and Alessandra Celletti

( p /q )
k
Gr k

1/2

1.75

5/8
8/13

( )
Gr

21/34

13/21

0.75

1/2

2/3

0.2

3/5

0.3

( p /q )
k k

2/3

1.5
1.25

0.4

( p /q )
k k

2.25

55/89

34/55

89/144

c ( pk /qk ) -

258

0.1

5/8

8/13
13/21

21/34 34/55 55/89

89/144

pk
Gr ( q )
k

versus the order k pertaining the rational approximants to the

Fig. 6. (a)
golden ratio; the dotted line represents Greenes threshold about equal to 0.971635.
(b) The relative error associated to c ( pqkk ) and Gr ( pqkk ) for some rational approximants to the golden ratio (after [5]).

6.4 Results
In the framework of the standard map (30), we show some results on the behavior of the invariant curves with frequencies , 1 , 2 , where = 2[0; 1 ],
1 = 2[0; 3, 12, 1 ], 2 = 2[0; 2, 10, 1 ]. We have implemented the three
techniques presented in the previous sections, i.e. Pade, Greene and Lyapunov. Such methods depend on the choice of some parameters and of tolerance
errors. In particular, Lyapunovs method depends on the order N of the truncation and on the increment h of the perturbing parameter . For a xed N
the agreement of the three methods is almost optimal taking a suitable value
of the increment, typically h = 0.001.
Table 1 shows a comparison of the three methods for the golden ratio
approximants of the standard map (30). The series have been computed up to
N = 800 and the increment of Lyapunovs method has been set to h = 103
(compare with [5]).
Table 2 (see [5]) provides the results for the invariant curve associated to
the standard map (30) with frequency equal to 1 ; we underline the good
agreement between Pades and Lyapunovs methods, providing estimates of
the analyticity radius. Due to the length of the calculations, it was possible to
compute only the rst few approximants while applying Pades method (up
to 38/117 in Table 2 and up to 85/178 in Table 3).
Similarly, we report in Table 3 the results for the invariant curve associated
to the standard map (30) with frequency equal to 2 (compare with [5]).
In the cases of the mappings f12 and f13 , we found the results reported in
Figure 7 (see [5]), where the abscissa k refers to the order of the approximant
pk /qk of the golden ratio: for example, k = 1 corresponds to 2/3, while k = 9
corresponds to 89/144. With reference to Figure 7, the breakdown threshold,
as computed by Greenes method (squares), amounts to c () = 1.2166 for f12
(see Figure 7a) and to () = 0.7545 for f13 (see Figure 7b). The Lyapunovs
method has been applied with a series truncated at N = 800 and for h = 0.01

Perturbative series expansions

p/q

Greene Pade

2/3
3/5
5/8
8/13
13/21
21/34
34/55
55/89
89/144

1.5176
1.2856
1.1485
1.0790
1.0353
1.0106
0.9953
0.9862
0.9832

259

Lyap.:h = 103

2.0501
1.4873
1.2495
1.1440
1.0753
1.0366
1.0115
0.9974
0.9909

2.0584
1.4913
1.2561
1.1492
1.0766
1.0397
1.0142
0.9960
0.9897

Table 1. Comparison of Greenes, Pades and Lyapunovs methods for the rational
approximants to the golden ratio in the framework of the mapping (30) (after [5])
p/q

Greene

Pade Lyap.:h = 103

12/37
13/40
25/77
38/117
63/194
101/311
164/505
265/816
429/1321

0.7486
0.7322
0.7232
0.7145
0.7134
0.7085
0.7073
0.7066
0.7056

0.5730
0.5447
0.5579
0.5539

0.57513
0.54658
0.56063
0.55580
0.55755
0.55688
0.55715
0.55706
0.55705

Table 2. Same as Table 1 for the invariant curve with frequency 1 (after [5]).
p/q

Greene

Pade Lyap.:h = 103

10/21
11/23
21/44
32/67
53/111
85/178
138/289
223/467
361/756
584/1223

0.7487
0.7198
0.7060
0.6919
0.6869
0.6842
0.6801
0.6785
0.6781
0.6771

0.5395
0.5008
0.5158
0.5084
0.5116
0.5105

0.54165
0.49557
0.51795
0.51058
0.51354
0.51239
0.51281
0.51265
0.51271
0.51268

Table 3. Same as Table 1 for the invariant curve with frequency 2 (after [5])

260

Luca Biasco and Alessandra Celletti

(crosses) and h = 0.001 (triangles). Pade approximants of order [200/200]

have been also computed (diamonds). Due to the length of the calculations,
it was possible to compute only the rst few approximants (up to 21/34) by
applying Pades method.
b)

a)
2

2.5

1.75

1.5
1.5
1.25
1

1
0.75

0.5

Fig. 7. Comparison of the results for the mappings f12 (a) and f13 (b) and for some
rational approximants labeled by the index k; square: Greenes value, diamond:
Pades results, cross: Lyapunovs indicator s2 (after [5]).

The results indicate that there is a good agreement between all methods as far
as the approximants to the golden mean curve associated to (30) are analyzed.
In such case the analyticity domain is close to a circle, so that the intersection
with the positive real axis (providing an estimate of Greenes threshold) almost
coincides with the analyticity radius, which is obtained implementing Pades
and Lyapunovs methods.
Such situation does not hold whenever the analyticity domain is not close
to a circular shape. This happens, for example, if the invariant curves with
rotation numbers 1 = 2 [0; 3, 12, 1 ] and 2 = 2 [0; 2, 10, 1 ] are considered; in these cases the two thresholds (breakdown value and analyticity
radius) are markedly dierent.

References
1. V.I. Arnold. Proof of a Theorem by A.N. Kolmogorov on the invariance of
quasiperiodic motions under small perturbations of the Hamiltonian, Russ. Math.
Surveys 18(9), 1963.
2. V.I. Arnold (editor). Encyclopedia of Mathematical Sciences, Dynamical Systems III, SpringerVerlag 3, 1988.
3. G.A. Baker Jr., P. Graves-Morris. Pade Approximants, Cambridge University Press, New York, 1996.
4. A. Celletti, L. Chierchia. KAM Stability and Celestial Mechanics, Preprint
(2003), http://www.mat.uniroma3.it/users/chierchia/PREPRINTS/SJV 03.pdf
5. A. Celletti, C. Falcolini. Singularities of periodic orbits near invariant curves, Physica D 170(2):87, 2002.

Perturbative series expansions

261

6. L. Chierchia, C. Falcolini. A direct proof of a theorem by Kolmogorov in

Hamiltonian systems. Ann. Scuola Norm. Sup. Pisa Cl. Sci. 4(21):541593, 1994.
7. L. Chierchia, G. Gallavotti. Smooth prime integrals for quasi-integrable Hamiltonian systems, Nuovo Cimento B (11) 67(2):277295, 1982.
8. B.V. Chirikov. A universal instability of many dimensional oscillator systems,
Physics Reports 52:264379, 1979.
9. L.H. Eliasson. Absolutely convergent series expansions for quasi periodic motions, Math. Phys. Electron. J. 2:33+ 1996.
10. C. Falcolini, R. de la Llave. A rigorous partial justication of the Greenes
criterion, J. Stat. Phys., 67:609, 1992.
11. G. Gallavotti. Twistless KAM tori, Comm. Math. Phys. 164(1):145156,
1994.
12. A. Giorgilli, U. Locatelli. Kolmogorov theorem and classical perturbation
theory, Z. Angew. Math. Phys. 48(2):220261, 1997.
13. J.M. Greene. A method for determining a stochastic transition, J. Math. Phys.
20, 1979.
non. Explorationes numerique du probl`eme restreint IV: Masses egales,
14. M. He
orbites non periodique, Bullettin Astronomique 3(1)(fasc. 2):4966, 1966.
15. A.N. Kolmogorov. On the conservation of conditionally periodic motions under small perturbation of the Hamiltonian, Dokl. Akad. Nauk. SSR 98:469, 1954.
16. R.S. MacKay. Greenes residue criterion, Nonlinearity, 5:161187, 1992.
17. J. Moser. On invariant curves of area-preserving mappings of an annulus,
Nach. Akad. Wiss. G
ottingen, Math. Phys. Kl. II 1(1), 1962.
. Les Methodes Nouvelles de la Mechanique Celeste, Gauthier Vil18. H. Poincare
lars, Paris, 1892.
schel. Integrability of Hamiltonian systems on Cantor sets, Comm. Pure
19. J. Po
Appl. Math. 35(5):653696, 1982.
20. V. Szebehely. Theory of orbits, Academic Press, New York and London, 1967.

Resonances in Hyperbolic and Hamiltonian

Systems
Viviane Baladi
CNRS UMR 7586,
Institut Mathematique de Jussieu,
75251 Paris, (France) baladi@math.jussieu.fr

Abstract
This text is a brief introduction to Ruelle resonances, i.e. the spectra of
transfer operators and their relation with poles and zeroes of dynamical zeta
functions, and with poles of the Fourier transform of correlation functions.

1 Two elementary key examples Basic concepts

1.1 Finite transition matrix and dynamical zeta function
Let A be a nite, say N N , complex matrix, with N 2. Then, denoting by
I the N N identity matrix, we have (recall the Taylor series for log(1 t)
and check, rst in the diagonal case, that log det B = Tr log B for any nite
matrix B):

det(I zA) = exp

zm
Tr Am
m
m=1

The left hand side of the above expression is a polynomial in z of degree at

most N . Its zeroes are the inverses of the nonzero eigenvalues of A (the order
of the zero coincides with the algebraic multiplicity of the eigenvalue). Let us
show that the right hand side can be viewed as the inverse of a dynamical
zeta function

f (z) = exp

zm
# Fix f m ,
m
m=1
m times
m

for a discrete-time dynamical system, i.e. the iterates f = f f of a

transformation f , and their xed points Fix f m = {x | f m (x) = x} (note that
Fix f m contains Fix f k for each k which is a divisor of m).
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 263274, 2006.
Springer-Verlag Berlin Heidelberg 2006

264

Viviane Baladi

Indeed, if A is an N N matrix with coecients 0 and 1, it can be seen as

a transition matrix, and one can associate to it a subshift of nite type on the
alphabet S = {1, . . . , N }. This subshift is the shift to the left (A (x))i = xi+1
on the space of unilateral admissible sequences
+
= {(xi ) S N | Axi xi+1 = 1 , i N} .
A
m
It is then easy to see that Tr Am = # Fix A
(consider rst m = 1 and note
that x is xed by A if and only if x = aaaaaaaaaaaaaa with Aaa = 1).
We thus have det(I zA) = 1/A (z) (cf [32]).
In this example the matrix A can be seen as the (transposed) matrix of
the restriction of the unweighted transfer operator

L(x) =

(y) ,
(y)=x

+
to functions : A
C which only depend on the coecient x0 (this is an
N -dimensional vector space).
To nish, note that since the coecients of A are non-negative, the classical
Perron-Frobenius theorem holds (cf e.g. [3]). For example, if A satises an
aperiodicity assumption, i.e. if there exists m0 such that all coecients of
Am0 are (strictly) positive, then the matrix A (and thus the operator L) has
a simple eigenvalue > 0 equal to its spectral radius, with strictly positive
right Au = u and left vA = v eigenvectors, while the rest of the spectrum is
contained in a disc of radius strictly smaller than . In fact, is the exponential
of the topological entropy of A and the vectors u et v can be used to construct
a A -invariant measure which maximises entropy (cf e.g. [3]).

This situation gives a good introductory example, but it is far too simple:
in general, the transfer operator must be considered as acting on an innitedimensional space on which it is often not trace-class. Nevertheless, one can
still sometimes interpret the zeroes of a dynamical zeta function as the inverses
of some subset of the eigenvalues of this operator.
1.2 Correlation functions and spectrum of the transfer operator
Let us consider a circle mapping f which is a small C 2 perturbation of the map
x 2x (modulo 1). This transformation is not invertible (it has two branches)
and it is locally uniformly expanding (hyperbolicity). Let us associate to f
a weighted transfer operator
L(x) =
f (y)=x

(y)
,
|f (y)|

which acts boundedly (but not compactly) on each of the innite-dimensional

Banach spaces L1 (Leb), C 0 (S 1 ), or C 1 (S 1 ). Our choice 1/|f | for the weight,

Resonances in Hyperbolic and Hamiltonian Systems

265

i.e. the jacobian of the inverse branch, implies that the dual of L acting (e.g.)
on Radon measures preserves Lebesgue measure: L()dx = dx.
Recall that if M is a bounded operator on a Banach space B, the essential
spectral radius of M is the smallest 0 so that the spectrum of M outside
of the disc of radius contains only isolated eigenvalues of nite multiplicity
(cf [13, 3]).
In this situation, we can prove quasi-compactness: the spectral radius of L
on C 1 (S 1 ) is 1 while its essential spectral radius ess is < 1 (cf. e.g. [3]). (Note
that the spectrum on L1 (Leb) or C 0 (S 1 ) is too big: on these two Banach
spaces, each point of the open unit disc is an eigenvalue of innite multiplicity
[41].) In fact, for the operator acting on C 1 (S 1 ), we even have a PerronFrobenius-type picture: 1 is a simple eigenvalue for a positive eigenvector 0
(up to normalisation, one can assume that the integral of 0 is 1), while the
rest of the spectrum is contained in a disc of radius with ess < 1.
There is thus a spectral gap. The eigenvalues in the annulus ess < |z| ,
if there are any, are called resonances. To motivate this terminology, let us
describe ergodic-theoretical consequences of these spectral properties. Before
this, note that one can show that the dynamical zeta function

1/|f | = exp

zm
m
m=1

|(f m ) (x)|1
xFix f m

is meromorphic in the disc of radius 1/ ess , where its poles are the inverses of
the eigenvalues of L acting on C 1 (S 1 ), i.e. the resonances (together with the
simple pole at 1).
Let us rst observe that the absolutely continuous probability measure 0
with density 0 (with respect to Lebesgue) is f -invariant: if L1 (Leb)
f 0 dx =

L(( f ) 0 ) dx =

L(0 ) dx =

0 dx .

One can show that 0 is ergodic, therefore the Birkho ergodic theorem says
that for all in L1 (Leb) and 0 -almost every x (i.e., Lebesgue almost every x!),
m1
the temporal averages (1/m) k=0 (f k (x)) converge to the spatial average
d0 .
We shall next see that this measure 0 is exponentially mixing for test
functions 1 , 2 in C 1 (S 1 ). Since the spectral projector corresponding to the
eigenvalue 1 is 0 dx, we have the spectral decomposition
L = 0

dx + PL ,

with P the spectral projector associated to the complement of 1 in the spectrum. This projector satises PLm C m for any < < 1 and for the
operator-norm acting on C 1 . Therefore,

266

Viviane Baladi

1 f m 2 0 dx =

Lm (1 f m 2 0 ) dx =

1 Lm (2 0 ) dx

1 0

2 0 dx + PLm (2 0 ) dx

1 d0

2 d0 +

1 PLm (2 0 ) dx ,

and since
1 PLm (2 0 )dx

|1 |dx PLm (2 0 )

|1 |dx

m ,

we obtain the claimed exponential decay.

Finally, note that for 1 and 2 in C 1 (S 1 ), the Fourier transform
eim C1 ,2 (m) ,

C1 2 () =
mZ

of their correlation function

1 f m 2 d0
C2 ,1 (m)

C1 ,2 (m) =

1 d0

2 d0

m 0,
m 0,

is meromorphic in the strip | Im | log( 1 ) where its poles are those

such that ei is a resonance:
eim

1 f m 2 d0

2 d0

1 d0

1 P(ei L)m (2 0 ) dx =

P(ei L)m (2 0 ) dx

1 (1 ei PL)1 (2 0 ) dx .

1.3 Basic concepts

Let f : M M be a map and let g : M C be a weight. We assume (these
assumptions can in fact be weakened) that Fix f m is a nite set for each xed
m 1 and that the set {y | f (y) = x} is nite for each x.
Denition 1. (Ruelle) transfer operator Resonances
The transfer operator is the linear operator
Lf,g (x) =

g(y)(y) ,
y:f (y)=x

Resonances in Hyperbolic and Hamiltonian Systems

267

acting on an appropriate Banach (sometimes Hilbert) space of functions or

distributions on M . In general L is bounded but not compact. If the essential
spectral radius ess of L is strictly smaller than its spectral radius, one says
that L is quasi-compact. The spectrum of L outside of the disc of radius ess
is called the set of resonances of (f, g).
Denition 2. Weighted zeta function
A weighted zeta function is a power series
m1

f,g (z) = exp

zm
m
m=1

g(f k (x)) .

xFix f m k=0

One can sometimes show that it is meromorphic in a disc where its poles are
in bijection with the resonances.
Denition 3. Dynamical (Ruelle-Fedholm) determinant
Assume moreover that f is (at least) C 1 and set Fixh f m = {x Fix f m |
det(I Df m (x)) = 0}. The dynamical determinant is the power series

m1
k
m
g(f
(x))
z
k=0
.
df,g (z) = exp
m
det(I
Df m (x))
m
m=1
xFixh f

One can sometimes show that this series is holomorphic in a disc which is
larger than the disc associated to f,g , and that in this larger disc, its zeroes
are in bijection with the resonances.
Denition 4. Correlation function
Let be an f -invariant probability measure (for example, a measure absolutely continuous with respect to Lebesgue or an equilibrium state for log |g|).
The correlation function for (f, ) and a class of functions : M C, is the
function C1 ,2 : N C dened for 1 , 2 in this class by
C1 ,2 (m) =

1 f m 2 d

1 d

2 d .

Analogous concepts exist for continuous-time dynamics (ows, in particular geodesic ows in not necessarily constant negative curvature - an example of an intersection between hyperbolic and hamiltonian dynamics). The
corresponding zeta function (s) is then often holomorphic in a half-plane
Re(s) > s0 and it admits a meromorphic extension in a larger half-plane.
We refer to the various surveys mentioned in the bibliography, which contain
references to the fundamental articles of Smale, Artin-Mazur, Ruelle, etc.

268

Viviane Baladi

2 Theorems of Ruelle, Keller, Pollicott, Dolgopyat...

The authors mentioned in the title are by far not the only ones to have made
important contributions to the theory of dynamical zeta functions and transfer operators. One should also mention (see the bibliography) Fried, Mayer,
Hofbauer, Haydn, Sharp, Rugh, Kitaev, Liverani, Buzzi, and many others (in
particular Cvitanovic for a more physical approach). Let us discuss a selection
of themes.
1. Ruelle [29] observed that the transfer operator associated to a discretetime dynamical system given by an expanding and analytic map, together with
an analytic weight g is nuclear (trace-class) on an appropriate Banach space
of holomorphic functions. In this case, the dynamical zeta function is an alternated product of Fredholm-Grothendieck determinants and the dynamical
determinant is a determinant. (In the case when contraction and expansion
coexist, an assumption of regularity of foliations was needed until the work of
Rugh and Fried [30, 24].) This fact is the key to proving that the dynamical
zeta function and the Fourier transform of the correlation function admit a
meromorphic extension to the whole complex plane, in some cases. Let us also
mention a recent Hilbert space version of this theory by Guillope, Lin et
Zworski [84], who are able to estimate the density of resonances (in the classical sense, which coincides here with the sense of Ruelle) of certain Schottky
groups (another example of application to hamiltonian dynamics).
2. Symbolic dynamics allows us to model a hyperbolic dynamical system
by a subshift of nite type, via Markov partitions (Sinai, Ratner, cf. [7]). The
unstable Jacobian is Lipschitz (for a suitable metric) in symbolic coordinates.
So we are led to study transfer operators LA ,g associated to the unilateral
subshift A and a Lipschitz (or H
older) weight g: they are bounded but not
compact, on the Banach space of Lipschitz (or Holder) functions. Ruelle [7, 4]
proved the rst Perron-Frobenius-type theorem in this kind of setting: there
is a spectral gap, and thus exponential decay of correlations in good cases. By
combining the results of Ruelle [19], Pollicott [37] and Haydn [33], we obtain
a meromorphic extension of the Fourier transform of the correlation function
to a strip, which in fact is the largest possible that can be obtained in this
setting.
3. Several years after the pioneering work of Lasota and Yorke on existence
of absolutely continuous invariant measures, Hofbauer and Keller [50, 51, 52]
obtained quasi-compactness of the transfer operator associated to piecewise
expanding (not necessarily Markov) interval maps acting on functions of bounded variation. This operator is not compact, but the dynamical zeta function
f,g has a nontrivial meromorphic extension to a disc where its poles are in
bijection with the resonances (eigenvalues) of the operator [46]. The higherdimensional case is much more recent [49, 54] and only partial results have
been obtained [BuKe]. There are stronger results for (Markov) dierentiable
locally expanding maps [41, 43] for which one may also study the dynamical

Resonances in Hyperbolic and Hamiltonian Systems

269

determinant df,g (z) [45, 42] (see also [44, 39] in the hyperbolic case). Let us
also mention the recent results of Collet et Eckmann [40] who show that in
general the essential rate of decay of correlations is slower than the smallest
Lyapunov exponent, contrary to a widespread misconception.
4. The case of continuous-time dynamical systems is much more delicate. A
meromorphic extension of the zeta function of a hyerbolic ow to a half-plane
larger than the half-plane of convergence was obtained in the eighties by Ruelle, Pollicott [76, 74]. ParryPollicott [73] obtained a striking analogue of the
prime number theorem for hyperbolic ows. This result was followed by many
other counting results. Ikawa [86] proved a modied Lax-Phillips conjecture
(see also [98]). However, in order to get exponential decay of correlations, a
vertical strip without poles is required, and this is not always possible: Ruelle
[75] constructed examples of uniformly hyperbolic ows which do not mix exponentially fast. Only recently could Dolgopyat [70, 71] prove (among other
things) exponential decay of correlations for certain Anosov ows, by using
oscillatory integrals. This result has consequences for billiards [97, 94, 91], yet
another hyperbolic/hamiltonian system. Liverani very recently introduced a
new method to prove exponential decay of correlations instead of representing
the ow as a (local) suspension of hyperbolic dieomorphisms under return
times (using the Poincare map associated to Makov sections), he studies directly the semi-group of operators associated to the ow [14, 72].
Despite its length, the bibliography is not complete. We hope that the
decomposition in items, although rather arbitrary, will make it more useful.
We do not mention at all the vast existing literature on sub-exponential decay
of correlations.

References
Surveys and books
1. V. Baladi. Dynamical zeta functions, Proceedings of the NATO ASI Real
and Complex Dynamical Systems (1993), B. Branner et P. Hjorth, Kluwer
Academic Publishers, Dordrecht, pages 126, 1995. See www.math.jussieu.fr/baladi/zeta.ps.
2. V. Baladi. Periodic orbits and dynamical spectra, Ergodic Theory Dynamical
Systems, 18:255292, 1998. See www.math.jussieu.fr/baladi/etds.ps.
3. V. Baladi. Positive Transfer Operators and Decay of Correlations, World Scientic, Singapore, 2000. Erratum available on www.math.jussieu.fr/baladi/erratum.ps.
4. V. Baladi. The Magnet and the Buttery: Thermodynamic formalism and
the ergodic theory of chaotic dynamics, Developpement des mathematiques au
cours de la seconde moitie du XXe si`ecle, Birkh
auser, Basel, 2000. Available on
www.math.jussieu.fr/baladi/thermo.ps

270

Viviane Baladi

5. V. Baladi. Spectrum and Statistical Properties of Chaotic Dynamics, Proceedings Third European Congress of Mathematics Barcelona 2000, pages 203224,
Birkh
auser, 2001. Available on www.math.jussieu.fr/baladi/barbal.ps
6. V. Baladi. Decay of correlations, AMS Summer Institute on Smooth ergodic
theory and applications, (Seattle, 1999), Proc. Symposia in Pure Math. AMS,
69:297325, 2001. See www.math.jussieu.fr/baladi/seattle.ps
7. R. Bowen. Equilibrium states and the ergodic theory of Anosov dieomorphisms,
Springer (Lecture Notes in Math., Vol. 470), Berlin, 1975.
, R. Artuso, R. Mainieri, G. Tanner, G. Vattay. Chaos:
8. P. Cvitanovic
Classical and Quantum, Niels Bohr Institute, Copenhagen, 2005.
9. D. Dolgopyat, M. Pollicott. Addendum to: Periodic orbits and dynamical
spectra, Ergodic Theory Dynam. Systems, 18:293301, 1998.
10. N. Dunford, J.T. Schwartz. Linear Operators, Part I, General Theory,
Wiley-Interscience (Wiley Classics Library), New York, 1988.
11. J.-P. Eckmann. Resonances in dynamical systems, IXth International Congress
on Mathematical Physics, (Swansea, 1988), Hilger, Bristol, pages 192207, 1989.
12. I. Gohberg, S. Goldberg, N. Krupnik. Traces and Determinants of Linear
Operators, Birkh
auser, Basel, 2000.
13. T. Kato. Perturbation Theory for Linear Operators, Springer-Verlag, Berlin,
1984. Second Corrected Printing of the Second Edition.
14. C. Liverani. Invariant measures and their properties. A functional analytic
point of view, Dynamical systems. Part II, pages 185237, Pubbl. Cent. Ric.
Mat. Ennio Giorgi, Scuola Norm. Sup., Pisa, 2003.
15. D.H. Mayer. The Ruelle-Araki transfer operator in classical statistical mechanics, Lecture Notes in Physics, Vol 123, Springer-Verlag, Berlin-New York, 1980.
16. D. Mayer. Continued fractions and related transformations, Ergodic Theory,
Symbolic Dynamics and Hyperbolic Spaces, T. Bedford et al., Oxford University
Press, 1991.
17. W. Parry, M. Pollicott. Zeta functions and the periodic orbit structure of
hyperbolic dynamics, Societe Mathematique de France (Asterisque, vol. 187-188),
Paris, 1990.
18. D. Ruelle. Resonances of chaotic dynamical systems, Phys. Rev. Lett, 56:405
407, 1986.
19. D. Ruelle. Dynamical Zeta Functions for Piecewise Monotone Maps of the
Interval, CRM Monograph Series, Vol. 4, Amer. Math. Soc., Providence, NJ,
1994.
20. D. Ruelle. Dynamical zeta functions and transfer operators, Notices Amer.
Math. Soc, 49:887895, 2002.
21. M. Zinsmeister. Formalisme thermodynamique et syst`
emes dynamiques holomorphes, Panoramas et Synth`eses, 4. Societe Mathematique de France, Paris,
1996.

Analytical framework
22. V. Baladi, H.H. Rugh. Floquet spectrum of weakly coupled map lattices,
Comm. Math. Phys, 220:561582, 2001.

23. D. Fried. The zeta functions of Ruelle and Selberg I, Ann. Sci. Ecole
Norm.
Sup. (4), 19:491517, 1986.

Resonances in Hyperbolic and Hamiltonian Systems

271

24. D. Fried. Meromorphic zeta functions for analytic ows, Comm. Math. Phys.,
174:161190, 1995.
25. A. Grothendieck. Produits tensoriels topologiques et espaces nucleaires,
(Mem. Amer. Math. Soc. 16), Amer. Math. Soc., 1955.
26. A. Grothendieck. La theorie de Fredholm, Bull. Soc. Math. France, 84:319
384, 1956.
27. G. Levin, M. Sodin, P. Yuditskii,. A Ruelle operator for a real Julia set
Comm. Math. Phys., 141:119131, 1991.
28. D. Mayer. On the thermodynamic formalism for the Gauss map, Comm. Math.
Phys., 130:311333, 1990.
29. D. Ruelle. Zeta functions for expanding maps and Anosov ows, Inv. Math.,
34:231242, 1976.
30. H.H. Rugh. Generalized Fredholm determinants and Selberg zeta functions for
Axiom A dynamical systems, Ergodic Theory Dynam. Systems, 16:805819, 1996.
31. H.H. Rugh. Intermittency and regularized Fredholm determinants, Invent.
Math., pages 124, 1999.

Symbolic dynamics framework (H

older-Lipschitz)
32. R. Bowen, O.E. Lanford III. Zeta functions of restrictions of the shift transformation, Proc. Sympos. Pure Math., 14:4350, 1970.
33. N.T.A. Haydn. Gibbs functionals on subshifts, Comm. Math. Phys., 134:217
236, 1990.
34. N.T.A. Haydn. Meromorphic extension of the zeta function for Axiom A ows,
Ergodic Theory Dynamical Systems, 10:347360, 1990.
35. A. Manning. Axiom A dieomorphisms have rational zeta functions, Bull. London Math. Soc., 3:215220, 1971.
36. M. Pollicott. A complex Ruelle operator theorem and two counter examples,
Ergodic Theory Dynamical Systems, 4:135146, 1984.
37. M. Pollicott. Meromorphic extensions of generalised zeta functions, Invent.
Math., 85:147 164, 1986.
38. D. Ruelle. One-dimensional Gibbs states and Axiom A dieomorphisms, J.
Dierential Geom., 25:117137, 1987.

Dierentiable framework
39. M. Blank, G. Keller, C. Liverani. Ruelle-Perron-Frobenius spectrum for
Anosov maps, Nonlinearity, 15:19051973, 2002.
40. P. Collet, J.-P. Eckmann. Liapunov Multipliers and Decay of Correlations
in Dynamical Systems, J. Statist. Phys., 115:217254, 2004.
41. P. Collet, S. Isola. On the essential spectrum of the transfer operator for
expanding Markov maps, Comm. Math. Phys., 139:551557, 1991.
42. D. Fried. The at-trace asymptotics of a uniform system of contractions, Ergodic Theory Dynamical Systems, 15:10611073, 1995.
43. V.M. Gundlach, Y. Latushkin. A sharp formula for the essential spectral radius of the Ruelle transfer operator on smooth and Holder spaces, Ergodic Theory
Dynam. Systems, 23:175191, 2003.
44. A. Kitaev. Fredholm determinants for hyperbolic dieomorphisms of nite
smoothness, Nonlinearity, 12:141179, 1999. See also Corrigendum, 17171719.
45. D. Ruelle. An extension of the theory of Fredholm determinants, Inst. Hautes
Etudes Sci. Publ. Math., 72:175193, 1991.

272

Viviane Baladi

Non Markov settings (BV, logistic maps, H

enon...)
46. V. Baladi, G. Keller. Zeta functions and transfer operators for piecewise
monotone transformations, Comm. Math. Phys., 127:459479, 1990.
47. M. Benedicks, L.-S. Young. Markov extensions and decay of correlations for
certain Henon maps, In Geometrie complexe et syst`emes dynamiques (Orsay,
1995), Asterisque (261):1356, 2000.
48. J. Buzzi, G. Keller. Zeta functions and transfer operators for multidimensional piecewise ane and expanding maps, Ergodic Theory Dynam. Systems,
21:689716, 2001.
49. J. Buzzi, V. Maume-Deschamps. Decay of correlations for piecewise invertible
maps in higher dimensions, Israel J. Math, 131:203220, 2002.
50. F. Hofbauer, G. Keller. Ergodic properties of invariant measures for piecewise monotonic transformations, Math. Z., 180:119140, 1982.
51. F. Hofbauer, G. Keller. Zeta-functions and transfer-operators for piecewise
linear transformations, J. reine angew. Math., 352:100113, 1984.
52. G. Keller. On the rate of convergence to equilibrium in one-dimensional systems, Comm. Math. Phys, 96:181193, 1984.
53. G. Keller, T. Nowicki. Spectral theory, zeta functions and the distribution of
periodic points for ColletEckmann maps, Comm. Math. Phys., 149:3169, 1992.
54. B. Saussol. Absolutely continuous invariant measures for multidimensional expanding maps, Israel J. Math., 116:223248, 2000.
55. L.-S. Young. Statistical properties of dynamical systems with some hyperbolicity, Ann. of Math. (2), 147:585650, 1998.

Birkho cone methods

56. P. Ferrero, B. Schmitt. Produits aleatoires doperateurs matrices de transfert, Probab. Theory Related Fields, 79:227248, 1988.
57. C. Liverani. Decay of correlations, Ann. of Math. (2), 142:239301, 1995.
58. C. Liverani. Decay of correlations for piecewise expanding maps, J. Stat. Phys.,
78:11111129, 1995.

Milnor-Thurston kneading methods

59. M. Baillif. Kneading operators, sharp determinants, and weighted Lefschetz
zeta functions in higher dimensions, Duke Math. J., 124:145175, 2004.
60. M. Baillif, V. Baladi. Kneading determinants and spectrum in higher dimensions: the isotropic case. Preprint (2003).
61. V. Baladi, A. Kitaev, D. Ruelle, S. Semmes. Sharp determinants and kneading operators for holomorphic maps, Proc. Steklov Inst. Math., 216:186228,
1997.
62. V. Baladi, D. Ruelle. Sharp determinants, Invent. Math., 123:553574, 1996.
zel. Spectre de loperateur de transfert en dimension 1, Manuscripta
63. S. Goue
Math, 106:365403, 2001.
64. J. Milnor, W. Thurston. Iterated maps of the interval, Dynamical Systems
(Maryland 1986-87), Lecture Notes in Math. Vol. 1342, J.C. Alexander, SpringerVerlag, Berlin Heidelberg New York, 1988.
65. D. Ruelle. Sharp zeta functions for smooth interval maps, Proceedings Conference on Dynamical Systems (Montevideo, 1995), pages 188206, Pitman Res.
Notes Math. Ser. 362, Longman, Harlow, 1996.

Resonances in Hyperbolic and Hamiltonian Systems

273

Random systems and spectral stability

66. V. Baladi, M. Viana. Strong stochastic stability and rate of mixing for unimodal maps, Annales scient. Ecole normale sup. (4), 29:483517, 1996.
67. V. Baladi, L.-S. Young. On the spectra of randomly perturbed expanding maps,
Comm. Math. Phys., 156:355385, 1993. Erratum, Comm. Math. Phys., 166,
219220 (1994).
68. J. Buzzi. Some remarks on random zeta functions, Ergodic Theory Dynam.
Systems, 22:10311040, 2002.
69. G. Keller, C. Liverani. Stability of the spectrum for transfer operators, Annali
Scuola Normale Sup. Pisa (4), XXVIII:141152, 1999.

Flows
70. D. Dolgopyat. On decay of correlations in Anosov ows, Ann of Math.,
147:357390, 1998.
71. D. Dolgopyat. Prevalence of rapid mixing for hyperbolic ows, Ergodic Theory
Dynam. Systems, 18:10971114, 1998.
72. C. Liverani. On contact Anosov ows, Preprint (2002). To apear Ann. of Math.
73. W. Parry, M. Pollicott. An analogue of the prime number theorem for closed
orbits of Axiom A ows, Ann. of Math. (2), 118:573591, 1983.
74. M. Pollicott. On the rate of mixing of Axiom A ows, Invent. Math., 81:413
426, 1985.
75. D. Ruelle. Flots qui ne melangent pas exponentiellement, C. R. Acad. Sci.
Paris Ser. I Math., 296:191193, 1983.
76. D. Ruelle. Resonances for Axiom A ows, J. Dierential Geom., 25:99116,
1987.

Numerics and aplications

. Recycling of strange sets: I. Cycle
77. R. Artuso, E. Aurell, P. Cvitanovic
expansions, II. Applications, Nonlinearity, 3:325359+361386, 1990.
, Homologie des geodesiques fermees sur des varietes
78. M. Babillot, M. Peigne

Norm. Sup. (4), vol 33, pages

hyperboliques avec bouts cuspidaux, Ann. Sci. Ecole
81120, 2000.
79. V. Baladi, J.-P. Eckmann, D. Ruelle. Resonances for intermittent systems,
Nonlinearity, 2:119135, 1989.
80. C.-H. Chang, D.H. Mayer. Eigenfunctions of the transfer operators and the
period functions for modular groups, Dynamical, spectral, and arithmetic zeta
functions (San Antonio, TX, 1999), Contemp. Math., 290:140, Amer. Math.
Soc., Providence, RI, 2001.
, H.H. Rugh. The spectrum of the period81. F. Christiansen, P. Cvitanovic
doubling operator in terms of cycles, J. Phys. A, 23:L713L717, 1990.
, P.E. Rosenqvist, G. Vattay, H.H. Rugh. A Fredholm de82. P. Cvitanovic
terminant for semiclassical quantization, Chaos, 3:619636, 1993.
83. M. Dellnitz, G. Froyland, S. Sertl. On the isolated spectrum of the PerronFrobenius operator, Nonlinearity, 13:11711188, 2000.
, K.K. Lin, and M. Zworski. The Selberg zeta function for convex
84. L. Guillope
co-compact Schottky groups, Comm. Math. Phys., 245:149176, 2004.

274

Viviane Baladi

85. J. Hilgert, D. Mayer. Transfer operators and dynamical zeta functions for a
class of lattice spin models, Comm. Math. Phys, 232:1958, 2002.
86. M. Ikawa. Singular perturbation of symbolic ows and poles of the zeta functions, Osaka J. Math., 27:281300, 1990.
87. S. Isola. Resonances in chaotic dynamics, Comm. Math. Phys, 116:343352,
1988.
88. O. Jenkinson, M. Pollicott. Calculating Hausdor dimensions of Julia sets
and Kleinian limit sets, Amer. J. Math., 124:495545, 2002.
89. D. Mayer. The thermodynamic formalism approach to Selbergs zeta function
for P SL(2, Z), Bull. Amer. Math. Soc., 25:5560, 1991.
90. T. Morita, Markov systems and transfer operators associated with conite
Fuchsian groups, Ergodic Theory Dynam. Systems, 17:11471181, 1997.
91. F. Naud. Analytic continuation of a dynamical zeta function under a Diophantine condition, Nonlinearity, 14:9951009, 2001.
92. F. Naud, Expanding maps on Cantor sets, analytic continuation of zeta functions with applications to convex co-compact surfaces. Preprint, 2003.
93. S.J. Patterson, P.A. Perry. The divisor of Selbergs zeta function for Kleinian groups, Duke Math. J., 106:321390, 2001.
94. V. Petkov. Analytic singularities of the dynamical zeta function, Nonlinearity,
12:16631681, 1999.
95. M. Pollicott and A.C. Rocha. A remarkable formula for the determinant
of the Laplacian, Invent. Math, 130:399414, 1997.
96. M. Pollicott, R. Sharp. Exponential error terms for growth functions on
negatively curved surfaces, Amer. J. Math., 120:10191042, 1998.
97. L. Stoyanov. Spectrum of the Ruelle operator and exponential decay of correlations for open billiard ows, Amer. J. Math., 123:715759, 2001.
98. L. Stoyanov. Scattering resonances for several small convex bodies and the
Lax-Phillips conjecture, Preprint, 2003.

Signal Processing Methods Related to Models

of Turbulence
Pierre Borgnat
Laboratoire de Physique (UMR-CNRS 5672)

ENS
Lyon 46 allee dItalie
69364 Lyon Cedex 07 (France)
Pierre.Borgnat@ens-lyon.fr

1 An overview of the main properties of Turbulence

Turbulence deals with the complex motions in uid at high velocity and/or
involving a large range of length-scales. Understanding turbulence is challenging and involves many questions from modeling this complexity to measuring
it. In this text, we aim at describing some tools of signal processing that have
been used to study signals measured in turbulence experiments. Before that,
another objective is the survey of some properties relevant for turbulent ows
(experiments and/or models): scaling laws, self-similarity, multifractality and
non-stationarity, that will explain why those techniques are useful.
1.1 Qualitative Analysis of Turbulence
Introduction.
Turbulence is rst a problem of mechanics applied to uids. The fundamental
relation of dynamics may be written for a uid element and this rules its
evolution. The velocity eld u(r(0); t), giving the velocity at time t of a uid
element that is in r(0) at initial time, is called the Lagrangian velocity in
the uid. This follows the point of view of Lagrange: one tracks the behavior
of each part of the uid along its trajectory r(t). The velocity is driven by
a balance between the inertial eects and the force s in the uid: friction,
pressure, gravity. If the uid is incompressible, the pressure derives from the
whole velocity eld and the resulting problem is not local. Added to that, it
is experimentally hard to track the movement of one uid element: nothing
distinguishes it from all the uid. Nevertheless, we will see later how some
measurements of Lagrangian velocity were made possible.
But usually, instead of this Lagrangian velocity, the problem is studied
through the point of view of Euler: the velocity v(r, t) at the xed position r and at time t characterizes all the motions in the uid. It is called
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 277301, 2006.
Springer-Verlag Berlin Heidelberg 2006

278

Pierre Borgnat

the Eulerian velocity. Both velocities are related via the change of variable
u(r(0); t) = v(r(t); t). The partial derivative equation for this Eulerian velocity is called the Navier-Stokes (NS) equation and reads:
t v
local derivative

(v )v

= (1/ )p +

convective derivative

pressure

viscous friction

f v.

(1)

Here is the density of the uid. The term f v stands for volumic forces in
the uid (electric forces, gravity,...) Internal friction in the uid (supposed
Newtonian) is proportional to the viscosity . Due to this friction, the boundary conditions are taken so that the uid has zero velocity relatively to
the boundaries. The friction will impose also that the motion of the uid will
decay if there is no forcing external to the uid. For an incompressible ow,
the continuity equation completes the problem: v = 0. Remark that the
pressure term is non-local because of a Poisson equation that relates p to v:
p = 2 (vi vj )/xi xj .
The NS equation could be analyzed from its inner symmetries but, because
the boundaries and the forcing will usually not satisfy the same symmetries,
a simple approach adopted by physicists is to study turbulence in open systems far from the boundaries, in order to nd a possible generic behavior
of an incompressible turbulent uid, disregarding the specic geometry of
the boundaries. The purpose of this part is to provide, rst an overview of
the properties of this situation, called homogeneous turbulence, second some
elements of its statistical modeling.
Dimensional analysis of turbulence.
A diculty of the NS equation is the non-linearity of the convective term that
is part of the inertial behavior of the uid. On the one hand, one may expect
solutions with irregular shapes but, on the other hand, the friction term works
to impose some regularity on the solutions. The balance between the two effects is evaluated by engineering dimensional analysis [6]. Let U be a typical
velocity, and L a typical length scale of the full ow (for instance the size of
an experiment). Let us use the symbol for equality of typical values. Then:
(v )v U 2 /L, and v U/L2 . The ratio is the Reynolds number Re
and equals U L/. This is the only quantity left if one takes out the dimensions
from the variables. When Re is large, the non-linear term is dominant and the
ow becomes irregular, with motions at many dierence length scales. Typical turbulent ows seem far from having symmetries: the ow is disordered
spatially; it is unpredictable temporally with strong variations from one time
to another; neither is the velocity clearly stationary, displaying excursions far
from its mean during long period of times. Those events at long time-scales
are mixed with unceasing short-time variations of the velocity.
A turbulent ow is very dierent from a ow with zero viscosity, even for
the situation of fully developed turbulence, when Re . Indeed, the energy
dissipated in the ow is never zero because the irregularity of the solutions

Signal Processing Methods Related to Models of Turbulence

279

increases correspondingly, creating stronger gradients in the ow. The local

dissipation is dened as (r) = (vj xi +vi xj )2 /2. If the ow is stationary,
its mean along the time is constant. If further the ow is homogeneous, this
mean equals the spatial mean: = (r) r . For a simple dimensional analysis,
we keep only and as relevant parameters. From them, one can build a dissipative length scale = ( 3 /)1/4 where the solution should become smooth
because of the uid friction. As a consequence, the estimated number of modes (L/)3 (needed for computer simulations) is, for the three-dimensional
velocity, proportional to Re9/4 . This number is too large to conveniently use
methods from non-linear dynamical systems. Characterizing the ow directly
from the NS equation is hard task because of all those properties.
3.5

0.002

0.012

0.01

dissipation

velocity (m/s)

2.5

1.5

0.008

0.006

0.004

1
0.002

0.5

0.2

0.4

0.6

0.8

time (ms)

0.2

0.4

0.6

0.8

time (ms)

Fig. 1. Left: a typical Eulerian velocity signal v(t). Right: its corresponding surrogate dissipation (derivative of the squared velocity). Those signals were measured
by Pietropinto et al. as part of the GReC experiment [64] of uid turbulence in low
temperature gaseous helium.

Experimental Eulerian velocity.

A sample velocity signal v(t) is shown on Figure 1 as an illustration. This
signal was obtained during the experiment GReC [64] in a jet at high Reynolds
number (up to 107 ) in helium at 4.5 K (so that its viscosity is very low).
Experiments of turbulence consists in studying high speed motions in a uid
where laminarity is broken, for instance by means of a grid or by creating a
jet and the ow becomes turbulent; here it is a jet. Common apparatus are
hot-wire probes that measure one component v(t) of the Eulerian velocity
at one point (we choose to discuss only single probe measurements here).
The erratic uctuations are typical of such signals and one can see numerous
points where the signal appears almost singular. The singular uctuations are
clearer from the time derivative of its energy v(t)2 /2, which is an estimation of
the dissipation, the so-called surrogate dissipative signal [53]. This dissipative
signal seems made of numerous peaks of variable amplitudes, separated by

280

Pierre Borgnat

periods of almost no activity: the density of peaks is uctuating from time to

time.
Measurements of Eulerian velocity provide signals recorded along time.
The Taylor hypothesis postulates that, if Re is large enough, the velocity eld
is advected quicker than it changes so that the evolution of v during a short
time is mainly given by the dominant convection term. This way, the velocity
at position x and at time t is essentially the same as velocity v(x + v; t).
This is an hypothesis of frozen turbulence during the time scale of the
measurement. This hypothesis is the basic fact behind the choice of modeling
spatial structure of the velocity eld in the following, instead of its evolution
along time.
1.2 Statistical modeling of Eulerian turbulence
A simple approach is to forget about the dynamical equation and nd only the
statistical properties of the velocity eld [7, 13]. Knowing the complete initial
velocity eld of a turbulent uid, and then following its proper evolution in
time is hopeless because of the high number of modes and of the non-linearity
of the equation. Experimental observations support this assertion: the uid
seems erratic, with many ever changing currents and eddies, and typical measurements of the Eulerian velocity as a function of time are strongly shambled
signals. Forgetting about the initial conditions and exact geometrical setting,
one can nd simple models to describe statistical properties of the signal, assuming that v(r; t) is a random process indexed by r and t. We will review
shortly main results obtained by this way. Many textbooks on the subject
exist, see for instance [56, 33, 54], and we sketch here major steps on the
subject that are especially relevant for signal processing questions.
Summary of the statistical properties of turbulence.
From the phenomenological and dimensional analysis of signals of turbulence,
several properties are clear-cut. The signals have relevant characteristics at
many scales: the uctuations should be accounted for at small, intermediate
and large scales in space and time. The signals are said intermittent both
in space: a complex geometric structure of eddies; and time: irregularity and
unpredictability of the time series. Added to that, the signals are also intermittent in statistics: there are large deviations that are evident in the dissipative
signals displayed here. The distribution of the velocity is near to Gaussian.
This does not mirror the apparent burstiness of experimental signals, that is
related to the existence of a broad band of characteristic scales. Actually large
uctuations, far from the mean, exist both at large and small scales.
To encompass all these properties, turbulence is studied through the velocity increments v(r; x, t) over distance |r|, and at time t and position x:
v(r; x, t) = v(x + r; t) v(x; t). One studies a component of the velocity v,
for instance the component parallel to r. The velocity increment was introduced to probe the velocity at dierent scales, because turbulence is, before

Signal Processing Methods Related to Models of Turbulence

281

all, a multi-scale phenomenon. In question is then a model of the statistical

properties of the random variables v(r; x, t).
Scale invariance and Self-similarity.
Kolmogorov proposed in 1941 [40, 41] a rst statistical description of velocity
increments (hereafter named K41 theory). He postulated that the velocity
increment obeys statistically some symmetries that are compatible with the
NS equation: time stationarity (independence from t), spatial homogeneity
(independence from x; note that this is valid because it models turbulence far
from the boundaries) and isotropy (invariance under rotations). Let us recall
that stationarity is the invariance under time-shifts S ; a stochastic process
d
Y is stationary if and only if (S Y )(t) = Y (t + ) = Y (t) for any R.
d

The equality = should be understood as equality for all nite-dimensional

probability distributions of the random variables. The other symmetries are
dened in the same way with the corresponding operators.
To those symmetries, a property of scale invariance, or self-similarity, is
added. Let us recall the denition of self-similarity: it is a statistical invariance
under the action of dilations. Let DH, be a dilation of scale ratio so that
(DH, X)(t) = H X(t). A random process {X(t), t R+
} is self-similar
with exponent H (H-ss) if and only if for any R+
, one has ([68])
d

+
{(DH, X)(t), t R+
} = {X(t), t R }.

For the velocity increments, this reads:

v(r; x, t) = h v(r; x, t)

0+ .

(2)

This last property is also a prescription of the regularity of the solution because, if this relation holds for small separations |r|, one solution is to have
v(r) = |r|h for small r, and that rules the behavior of the derivative, and consequently of the dissipation. This denes the singularities and peaks expected
in the dissipation signal.
With those symmetries, the only parameters left to describe the velocity
are the mean dissipation , the viscosity , the self-similarity exponent h and
the length-scale r = |r| one considers. Kolmogorov supposes a full scale invariance so that all spatial scales behave the same, sharing the same mean dissipation so that for any r, = C[v(r)]2 /[r/v(r)], where C is some constant.
Thus v(r) = c1 1/3 r1/3 : the velocity has a unique exponent of self-similarity
h = 1/3. The moment of order p of v is called the structure function of order
p and obeys, according to this theory, the following relation:
E {|v(r; x, t)|p } = cp (r)p/3

(3)

E is the mathematical expectation, i.e. the mean of the quantity; cp are constants. Here L is an integral scale, that is a characteristic distance of the whole

282

Pierre Borgnat

ow, for instance the scale of the forcing. The scales between and L for which
the scaling of Eq. (3) holds, are called the inertial zone because friction is small
at those scales and the inertial eects are dominant for the NS equation, especially the convection term. Note that for order 3, the exponent is 1 and this is
fortunate because the K
arm
an-Howarth equation derived from the NS equation imposes so [33]. The scaling law for p = 2 imposes the spectrum of the
velocity by means of the Wiener-Khinchin relation. Kolmogorovs well-known
prediction is that the spectrum should be: Sv (k) = c2 5/4 1/4 (k)5/3 if k is
in-between 1/L and 1/ (the inertial zone). This is a property of long-range
dependence: the spectrum and the correlation of the process decrease slowly.
This is in this model related to scale invariance, or self-similarity.
The prediction for the spectrum holds well, as seen on Figure 2. The
structure functions as a function of r are also roughly power laws rp , but not
exactly [9]. But the general prediction of (3) is found failing for other orders.
Indeed, experimental exponents p depart from linearity predicted in p/3. We
report in Figure 2 some properties of the structure functions: they look like
power laws over the inertial range. On the right, we display the evolution of
the exponents p of this power law with the order p of the moment, and the
probability density function of the increments v(r) for dierent r.
Multifractality: Characterization in terms of singularities.
The failure of the previous theory is related to the spatial and temporal intermittency of the dissipation: random bursts of activity exist and the regularity
of the signal changes from one point to another, and so does from one scale
to another. The statistical self-similarity property (2) is now true only if h is
also a random variable that depends on x and t. If this property holds for
0+ , h is called the Holder exponent of the signal at point x. The set of
points sharing the same H
older exponent is a complicated random set that
is a fractal set with dimension D(h). This is a multifractal model [32, 33]
that describes the signal in terms of singularities at small scale. The underlying hypothesis is that all the statistics are ruled by those singularities. The
complementary property of the multifractality is the conjecture of a relation
between the singularity spectrum D(h) and the scaling exponents p , by means of a Legendre transform: D(h) = inf p (hp + 1 p ). Mathematical aspects
of multifractality and of its equality can be found in [37, 38]. Experimentally,
in order to measure the multifractal spectrum that is the core of this model,
one has rst to compute a multiresolution quantity, then use a Legendre transform that is a statistical measure of D(h) from the exponents p . Experiments
now agree with p c1 p c2 p2 /2, where c1 0.370 and c2 0.025; this is a
development in a power series pn and terms pn with n 3 are too small to be
correctly estimated nowadays. The corresponding singularity spectrum D(h)
is 1 (h c1 )2 /2c2 , for values of h such that D(h) 0. The expected value
of h on a set of dimension 1 in the signal is 0.37, close to the 1/3 exponent
predicted by Kolmogorov, but the local exponent uctuates.

Signal Processing Methods Related to Models of Turbulence

Structure functions for order 1 to 4

pdf at small scale

log(probability density) (arbitrary scale)

log(Structure functions)

ESS

0
2

6
6
7

0.5

pdf of velocity increments v

S1
S2
S
3
S4

283

1.5

2.5

log(r)

3.5

S2
4

(exponential tail)

pdf at large scale

4.5

(almost Gaussian)
5

50
15

v/ (linear scale)

Fig. 2. Statistical analysis of one-point velocity measurements. Top left: spectrum

Sv (k) of the velocity that follows the K41 prediction of a power law of exponent
5/3. Bottom left: structure functions Sp (r) = E|v(r)|p for p = 1 to 4; inserted
is shown the Extended Self-Similarity property [9]: the structure functions are not
really power laws of r, but are acceptable power laws if drawn as a function of one
another (on a log-log diagram). Top right : exponents p of the higher-order statistics
(taken from [35] and [19]) are shown dierent from the K41 model, and closer to a
multifractal models (here the Kolmogorov-Obhukov model of 1962 (K62) and the
She-Leveque (SL) model). Bottom right: pdf of the increments gured at various
scales, from small scale (a few ) where the pdf is non-Gaussian with heavy tails,
to large scale (around L) where the pdf is almost Gaussian; note that the scale is
logarithmic for the pdf. The experimental spectrum and pdf are from the data of
the GReC experiment [64].

Because the physical velocity should be a continuous signal at small scales

(smaller than ), a further renement in modeling is that singularities appear
only in an analytic continuation of the velocity for complex times. The singularities in the velocity signal have the form |t z0 |h(t0 ) , with z0 = t0 + i C,
and are then a basis for multifractal interpretation. Such a distribution of singularities, having each a spectrum k 2h1 e2U k , leads to a mean spectrum
consistent with quantitative measurements. Yet the existence of such isolated

284

Pierre Borgnat

singularities was not proved nor derived from the NS equation, but only in
simpler dynamical systems [30, 31].
Another approach was to relate the uctuations of the exponents h to the
dissipative scales . Beneath the dissipative scale, the velocity is dierentiable: v(r) = rv/x. This small scale regularisation is obtained via a local
dissipative scale [65, 34], dened as the scale where the local Reynolds number
Re(r) = rv(r)/ equals 1. In fact we have v(r) = U (r/L)h(x) if r > (x)
so that Re(r) = rv(r)/ = (l/L)1+h(x) Re. The dissipative scale is uctuating locally as (h) = LRe1/(1+h) , whereas K41 uses a xed dissipative scale
= ( 3 /)1/4 which is now the mean of the (h). Given this behavior, a unied description of the statistics E {|v(r; x, t)|p } was derived, valid both in
the inertial and dissipative scales [22].
Characterization as random cascades.
We hereby test further statistical aspects of the intermittency of the ows; for
this we stick with modeling only the statistics of the ows. A feature of equation (3) is notable: if the equation were true, the random variable v(r)/(r)1/3
should be independent of r [18]. However experimental measurements of the
probability density function (pdf) of v(r) shows that its shape changes with
r, even in the inertial domain; see Figure 2. At large scale (close to L), the pdf
is almost a Gaussian; when probing smaller scale, exponential tails become
more and more prominent: rare intense events are more frequent at small scale
this is the statistical face of intermittency.
This property is best modeled as a multiplicative random process, where
each scale is derived from the larger one. The general class of this model comes from the Mandelbrot martingales [42, 69] and was also developed from
the experimental data in turbulence [59, 23, 63]. The challenge is to model dependencies between scale, for instance by means of multipliers between scales W (r1 , r2 ) dened by v(r2 ) = W (r1 , r2 )v(r1 ). For the density probability function Pr1 (log |v|) at scale r1 , this equation is a convolution between Pr1 and the pdf of the multipliers log(W (r1 , r2 )). Because
the relation holds for every couple of scales, the relevant solutions are innitely divisible distributions. For instance, one can explicitly write [18]:
Pr2 (log |v|) = G [n(r2 )n(r1 )] Pr1 (log |v|), where is a convolution. G is
here the kernel of the cascade, that is the operator that maps the uctuations from one scale r1 to another r2 ; equivalently, it gives the distribution of
log(W (r1 , r2 )). Derived from this, the structure functions read:

E {|v(r; x, t)|p } = eH(p)n(r) with H(p) = log G(p),

(4)

is the Laplace transform of G. The interest of multiplicative cascades

where G
seen as innitely divisible processes is that this leads to elegant construction
of stochastic processes satisfying exactly the relations (4) [14], and they can
be used as benchmark for the estimation tools of multifractality [15, 21]. A
consequence of the model is that if n(r) is close to log r, the structure function

Signal Processing Methods Related to Models of Turbulence

285

obeys a power law with exponents p = H(p). If not, the property is the socalled Extended Self-Similarity because all orders share the same law en(r)
and for instance, with 3 = 1: E {|v(r; x, t)|p } = (E |v(r; x, t)|3 )H(p) , as
illustrated on Figure 2.
1.3 Vortex modeling for turbulence and oscillating singularities
The models reported were built on multi-scale properties of the velocity and
on its singularities, and they are good descriptions of the data. Nevertheless,
these models lack connections with the NS equation and with the structured
organization of turbulent ows which are not purely random ows. One would
like to characterize a ow from its own structures. Experiments of turbulence
show that there are intense vortices: objects similar to stretched laments around which the particles are mainly swirling [25]. The singularities in velocity
signals could then be understood as features of a few organized objects with a
complex inner structuration and a singular behavior near their core [51, 36].
A mechanism could be spiraling structures, analogous to the phenomenon of
a Kelvin-Helmholtz instability [52]. Lundgren studied a specic collection of
elongated vortices having a spiraling structure in their orthogonal section, and
that are solution of the NS equation given a specied strain [46]. It was shown
that such a collection could be responsible for a spectrum in k 5/3 and intermittency of the structures functions consistent with modern measurements of
p [66]. Turbulence is understood in this case as some superposition of building objects with complex geometrical characteristics, such as oscillations or
fractality (now in a geometrical, not statistical, way).
A simple model for corresponding Eulerian velocity signals would be an
accumulation of complex singularities. This is dierent from modeling singularities in complex times in the sense that here the exponent is complex
(t t0 )h+i , not the central time t0 of the singularity. See some examples of
those functions on Figure 5. The exponent is responsible for oscillations in
the signal and multifractal estimation is perturbed by such oscillations [1].
The Fourier spectrum of a function ea(tt0 ) (t t0 )h+i behaves like
4 atan(2/a)
|4 2 2 + a2 |h1 ; except at low frequencies, so when
a,
e
the spectrum scales like ||2h2 . This is a power law so they can be used as
basis functions to built a synthetic signal with properties of turbulence. A sum
of many functions of this kind may have multifractal properties that depend
on the distribution of the h and exponents [17]. One is then interested to
nd whether or not there are such oscillations in velocity signals.
The consequences of the existence of spiraling structures for Lagrangian
velocity would be the existence of swirling motions when a particle is close to
a vortex core. Far from vortices, the motion should be almost ballistic, with
small acceleration. A consequence is that expected Lagrangian trajectories will
go through periods of large acceleration and periods of almost no acceleration.
Non-stationary descriptions would then give interesting characterizations of
the velocity.

286

Pierre Borgnat

The vortices and the swirling motions are described by the vorticity
= v. Vorticity is related to dissipation since = ||2 r . If vortices are relevant features of a ow, vorticity should be strongly organized in
those specic structures. One expects that they can be detected as isolated
objects and a question is their role in intermittency. Hereto the non-stationary
evolution of those objects is an expected feature.
To sum up, the general problem is that one can not easily track at the same
time the three kinds of interesting properties for turbulence: non-stationarity
of the signals; the inner oscillating or geometric structure; and the statistical
self-similar properties (exponent h or multifractality) of the spiraling vortices
or their consequence for velocity.
Alternative representations of signals.
Dealing with these three properties, we know how to construct a representation jointly suited to two of them at the same time. The third one is then
dicult to assess.
1) Time evolution and self-similarity: statistical methods using wavelets
are adapted to multifractal models or random cascades because they probe
statistical quantities of stationary signals with relevant self-similar properties
but no inner oscillations [1, 39].
2) Time evolution and Fourier analysis: modern Lagrangian and vorticity
measurements are made possible by following the instant variation of the
Fourier spectrum of some non-stationary signal. Neither the temporal nor the
spectral representation is enough: time-frequency representations that unfold
the information jointly in time and frequency [28] are needed.
3) Self-similarity and inner geometry: one may be interested in oscillations and self-similarity at the same time. It is known that wavelets are not
well adapted to study oscillations [1]. A variant is to measure geometry in a
non-stationary context (since self-similarity implies non-stationarity). Ad-hoc
procedures constructed on the wavelet transform [43] or on the Mellin-time
representations [17] were considered, but for now without clear-cut results.
The third part of this section is devoted to the Mellin representation that is
adapted to probe self-similarity and some features of geometry because it is
based on self-similar oscillating functions (t t0 )h+i .
To conclude this overview of turbulence, let us summarize the complexity of uid turbulence. The problem is driven by a non-linear PDE that is
reluctant to mathematical analysis. Still we dispose of strong phenomenological properties to build stochastic modeling of the velocity. The signals are
irregular, intermittent and one would like to question their (multi)-fractal
aspects, their singularities but also situations where their geometrical organization or some non-stationary properties are more relevant. Because there
exists no single method that capture all these features, multiple tools of signal
processing are useful.

Signal Processing Methods Related to Models of Turbulence

287

2 Signal Processing Methods for Experiments on

Turbulence
2.1 Some limitations of Fourier analysis
Physicists often describe signal in term of their harmonic Fourier analysis; it rst relies on order 2 statistics, through the spectral analysis of the
signal. The well-known Fourier representation reads: v(t) = e2it d().
This decomposition as spectral increments d() on the cosine basis is especially suited if v is stationary. In this case, its spectrum Sv is given by
E d(1 )d(2 ) = Sv ()(1 2 )d1 d2 . The spectral increments are thus
uncorrelated. In the context of turbulence, one sees on Figure 2 an estimate of
Sv (), using standard signal processing tools. The support of the spectrum is
broad-band and Sv () follows roughly a power law with cut-os at the inertial
scale L, and at the small scale were dissipation becomes dominant (around
). This corresponds to the lack of a single time scale of evolution. On Figure
2 is displayed the spectrum of Eulerian velocity. The lack of separate characteristic frequencies (or times) is evident. The spectrum follows closely the
Kolmogorov k 5/3 law.
A rst diculty to use Fourier representation in turbulence, is that the
higher-order statistics and the geometric organization is coded in the phase
of the Fourier transform. This information is hard to recover. For instance,
realizations of the random Weierstrass functions (that will be dened later),
that are fractal, and of oscillating singularities (for instance the function
|t t0 |h+i ), that are not, would share the same Fourier spectrum but not
the same statistical and fractal properties [36]. Redundant representations,
depending jointly of time and frequency or scale variables, will be found to
capture those properties in a clearer way.
A second problem in the context of turbulence is the long-range dependence in the times series [10]. Generally, this reduces the performance of
estimation of all classical quantities from the time series, including the Fourier transform. For instance, let us suppose that X(t) is a stationary process
with long memory, or long-range dependence, i.e. its correlation decreases like
2H2 when is large. If the process is known for n samples, the periodogram
In () is computed from the Fourier transform:
1
In () =
n

X(t)e

2it

(5)

t=1

The periodogram is an estimate of the spectral density SX (). For frequencies

that are close to zero, the asymptotic variance of the periodogram is of the
order: Var[In ()] = O(n4H2 ). It means that for low frequencies, it uctuates
much more than for short-range correlated processes, and provides a cruder
estimation.

288

Pierre Borgnat

These short-comings of the Fourier transform are motivations to study

dierent kinds of representations, which will be more adapted to multi-scale
properties, singularities and long-memory.
2.2 Multiresolution characterization and estimation of scaling laws
Velocity increments.
In order to question all the time scales in a turbulent signal, the velocity increment over the time separation was introduced as a more relevant quantity:
v( ; x, t) = v(x; t ) v(x; t) = v(x + r; t) v(x; t). The second equality
is obtained from the Taylor hypothesis, provided that r = v(x; t). Velocity
increments are a multiresolution quantity in the sense that they describe the
velocity at the varying resolution . They are relevant to capture both longtime evolution of the signal that are dominated by the statistics of v because
v(t ) and v( ) are then almost independent, and short time behaviors where
the dominant features are intermittent peaks of activity seen in the derivative
of the signal.
Wavelet decompositions.
More general multiresolution quantities exist beside velocity increment, which
are not the most well-behaved for estimation in presence of long dependence.
A class of multiresolution representation is the wavelet transform [49, 48]:
Tv (a, t) =

v(u)([u t]/a)du/a,

(6)

provided that the mother-wavelet (t) has zero integral. The wavelet is further
characterized by an integer N 1, the number of vanishing moments. The
representation is then blind to polynomial trends of order less than N, and
this gives robustness to the representation regarding the slow, large-period
excursions that one nds in signals of turbulent velocity. Velocity increments
are the poor mans wavelet, setting (u) = (u + 1) (u) and letting
be the scale variable a. This wavelet has only one vanishing moment, N = 1.
With a larger number of vanishing moments, wavelets give good methods of
estimation when one expects power law statistics, such as self-similarity or
multifractality. We report briey two estimation methods.
Wavelet transforms and singularities.
A property of the wavelet transform is that it captures the singularities of a signal on the maximum of the wavelet [48, 37]. Let us assume that v(t ) v(t)
behaves like | |h near point t. If the number N of vanishing moments of
is higher than h, the wavelet transform will have a maximum in the scaletime cone (a, u) dened by |u t| Ca for some constant C. This maximum

Signal Processing Methods Related to Models of Turbulence

289

behaves as ah when a 0+ . This property permits one to estimate directly

the H
older exponents of singularities. In the context of multifractality, the
singularities are not isolated and it is helpless to try to estimate separately
the exponents. Combining the multifractal formalism with the properties on
maxima of wavelet transform, it was proposed to estimate the spectrum D(h)
by computing the moments of velocity on those maxima only. This is called
the Wavelet Transform Maxima Modulus method [4] and it gives reliable estimations of D(h) in turbulence [5]. A limit is that there are few theoretical
calculations of D(h) that validate the WTMM method, and so there is not a
complete mathematical justication of it.
Wavelet transform and estimation of scaling laws.
Another possibility is to take advantage of discrete orthogonal wavelet basis
(see the text of J.R. Partington, page 95). Let j,k (t) = 2j/2 (2j t k)
denote its dilated and translated templates on the dyadic grid, and
dX (j, k) = j,k (u)X(u)du, the corresponding discrete wavelet coecients.
For any second order stationary process X, its spectrum SX () can be related
to its wavelet coecients through:
E dX (j, k)2 =

SX ()2j | (2j )|2 d,

(7)

where stands for the Fourier transform of . This is an estimation of the

spectrum. One can also recover statistics of all orders. If X is a self-similar
process, with parameter H, they behave as:
E {dX (j, k)p } = C2jpH , if 2j +.

(8)

Moreover, it has been proven that the {dX (j, k), k N} form short range
dependent sequences as soon as N 1 > H. This means that they no longer
suer from statistical diculties implied by the long memory property. In
nj
particular, the time averages S(j; p) = 1/nj k=1
|dX (j, k)|p can then be used
as relevant, ecient and robust estimators for E {dX (j, k)p }. The possibility of
varying N brings robustness to these analysis and estimation procedures. The
performance of the estimators was studied, see for instance [3]. One can then
characterize all the statistics of X from the following estimation procedure:
For a velocity signal v, a weighted linear regression of log2 S(j; p) against
log2 2j = j, performed in the limit of the coarsest scales, provides with an
estimate of the exponents p of the structure functions E {|v(r)|p }.
Combining the WTMM idea and the properties of discrete wavelet transform, Jaard proposed an exact characterization of multifractal signals using
wavelet leaders (local maxima of discrete wavelet coecients) [39] that are
now developed as signal processing tools [45].

290

Pierre Borgnat

Fig. 3. Lagrangian velocity of a particle in turbulence (from [50]). Left: the Doppler
signal whose instantaneous frequency gives the velocity of the tracked solid particle
in a turbulent uid, and its time-frequency representation. Right: acceleration, velocity and trajectory, reconstructed for two components from the measurement of
velocity by Doppler eect.

2.3 Time-frequency methods for Lagrangian and Vorticity

measurements
Time-frequency representations.
A linear time-frequency decomposition is achieved in the same manner as a
wavelet transform, using a basis built from shifts in time and frequency of a
small wave packet:
v(t) =

rv (u, )bu (t) du d, with bu (t) = b0 (t u)e2it .

The variable is indeed a frequency and rv (u, ) gives the component of v

at frequency and time u. The time-frequency spectrum is E |rv (u, )|2 .
Note that instead of time and frequency shifts, the wavelet transform uses
time-shifts and dilation on the mother wavelet, so that the variables are time
and scale rather than of time and frequency.
If one is interested in the time-frequency spectrum, it is possible to achieve
better estimation using bilinear densities that are time-frequency decompositions of the energy [28]. They derive from the Wigner-Ville distribution:
Wv (t, ) =

v(t + /2)v(t /2)e2i d.

Signal Processing Methods Related to Models of Turbulence

291

A general class is obtained by applying some smoothing in time and/or frequency. Such a distribution represents well the energy of the signal because
of the following physical properties.
1. Marginals in time and frequency: Wv (t, )d = |v(t)|2 ; Wv (t, )dt =
|V ()|2 if V is the Fourier transform of v.
2. Covariances with time and frequency shifts: Wv (t , ) is the transform
of v(t ) and Wv (t, f ) is transform of e2if t v(t).
3. Instantaneous frequency: the mean frequency Wv (t, )d is equal to
the instantaneous frequency of the signal v(t), that is the derivative of the
phase of the analytic signal associated to v(t).
Representations of this kind are used, because of the properties, to analyze the non-stationary signals of Lagrangian experiments and of vorticity
measurements.

Fig. 4. Measurement of vorticity by acoustic scattering [16]. Up: examples of recorded signals for two dierent scattered waves at the same time by the same vo i (k, t) along time t. Down: quadratic timelume: they both represent the same
frequency representation of one signal, exhibiting packets of structured vorticity
advected through the measurement volume. [16]

Measurements of Lagrangian velocity.

Recent experiments have been able to nd characteristics of Lagrangian velocity. Solid particles are released in a turbulent uid, then tracked to record

292

Pierre Borgnat

their Lagrangian velocities u(t) [47, 50]. One solution uses high-speed detectors to record the trajectories, and the second one relies on tracking by sonar
methods. In both cases the experiment deals with a non-stationary signal that
should be tracked in position and value along time. In the second experiment,
ultra-sonor waves are reected by the particle and the Doppler eect catches
its velocity. Figure 3 shows a sample experimental signal whose instantaneous
frequency is the Lagrangian velocity. A time-frequency analysis follows the
instantaneous frequency and thus u(t). Acceleration, velocity and trajectory
are reconstructed from this data. The signals contain many oscillating events
such as the one gured here, and many more trajectories which are almost
smooth and ballistic between short periods of times with strong accelerations.
This is consistent with the existence of a few swirling structure but a clear
connection between oscillations and intermittency is not made. By now, statistical analysis of the data show that Lagrangian velocity is intermittent [50],
and this is well described by a multifractal model analogous to the one for
Eulerian velocity [22].
Measurements of vortices and of vorticity.
Instead of trying to nd indirect eects of the vortices, the intermittency of
turbulence was looked after directly in vorticity. Measuring locally is dicult and by now not reliable. Using the sound scattering property of vorticity,
an acoustic spectroscopy method was developed [16]. The method measures a
i (k, t) = i (r, t)e2ikr dr,
time-resolved Fourier component of vorticity,
summed all over some spatial volume. Figure 4 shows recorded signals of
scattering amplitudes for two dierent incident waves; they look alike because
i (k, t). The intermittency here
both are measurements of the same quantity,
is the existence of bursts of vorticity that cross the measurement volume; those
packets are characteristic of some structuration of vorticity, which could be
vortices. They are revealed in the time-frequency decomposition of one signal on the right. The intermittency is well captured by the description of a
slow non-stationary activity that drives many short-time bursts, and so causes
multi-scale properties [62].
2.4 Mellin representation for self-similarity
Another signal processing method uses oscillating functions as basis functions:
the Mellin transformation. Its interest is that it is encompasses both selfsimilar and oscillating properties in one description. Because those tools are
less known, we will survey some of their properties with more mathematical
details.
Dilation and Mellin representation.
We aim at nding a formalism suited to scale invariance. Self-similarity is a
statistical invariance under the action of dilations. Given exponent H, the

Signal Processing Methods Related to Models of Turbulence

293

+
group {DH, , R+
} is a continuous unitary representation of (R , ) in
2
+ 2H1
the space L (R , t
dt). The associated harmonic analysis is the Mellin representation. Indeed, the hermitian generator of this group is C dened as: 2i(CX)(t) = (H + td/dt)X(t), so that DH, = e2iC . The operator C characterizes a scale because its eigenfunctions are unaected by
scale changes (dilations), so the eigenvalues are a possible measure of scale.
Those eigenvalues EH, (t) satisfy dEH, (t)/EH, (t) = (H + 2i)dt/t, thus
EH, (t) = tH+2i up to a multiplicative constant. One obtains the basis of
Mellin functions with associated representation:

(MH X)() =
X(t) =

+
0

tH2i X(t)

dt
t

(9)

EH, (t)(MH X)()d.

A signal processing view of several applications the Mellin transform may

be found in [20, 8, 29, 58], and mathematical aspects are documented in
[24, 71]. Relevant features here are, rst, that is a meaningful scale, and,
second, the oscillating aspects of the Mellin functions EH, (t). Those functions
are chirps of instantaneous frequency /t. See a drawing of such a function
on Figure 5. One can disregard the behavior of those functions near 0; the
important feature is the chirp part and it holds even if the function is ltered
by some window, as seen on this gure. By this means we may describe both
self-similarity and oscillations, as long as they can be well approximated by
smoothed Mellin function.
Interpretation for self-similarity
When introducing self-similarity [44], J. Lamperti noticed a specic property
of the invertible transformation LH , now called the Lamperti transformation
and dened as:
(LH Y )(t) = tH Y (log t), t > 0;

(LH 1 X)(t) = eHt X(et ), t R. (10)

This transformation maps stationary processes onto self-similar processes, and

the converse for its inverse. The Lamperti transformation is a unitary equivalence between the group of time shifts S and the group of dilations DH, :
LH 1 DH, LH = Slog

and

LH S LH 1 = DH,e .

(11)

This equivalence has interesting consequences: a natural representation of

a self-similar process X is to use its stationary generator LH 1 X. Signal
processing for stationary signals is a well-known eld and methods can then
be converted in tools for self-similar processes by applying equivalence (11)
[27, 11]. In this context, Mellin representation is suited to H-ss processes in
the same way as Fourier representation is suited to stationary processes, since
MH = F LH 1 :

294

Pierre Borgnat

(MH X)() =
=

tH X(t)t2i1 dt

(12)

(LH 1 X)(u)e2iu du = (F LH 1 X)().

(13)

Canonical spectral analysis of self-similar processes.

A H-ss process X(t) has a covariance that reads necessarily as:
RX (t, s) =E
{X(t)X(s)} = (ts)H cX (t/s).
This comes from the correlation function Y ( ) of its stationary generator
Y = (LH 1 X), with Y (log k) = cX (k). The Mellin spectral density X ()
of X is then simply introduced by means of the spectrum of Y :
Y () =
=

+
0

Y ( )e2i d
(14)
cX (k)k

2i1

dk = (M0 cX )()=
X ().

H-ss processes admit also an harmonisable decomposition on the Mellin basis so that X(t) = tH+2i dX(), with uncorrelated spectral increments
dX(). Thus we have E{dX(1 )dX(2 )} = (1 2 ) X (1 )d1 d2 .
Among the tools coming from the Lamperti equivalence, there are scale
invariant lters. A linear operator G is invariant for dilations if it satises
GDH, = DH, G for any scale ratio R+
. Using equation (11), we may
replace DH, by Slog and we obtain the equality:
(LH 1 GLH )Slog = Slog (LH 1 GLH ).
Thus, LH 1 GLH = H is a linear stationary operator, so it acts as a lter
by means of a convolution. The Lamperti transformation maps addition onto
multiplication so that G will act by means of a multiplicative convolution
instead of the usual one:
(GX)(t) =

g(t/s)X(s)

ds
=
s

g(s)X(t/s)

ds
.
s

(15)

Let us consider A = GX with {X(t), t > 0} and H-ss process and G a scale
invariant lter. Then A(t) is also self-similar because
DH, A = DH, GX = (GDH, )X = DH, X.
This lter acts on the Mellin spectrum as a multiplication:
A () = |(MH g)()|2 X ().

Signal Processing Methods Related to Models of Turbulence

295

By means of the Bochner theorem, any H-ss process may be represented by

the output of a scale-invariant linear system:
X(t) =

+
0

g(t/s)V (s)

ds
, with E V (t)V (s) = 2 t2H+1 (t s). (16)
s

The random noise V (t) is white and Gaussian but non-stationary; it is the
image by LH of the Wiener process. The self-similar process X is dened by
g; the second-order properties are covariances given by means of
cX (k) = 2 k H

g(k)g()2H1 d,

and Mellin spectrum which is X () = 2 |(MH g)()|2 . Models of this kind

were studied in [70, 57].
Other methods are derived in the same way. For instance, time-frequency
methods that were suited to measure jointly time and frequency components
of a signal will be converted in time-Mellin scale representations that measure
contents as a joint function of time and Mellin scale.
Examples of self-similar processes.
A fractional Brownian motion BH is dened as a H-ss process with Gaussian
stationary increments [55]. Its covariance is necessarily:
RBH = 2 (|t|2H + |s|2H |t s|2H )/2
which satises the general expected structure with

cBH (k) = 2 [k H + k H | k 1/ k|2H ]/2.

The corresponding Mellin spectrum is obtained by a straightforward calculus
( is the Euler function):
BH () =

(1/2 + 2i)
2
H 2 + 4 2 2 (H + 2i)

(17)

Here, we have a representation of fractional Brownian motions alternative

to its harmonic or moving-average representations [67]. From this spectral
representation, one can synthesize exact samples of fractional Brownian motions: it is enough to prescribe Mellin spectral increments satisfying equation
(17) with random i.i.d. phases in [0, 2[. An inverse Mellin transform gives then
a fractional Brownian motion. Classical methods of whitening, prediction and
interpolation for this process were derived from this Mellin representation in
[60, 61]. Developments of the synthesis method from the Mellin spectrum for
other self-similar processes without stationary increments were studied also
in [17].

Pierre Borgnat
=6.4, H=0.5

(t)

g(t) E,H(t)

2
4

200

400

600

800

=6.4, H=0.5

200

400

600

800

1
0
1

100
150
200

2
0

200

400

600

time

800

1000

250

200

400

600

time

800

Weierstrass function =1.07, H=0.3

60
50
40
30
20
10
0

1000

reassigned spectro.

frequency

(t)

1000

=12, H=0.5

deterministic phase

100

200

300

400

100

200

300

400

500

600

700

800

900

1000

500

600

700

800

900

1000

random phase

296

0
2
4
6
8

time

frequency

reassigned spectro. 450X500 (dh = 127)

time

Fig. 5. Left: Mellin functions with various H, and spectrogram of one smoothed
Mellin function (where g(t) is a Kaiser window) that shows the instaneous frequency path, chirp behavior of the Mellin functions. Middle: samples of WeierstrassMandelbrot functions, both deterministic and random (H = 0.3, = 1.07). Right:
spectrogram of the empirical variogram of a Weierstrass-Mandelbrot function (adapted from [26]). Spectograms are computed here using reassignment techniques for
time-frequency distributions [2].

The random Weierstrass-Mandelbrot function is a good model of inexact

self-similarity that can be studied by means of a Mellin decomposition. It is
a step towards properties closer to turbulence than pure self-similarity. It is
n
dened [12] as W (t) = nZ nH (1 ei t )ein , with i.i.d. phases n . The
function is given here as a sum of Fourier modes. This is possible since it has
stationary increments. But another feature is more obvious if one considers
its decomposition on a Mellin basis, namely its scale invariance. W (t) has
d

Discrete Scale Invariance [11] because W (k t) = kH W (t), scale invariance

for dilations with a scale ratio that is a power of only. Using LH , one can
nd up the Mellin representation for the deterministic version of the function,
with n = 0, [12, 26]:
W (t) =
m

(H m/ ln ) [i(H+m/ ln )/2]
EH,m/ ln (t).
e
ln

Signal Processing Methods Related to Models of Turbulence

297

The two writings of W (t) are its time-frequency representation and its
time-Mellin scale representation. Both methods of analysis are valid as tools
to assess the characteristics of the function. The relevance comes from the joint
properties of stationary increments and self-similarity (even in the weakened
sense of Discrete Scale Invariance). A time-frequency analysis illustrates this,
see Figure 5. Deterministic and randomized versions of W (t) have a spectrogram (from the detrended empirical variogram) that is made partly of pure
tones, and partly of chirps, that are localized on the Mellin modes = m/ ln .
Here both aspects are shown, depending on the width of the smoothing window with respect to the rapidity of variation of the chirp (one see the chirp
when its frequency does not change quickly over the length of the window)
[26].

Concluding remarks.
We lectured here a signal processing view of turbulence. We have surveyed
how the complexity of turbulence, and the need to understand various models
and experiments, is linked to a great diversity of signal processing methods
that are useful for turbulence: time-scale analysis, time-frequency analysis,
self-similarity and Mellin analysis, and geometrical characterizations.
Concerning the last point, we are far from having at disposal convenient
tools for estimation of the geometry (fractal sets, oscillations,...) of a selfsimilar process. We have proposed here a framework adapted to self-similarity
and based on the oscillating Mellin functions th+2i but a tractable extension to oscillating singularities of the form |t t0 |h+2i is yet to be found.
To be relevant for turbulence, the central point t0 of the singularity has to
be a variable, whereas the Lamperti framework is for a xed central time,
t0 = 0, of the Mellin functions. Consequently, though a mixture of oscillating
functions such as |t t0 |h+2i may have multifractal properties close to the
one measured in turbulence, one lack signal processing tools to inverse the
mixture and estimates the various parameters (t0 , h, ) of each object.
Finally, turbulence is an active, challenging and open eld with many problems that are interesting from a mathematical, physical or signal processing
point of view. This is a subject where one needs to establish fruitful interactions between models, tools of analysis and experimental measurements.
Thanks.
I would like to thank people that helped me by their competence and their
willingness to share their knowledge and ideas. Many thanks thus to Olivier
Michel, Patrick Flandrin and Pierre-Olivier Amblard, with whom I have the
pleasure to work. I am also thankful to C. Baudet, B. Castaing, L. Chevillard,
N. Mordant, J.F. Pinton, and J.C. Vassilicos.

298

Pierre Borgnat

References
1. A. Arneodo, E. Bacry, S. Jaffard, J.F. Muzy. Singularity spectrum of
multifractal functions involving oscillating singularities. J. Four. Anal. Appl.,
4(2):159174, 1998.
2. F. Auger, P. Flandrin. Improving the readability of time-frequency and timescale representations by reassignment methods. IEEE Trans. on Signal Proc.
V, SP-43(5):10681089, 1995.
3. P. Abry, P. Flandrin, M. Taqqu, D. Veitch. Wavelets for the analysis,
estimation, and synthesis of scaling data. In K. Park and W. Willinger, editors,
Self-Similar Network Trac and Performance Evaluation. Wiley, 2000.
4. A. Arneodo, E. Bacry, J.F. Muzy. The thermodynamics of fractals revisited
with wavelets. Physica A, 213:232275, 1995.
5. A. Arneodo, J.F. Muzy, S. Roux. Experimental analysis of self-similarity
and random cascade processes: applications to fully developped turbulence data.
J. Phys. France II, 7:363370, 1997.
6. G. Barenblatt. Scaling, self-similarity, and intermediate asymptotics. CUP,
Cambridge, 1996.
7. G.K. Batchelor. The theory of homogeneous turbulence. Cambridge University Press, 1953.
8. J. Bertrand, P. Bertrand, J.P. Ovarlez. The Mellin transform. In A.D.
Poularikas, editor, The Transforms and Applications Handbook. CRC Press,
1996.
9. R. Benzi, S. Ciliberto, R. Tripicione, C. Baudet, F. Massaioli. Extended
self-similarity in turbulent ows. Phys. Rev. E, 48:R29R32, 1993.
10. J. Beran. Statistics for Long-memory processes. Chapman & Hall, New York,
1994.
11. P. Borgnat, P. Flandrin, P.-O. Amblard. Stochastic discrete scale invariance. Signal Processing Lett., 9(6):181184, June 2002.
12. M. Berry, Z. Lewis. On the Weierstrass-Mandelbrot fractal function. Proc.
Roy. Soc. Lond. A, 370:459484, 1980.
13. A. Blanc-Lapierre, R. Fortet. Theorie des fonctions aleatoires. Masson,
Paris, 1953.
14. J. Barral, B. Mandelbrot. Multifractal products of cylindrical pulses. Probab. Theory Relat. Fields, 124:409430, 2002.
15. E. Bacry, J.F. Muzy. Log-innitely divisible multifractal processes. Comm.
in Math. Phys., 236:449475, 2003.
16. C. Baudet, O. Michel, W. Williams. Detection of coherent vorticity structures using time-scale resolved acoustic spectroscopy. Physica D, 128:117, 1999.
17. P. Borgnat. Mod`eles et outils pour les invariances dechelle brisee : variations
sur la transformation de Lamperti et contributions aux mod`
eles statistiques de

normale superieure de Lyon, November

vortexen turbulence. PhD thesis, Ecole
2002.
18. B. Castaing. Turbulence: Statistical approach. In B. Dubrulle, F. Graner, and
D. Sornette, editors, Scale Invariance and Beyond, pages 225234. Springer,
1997.
19. G. Chavarria, C. Baudet, S. Ciliberto. Hierarchy of the energy dissipation
moments in fully developed turbulence. Phys. Rev. Lett., 74:19861989, 1995.
20. L. Cohen. The scale representation. IEEE Trans. on Signal Proc., 41(12):3275
3292, December 1993.

Signal Processing Methods Related to Models of Turbulence

299

21. P. Chainais, R. Riedi, P. Abry. On non scale invariant innitely divisible

cascades. to appear in IEEE Trans. on Info. Theory, 2004.
ve
que, N. Mordant, J.F. Pinton, A. Ar22. L. Chevillard, S. Roux, E. Le
neodo. Lagrangian velocity statistics in turbulent ows: Eects of dissipation.
Phys. Rev. Lett., 91:214502, 2003.
23. A. Chhabra, K. Sreenivasan. Scale-invariant multiplier distributions in turbulence. Phys. Rev. Lett., 68(18):27622765, 1992.
24. B. Davies. Integral transforms and their applications. Springer-Verlag, New
York, 1985.
25. S. Douady, Y. Couder, M.E. Brachet. Direct observation of the intermittency of intense vorticity laments in turbulence. Phys. Rev. Lett., 67:983986,
1991.
26. P. Flandrin, P. Borgnat. On the chirp decomposition of WeierstrassMandelbrot functions, and their time-frequency interpretation. Applied and
Computational Harmonic Analysis, 15:134146, September 2003.
27. P. Flandrin, P. Borgnat, P.-O. Amblard. From stationarity to selfsimilarity, and back : Variations on the Lamperti transformation. In G. Raganjaran and M. Ding, editors, Processes with Long-Range Correlations: Theory and
Applications, volume 621 of Lectures Notes in Physics, pages 88117. SpringerVerlag, June 2003.
28. P. Flandrin. Temps-Frequence (1`ere ed.). Hermes, 1993.
29. P. Flandrin. Inequalities in Mellin-Fourier signal analysis. Newton Institute
Preprint NI98030-NSP, Cambridge, UK, November 1998.
30. U. Frisch, R. Morf. Intermittency in nonlinear dynamics and singularities at
complex times. Phys. Rev. A, 23(5):26732705, May 1981.
31. U. Frisch, M. Mineev-Weinstein. Extension of the pole decomposition for
the multidimensional Burgers equation. Phys. Rev. E, 67:067301, 2003.
32. U. Frisch, G. Parisi. On the singularity structure of fully developped turbulence. In M. Ghil, R. Benzi, and G. Parisi, editors, Proc. of Int. School of Phys.
on Turbulence and predictability in geophysical uid dynamics, pages 8487,
Amsterdam, 1985. North-Holland.
33. U. Frisch. Turbulence. CUP, Cambridge, 1995.
34. U. Frisch, M. Vergassola. A prediction of the multifractal model: the intermediate dissipative range. Europhys. Lett., 14:429, 1991.

35. Y. Gagne. Etude

experimentale de lintermittence et des singularites dans le
plan complexe en turbulence developpee. PhD thesis, INP Grenoble, 1987.
36. J.C.R. Hunt, N.K.-R. Kevlahan, J.C. Vassilicos, M. Farge. Wavelets,
fractals and fourier transforms: detection and analysis of structures. In M. Farge,
J.C.R. Hunt, and J.C. Vassilicos, editors, Wavelets, Fractals, and Fourier Transforms, pages 138. Oxford : Clarendon Press, 1993.
37. S. Jaffard. Multifractal formalism for functions, part 1 and 2. SIAM J. of
Math. Anal., 28(4):944998, 1997.
38. S. Jaffard. On the Frisch-Parisi conjecture. J. Math. Pures Appl., 79(6):525
552, 2000.
39. S. Jaffard. Wavelet techniques in multifractal analysis. Proceedings of Symposia in Pure Mathematica, 2004.
40. A.N. Kolomogorov. The local structure of turbulence in incompressible viscuous uid for very large Reynolds numbers. Dokl. Akad. Nauk SSSR, 30:913,
1941.

300

Pierre Borgnat

41. A.N. Kolomogorov. On degeneration of isotropic turbulence in a incompressible viscuous liquid. Dokl. Akad. Nauk SSSR, 31:538540, 1941.
`re. Sur certaines martingales de Benoit Mandelbrot.
42. J.-P. Kahane, J. Peyrie
Adv. Math., 22:131145, 1976.
43. N. Kevlahan, J.C. Vassilicos. The space and scale dependencies of the selfsimilar structure of turbulence. Proc. R. Soc. Lond. A, 447:341363, 1994.
44. J. Lamperti. Semi-stable stochastic processes. Trans. Amer. Math. Soc.,
104:6278, 1962.

45. B. Lashermes. PhD thesis, ENS

Lyon, in preparation, 2004.
46. T.S. Lundgren. Strained spiral vortex model for turbulent ne structure. Phys.
Fluids, 25(12):21932203, 1982.
47. A. La Porta, G.A. Voth, A.M. Crawford, J. Alexander, E. Bodenschatz. Fluid particle accelerations in fully developped turbulence. Nature,
409:10171019, 2001.
48. S. Mallat. A Wavelet tour of signal processing. Academic Press, 1999.
49. Y. Meyer. Ondelettes et operateurs. Hermann, 1990.
50. N. Mordant, P. Metz, O. Michel, J.F. Pinton. Scaling and intermittency
of Lagrangian velocity in fully developped turbulence. Phys. Rev. Lett., 87:21
24, 2001.
51. H. Moffatt. Simple topological aspects of turbulent velocity dynamics. In
T. Tatsumi, editor, Turbulence and chaotic phenomena in uids, pages 223
230. Elsevier, 1984.
52. H. Moffatt. Spiral structures in turbulent ows. In M. Farge, J.C.R. Hunt,
and J.C. Vassilicos, editors, Wavelets, Fractals, and Fourier Transforms, pages
317324. Oxford : Clarendon Press, 1993.
53. C. Meneveau, K.R. Sreenivasan. The multifractal nature of turbulent
energy-dissipation. J. Fluid Mechanics, 224:429484, March 1991.
54. J. Mathieu, J. Scott. An introduction to Turbulent Flow. Cambridge University Press, 2000.
55. B. Mandelbrot, J. W. Van Ness. Fractional Brownian motions, fractional
Brownian noises and applications. SIAM review, 10:422437, 1968.
56. A.S. Monin, A.S. Yaglom. Statistical uid mechanics (vol. 1 and 2). The
MIT Press, 1971.
57. E. Noret, M. Guglielmi. Modelisation et synthèse dune classe de signaux
a memoire longue. In Proc. Conf. Delft (NL) : Fractals in
auto-similaires et `
Engineering, pages 301315. INRIA, 1999.
58. J.M. Nicolas. Introduction aux statistiques de deuxième espèce : Applications
aux lois dimages SAR. Rapport interne, ENST, Paris, February 2002.
59. E.A. Novikov. Intermittency and scale-similarity in the structure of a turbulent
ow. P.M.M. Appl. Math. Mech., 45:231241, 1971.
60. C. Nuzman, V. Poor. Transformed spectral analysis of self-similar processes.
In Proc. CISS99, May 1999.
61. C. Nuzman, V. Poor. Linear estimation of self-similar processes via Lampertis
transformation. J. of Applied Probability, 37(2):429452, June 2000.
62. C. Poulain, N. Mazelllier, P. Gervais, Y. Gagne, C. Baudet. Lagrangian
vorticity and velocity measurements in turbulent jets. Flow, Turbulence, and
Combustion, 72(24):245271, 2004.
63. G. Pedrizzetti, E. Novikov, A. Prakovsky. Self-similarity and probability distributions of turbulent intermittency. Physical Review E, 53(1):475484,
1996.

Signal Processing Methods Related to Models of Turbulence

301

64. S. Pietropinto, C. Poulain, C. Baudet, B. Castaing, B. Chabaud,

bral, Y. Ladam, P. Lebrun, O. Pirotte, P. Roche. SuY. Gagne, B. He
perconducting instrumentation for high Reynolds turbulence experiments with
low temperature gaseous helium. Physica C, 386:512516, 2003.
65. G. Paladin, A. Vulpiani. Degrees of freedom of turbulence. Phys. Rev. A,
35:1971, 1987.
66. P.G. Saffman, D.I. Pullin. Calculation of velocity structure fonctions for
vortex models of isotropic turbulence. Phys. Fluids, 8(11):30723077, 1996.
67. G. Samorodnitsky, M. Taqqu. Stable Non-Gaussian Random Processes.
Chapman & Hall, 1994.
68. W. Vervaat. Properties of general self-similar processes. Bull. of International
Statistical Inst., 52:199216, 1987.
69. E. Waymire, S. Williams. A general decomposition theory for random cascades. Bull. Amer. Math. Soc., 31:216222, 1994.
70. B. Yazici, R. L. Kashyap. A class of second-order stationary self-similar
processes for 1/f phenomena. IEEE Trans. on Signal Proc., 45(2):396410,
1997.
71. A. Zemanian. Generalized integral transforms. Dover, 1987.

Control of Interferometric Gravitational Wave

Detectors
Francois Bondu and Jean-Yves Vinet
Laboratoire Artemis
Observatoire de la C
ote dAzur,
BP 4229, 06304 Nice (France).
Francois.Bondu@obs-nice.fr
vinet@obs-nice.fr

1 Introduction
Interferometric gravitational wave detectors are promising instruments to
make the rst direct detection of gravitational waves, and later to permanently open a new window on the universe [1, 2, 3, 4]. A detector like Virgo
aims at observing signals in the 10 Hz - 10 kHz band. The detectors have very
strong noise requirements, bringing challenging designs of control loops.
It was directly observed in 1918 that gravity has some eect on light
propagation, and in particular is able to bend light rays nearby massive objects
like the sun. This eect was predicted by A. Einstein as a consequence of
General Relativity, a relativistic theory of gravitation describing gravitational
elds as the geometry of space-time. Non static gravitational elds can, in this
theory, have some time variable eects on space-time, and be considered as
gravitational waves. Highly energetic astrophysical events are expected to
produce such waves, the observation of which would be of the highest interest
for our understanding of the Universe. These waves in all theoretical studies,
are foreseen very weak (analogous, in an interferometric length measurement,
to a distortion of one interferometer arm L/L 1022 in best cases).

2 Interferometers
Interferometric detection of gravitational waves amounts to continuously measure the length dierence between two orthogonal paths. The right topology
of the instrument is that of a Michelson interferometer [5]. Virgo is the nearest (Pisa, Italy) example of such an interferometric detector of gravitational
waves. It consists essentially of large mirrors suspended by wires in a vacuum,
and of light beams partially reected and/or transmitted by these mirrors
over long distances. The right sensitivity is reached by two steps of light
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 303311, 2006.
Springer-Verlag Berlin Heidelberg 2006

304

Francois Bondu and Jean-Yves Vinet

Fig. 1. Principles of an interferometer to detect gravitational waves. A laser beam

lights a Michelson interferometer. The mirror suspensions are ecient enough to
lter out the seismic noise in the detection band. Resonant Fabry-Perot cavities in
the arms allow to enhance the interferometer sensitivity.

Fig. 2. Aerial view of the Virgo interferometer

Control of Interferometric Gravitational Wave Detectors

305

power build-up: one step is the resonance caused by the so-called recycling
mirror, a second one is due to the use of Fabry-Perot cavities on each arm.
Recycling has the eect of increasing the light power, Fabry-Perots have the
eect of enlarging the eective lengths of the arms.

3 Servo systems
3.1 Introduction
To make the complex optical structure of an interferometer work, an ensemble
of servo-systems is needed in order to lock the resonant cavities and the lasers
frequency at the right place. The design of the open loop transfer functions
has to make a trade-o between large gain for frequencies below 1 Hz (where
seismic noise is large), stability of the system, and large attenuation of loop
gain above 10 Hz, where the aim is to detect gravitational waves. This is the
reason why sophisticated servo-loops have been studied for several years.
An interferometer as a complex optical structure
An interferometer to detect gravitational waves, is, today, typically made up
of 6 main mirrors. Each arm of the interferometer is made of a long resonant
optical cavity, and an additional mirror, in front of the interferometer, helps
to build up the light power.
EMY

cavity Y

IMY
ly

PR
laser

IMX

l0
splitter

EMX

cavity X

Fig. 3. Main distances of an interferometer to detect gravitational waves

Here we will simplify the analysis to the displacements of the mirrors along
the optical axes.

306

Francois Bondu and Jean-Yves Vinet

The main output of the interferometer, the so-called dark fringe sensitive
to gravitational waves, is measuring the variations of the dierence of the two
cavity lengths Lx Ly; this is the dierential mode of the interferometer (the
static dierence is close to zero). In order for the cavities to work close to
their optimal point, both Lx and Ly have to be controlled at the picometer
level.
The common mode of the interferometer, Lx + Ly, can be used to control
the laser frequency uctuations.
The short Michelson dierence, lx ly, and the recycling cavity length,
l0 + (lx + ly)/2, should be controlled at the order of the picometer as well.
The error signals for these controls are provided by additional outputs of
the interferometer (other ports than the dark fringe), and actuation is done
by means of currents in coils facing magnets glued on the mirrors.
On these four lengths, only the interferometer dierential mode can be
monitored with high signal to noise ratio. Thus, on the other lengths, the
unity gain and gain loop should be designed so that it does not introduce
noise on the dark fringe.
Seismic noise and the suspensions
The spectrum of the seismic noise, in the 10 Hz-10 kHz bandwidth, where
one expects to detect gravitational waves, is orders of magnitude higher than
what is necessary to detect gravitational waves. Thus, seismic noise isolators
are necessary [6, 7].
The suspensions act as low pass lter transfer functions: at the test mass
mirror level, the seismic noise is much attenuated for frequencies above 10 Hz.
But the seismic noise is amplied on resonances (several resonances in the 0.1
Hz - few Hz band), and still quite high on low frequencies. As a result, the
free suspended mirror motion is about 1 m on a 1 second timescale. There
is a need for a loop gain of 106 below 1 Hz. When controlling the degrees
of freedom other than the dierential one, the loop gain should be very low
for frequencies above 10 Hz, in order to not re-inject the error signal noise.
Laser frequency stabilization
A similar feedback loop issue exists for the laser frequency stabilization [8].
The Virgo instrument
requires a very stable laser frequency with a relative
level as low as 1020 / Hz, in the 10 Hz 10 kHz band; the frequency has
also to be reasonably stable for frequencies below 10 Hz, so that the FabryPerot cavities are kept resonant. The stability is ensured by the quality of the
reference oscillator.
The laser frequency in the 10 Hz 10 kHz band is locked on the common
mode of the two long Fabry-Perot cavities. The seismic isolation ensures that
this level of stability can be reached.

Control of Interferometric Gravitational Wave Detectors

307

The low frequency stability is dened by locking the common mode to a

rigid and very stable cavity, manufactured in a material with a low expansion
thermal coecient. The spectral resolution of such a cavity is about 4 orders
of magnitude higher than the one of the long Fabry-Perot. Therefore, the
lock of the common mode of the long arms to the short cavity should have a
negligible action at 10 Hz.
Other constraints
Each mirror of the interferometer has also to be controlled in the angular
degrees of freedom, with similar, although less stringent, constraints to the
longitudinal ones.
3.2 Mathematical requirements on open loop transfer function
The noise requirements on the mirror motions are very strong, and impose
the design of ecient corrector lters for the various feedback loops.
Model of servo loops
All servo loops are dened by their frequency domain open loop transfer function. This allows to impose the constraints on the gain at various frequencies,
since this is a fundamental issue in the control of the locking point of the
resonant cavities, in order not to re-inject the noise of the error signals.
The open loop transfer function G is a complex function of the Laplace
variable s, or, equivalently, of a Fourier frequency f . This function is the
product of the various elements of the loop: the system to be corrected, including the actuator transfer functions; the error signal transfer function, related
to the optical properties; the corrector lter, whose design must satisfy the
requirements for the open loop transfer function.
System identication
The error signal transfer function and the actuator transfer function can be
easily approximated by rational functions of the variable s. The approximation
of resonances, poles, quality factors, etc. is done so that the dierence between
the modeled transfer function and the measured one is not bigger than a few
percent in amplitude and a few degrees in the phase. This is done by manually
tting the measured curves.
The corrector lter is implemented in a DSP by a description of gain, poles
and zeros together with their quality factors. Thus, the corrector lter is also
a rational fraction of the variable s, with a reasonable order (could be 10,
exceptionally 20).

308

Francois Bondu and Jean-Yves Vinet

Mathematical description of the requirements

We make the assumption that the error signal and actuator transfer functions
are unity at all frequencies. Of course, this is not true, and exact compensation
of pole-zero pairs is not possible either. But we assume that we can deal with
this problem once corrector lters for unity system transfer functions are
available.
The customary variable is the frequency f , with s = 2if
Ideally, we would like to nd a solution for a function G modeled as a rational function of a reasonable order (for example numerator and denominator
order not bigger than 20). The constraints become:
f < f1 ,

|G(f )| > G1

(1)

f > f2 ,

|G(f )| < G2

(2)

The closed loop should be stable, i.e. 1/(1 + G) should not have any pole
with positive real parts.
The closed loop should be robust. We could dene as commonly done gain
and phase margins, but unfortunately this can still allow low eective margins
for complicated functions. We then require that:
f,

|1/(1 + G(f ))| < k

(3)

Example of aimed values: f1 = 0.1 Hz, f2 = 10 Hz, G1 = 106 , G2 = 104 ,

k = 2 or, even better f1 = 1 Hz, f2 = 10 Hz, G1 = 104 , G2 = 104 , k = 2.
3.3 The Coulons solution
At the Observatoire de la C
ote dAzur, J.-P. Coulon has built a program to nd solutions such that f2 = 10 Hz, f1 = 0.1 Hz or lower. The
program actually looks for functions made with a function Gs (f ) so that
G(f ) = Gs (f )/Gs (1/f ). Gs (f ) is a rational fraction with real coecients,
with m zeroes and n poles.
The program looks over the coecients of the Gs function, with a successive approximation technique, and some culinary recipes to save computing
time.
The algorithm allows to nd solutions with an order up to 6 zeroes and
n = 8 poles for the Gs function in a reasonable time. Despite this modest
complexity, the solutions given are much better than the current engineer
design.
The open loop transfer function in the Nichols plot helps to check the
stability:
Such a lter has been tried successfully, on the real Virgo suspensions.

Control of Interferometric Gravitational Wave Detectors

309

Open loop transfer function in Bode plot

Loop gain

10
Frequency (Hz)

Fig. 4. Two open loop transfer functions, based on the same simplied suspension
system (one resonant pole at 0.6 Hz): Coulons lter (continuous line), engineer
design (dashed line). The Coulons lter is dened for k = 2, f2 /f1 = 278.
Open loop transfer function; Nichols plot

Transfer function amplitude

10
720

630

540

450
360
270
180
Transfer function phase (degrees)

Fig. 5. Coulons lter used to stabilize a pendulum in the Nichols plot. The dotted
circles correspond to a closed loop overshoot of 2 (corresponding to a gain margin
of 2 and a phase margin of 30 .)

310

Francois Bondu and Jean-Yves Vinet

Attenuation of crossmn filters

attenuation for frequencies above f2

m=2 n=4
m=4 n=6
m=6 n=8

10
1
10

f2/f1

Fig. 6. Performances of various Coulons lters, with m zeroes and n poles for the
Gs (f ) function.

The performances of the lters have been computed, depending of the ratio
for various orders of poles and zeroes of the Gs (f ) function: The gure 6
seems to indicate that the lter performances will not be very high on a short
frequency span f2 /f1 = 10, if one increases the order of the lter.
f2
f1 ,

4 Conclusion and perspectives

The Coulons lters give very good result, when f2 /f1 is about two decades.
This makes already very good attenuation at 10 Hz (more than 106 ).
Yet, the loop gain in the 0.1 Hz - 1 Hz band is not enough to attenuate the
real seismic noise. One would need Coulon-like lters with f2 /f1 of the order
of 10, if possible. New kind of mathematical functions might be investigated
to improve the feedback lter performances.

References
1. F. Acernese et al. Status of Virgo. Class. Quantum Grav., 21(5):S385+, March
2004.
2. D. Sigg. Commissioning of Ligo detectors. Class. Quantum Grav., 21(5):S409+,
March 2004.

Control of Interferometric Gravitational Wave Detectors

311

3. B. Willke et al. Status of Geo 600. Class. Quantum Grav., 21(5):S417+, March
2004.
4. R. Takahashi and the TAMA collaboration. Status of Tama 300. Class. Quantum Grav., 21(5):S403+, March 2004.
5. P. Saulson. Fundamentals of interferometric gravitational wave detectors. World
Scientic Publishing Company, Singapore, 1994.
6. F. Acernese et al. The last stage suspension of the mirrors for the gravitational
wave antenna Virgo. Class. Quantum Grav., 21(5):S245+, March 2004.
7. F. Acernese et al. Properties of seismic noise at the Virgo site. Class. Quantum
Grav., 21(5):S433+, March 2004.
8. F. Bondu, A. Brillet, F. Cleva, H. Heitmann, M. Loupias, C.N. Man,
H. Trinquet, and the VIRGO collaboration. The Virgo injection system. Class.
Quantum Grav., 19(7), April 2002.

Lecture Notes in Control and Information Sciences

Edited by M. Thoma and M. Morari
Further volumes of this series can be found on our homepage:
springer.com

Vol. 326: Wang, H.-S.; Yung, C.-F.; Chang, F.-R.

H Control for Nonlinear Descriptor Systems

164 p. 2006 [1-84628-289-6]

Vol. 325: Amato, F.

Robust Control of Linear Systems Subject to Uncertain
Time-Varying Parameters
180 p. 2006 [3-540-23950-2]
Vol. 324: Christodes, P.; El-Farra, N.
Control of Nonlinear and Hybrid Process Systems
446 p. 2005 [3-540-28456-7]
Vol. 323: Bandyopadhyay, B.; Janardhanan, S.
Discrete-time Sliding Mode Control
147 p. 2005 [3-540-28140-1]
Vol. 322: Meurer, T.; Graichen, K.; Gilles, E.D. (Eds.)
Control and Observer Design for Nonlinear Finite
and Innite Dimensional Systems
422 p. 2005 [3-540-27938-5]
Vol. 321: Dayawansa, W.P.; Lindquist, A.;
Zhou, Y. (Eds.)
New Directions and Applications in Control Theory
400 p. 2005 [3-540-23953-7]
Vol. 320: Steffen, T.
Control Reconguration of Dynamical Systems
290 p. 2005 [3-540-25730-6]
Vol. 319: Hofbaur, M.W.
Hybrid Estimation of Complex Systems
148 p. 2005 [3-540-25727-6]
Vol. 318: Gershon, E.; Shaked, U.; Yaesh, I.
H Control and Estimation of State-muliplicative
Linear Systems
256 p. 2005 [1-85233-997-7]
Vol. 317: Ma, C.; Wonham, M.
Nonblocking Supervisory Control of State Tree
Structures
208 p. 2005 [3-540-25069-7]
Vol. 316: Patel, R.V.; Shadpey, F.
Control of Redundant Robot Manipulators
224 p. 2005 [3-540-25071-9]
Vol. 315: Herbordt, W.
Sound Capture for Human/Machine Interfaces:
Practical Aspects of Microphone Array Signal Processing
286 p. 2005 [3-540-23954-5]
Vol. 314: Gil', M.I.
Explicit Stability Conditions for Continuous Systems
193 p. 2005 [3-540-23984-7]

Vol. 313: Li, Z.; Soh, Y.; Wen, C.

Switched and Impulsive Systems
277 p. 2005 [3-540-23952-9]
Vol. 312: Henrion, D.; Garulli, A. (Eds.)
Positive Polynomials in Control
313 p. 2005 [3-540-23948-0]
Vol. 311: Lamnabhi-Lagarrigue, F.; Lor a, A.;
Panteley, E. (Eds.)
Advanced Topics in Control Systems Theory
294 p. 2005 [1-85233-923-3]
Vol. 310: Janczak, A.
Identication of Nonlinear Systems Using Neural
Networks and Polynomial Models
197 p. 2005 [3-540-23185-4]
Vol. 309: Kumar, V.; Leonard, N.; Morse, A.S. (Eds.)
Cooperative Control
301 p. 2005 [3-540-22861-6]
Vol. 308: Tarbouriech, S.; Abdallah, C.T.; Chiasson, J. (Eds.)
Advances in Communication Control Networks
358 p. 2005 [3-540-22819-5]

Vol. 307: Kwon, S.J.; Chung, W.K.

Perturbation Compensator based Robust Tracking
Control and State Estimation of Mechanical Systems
158 p. 2004 [3-540-22077-1]
Vol. 306: Bien, Z.Z.; Stefanov, D. (Eds.)
Advances in Rehabilitation
472 p. 2004 [3-540-21986-2]
Vol. 305: Nebylov, A.
Ensuring Control Accuracy
256 p. 2004 [3-540-21876-9]
Vol. 304: Margaris, N.I.
Theory of the Non-linear Analog Phase Locked Loop
303 p. 2004 [3-540-21339-2]
Vol. 303: Mahmoud, M.S.
Resilient Control of Uncertain Dynamical Systems
278 p. 2004 [3-540-21351-1]
Vol. 302: Filatov, N.M.; Unbehauen, H.
Adaptive Dual Control: Theory and Applications
237 p. 2004 [3-540-21373-2]
Vol. 301: de Queiroz, M.; Malisoff, M.; Wolenski, P. (Eds.)
Optimal Control, Stabilization and Nonsmooth Analysis
373 p. 2004 [3-540-21330-9]

Vol. 300: Nakamura, M.; Goto, S.; Kyura, N.; Zhang, T.

Mechatronic Servo System Control
Problems in Industries and their Theoretical Solutions
212 p. 2004 [3-540-21096-2]
Vol. 299: Tarn, T.-J.; Chen, S.-B.; Zhou, C. (Eds.)
Robotic Welding, Intelligence and Automation
214 p. 2004 [3-540-20804-6]
Vol. 298: Choi, Y.; Chung, W.K.
PID Trajectory Tracking Control for Mechanical Systems
127 p. 2004 [3-540-20567-5]
Vol. 297: Damm, T.
Rational Matrix Equations in Stochastic Control
219 p. 2004 [3-540-20516-0]
Vol. 296: Matsuo, T.; Hasegawa, Y.
Realization Theory of Discrete-Time Dynamical Systems
235 p. 2003 [3-540-40675-1]
Vol. 295: Kang, W.; Xiao, M.; Borges, C. (Eds)
New Trends in Nonlinear Dynamics and Control,
and their Applications
365 p. 2003 [3-540-10474-0]
Vol. 294: Benvenuti, L.; De Santis, A.; Farina, L. (Eds)
Positive Systems: Theory and Applications (POSTA 2003)
414 p. 2003 [3-540-40342-6]
Vol. 293: Chen, G. and Hill, D.J.
Bifurcation Control
320 p. 2003 [3-540-40341-8]
Vol. 292: Chen, G. and Yu, X.
Chaos Control
380 p. 2003 [3-540-40405-8]
Vol. 291: Xu, J.-X. and Tan, Y.
Linear and Nonlinear Iterative Learning Control
189 p. 2003 [3-540-40173-3]
Vol. 290: Borrelli, F.
Constrained Optimal Control
of Linear and Hybrid Systems
237 p. 2003 [3-540-00257-X]
Vol. 289: Giarre, L. and Bamieh, B.
Multidisciplinary Research in Control
237 p. 2003 [3-540-00917-5]
Vol. 288: Taware, A. and Tao, G.
Control of Sandwich Nonlinear Systems
393 p. 2003 [3-540-44115-8]
Vol. 287: Mahmoud, M.M.; Jiang, J.; Zhang, Y.
Active Fault Tolerant Control Systems
239 p. 2003 [3-540-00318-5]
Vol. 286: Rantzer, A. and Byrnes C.I. (Eds)
Directions in Mathematical Systems
Theory and Optimization
399 p. 2003 [3-540-00065-8]
Vol. 285: Wang, Q.-G.
Decoupling Control
373 p. 2003 [3-540-44128-X]

Vol. 284: Johansson, M.

Piecewise Linear Control Systems
216 p. 2003 [3-540-44124-7]
Vol. 283: Fielding, Ch. et al. (Eds)
Advanced Techniques for Clearance of
Flight Control Laws
480 p. 2003 [3-540-44054-2]
Vol. 282: Schroder, J.
Modelling, State Observation and
Diagnosis of Quantised Systems
368 p. 2003 [3-540-44075-5]
Vol. 281: Zinober A.; Owens D. (Eds)
Nonlinear and Adaptive Control
416 p. 2002 [3-540-43240-X]
Vol. 280: Pasik-Duncan, B. (Ed)
Stochastic Theory and Control
564 p. 2002 [3-540-43777-0]
Vol. 279: Engell, S.; Frehse, G.; Schnieder, E. (Eds)
Modelling, Analysis, and Design of Hybrid Systems
516 p. 2002 [3-540-43812-2]
Vol. 278: Chunling D. and Lihua X. (Eds)
H Control and Filtering of
Two-dimensional Systems
161 p. 2002 [3-540-43329-5]
Vol. 277: Sasane, A.
Hankel Norm Approximation
for Innite-Dimensional Systems
150 p. 2002 [3-540-43327-9]
Vol. 276: Bubnicki, Z.
Uncertain Logics, Variables and Systems
142 p. 2002 [3-540-43235-3]
Vol. 275: Ishii, H.; Francis, B.A.
Limited Data Rate in Control Systems with Networks
171 p. 2002 [3-540-43237-X]
Vol. 274: Yu, X.; Xu, J.-X. (Eds)
Variable Structure Systems:
Towards the 21st Century
420 p. 2002 [3-540-42965-4]
Vol. 273: Colonius, F.; Grune, L. (Eds)
Dynamics, Bifurcations, and Control
312 p. 2002 [3-540-42560-9]
Vol. 272: Yang, T.
Impulsive Control Theory
363 p. 2001 [3-540-42296-X]
Vol. 271: Rus, D.; Singh, S.
Experimental Robotics VII
585 p. 2001 [3-540-42104-1]
Vol. 270: Nicosia, S. et al.
RAMSETE
294 p. 2001 [3-540-42090-8]
Vol. 269: Niculescu, S.-I.
Delay Effects on Stability
400 p. 2001 [1-85233-291-316]