Harmonic Analysis and Rational Approximation
Harmonic Analysis and Rational Approximation
327
Harmonic Analysis
and Rational
Approximation
Their R^oles in Signals, Control
and Dynamical Systems
With 47 Figures
Editors
Dr. J.-D. Fournier
Dr. J. Grimm
Dr. J. Leblond
Departement ARTEMIS
CNRS and Observatoire de la Cote dAzur
BP 4229
06304 Nice Cedex 4
France
Prof. J. R. Partington
University of Leeds
School of Mathematics
LS2 9JT Leeds
United Kingdom
ISSN 0170-8643
ISBN-10 3-540-30922-5 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-30922-2 Springer Berlin Heidelberg New York
Library of Congress Control Number: 2005937084
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specically the rights of translation, reprinting, reuse of illustrations, recitation,
broadcasting, reproduction on microlm or in other ways, and storage in data banks. Duplication
of this publication or parts thereof is permitted only under the provisions of the German Copyright
Law of September 9, 1965, in its current version, and permission for use must always be obtained
from Springer-Verlag. Violations are liable to prosecution under German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
Springer-Verlag Berlin Heidelberg 2006
Printed in Germany
The use of general descriptive names, registered names, trademarks, etc. in this publication does
not imply, even in the absence of a specic statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
Typesetting: Data conversion by authors.
Final processing by PTP-Berlin Protago-TEX-Production GmbH, Germany
Cover-Design: design & production GmbH, Heidelberg
Printed on acid-free paper
54/3141/Yu - 5 4 3 2 1 0
Preface
This book is an outgrowth of a summer school that took place on the Island of
Porquerolles in September 2003. The goal of the school was mainly to teach
certain pieces of mathematics to practitioners coming from three dierent
communities: signal, control and dynamical systems theory. Our impression
was indeed that, in spite of their great potential applicability, 20th century
developments in approximation theory and Fourier theory, while commonplace
among mathematicians, are unknown or under-appreciated within the abovementioned communities. Specically, we had in mind:
some advances in analytic, meromorphic and rational approximation theory,
as well as their links with identication, robust control and stabilization
of innite-dimensional systems;
the rich correspondences between the complex and real asymptotic behavior of a function and its Fourier transform, as already described, for
instance, in Wieners books.
In this respect, it is noticeable that in the last twenty years, much eort has
been devoted to the research and teaching of recent decomposition tools, like
wavelets or splines, linked to real analysis. From the early stages, we shared
the view that, in contrast, research in certain elds suers from the lack of a
working knowledge of modern Fourier analysis and modern complex analysis.
Finally, we felt the need to introduce at the core of the school a probabilistic counterpart to some of the questions raised above. Although familiar
to specialists of signal and dynamical systems theory, probability is often ignored by members of the control and approximation theory communities. Yet
we hope to convey to the reader the conviction that there is room for fascinating phenomena and useful results to be discovered at the junction of
probability and complex analysis.
This book is not just a proceedings of the summer school, since the contributions made by the speakers have been totally rewritten, anonymously
refereed and edited in order to reect some of the common themes in which
the authors are interested, as well as the diversity of the applications. The
VIII
Preface
contributors were asked to imagine addressing a fellow-scientist with a nonnegligible but modest background in mathematics.
In drawing the boundaries between the chapters of the book, we have also
tried to eliminate redundancy, while allowing for repetition of a theme as seen
from dierent points of view.
We begin in Part I with a general introduction from the late Maciej Pindor. He surveys the conceptual and practical value of complex analyticity,
both in the physical and the conjugate Fourier variables, for physical theories
originally built in the real domain. Obstacles to analytic extension, like polar
singularities known as resonances, a key concept of the school, turn out to
have themselves a physical meaning. It is illustrated here by means of optical
dispersion relations and the scattering of particles.
Part II of this book contains basic material on the complex analysis and
harmonic analysis underlying the further developments presented in the book.
Candelpergher writes on complex analysis, in particular analytic continuation
and the use of Borel summability and Gevrey series. Partington gives an
account of basic harmonic analysis, including Fourier, Laplace and Mellin
transforms, and their links with complex analysis.
Part III contains further basic material, explaining some of the aspects
of approximation theory. Pindor presents the theory of Pade approximation,
including convergence issues. Levin and Sa explain how potential theoretic
tools such as capacity play a role in the study of ecient polynomial and
rational approximation, and analyse some weighted problems. Partington discusses the use of bases of rational functions, including orthogonal polynomials,
Szego bases, and wavelets.
Finally Part IV completes the foundations by a tour in probability theory.
The driving force behind the order emerging from randomness, the central
limit theorem, is explained by Collet, including convergence and fractal issues.
Dujardin gives an account of the properties of random real polynomials, with
particular reference to the distribution of their real and complex roots. Pindor
puts rational approximation into a stochastic context, the basic idea being to
obtain rational interpolants to noisy data.
The major application of the themes of this book lies in signal and control
theory, which is treated in Part V. Deistler gives a thorough treatment of
the spectral theory of stationary processes, leading to an account of ARMA
and state space systems. Cuocos paper treats the power spectral density of
physical systems and its estimation, to be used in the extraction of signals
out of noisy data. Olivi continues some of the ideas of Parts II and III, and,
under the general umbrella of the Laplace transform in control theory, discusses linear time-invariant systems, controllability and rational approximation.
Baratchart uses LaplaceFourier transform techniques in giving an account
of recent work analysing problems originating in the identication of linear
systems subject to perturbations. In a nal return to the perspective of the
Introduction, Parts VI and VII shows the r
ole of the previously-discussed
tools in extremely diverse domains of physics. In Part VI, some mathematical
Preface
IX
aspects of dynamical systems theory are discussed. Biasco and Celletti are
concerned with celestial mechanics and the use of perturbation theory to analyse integrable and nearly-integrable systems. Baladi gives a brief introduction
to resonances in hyperbolic and hamiltonian systems, considered via the spectra of certain transfer operators. Part VII is devoted to a modern approach to
two classical physics problems. Borgnat is concerned with turbulence in uid
ow; he discusses which tools, including the Mellin transform, can be adapted to reveal the various statistical properties of intermittent signals. Finally,
Bondu and Vinet give an account of the high-performance control and noise
analysis required at the gravitational waves VIRGO antenna.
Last but not least, our thanks go to the authors of the 17 contributions
gathered in this book, as well as to all those who have helped us produce it,
with particular mention of the anonymous referees.
Jean-Daniel Fournier,
Jose Grimm,
Juliette Leblond,
Jonathan R. Partington.
Preface
Maciej PINDOR
Our colleague Dr. Maciej Pindor of Poland, the friend, collaborator and visitor of Jean-Daniel Fournier (JDF), died on Saturday 5th July 2003 at the
Nice Observatory. Apparently, he was on his way to work from the Pavillon Magnetique, where he was staying, to his oce at CION. His death
was attributed to cardiac problems. He was 62 years old. Some colleagues
were present, including the Director of the Observatoire de la C
ote dAzur
(OCA) and JDF, when help arrived.
Maciej Pindor was a senior lecturer at the Institute of Theoretical Physics
at the University of Warsaw. He performed his research work with the same
care that he devoted to his teaching duties. He was a specialist in complex
analysis, applied to some questions of theoretical physics, and, in recent years,
to the processing of data; he produced theoretical and numerical solutions,
which in this regard showed an ingenuity and reliability that is hard to match.
He taught eective computational methods to young physicists. From the
beginning of the thesis that Benedicte Dujardin has been writing under the
direction of JDF, M. Pindor participated in her supervision.
The collaboration of JDF and his colleagues with M. Pindor began in 1996.
Over the years, it was supported by regular or exceptional funding from the
Cassini Laboratory, the Theoretical Physics Institute of Warzaw, the Polish
Academy of Sciences and from OCA (with an associated post in astronomy).
Thus M. Pindor came to Nice several times, and many people knew him. His
genuine modesty made him a very accessible person, and dealings with him
were agreeable and fruitful in all cases.
For the summer school of Porquerolles, he had agreed to give three courses,
on three dierent subjects. In this he was motivated by friendship, scientic
interest, and his acute awareness of the teaching responsibility borne by university sta; since then he had overcome the anxiety that he felt towards the
idea of presenting mathematics in front of professional mathematicians. In
particular, he was due to give the opening course, showing the link between
physics and mathematics, treating the ideas of analyticity and resonance. He
produced his notes for the course in good time, and these are therefore included under his name in this book and listed in the table of contents. At
Porquerolles his courses were given by three dierent people. As co-worker
JDF took the topic rational approximation and noise. We sincerely thank
the two others: G. Turchetti, himself an old friend of M. Pindor, agreed to
expound the r
ole of analytic continuation and Pade approximants in theoretical and mathematical physics; E. B. Sa kindly oered to lecture on the
mathematics behind Pade approximants.
This book is dedicated to the memory of Maciej Pindor.
This obituary and M. Pindors photograph have been included here by
agreement with his widow, Dr. Krystyna Pindor-Rakoczy.
Preface
XI
XII
Preface
Preface
XIII
Organization:
J. Gosselin (CNRS, Nice),
F. Limouzis (INRIA, SA),
D. Sergeant (INRIA, SA).
Finally we thank the sponsors of the school: CNRS (Formation Permanente),
INRIA (Formation Permanente), INRIA Sophia-Antipolis, Conseil Regional
PACA, Observatoire de la C
ote dAzur (OCA), Departement Cassini, Minist`ere delegue Recherche et Nouvelles Technologies, VIRGO-EGO. We thank
them all for their support.
Nice (France), Sophia-Antipolis (France), July 2005.
The co-directors:
J.-D. Fournier,
J. Leblond
Contents
Part I Introduction
Analyticity and Physics
Maciej Pindor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2
The optical dispersion relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
3
Scattering of particles and complex energy . . . . . . . . . . . . . . . . . . . . . . 7
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
Part II Complex Analysis, Fourier Transform
and Asymptotic Behaviors
From Analytic Functions to Divergent Power Series
Bernard Candelpergher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
1
Analyticity and dierentiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
Analytic continuation and singularities . . . . . . . . . . . . . . . . . . . . . . . . .
3
Continuation of a power series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4
Gevrey series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
5
Borel summability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
6
Acknowledgments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
15
15
19
22
27
30
37
37
39
39
45
46
49
51
55
XVI
Contents
59
59
60
63
66
69
69
71
71
78
82
85
88
89
93
Good Bases
Jonathan R. Partington . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2
Orthogonal polynomials and Szeg
o bases . . . . . . . . . . . . . . . . . . . . . . . 96
3
Wavelets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Part IV The R
ole of Chance
Some Aspects of the Central Limit Theorem
and Related Topics
Pierre Collet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
1
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
2
A short elementary probability theory refresher . . . . . . . . . . . . . . . . . 107
3
Another proof of the CLT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
4
Some extensions and related results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
5
Statistical Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
6
Large deviations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
7
Multifractal measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Contents
XVII
XVIII Contents
Contents
XIX
List of Contributors
Viviane Baladi
CNRS UMR 7586,
Institut Mathematique de Jussieu,
75251 Paris (France)
baladi@math.jussieu.fr
Laurent Baratchart
Inria, Apics Team
2004, Route des Lucioles
06902 Sophia Antipolis (France)
baratcha@sophia.inria.fr
Luca Biasco
Dipartimento di Matematica,
Universit`a di Roma Tre,
Largo S. L. Murialdo 1,
I-00146 Roma (Italy)
biasco@mat.uniroma3.it
Fran
cois Bondu
Laboratoire Artemis
CNRS UMR 6162
Observatoire de la C
ote dAzur
BP4229 Nice (France)
Francois.Bondu@obs-nice.fr
Pierre Borgnat
Laboratoire de Physique
UMR-CNRS 5672
ENS
Lyon 46 allee dItalie
69364 Lyon Cedex 07 (France)
Pierre.Borgnat@ens-lyon.fr
Bernard Candelpergher
University of Nice-Sophia Antipolis
Parc Valrose
06002 Nice (France)
candel@math.unice.fr
Alessandra Celletti
Dipartimento di Matematica,
Universit`a di Roma Tor Vergata,
Via della Ricerca Scientica 1,
I-00133 Roma (Italy)
celletti@mat.uniroma2.it
Pierre Collet
Centre de Physique Theorique
CNRS UMR 7644
Ecole Polytechnique
F-91128 Palaiseau Cedex (France)
collet@cpht.polytechnique.fr
Elena Cuoco
INFN, Sezione di Firenze,
Via G. Sansone 1,
50019 Sesto Fiorentino (FI),
present address:
EGO, via Amaldi,
Santo Stefano a Macerata,
Cascina (PI) (Italy)
elena.cuoco@ego-gw.it
XXII
List of Contributors
Manfred Deistler
Department of Mathematical
Methods in Economics,
Econometrics and System Theory,
Vienna University of Technology
Argentinierstr. 8,
A-1040 Wien (Austria)
Deistler@tuwien.ac.at
B
en
edicte Dujardin
Departement Artemis,
Observatoire de la C
ote dAzur,
BP 4229, 06304 Nice (France)
dujardin@obs-nice.fr
Jonathan R. Partington
School of Mathematics
University of Leeds
Leeds LS2 9JT (U.K.)
J.R.Partington@leeds.ac.uk
Maciej Pindor
Instytut Fizyki Teoretycznej,
Uniwersytet Warszawski ul.Hoza 69,
00-681 Warszawa (Poland)
deceased
Eli Levin
The Open University of Israel
Department of Mathematics
P.O. Box 808, Raanana (Israel)
elile@openu.ac.il
Edward B. Sa
Center for Constructive Approximation
Department of Mathematics
Vanderbilt University
Nashville, TN 37240 (USA)
esaff@math.vanderbilt.edu
Martine Olivi
Inria, Apics Team
2004, Route des Lucioles
06902 Sophia Antipolis (France)
Martine.Olivi@sophia.inria.fr
Jean-Yves Vinet
ILGA, Departement Fresnel
Observatoire de la C
ote dAzur
BP 4229, 06304 Nice (France)
vinet@obs-nice.fr
1 Introduction
My goal is to present to you some aspects of the role that the mathematical
concept as subtle and abstract as analyticity plays in physics.
In retrospective we could say that also the real number notion is in
fact a very abstract one and its applicability to the description of the world
external to our mind, is sort of a miracle I do not want to dwell here on a
relation between constructs of the mind and the external world this is the
playground for philosophers and I do not wish to compete with them. I mean
here the intuitively manifest dierence between the obvious nature of integer
numbers (and nearly obvious nature of rationals) and abstractness of real
numbers. This abstractness notwithstanding, I do not think that talking in
terms of real numbers when describing the real world needed much more
intellectual eort than applying rational numbers there. This fact is excellently
demonstrated by the fact that in practice we use only rationals: e.g. oating
point numbers in computer calculations practitioners just ignore the subtle
avour of irrationals and treat them as rationals represented in decimal system
by a sucient number of digits.
The situation is completely dierent with complex numbers. Contrary to
many other mathematical notions, they originated entirely within pure mathematics and even for mathematicians they seemed so strange that the word
imaginaire was attributed to them! No real world situation seemed to
demand complex numbers for its mathematical description. However already
Euler (and also dAlembert) observed that they were useful in solving problems in hydrodynamics and cartography [4]. Once domesticated by mathematicians, complex numbers slowly creeped into physical papers, though only
as an auxiliary and convenient tool when dealing with periodic solutions of
some mechanical systems (the spherical pendulum studied by Tissot [7]). Their
particular usefulness was discovered by Riemann for describing some form of
the potential eld [6] and when he studied Maxwell equations [9], but again
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 312, 2006.
Springer-Verlag Berlin Heidelberg 2006
Maciej Pindor
they played here a role of a shorthand notation for a simultaneous description of two dierent, though related, physical quantities. Even the advent of
the quantum mechanics did not change too much although the wave function was essentially complex and its real and imaginary part had no separate
existence, the values of the function had no physical meaning themselves. It
was its modulus that was interpretable and so physicists could think of its
complexness as of some mathematical trick however with some feeling of
uneasiness, this time.
As far as I know, the rst individuals that truly opened the complex plane
for physics were Kramers and Kronig (see [3]). They had the daredevil idea
that extending the domain of a function, having a well dened physical quantity as its argument the frequency in their case to the complex plane, can
lead to conclusions veriable experimentally. They have shown, moreover,
that properties of this function in the complex plane are connected to important physical conditions on another function. Their idea seemed a curiosity
and only 25 years later it was found useful and advantages of considering
energy on the complex plane were discovered. Even then physicists felt still
uneasy with this, and when few years later Tulio Regge proposed extending
the angular momentum to the complex plane his paper was rejected by many
referees [1].
In the following I shall briey review the original idea of Kramers and Kronig (following closely the exposition of [3]), the consequences of the extension
of the energy to the complex plane in the description of particle scattering
and the Regge idea.
D(x,
) = ()E(x,
)
(1)
where () is called the dielectric constant and is frequency dependent, because the response of the medium to the presence of E depends on frequency.
These frequency components are just Fourier transforms of the temporal dependence of the elds, e.g.
1
E(x, t) =
2
and vice versa
E(x,
)eit d
E(x,
) =
2
E(x, )ei t d .
Using now (1) and assuming that the functions considered vanish at innity
in time and frequency fast enough as to make exchange of order of integration
possible, we arrive at
+
D(x, t) = E(x, t) +
G( )E(x, t )d
(2)
1
2
[() 1]ei d .
(3)
G( )ei d .
(4)
Already at the very birth of the theoretical optics physicists used some simple models, classical ones because quantum mechanics was not yet born,
to describe the interaction between light and matter and these models lead
to expressions for () satisfying our requirement that G( ) 0 for < 0.
However, truly speaking, the phenomenon of polarization of atoms and molecules is a very complicated one and even now it is not easy to describe it in
all its details and it is not obvious how should one guarantee vanishing of the
predicted G( ) for negative arguments.
Kramers and Kronig observed that the most general conditions one should
impose on () to have causality satised, is just that it be of the form (4)
with some real G( ). Again, this form would not be so very interesting if not
their daring concept of considering () as a function of complex . Once
they did this, many interesting conclusions followed. The most fundamental
observation is that if G( ) is nite for all , () is an analytic function of
in the upper half plane.
Maciej Pindor
1
2i
f (t)
dt.
t
(5)
We can now take D as the upper half plane, innitesimally above the real
axis and C as on the Figure 1 and write (5) for f () = () 1. With the
condition that () 1 vanishes for large at least like 1/ 2 , which can
be justied by some physical arguments, we can take R and neglect
the integral over CR . With some more maneuvering we arrive at the famous
dispersion relations for the real and the imaginary parts of ().
Re () = 1 +
1
P
1
Im () = P
Im (t)
dt
t
[Re (t) 1]
dt.
t
(6)
The name comes from the fact that the dependence of on leads to the
phenomenon called dispersion the change of the shape of the light wave
penetrating a material medium. The real part of is directly related to this
phenomenon, while the imaginary part is connected with the absorption of
light. Therefore they can be both measured and, not unexpectedly, experimental data conrm the validity of (6).
On the other hand the Titchmarsh theorem [8] says that if a function F (z)
satises relations of the type (6) then its Fourier transform vanishes on the
real negative semiaxis. Thus, not only the physical condition of causality
leads to denite analytical properties of some function implying a relation
between its real and imaginary parts on the real axis that can be conrmed by
physical experiments, but also the experimentally veriable relation between
two physical quantities, when they are considered the real and imaginary parts
of an analytical function on the real axis, implies a property of the Fourier
transform of this function, the one having the meaning of causality!
Im
iR
CR
-R
C
R Re
Maciej Pindor
i = 1, ..., 4 .
pi = (Ei , pi )
i = 1, ..., 4
is conserved and so is the total energy. In the special reference system, called
the center of mass system (c.m.s.), the total momentum is zero, and therefore
the c.m.s. energy squared is equal to s = P 2 . Another four-vector important
in the description of the process is the momentum transfer q, together with
its square t
q = p1 p3 = p2 + p4 ;
t = q2 .
this context at that time, it appeared possible to show again that relativistic
causality (i.e. impossibility of any relation between events separated in such a
way that they could not be connected by signals traveling with a speed inferior
or equal to the velocity of light) implies some special analyticity properties
of the scattering amplitude in the complex plane of energy (see e.g. [2] and
references therein).
In fact, the fascinating connection between the physical requirement
causality and the abstract mathematical property analyticity has been
rigorously (almost) shown only for the forward scattering amplitude, i.e. at
t = 0. These analytical properties allowed then one, using the theorems from
complex variable functions theory, to write the dispersion relations for the
scattering amplitude of the type
A(s, 0) =
4m2
Im A(s , 0)ds
1
+
s s i
Im A(s , 0)ds
.
s s i
(7)
Here i means that the integration runs just above the real axis. This integral representation of A(s, 0) as a function of complex s means that this
function has the very nasty singularities (i.e. the points where it is not analytic) at s = 4m2 and s = 0 (and possibly s = ) called the branchpoints. They
are nasty, because they make the function multivalued if we walk along a
closed curve encircling such a point, then at the point from which we started
we nd a dierent value of the function. I cannot dwell on this horror (or,
to me, the fascinating property of the complex plane) here but can only say
that the multivalued function can be made univalued by removing, from the
complex plane, lines joining the branchpoints such lines are called the cuts.
Looking at (7) you see that A(s, 0) is not dened on (, 0) and (4m2 , )
these are the cuts. On the other hand the function has well dened limits
when s approaches these semiaxes from imaginary directions. The limit from
above for s (4m2 , ) is just the physical scattering amplitude because
these values of s correspond to physical scattering process. On the other hand
the limit from below for s (, 0) corresponds to the scattering amplitude for another process related to the one we consider, through the crossing
symmetry a property of the scattering amplitude suggested by the QFT.
Combining this property with unitarity loosely speaking the requirement
that the probability that anything can happen (in the context of the scattering, of course) is one, leads to conclusions that again could have been veried
experimentally. This was a great triumph, because earlier no quantitative predictions concerning phenomena connected with new types of interactions (new
with respect to electromagnetism) could have been given.
The great success of the simplest dispersion relations prompted many
theoreticians to study the analytical structure of the scattering amplitude
as suggested by the perturbation theory though the later produced divergent expansions. This analytical structure appeared to be very rich with many
branchpoints on the real axis (where the amplitude had a physical meaning)
with locations depending on masses of the scattered particles, and poles at
10
Maciej Pindor
energies of the bound states (if any) of these particles. Moreover, as mentioned
above, the crossing symmetry implied direct connections between values of
the scattering amplitude on some edges of dierent cuts. Causality implied
that the scattering amplitude is analytic on the whole plane of complex energy
properly cut along the real axis, but it was soon realized that there have to
exist poles on the unphysical sheets one of the fantastic properties of the
analytic functions is that they can undergo the analytic continuation. You will
learn more about it during the lectures to come, but here I shall describe it
as a feature which makes the function dened on its whole domain, once it is
dened on the smallest piece of it. The domain can mean also other copies
(called Riemannian sheets ) of the complex plane if there are branchpoints
reached when one continues function analytically across the cuts. In elementary particle physics, the sheet on which energy has the physical meaning,
is called the physical sheet. The ones reached through analytic continuation
of the amplitude across the cuts, are called the unphysical sheets. I want to
make clear this fundamental fact: the assumption of analyticity of the scattering amplitude as a function of complex energy means that its values on
sections of the real axis, where the values of energy correspond to the physical scattering process, dene the scattering amplitude on all its Riemannian
sheets. In particular for many types of scattering processes the amplitude had
to have poles on the rst unphysical sheet. These poles were the manifestations of resonances experimentally seen enhancements of the cross-section,
related in solvable models of scattering (e.g. nonrelativistic scattering described by the Schr
odinger equation) to short-living quasibound states of the
scattered particles and therefore also in relativistic description attributed to
an existence of short living non-stable particles.
Also using suggestions from the expansions of the scattering amplitude
obtained in the perturbation theory, the so called double dispersion relations
written both in the complex s and t planes were postulated and some
veriable and veried! conclusions followed from them.
Another astonishing concept was put forward by T. Regge [5]. He considered the, so called, partial waves expansion of the nonrelativistic scattering
amplitude A(q 2 , t)
A(q 2 , t) = f (q 2 , cos()) =
l=0
where Pl (z) are the Legendre polynomials. Al (q 2 ) are called the partial wave
amplitudes and describe the scattering at the given angular (orbital) momentum. The sum runs over integers only, because in quantum physics the angular
momentum is quantized, i.e. it can take on values only from the discreet
countable set. Regge had, however, an idea to consider the angular momentum
in the complex plane!
He studied the nonrelativistic scattering for a reasonable class of potentials (a superposition of Yukawa potentials) and was able to show that
11
i
2
dl(2l + 1)A(l, q 2 )
Pl ( cos())
sin(l)
where the contour C encircled the positive semiaxis clockwise (so it was, in
fact, the sum of small circles around all positive integers), he could deform
the contour C by moving its ends at i to 21 i. As the result he got
f (q 2 , cos()) =
i
2
12 +i
21 i
dl(2l + 1)A(l, q 2 )
Pl ( cos())
sin(l)
(2n (q 2 ) + 1)n (q 2 )
Pn (q2 ) ( cos())
sin(n (q 2 ))
n=1
where the sum runs over all poles (called since then the Regge poles) of A(l, q 2 )
in the half plane of the complex l, Re l > 12 .
The most exciting part came from the fact that for q 2 < 0, we call it below
threshold, all these poles lie on the real axis and correspond precisely to bound
states of the potential at energies (q 2 ) at which n (q 2 ) equals to an integer
being the angular momentum of the given bound state! When q 2 grows above
the threshold (becomes positive) n (q 2 ) move to the complex plane and when
at some qr the real part of it crosses an integer, the scattering amplitude has
a form
a
(q 2 qr2 )b + i Im n (qr2 )
characteristic of a resonance. This way bound states and resonances were
grouped into Regge trajectories originating from the same n (q 2 ).
It was then immediately conjectured that the relativistic scattering amplitude shows the same (or analogous) behaviour in the complex angular momentum plane. Though many actual resonances were grouped into Regge trajectories, other conclusions were not veried experimentally, what was attributed
to a hypothetical existence of branchpoints of the scattering amplitude in the
complex angular momentum plane. When such branchpoints were included
the theory lost its beautiful simplicity and its predictive power was considerably limited. Because of that, its attractivity paled and though it is still
considered that actually bound states and resonances form families lying on
Regge trajectories, no more much importance is attributed to this fact.
This amazing fact that elements of the analytical structure of the scattering amplitude, as a function of the complex energy and momentum transfer,
have direct physical meaning, induced some physicist to think that just the
proper analytical properties of the scattering amplitude compatible with the
12
Maciej Pindor
fundamental physical conditions (like the crossing symmetry or the unitarity) could form the correct set of assumptions to build a complete theory of
the phenomena concerning elementary particles. This point of view fell later
out of fashion in the view of the spectacular success of the developments of
the QFT which take now the shape of the Nonabelian Gauge Field Theory.
Nevertheless the lesson that functions describing the physical observations in
terms of the physically measurable parameters must be studied for complex
values of these parameters because the analytic properties of such functions
have direct relation to true physical phenomena underlying the observations,
is now deeply rooted in the thinking of physicists.
References
1.
2.
3.
4.
5.
6.
7.
8.
9.
16
Bernard Candelpergher
with respect to x and y, satisfying certain equations known as the CauchyRiemann equations.
Indeed, let us consider the functions
: (x, y) Re f (x + iy)
: (x, y) Im f (x + iy).
It is easy to check that the dierentiability of f with respect to z implies
that the functions and are dierentiable with respect to x and y, and
that
f (x + iy) = x (x, y) + ix (x, y)
1
= (y (x, y) + iy (x, y))
i
and hence the partial derivatives satisfy the Cauchy-Riemann equations:
x = y ,
y = x .
The properties of holomorphic functions on an open subset U of C are therefore much more striking than those of functions of a real variable. In particular
a function that is holomorphic on U \ {a} and with a nite limit at a is holomorphic on U (this is the Riemann theorem).
1.2 Integrals
Let f be an holomorphic function on an open subset U of C and a path in
U (so is a piecewise continuously dierentiable function on an interval [a, b]
with values in U ; if (a) = (b), we say that is a closed path). We write
f (z)dz =
b
a
f ((t)) (t)dt.
A natural question is to see how this integral depends on the path , and in
particular, what happens if we deform the path continuously, while remaining in U . It is the concept of homotopy that allows us to make this precise,
saying that two paths 0 and 1 with the same endpoints (or two closed paths), are homotopic in U if there exists a family s of intermediate paths (resp.
of closed paths) between 0 and 1 , having the same endpoints as 0 and 1 ,
which depend continuously on the parameter s [0, 1].
The homotopy theorem
If f is holomorphic in U , and if 1 and 2 are two paths with the same
endpoints, or else two closed paths, which are homotopic in U , then
f (z)dz =
17
f (z)dz.
Since the integral along a closed path consisting of a single point z0 (i.e.,
the closed path t z0 for all t) is zero, it follows from the homotopy theorem
that if f is holomorphic in U and if we can continuously contract a closed
path down to a point z0 in U while remaining all the time in U , then we
have
f (z)dz = 0.
Connected open sets U (i.e., ones consisting of a single piece) for which
every closed path in U is homotopic in U to a single point in U are called
simply connected.
We deduce from the above that if f is holomorphic on a simply connected
open set U and z0 is a point of U, then for every closed path in U we have
f (z) f (z0 )
dz = 0.
z z0
Since we have
C(z0 ,r)
1
dz = 2i,
z z0
with C(z0 , r)(t) = z0 + r exp(it), t [0, 2], the circle of center 0 and radius r,
then if f is holomorphic on a simply connected open set U and C(z0 , r) U,
we have Cauchys formula
f (z0 ) =
1
2i
C(z0 ,r)
f (z)
dz.
z z0
n!
2i
C(z0 ,r)
f (z)
dz.
(z z0 )n+1
1
2i
C(z0 ,r)
and expanding
f (u)
1
du =
(u z0 ) (z z0 )
2i
C(z0 ,r)
f (u)
1
du
u z0 1 (zz0 )
(uz0 )
18
Bernard Candelpergher
1
1
(zz0 )
(uz0 )
=
n0
(z z0 )n
(u z0 )n
f (z) =
n f (z0 )
(z z0 )n
n!
n=0
19
g(t)dt < +
and
|f (t, z)| g(t)
for all z U , then the function
z
b
a
f (t, z)dt
is analytic on U .
20
Bernard Candelpergher
f (z) =
cm
c1
+ ... +
cn (z a)n for every z D(a, r) \ {a}.
+
(z a)m
(z a) n=0
This is called the Laurent expansion of f about a, and the singular part
cm
c1
+ ... +
(z a)m
(z a)
is called the principal part of f at a.
If a is an essential singularity of f , then the expansion above becomes
+
n
n= cn (z a) with an innite number of non-zero cn such that n < 0.
2.4 Residue theorem
Let U be an open set, a U and f an analytic function in U \ {a}. The
coecient c1 of the Laurent expansion of f about a is called the residue of
f at a, denoted Res(f, a). This number is all that is needed to calculate the
integral of f around a small closed path winding round a.
More precisely, for every closed path homotopic in U \ {a} to a circle
centred at a we have
Res(f, ai ),
f (z)dz = 2i
i=1
where is a closed path in U \ {a1 , a2 , . . . , an } such that for every i the curve
is homotopic in U \ {ai } to a circle centre ai .
21
z
1
1
du
u
0+
Thus the point 0 is a singularity of log, but not an isolated singularity since
log cannot be continued analytically to a disc centred at 0. The point 0 is a
singularity of log called a branch point.
Let U be a connected open set; then we call any analytic function log on
U satisfying elog(z) = z for all z U a branch of the logarithm in U .
We call a continuous function on a connected open set U a branch of the
argument if for each z U we have z = |z|ei(z) .
Every branch of the logarithm in U can be written
log(z) = ln(|z|) + i(z),
where is a branch of the argument in U . Conversely, each branch of the
argument allows us to dene a branch of the logarithm, by the above formula.
For example we dene a branch of the logarithm on C \ [0, +[ by
Log(z) = ln |z| + i Arg(z)
where Arg is the continuous function on C \ [0, +[ with values in ]0, +2[,
such that z = |z|ei Arg(z) for each z C \ [0, +[.
22
Bernard Candelpergher
C
for all n}.
rn
f (z) =
lim
N +
an z n .
n0
n0
an z n by multiplying
an n
.
n!
Since |an | is bounded by C/rn with 0 < r < R, it is easy to see that this series
has an innite radius of convergence and denes an analytic function B(f ) on
the whole of C.
23
et
an
(zt)n dt = an z n .
n!
However, for each z in D(0, R) there exists r such that 0 < |z| < r < R, so
that
+
0
et
n0
|an z n | n
t dt
n!
et Cet|z|/r dt < +.
et
n0
an z n n
t dt =
n!
+
n0
et
an z n n
t dt,
n!
an
(zt)n dt =
n!
et
n0
an z n .
n0
+
0
et B(f )(zt)dt.
L(h)(z) =
ez h()d,
1
1
L(B(f ))( ).
z
z
1
z
24
Bernard Candelpergher
et B(f )(zt)dt =
+
0
z0
=( )
z
et B(f )(z0
+
0
z
t)dt
z0
and since this last integral converges for z0 /z = 1, it does so for z0 /z > 1,
i.e., for in the segment [0, z0 ] and even for those z with Re(z0 /z) > 1, by the
following lemma:
Lemma 1 (Classical lemma). If a is a locally integrable function on [0, +[
+
+
such that 0 et a(t)dt converges, then 0 est a(t)dt converges for every
s such that Re(s) > 1, and the integral denes an analytic function of s in
this half-plane.
Let us consider the function
z
z0
z
+
0
this function is analytic in the open set consisting of all z such that
Re(
z0
) > 1,
z
25
D(0, R); by the uniqueness theorem it therefore equals f on D(z0 /2, |z0 |/2)
D(0, R), and so we obtain an analytic continuation of f .
Consider the open set
E(f ) = {z0 Star(f ), there exists > 0 such that D(
z0 |z0 |
,
+ ) Star(f )};
2 2
we shall show that the function z 0 et B(f )(zt)dt is dened and analytic
in this open set, and therefore provides an analytic continuation of f into E(f ).
This is a consequence of the preceding discussion together with the following
lemma:
+ t
e B(f )(zt)dt
0
converges and
Proof of the Lemma. Take z in E(f ); we deform the contour C(z/2, |z|/2)
to a slightly bigger contour C surrounding 0 such that if C then we have
Re(z/) < 1.
By Cauchys formula we have
f (z) =
1
2i
f ()
d,
z
+
0
et etz/ dt.
f ()
C
+
1
et (
=
2i
0
1
2i
f (z) =
et etz/ dt
f () zt/
e
d)dt.
=
n0
=
n0
f ()
n0
1 z n
( t) d
n!
1
1
(zt)n
n!
2i
C(0,r)
f ()
d
n+1
an
(zt)n
n!
= B(f )(zt).
We deduce that the integral
+ t
e B(f )(zt)dt
0
26
Bernard Candelpergher
(1 + n) =
et tn dt .
an
n.
(1 + n)
This series has innite radius of convergence, and denes an analytic function B (f ) on the whole complex plane C.
To recover f from B (f ) we use the fact that
+
0
et
an z n
tn dt = an z n ,
(1 + n)
f (z) =
et B (f )(zt )dt.
et B (f )(zt )dt =
=(
et B (f )(z0
z0 1/
)
z
+
0
z
t )dt
z0
e(z0 /z)
1/
B (f )(z0 u )du,
and since this last integral converges for (z0 /z)1/ = 1, it does so also for
(z0 /z)1/ > 1 and even for Re(z0 /z)1/ > 1.
The function
z(
z0 1/
)
z
+
0
e(
z0 1/
u
z )
B (f )(z0 u )du
27
This is a rather thin convex open set, whose boundary C (z0 ) has the following
equation in polar coordinates:
0
)
< 0 < .
2
2
=
0 (cos
+
0
+ t
e B (f )(zt )dt
0
et B (f )(zt )dt
is an analytic continuation of f.
Since
E (f ),
Star(f ) =
0<1
4 Gevrey series
4.1 Denitions
n
If the radius of convergence of
is zero, then the power series
n0 an z
n
F = n0 an z cannot dene an analytic function f by the formula
N
f (z) =
lim
N +
an z n ,
n0
28
Bernard Candelpergher
n0
|f (z)
n0
n f (0)
,
n!
|f (z)
n0
the above holding for all N 0, with constants C > 0 and B > 0 independent
of z S.
This condition implies that
f (z)
N 1
n0
zN
an z n
aN 0 when z 0 in S,
29
|an | CB n n! .
We say in this case that the series n0 an z n is Gevrey (or Gevrey of order 1).
We shall write this condition of Gevrey asymptoticity in S in the form
an z n in S.
f (z)
n0
|f (z)
an z n | CB N N ! |z|N
n0
A|z|1/2 e
1
B|z|
with A > 0.
We therefore have an exponentially small remainder (when z 0 in S) if
we take the sum as far as N0 (this justies the method of summation up to
the smallest term, or the astronomers method).
Note that this implies that if we have
0z n in S,
f (z)
n0
n0
0z n
f (z)
n0
30
Bernard Candelpergher
f (z)
n0
an n
n!
for all in the disc D(0, 1/B), and denes an analytic function in this disc.
For 0 < < 1/B, we can dene an analytic function
f (z) =
1
z
f (z)
n0
5 Borel summability
n
be a power series satisfying the Gevrey condition:
Let F =
n0 an z
|an | CB n n! for all n. The function
B(F )() =
n0
an n
n!
is dened and analytic in the disc D(0, 1/B). If we want to avoid the arbitrary
choice of as above, we try to dene the function
1
f (z) =
z
31
To guarantee the existence of the integral we shall suppose that the function
B(F ) is continued analytically in a sector S = {z = rei with < < +},
to give a function of at most exponential growth at innity in this sector, i.e.,
|B(F )()| AeB|| .
In this case we say that the series n0 an z n is Borel-summable in the direction = 0. The function f thereby dened is analytic in the domain
{z | Re(1/z) > B}, which is just the disc
D = D(
1 1
1
,
) = {z = rei | r < cos()},
2B 2B
B
1
f (z) =
z
1
cos( )},
B
+ei
0
1
lim
z R+
+
0
|f (z)
n0
The function f dened this way in S is then the only analytic function in S
such that
an z n in S.
f (z)
n0
32
Bernard Candelpergher
We call this the Borel sum of the formal series F = n0 an z n , and we write
it f = s(F ).
In the same way we can dene the notion of Borel-summability in the
direction = 0, and we write
+ei
1
f (z) =
z
The function f dened this way in ei S is the the only analytic function in
ei S such that
an z n in ei S,
f (z)
n0
and we call it the Borel sum in the direction of the formal series F =
n
n0 an z , we note it s (F ).
n
Remark. If the series F =
has radius of convergence R > 0,
n0 an z
then it is Borel-summable in every direction and the Borel sums s (F ) give
the analytic continuation of the function f : z n0 an z n to an open set
containing D(0, R).
Properties of s
a) s is linear:
s (F + G) = s (F ) + s (G),
s (c F ) = c s (F ) if c C,
since J, L and B are linear.
b) s commutes with dierentiation = d/dz:
s (F ) = s (F ).
c) s is a morphism:
s (F G) = s (F )s (G)
(where the product F G denotes the usual product of formal series).
5.1 Connection with the usual Laplace transform
The integral formula
s (F )(z) =
1
z
+ei
0
n0
an z n , can be expressed
33
1
1
L (B(F ))( )
z
z
where
L (g)(z) =
+ei
0
ez g()d.
n0
We then have
s = J L B.
The behaviour at 0 of s (F ) is then linked to the behaviour at of L (B(F )).
The asymptoticity condition at 0:
an z n in S ,
f (z)
n0
< < + + },
2
2
translates into the asymptoticity condition at :
1
an n+1 in S, ,
L (B(F ))(z)
z
S = {z = rei | r < R and
n0
n0
L (B(F ))(z) =
< < + + }.
2
2
ez B(F )()d
< < + + },
2
2
and satises
L (B(F ))(z)
an
n0
1
in S, .
z n+1
34
Bernard Candelpergher
1
( ) Log( ) + ( ),
2i
+ei
ez ( )d
L ()(z).
35
s (F ) = s+ (F + e/z S F ).
This formula can be generalized to other singularities than logarithmic
ones; it is the basis of the denition of alien derivations due to J. Ecalle. Let
us show that S is a derivation, i.e., that it satises
S (F G) = (S F )G + F (S G).
If F and G are two formal series as above, such that we have
s (F ) = s+ (F + e/z S F ),
s (G) = s+ (G + e/z S G).
Using the fact that s and s+ are morphisms, we deduce that
s (F G) = s+ ((F + e/z S F )(G + e/z S G))
= s+ (F G + e/z (S F )G + e/z F (S G) + e2/z (S F )(S G)).
We see that the product of two formal power series F and G such that
B(F ) and B(G) have singularities at , can have one at , but the exponential
e2/z show us that we can also have a singularity at 2.
On the other hand, we have as above
s (F G) = s+ (F G + e/z S (F G) + e2/z S2 (F G))
where S2 (F G) represents the singularity of B(F G) at 2.
Equating the coecients of the exponential, we obtain
S (F G) = (S F )G + F (S G),
or, in other words, the mapping S is a derivation; it is also written .
More generally, in order to take arbitrary products of power series, it is
therefore necessary to allow B(F ) the possibility of singularities at the points
n, n = 1, 2, . . . The ambiguity in summation is then described by all the
Sn , since
s (F ) = s+ (F + e/z S F + e2/z S2 F + . . . ).
The mappings Sn are dened as above, but for n 2 they are not derivations; for example, we have
S2 (F G) = (S2 (F ))G + F (S2 (G)) + S (F )S (G).
We can construct derivations n by suitable combination of the Sk . To
nd this combination, we use the mapping
36
Bernard Candelpergher
(1)n1
(S())n .
n
1
(s+ (F ) + s (F )),
2
37
we do obtain a real sum for real z, although it does not necessarily have the
property
s(F G) = s(F )s(G).
In order to obtain a real sum with this property, we introduce the operator C
dened by
C(F )(z) = F (z).
We have C(F ) = F if F is a series n0 an z n where the an are real; in this
case we wish to determine a sum f of F such that
C(f ) = f.
We may see from the explicit formula for Borel summation that s+ C = Cs .
Since
C 2 = I and s1
+ s = e ,
we deduce that
Cs+ e/2 = s+ e/2 C.
This implies that
Cs+ e/2 (F ) = s+ e/2 (F ).
In other words, the function
s(F ) = s+ e/2 (F )
has the property that s(F )(x) is real if x is real, and
s(F G) = s(F )s(G).
6 Acknowledgments
I am indebted for the quality of the English version of this paper to Jonathan Partington from Leeds University who kindly agreed to translate it. My
warmest thanks to him.
References
1. E. Borel, Lecons sur les series divergentes, Gabay, 1988.
2. B. Candelpergher, Une introduction `
a la resurgence, La Gazette des Mathematiciens, SMF, 42, 1989.
3. J. Ecalle, Les fonctions resurgentes, Publ. Math., Orsay, 1985.
4. B. Malgrange, Sommation des series divergentes. Expo. Math. 13:163-222,
1995.
5. G. Sansone, J. Gerretsen, Lectures on the theory of functions of a complex
variable, Noordho, Groningen, 1960.
a0
+
2
ak cos
k=1
2kt
2kt
+ bk sin
T
T
where
ak =
2
T
T
0
f (t) cos
2kt
dt
T
and
bk =
2
T
T
0
f (t) sin
2kt
dt,
T
f (t)
2
(1)n+1 sin nt
n
n=1
(the cosine terms vanish). The Fourier series converges to the function except
at odd multiples of , where it is discontinuous.
We have used the symbol rather than = above, since, even for
continuous functions, the Fourier series need not converge pointwise. However,
if f is C 1 (has a continuous derivative), then in fact there is no problem and
the series converges absolutely. For all continuous functions the partial sums
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 3955, 2006.
Springer-Verlag Berlin Heidelberg 2006
40
Jonathan R. Partington
a0
+
sn (f )(t) =
2
ak cos
k=1
2kt
2kt
+ bk sin
T
T
|f (t) sn (f )(t)|2 dt 0
as
n ,
and there are other famous results in the literature, such as Fejers theorem,
which asserts that the Ces`aro averages
m (f ) =
1
(s0 (f ) + . . . + sm (f ))
m+1
f (t)
ck e2ikt/T ,
k=
where
ck =
1
T
T
0
f (t)e2ikt/T dt
are the complex Fourier coecients of f , and often written ck = f(k). Indeed,
the real and complex coecients are related by the identities
ak = f(k) + f(k)
and
bk = i(f(k) f(k)).
sn (f ) =
ck e2ikt/T ,
k=n
as is easily veried.
Underlying all this theory is an inner-product structure, and the basic
orthogonality relation
1
T
T
0
e2ijt/T e2ikt/T dt =
1 if j = k,
0 otherwise,
41
T
0
|f (t)|2 dt =
|f(k)|2 .
k=
This expresses the idea that the energy in a signal is the sum of the energies
in each mode.
Fourier series can be used to study the vibrating string (wave equation),
as well as the heat equation, which was Fouriers original motivation. We
illustrate this by an example.
The temperature in a rod of length with ends held at zero temperature
is governed by the heat equation
2y
1 y
,
= 2
2
x
K t
with boundary conditions y(0, t) = y(, t) = 0. Suppose an initial temperature
distribution y(x, 0) = F (x).
We look for solutions y(x, t) = f (x)g(t), so that
f (x)g(t) = f (x)g (t)/K 2 ,
or
f (x)
1 g (t)
=C= 2
.
f (x)
K g(t)
It turns out we should take f (x) = sin nx (times a constant), and C = n2 ,
in which case
g (t) + K 2 n2 g(t) = 0.
Thus one solution is
y(x, t) = f (x)g(t),
with
f (x) = sin nx
and
g(t) = an eK
n2 t
y(x, 0) = F (x) =
an sin nx.
n=1
y(x, t) =
n=1
an sin nx eK
n2 t
42
Jonathan R. Partington
f(w) =
f (t)eiwt dt.
This is a function of w, which is sometimes interpreted as denoting frequency, while the variable t denotes time. WARNING: one can nd various
alternative expressions in the literature, for example
1
f (t)eiwt dt
or
f (t)e2iwt dt.
Each has its advantages and disadvantages, so we have had to make a choice.
On another day we might prefer a dierent one.
Here is an important example. If
f (x) = ex
/2
then
f(w) =
2ew
/2
that is, the Gaussian function is (up to a constant) the same as its Fourier
transform.
In the same way that one can reconstruct a function from its Fourier series,
it is possible to get back from the Fourier transform to the original function.
Accordingly, dene the inverse Fourier transform by
g(t) =
1
2
g(w)eiwt dw =
1
g(t).
2
(1)
|f (t)| dt <
and
|f(w)| dw < ,
1
2
f(w)eiwt dw.
(2)
43
|f (t)|2 dt <
(more concisely: if f and f lie in L1 (R) and f also lies in L2 (R)), then
|f (t)|2 dt =
1
2
|f(w)|2 dw.
Rn
f (x)eiw.x dx,
1
2
n
Rn
f(w)eiw.x dw,
f=
k=1
2f
x2k
has transform equal to w 2 f(w); we shall not go into further details here.
1.3 Harmonic and analytic functions
For simplicity, let us consider 2-periodic functions f . These correspond to
functions g dened on the unit circle
T = {z C : |z| = 1}
in the complex plane, by setting g(eit ) = f (t). Conversely, any function g :
T C gives a 2-periodic function f by the same formula.
44
Jonathan R. Partington
Note that the formula for the Fourier coecients can be written
g(k) =
1
2
2
0
g(eit )eikt dt =
1
2i
g(z)
dz,
z k+1
(3)
where the last integral is a contour integral round the unit circle.
Suppose (for simplicity) that g is continuous. Then it has a harmonic
extension to the unit disc
D = {z C : |z| < 1},
namely
g(rei ) =
g(k)r|k| eik ,
k=
g(k)rk eik ,
k=0
or
g(z) =
g(k)z k ,
k=0
|
g (k)| Ma a ,
and
(4)
45
for k 0, where Mr denotes the maximum value of |g| on the circle of radius r.
If now g has an isolated simple pole at a point z0 with A < |z0 | < 1, with
residue c, but is otherwise analytic in the annulus A, then the identity
c
=c
z z0
k=1
z0k1
,
zk
k=0
zk
z0k+1
1
2
2
0
g(eit )eikt dt =
1
2i
g(z)
dz.
z k+1
In order to compute Fourier transforms numerically from data, a natural approximation to the above integral is obtained by discretising. Let us take N
equally-spaced points: to do this set = e2i/N and consider the expression
gN (k) =
1
N
N 1
g( j ) jk .
j=0
gN (eit ) =
gN (k)eikt ,
k=N/2
The Fast Fourier Transform (FFT) was introduced by Cooley and Tukey
as a numerical algorithm for computing the discrete Fourier coecients of g
46
Jonathan R. Partington
gw (e ) =
gN (k)wk eikt ,
k=
where (wk ) is a sequence of weights, of which usually only nitely many are
non-zero. For example, for 0 m < N we may take the sequence
m + 1 |k|
for|k| m,
wk =
m+1
0
otherwise,
in which case the corresponding functions gw form a sequence of trigonometric
polynomials known as the Jackson polynomials, Jm,N (g). These have many
attractive properties, in particular they converge uniformly to the original
function g as N , for any sequence of m = m(N ) remaining less than N
but also tending to innity. They are also robust, in the sense that small measurement errors or perturbations lead to small errors in the polynomials. For
rather more rapid convergence, one may use the discrete de la Vallee Poussin
polynomials, Vm,N (g), dened for N 3m using the following window:
1
for |k| m,
2m |k|
wk =
for m |k| 2m,
0
otherwise.
These have been used in various interpolation and approximation schemes, for
example in the identication of linear systems from noisy frequency-domain
data.
47
and
and
Similarly, derivatives transform in a simple fashion: if f is an L1 (R) function with a continuous derivative, such that R |f | < , then
(f )(w) = iwf(w).
In particular, there is a constant C > 0 such that |f(w)| C/|w|. This argument can be repeated with higher derivatives, and we obtain the slogan: the
smoother the function, the faster its Fourier transform decays. A similar phenomenon holds for Fourier series of periodic functions: for smooth functions
the Fourier coecients tend rapidly to zero.
By means of the inversion theorem, we can argue in the other direction
too: if f is smooth, then this corresponds to rapid decay of f at .
In many applications, it is convenient to work with smooth functions of
rapid decay. Thus we dene the Schwartz class, S, to be the class of all innitely dierentiable functions f : R C such that every derivative is rapidly
decreasing: thus, for all n, k, there is Cn,k > 0 such that
|f (n) (t)|
Cn,k
(1 + |t|)k
for all t R. A simple example is exp(at2 ) with a > 0, but one can even
nd such functions with compact support (so-called bump functions).
Now if R |tk f (t)| dt < , it follows that f is dierentiable k times, and
(f)(k) (w) =
This can be used to show that the Fourier transform is a linear bijection from
S onto itself.
Suppose now that f is smooth apart from jump discontinuities of the
function and its derivatives at the origin, so that we may dene
k = lim f (k) (t) lim f (k) (t)
t0+
t0
48
Jonathan R. Partington
f(w)
k=0
k
(iw)k+1
as |w| .
Moreover, since the Fourier transform and inverse Fourier transform are
related by (1), we may similarly conclude that jumps k in f and its derivatives
at the origin are reected in an asymptotic expansion
f (t)
1
2
k=0
k
(it)k+1
as |t| .
Finally, the expansions corresponding to jumps occurring at other points
on the real line may be derived by a straightforward change of variables.
We now consider the case when f has an analytic extension to a horizontal
band B = {A < Im z < B}, where A < 0 and B > 0. Then certain estimates hold for the Fourier transform, which are analogous to those obtained for
Fourier coecients in (4). If we take 0 < b < B and suppose that f is absolutely integrable on the line {Im z = b}, tending to zero uniformly in B as
Re z , then we can move the contour of integration, and obtain the
estimate
|f(w)|
2iceiwz0
0
if w < 0,
if w > 0.
0
2iceiwz0
if w < 0,
if w > 0.
49
The extensions to poles in the lower half-plane, to a nite number of poles, and
to poles of multiple order, are very similar and we omit them. As before, we
may exchange the roles of f and f, using the identity (1), so that singularities
in f are reected in the asymptotic behaviour of f .
4 Wieners theorems
Suppose that a 2-periodic function f has an absolutely convergent Fourier
series, that is
f(k)eikt ,
f (t) =
k=
f(k)eikt
f (t) =
and
g(t) =
k=
g(k)eikt ,
k=
then
ck eikt ,
f (t)g(t) =
k=
where
ck =
f(j)
g (k j),
j=
and so
|f(j)| |
g (k j)|
|ck |
k= j=
k=
|f(j)| |
g (l)| < ,
=
j= l=
50
Jonathan R. Partington
f (z) =
ak z k
k=0
f (x y)g(y) dy,
and
(f g)(w) = f(w)
g (w).
f (x) dx,
g(x) dx.
51
f (t)est dt,
f (t) = lim
b+iy
biy
F (s)est ds,
est f (t) dt
= [est f (t)]
t=0 + s
est f (t) dt
52
Jonathan R. Partington
f (t T )
0
if t T,
if t < T.
Then
(Lg)(s) =
=
=
est g(t) dt
est f (t T ) dt
es(x+T ) f (x) dx = esT (Lf )(s).
:=
sup
x>0
|F (x + iy)|2 dy
1/2
< ,
F 2 = 2 f 2 .
This is the basis of various approaches to control theory and approximation
theory. One consequence is that if we have an input/output relation
Y (s) = G(s)U (s)
as in our examples, then we can decide whether L2 inputs (nite energy)
guarantee L2 outputs. The answer is that it is necessary and sucient that
53
G(s) be analytic and bounded in C+ (i.e., lie in the Hardy class H (C+ )).
For example, if
y (t) + ay(t 1) = u(t),
then we have this form of stability precisely when
1
H (C+ ),
s + aes
5.2 Mellin
Note that the Laplace transform is closely related to the Fourier transform
(put s = iw), if we consider only functions which are 0 on the negative real
axis. A still closer analogue is the bilateral Laplace transform, where we dene
G(s) =
f (t)est dt = f(is),
F (s) =
xs1 ex dx = (s),
1
lim
2i y
a+iy
aiy
F (s)xs ds,
54
Jonathan R. Partington
u(r, 1) =
1 for 0 r 1,
0 otherwise.
(This might represent the heat ow in a piece of cake, heated on one side
only.) Then it is easily veried that
U (s, ) =
1 sin s
,
s sin s
u(r, ) =
1
2i
a+i
ai
rs sin s
ds,
s sin s
(1)n rn
sin n.
n
n=1
55
References
1. Y. Katznelson, An introduction to harmonic analysis. Dover Publications, Inc.,
New York, 1976.
rner, Fourier analysis. Cambridge University Press, Cambridge, 1988.
2. T.W. Ko
3. M. Levitin, Fourier Tauberian theorems. Appendix in Yu. Safarov and D. Vassiliev, The asymptotic distribution of eigenvalues of partial dierential operators.
AMS Series Translations of Mathematical Monographs 155, AMS, Providence,
R. I., 1997.
4. J.R. Partington, Interpolation, identication, and sampling. The Clarendon
Press, Oxford University Press, 1997.
5. N. Wiener, The Fourier integral and certain of its applications. Reprint of the
1933 edition. Cambridge University Press, Cambridge, 1988.
6. W.E. Williams, Partial dierential equations. The Clarendon Press, Oxford
University Press, 1980.
7. A. Zygmund, Trigonometric series. Vol. I, II. Third edition. Cambridge University Press, Cambridge, 2002.
Pad
e Approximants
Maciej Pindor
Instytut Fizyki Teoretycznej,
Uniwersytet Warszawski ul.Hoza 69,
00-681 Warszawa, Poland.
1 Introduction
The frequent situation one encounters in applied science is the following: the
information we need is contained in values, or some features of the analytical structure, of some function of which we have a knowledge only in the
form of its power expansion in a vicinity of some point. Favourably, it is the
Taylor expansion with some nite radius of convergence, but it may also be
an asymptotic expansion. Let us concentrate on the rst case, some remarks
concerning the second one will be given later, if time allows.
If the information we need concerns points within the circle of convergence
of the Taylor series, then the problem is (almost) trivial. If it concerns points
outside the circle, then the problem becomes that of analytic continuation.
Unfortunately, the method of direct rearrangements of the series, used in
theoretical considerations on the analytic continuation, is practically useless
here. The method of the practical analytic continuation which I shall discuss
is called the method of Pade Approximation. There exist ample monographs
on Pade Approximants [6], [2], [3] and my purpose here is to present you a
subjective glimpse of the subject.
Actually, the method is based on the very direct idea of using rational
functions instead of polynomials to approximate the function of interest. They
are practically as easy to calculate as polynomials, but when we recall that the
truncated Laurent expansions is just a rational function, we can expect that
they could provide reasonable approximations of functions also in a vicinity of
the poles of the latter, not only in circles of analyticity. Therefore the concept,
born already in XIXth century, was to substitute partial sums of the Taylor
series, by rational functions having the corresponding partial sums of their
own Taylor series identical to that former one. To formulate it precisely, let
us assume we have a function f (z) with its Taylor expansion
f (z) =
i=0
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 5969, 2006.
Springer-Verlag Berlin Heidelberg 2006
(1)
60
Maciej Pindor
Having the partial sum of the above series up to the power M , we seek a
rational function rm,n (z) which will have rst M + 1 terms of its Taylor
expansion identical to that of f (z) what I shall represent by
rm,n (z) f (z) = O(z M +1 ) .
(2)
e Table
2 The Pad
Let us, however, discuss rst the problem of existence of rational functions
dened by (2). If we denote the numerator of rm,n (z) by Pm (z) and its denominator by Qn (z) and rm,n (z) by [m/n]f (z) then (2) becomes
[m/n]f (z) f (z) =
Pm (z)
f (z) = O(z m+n+1 ) .
Qn (z)
(3)
Let me make here an obvious remark that Pm depends also on n and Qn de[m/n]
, but for hygienic reasons
pends on m and they should be denoted, e.g., Pm
I shall almost everywhere skip this additional index. Finding coecients of
Pm and Qn by the expansion of [m/n]f (z) and then comparing the two series,
would be a horror, but the problem can immediately be reduced to the linear
one:
Pm (z) Qn (z)f (z) = O(z m+n+1 ) .
(4)
fm+1 fm fmn+2 fmn+1
q0
fm+2 fm+1 fmn+3 fmn+2 q1
= 0
(5)
qn
fm+n fm+n1 fm+1
fm
Pade Approximants
61
and
p0 = f 0 q 0
p1 = f 1 q 0 + f 0 q 1
p2 = f 2 q 0 + f 1 q 1 + f 0 q 2
(6)
min(m,n)
pm =
fmi qi .
i=0
log(1 + z 2 )
z2
z4
z6
z8
=
1
+
+
z2
2
3
4
5
1+
1+
z2
6
2z 2
3
=1
z4
2z 6
z2
+
+
2
3
9
Obviously, [2/2]f is simultaneously [3/2]f and [2/3]f because its Taylor series
matches that of f (z) up to z 5 . On the other hand, there is no rational function
62
Maciej Pindor
[k/l]
[k/l+1]
[k/l+2]
............
[k/l+j-1]
[k+1/l+j-1]
[k+1/l]
............... .............
............
[k+2/l]
............
..............
...........
.........
.............
.............
[k+j-1/l]
[k+j-1/l+1]
[k/l+j]
[k+j/l]
Fig. 1. A block of the size j + 1 in the Pade Table. All Pade Approximants on
the positions indicated by their symbols, or by dots, exist and are identical to [k/l],
therefore they are rational functions of degrees k and l in the numerator and the
denominator, however they fulll equation (3) with m and n corresponding to their
positions in the Pade Table.
1+
z2
4
1+
3z 2
4
z4
24
=1
z2
z4
z6
3z 8
+
+
+
2
3
4
16
In this case the whole Pade Table consists of blocks of the size 2.
The second situation appears typically when f (z) is a rational function
itself. In this case there is one innite block with the left upper corner at
the entry corresponding to the exact degrees of the numerator and the denominator of this function. Obviously all the Pade Approximants with degrees
of numerators and denominators larger or equal to these of the function, are
equal to this function, because it matches it own Taylor expansion to any
order!
Pade Approximants
63
3 Convergence
Rational functions are meromorphic, and therefore the rst speculation that
comes to the mind (at least mine) is that Pade approximants should be well
suited to approximate just the former ones.
This speculation appears to absolutely correct, because there holds the de
Montessus theorem ([3] p. 246):
Theorem 1 (de Montessus, 1902). Let f (z) be a function meromorphic in
the disk |z| < R with m poles at distinct points z1 , z2 , ..., zm with
0 < |z1 | |z2 | |zm | < R .
m
Let the pole at zk have multiplicity k and let the total multiplicity
k = M
k=1
precisely. Then
f (z) = lim [L/M ]
L
64
Maciej Pindor
Theorem 3. Let f (z) be analytic at the origin and also in a given disk |z| R
except for m poles counting multiplicity. Consider a row of Pade table [L/M ]
of f (z) with M xed, M m, and L . Suppose that arbitrarily small,
positive and are given. Then L0 exists such that |f (z) [L/M ]| < for
any L > L0 and for all |z| R except for z EL where EL is a set of points
in the z-plane of measure less than .
This type of convergence is known as the convergence in measure and seems to
be used in this context rst by Nuttal [7]. It means that we cannot guarantee
convergence at any given point in the z-plane, but it assures us that the area
where our Pade approximants do not approximate f (z) arbitrarily well can
be made as small as we wish.
It is important to understand that the theorem says nothing about where
this set EL is, and the practice shows that undesired poles are accompanied by
undesired zeros and form so called defects which spoil convergence in smaller
and smaller neighborhoods, but shift unpredictably from order to order.
But what about functions with more rich analytical structure essential
singularities and branch points?
The amazing (at least for me) fact is that if we are content with convergence in measure (or even stronger convergence in capacity) also such functions can be approximated by Pade approximants, if we consider sequences
with growing degrees of the numerator and of the denominator. The fundamental theorem on convergence of Pade approximants for functions with
essential singularities is due to Pommerenke [10]
Theorem 4 (Pommerenke, 1973). Let f (z) be a function which is analytic
at the origin and analytic in the entire z-plane except for a countable number
of isolated poles and essential singularities. Suppose > 0 and > 0 are given.
Then M0 exists such that any [L/M ] Pade approximant of the ray sequence
(L/M = ; = 0, = ) satises
|f (z) [L/M ]f (z)| <
for any M M0 , on any compact set of the z-plane except for a set EL of
capacity less than .
As you see, the essential notion here is that of capacity. It is also known
as Chebishev constant, or transnite diameter. I do not have time here to
dene it, as it is a dicult concept concerning geometry of the complex plane.
Anyway to understand practical implications of the theorem above and the
ones to follow, it is sucient to know that the capacity is a function on sets
in the complex plane such that it vanishes for countable sets of points, but
is dierent from zero on line segments, e.g. for a section of a straight line it
equals to one fourth of its length. For a circle it is the same as for the disk
inside the circle and equals to their radius. Actually it is proportional to the
electrostatic capacity in the plane electrostatics.
Pade Approximants
65
f (z) =
fj z j
j=0
66
Maciej Pindor
4 Examples
Let us see some examples how Pade approximants work for dierent types
of functions. In illustrations below, I shall devote more attention to demonstrating that Pade approximants discover correctly singularities and zeros
than to approximating values of functions, though I shall not forget about the
latter.
Let f (z) = tanh(z)/z + 1/[2(1 + z)]. This function has an innite number
of poles uniformly distributed on the imaginary axis at z = (2k + 1)/2
k = 0, 1, 2, . . . and the pole at z = 1. It has also innite number of
zeros, the ones closest to origin are: z = 2.06727, .491559 2.93395i,
.5357536.17741i, .54597712.5134i and so on. I have added the geometric
series mainly to have a function with a series containing all powers of z, not
the one with even powers only. A small curiosity is that there is a block in the
Pade table of this function the one consisting of [0/1], [0/2], [1/1], [1/2].
As in any circle centred at the origin there is an odd number of zeros and
poles, we consider the sequence [M/3]. In the tables below I shall compare
positions of zeros and poles of the approximants in this sequence.
P.A.
[3/3]
zeros
2.06806, 1.02990 3.17939i
poles
1.00065, .002435 1.58229i
[4/3]
[5/3]
[6/3]
[7/3]
[8/3]
We clearly see that rst three poles and rst three zeros of [M/3] converge
to corresponding zeros and poles of f (z) as expected from the de Montessus
theorem. We could have also checked that values of [M/3] converge to values
of f (z) in the circle of the radius smaller than 3/2 the distance of the next
pair of poles. There appeared also stray zeros but they were outside this
circle.
Our function has innite number of poles, so let us see how diagonal
Pade approximants work here.
Pade Approximants
67
P.A.
[4/4]
zeros
poles
2.02230, .906076 3.08279i .999772, .001036 1.57569i
2.07416
2.08395
[5/5]
[6/6]
If we remember that [7/3] and [5/5] both use the same number of the
coecients (11), we can conclude that the diagonal Pade approximants approximate our function better than approximants with a prescribed degree of
the denominator. We could say, there is a price to pay: [4/4] using 9 coefcients like [5/3] has an unwanted pole at 2.08395. We see however that it
is accompanied by a zero at 2.07416 and can (correctly) guess that values of
[4/4] deviate considerably from those of f (x) only close to the pair, which is
called the defect. The analogous defect appears in [6/6], but the pair is much
more tight here and we can (correctly) guess that it spoils the approximative quality of [6/6] in even smaller area close to the defect. This is just how
convergence in measure (and in capacity) manifests itself.
We can also see on Fig. 2 how the behavior of some Pade approximants, mentioned above, compares with the behavior of f (x) on the interval
[6, 1.5] i.e. behind the singularity at x = 1.
If you are curious what happens when f (z) has a multiple pole let me
tell you that in that case Pade approximants have as many single poles as is
a multiplicity of that pole and they all converge to this one when order the of
the approximation increases.
Finally, let me say that I would be glad if you have read the message:
diagonal Pade approximants are beautiful do not be discouraged by their
defects others can also have defects, but none are as useful.
You should not, however, think that diagonal Pade approximants are
always the best ones there are some situations when paradiagonal sequences
of Pade approximants, i.e. sequences [m + k/m] with k constant, are optimal.
It can happen if we have some information on the behavior of the function at
innity. Obviously [m + k/m](x) behaves like xk for x . If our function
behaves at innity in a similar way, such sequences of Pade approximants can
convergefaster.This is well exemplied by a study of Pade approximants for
f (x) = x + 1 2x + 1 + 2/(1 x). It has zeros at 1.60415 and 1.39193, a
pole at x = 1 and two branch points at x = 1/2 and x = 1. Look at zeros
and poles of [4/3] and [4/4], remembering that [4/4] uses one coecient the
series more.
68
Maciej Pindor
[4/3]
[7/3]
[4/4]
0.1
[5/5]
f(x)
[5/3]
0
x
[6/3]
0.1
0.2
Fig. 2. Values of dierent PA to f (x) = tanh(x) + 1/(1 + x)/2
P.A.
[4/3]
zeros
1.38833, .754876,
.556928, 1.60403
poles
.782628, .564096
.999985
[4/4]
1.38548, .739811,
.552442, 1.60441
2499.85, .767038
.558807, 1.00002
Positions of zeros and of the pole are clearly better reproduced by[4/3]
than [4/4]. Moreover, when x [4/3](x) behaves like 1.4146x ( 2
1.4142). Additionally we see that the cut (1, 1/2) is simulated by a line of
interlacing zeros and poles the line of minimal capacity connecting branchpoints.
Pade Approximants
69
5 Calculation of Pad
e approximants
In practical applications there appears a problem of how to calculate the given
Pade approximants. In principle one should avoid solving a system of linear
equations, because it is the process very sensitive both to errors of data and to
precision of calculations. Forty and thirty years ago much activity was devoted
to nding dierent algorithms of recursive calculation of Pade approximants.
It is well documented in [3] ch. 2.4. However you can see that the system
of equations for coecients of the denominator is the one with the Toeplitz
matrix and for such systems there exist relatively fast and reliable routines
in all numerical programs libraries. With the speed of computers now in use,
quadruple precision as a standard option in all modern Fortran compilers
and also multiprecision libraries spreading around, I think that nding Pade
approximants this way is in practice the most convenient solution. This is,
e.g., the method used for calculation of Pade approximants in the symbolic
algebra system Maple.
References
1. G.A. Baker, Jr. Existence and Convergence of Subsequences of Pade Approximants. J. Math. Anal. Appl., 43:498528, 1973.
2. G.A. Baker, Jr. Essentials of Pade Approximants. Academic Press, 1975.
3. G.A. Baker, Jr., P. Graves-Morris. Pade Approximants, volume 13 and 14
of Encyclopedia of Mathematics and Applications. Addison-Wesley, 1981.
4. A.F. Beardon. The convergence of Pade Approximants. J. Math. Anal. Appl.,
21:344346, 1968.
5. G. Frobenius. Ueber Relationen zwischen den N
aherungsbr
uchen von potenzreihen. J f
ur Reine und Angewandte Math., 90:117, 1881.
e. Number 667 in Springer Lecture Notes
6. J. Gilewicz. Approximants de Pad
in Mathematics. Springer Verlag, 1978.
7. J. Nuttal. Convergence of Pade approximants of meromorphic functions. J.
Math. Anal. Appl., 31:147153, 1970.
. Sur la representation approchee dune fonction par des fractions
8. H. Pade
rationelles. Ann. de lEcole Normale, 9(3ieme serie, Suppl. 3-93).
9. O. Perron. Die Lehre von den Kettenbr
uchen. B.G. Tuebner, 1957. Chapter 4.
10. Ch. Pommerenke. Pade approximants and convergence in capacity. J. Math.
Anal. Appl., 31:775780, 1973.
11. H. Stahl. Three dierent approaches to a proof of convergence for Pade approximants. In Rational Approximation and its Applications in Mathematics and
Physics, number 1237 in Lecture Notes in Mathematics. Springer Verlag, 1987.
E.B. Sa
Center for Constructive Approximation
Department of Mathematics
Vanderbilt University
Nashville, TN 37240, USA
esaff@math.vanderbilt.edu
log
1
d(t),
|z t|
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 7194, 2006.
Springer-Verlag Berlin Heidelberg 2006
72
(1)
U d =
log
1
d(t)d(z).
|z t|
f dn
f d
as
73
E
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
E
1111111111111
0000000000000
1111111111111
0000000000000
1111111111111
0000000000000
S(E ) E.
Moreover, if strict inclusion takes place, then the set E \S(E ) has capacity
zero. It follows from the above inclusion that, being unique, the equilibrium
measures for E and for E coincide. Therefore
cap(E) = cap( E).
(b) For all z C,
U E (z) VE
with equality holding quasi-everywhere on E; that is, except possibly for a set
of capacity zero. We write this as
U E (z) = VE = log
1
cap(E)
q.e. on E.
(2)
A point z0 Int E if and only if there is some open disk with center at z0 that
lies entirely in E.
74
Then (b) guarantees that (2) holds at every point of IntE. The following fact is
deeper: if E is connected, then every point of E is regular. Furthermore,
at every regular point the conductor potential is continuous.
It is helpful to keep in mind the following two simple examples.
Example 1. Let E be the closed disk of radius R, centered at 0. Then dE =
ds/2R, where ds is the arclength on the circle |z| = R. One way to derive this
is to observe that E is invariant under rotations. The equilibrium measure,
being unique and supported on |z| = R, must enjoy the same property, and
therefore must be of the above form. Calculating the potential, we obtain
U E (z) = log
1
,
|z|
|z| > R
U E (z) = log
and
1
,
R
|z| R.
dx
(x a)(b x)
x [a, b].
z 2 1|
1
z + lower order terms,
c
as
z .
z E.
75
(i)
g is harmonic in ;
Second, our normalization implies that
(ii) lim (g(z) log |z|) exists and is nite;
z
Finally,
(iii) g is continuous in the closed domain and equals zero on its boundary.
There is a unique function that enjoys these three properties. It is called
the Green function for with pole at innity and is denoted by g (, ). So
we have just shown that
log |(z)| = g (z, )
(and that the limit in (ii) is equal to log(1/c)). It is now easy to see that
1
g (z, ).
cap(E)
U E (z) = log
(3)
Indeed, let h denote the dierence of the two sides of (3). Then h is harmonic
in the domain , and is equal to zero on its boundary. Moreover, h has a
nite limit at innity, namely log(1/c) log(1/cap(E)), recall (1). By the
maximum principle, h is identically zero and we are done. We also obtain
that the constant c is just cap(E).
In the case when E is a smooth closed Jordan curve, there is a simple
representation for E . The equilibrium measure of any arc on E is given
by
E () =
1
2
1
g
ds =
n
2
| |ds,
where the derivative in the rst integral is taken in the direction of the outer normal on E. Alternatively, E () is given by the normalized angular
measure of the image ():
E () =
1
2
()
(4)
2/n(n1)
n (E) :=
max
z1 ,... ,zn E
|zi zj |
1i<jn
76
(n)
(n)
z1 , . . . , zn
tained, is called an n-point Fekete set for E; the points zi in Fn are called
Fekete points.
(2)
(2)
(2) (2)
=
For example, if n = 2, then F2 = z1 , z2 , where z1 z2
diam E. Obviously, these 2 points lie on the outer boundary of E. In general, it follows from the maximum modulus principle for analytic functions,
that for all n, the Fekete sets lie on the outer boundary of E.
It turns out (cf. [11], [12]), that the sequence n decreases, so we may
dene
(E) := lim n (E).
n
pPn1
z n + p(z)
E,
:= max |f (z)|.
zE
We assume that E contains innitely many points (which is always the case
if cap(E) > 0). Then for every n there is a unique monic polynomial Tn (z) =
z n + such that Tn E = tn (E). It is called the n-th Chebyshev polynomial
for E.
In view of the simple inequality
tm+n (E) = Tm+n
Tm Tn
Tm
Tn
= tm (E)tn (E),
one can show (cf. [11], [12]) that the sequence tn (E)1/n converges, so we may
dene
cheb(E) := lim tn (E)1/n .
n
77
|z|=R
z n + p(z)
Rn ,
zn
and strict inequality takes place if p(z) is not identically zero. It follows that
Tn (z) = z n . Therefore tn (E) = Rn and cheb(E) = R.
Example 6. Let E = [1, 1]. Then Tn is the classical monic Chebyshev polynomial
Tn (x) = 21n cos(n arccos x),
x [1, 1],
n 1.
Fn
1/n
E
= lim
Tn
1/n
E
= 1 = cheb(E).
(the last equality follows from Example 1). Finally, it is easy to see that the
zeros of Fn are asymptotically uniformly distributed on |z| = 1. By that we
mean that for any arc on this circle,
1
1
{number of zeros of Fn in }
{length of },
n
2
n .
Note that the second ratio coincides with E () (cf. Example 1).
The examples of this section illustrate the following fundamental theorem,
various parts of which are due to Fekete, Frostman, and Szeg
o.
Theorem 1 (Fundamental Theorem of Classical Potential Theory).
For any compact set E C,
(a) cap(E) = (E) = cheb(E);
(b) Fekete polynomials are asymptotically optimal for the Chebyshev problem:
lim
Fn
1/n
E
= cheb(E) = cap(E).
78
Pn (z) =
(z zk )
k=1
and let zk denote the unit mass placed at zk . Then U zk (z) = log
and we see that
|Pn (z)|1/n = eU
(z)
1
,
|z zk |
where is the unit measure (normalized zero counting measure for Pn ) given
by
= Pn :=
1
n
zk .
k=1
1
{number of zeros of Pn in K}.
n
79
en (f ; E) = en (f ) := min f p
(5)
pPn
(pn+1 pn )
n=1
1
U E (z) = gC\E (z, ),
cap(E)
z C \ E.
(6)
For any R > 1, let R denote the level curve {z : |(z)| = R}, see Fig. 2 (we
call such a curve a level curve with index R).
11111
00000
E
00000
11111
111
00
00
11
80
U E (z) = log
1
,
R cap(E)
z R .
(7)
Let Fn+1 be the (n + 1)-st Fekete polynomial for E and let Pn be the polynomial of degree n that interpolates f at the zeros of Fn+1 . We are given that
f is analytic in a neighborhood of E; hence there exists R > 1 such that f is
analytic on and inside R . For any such R, the Hermite interpolation formula
yields
f (z) Pn (z) =
1
2i
z inside R .
(8)
(The validity of the Hermite formula follows by rst observing that the righthand side vanishes at the zeros of Fn+1 (z), and then by replacing f (z) by its
Cauchy integral representation to deduce that the dierence between f and
the right-hand side is indeed a polynomial of degree at most n).
Formula (8) leads to a simple estimate:
en (f ) f Pn
Fn+1 E
,
minR |Fn+1 (t)|
1
cap(E)
=
< 1.
R cap(E)
R
(9)
We have proved that indeed en (f ) 0 and that the convergence is geometrically fast. Since R > 1 was arbitrary (but such that f is analytic on and
inside R ), we have actually proved that (9) holds with R replaced by R(f ),
where
R(f ) := sup{R : f admits analytic continuation to the interior of R }.
Can we improve on this? The answer is no! In order to show this, we need
the following very useful result.
Theorem 2 (Bernstein-Walsh Lemma). Assume that both E and C \ E
are connected. If a polynomial p of degree n satises |p(z)| M for z E,
then |p(z)| M rn for z r , r > 1.
The proof uses essentially the same argument as in Example 5. The function p(z)/n (z) is analytic outside E, even at . Since || = 1 on E, we
know that |p(z)/n (z)| M for z E. Hence the maximum principle yields
p(z)
M,
n (z)
z C\E
Assume now that (9) holds for some R > R(f ) and let R(f ) <
Then for some constant c > 1,
en (f )
c
n
81
< R.
n 1.
= pn+1 f + f pn
en+1 (f ) + en (f ) 2c
2c
n 1.
If we choose R(f ) < r < , we obtain that the series p1 + n=1 (pn+1 pn )
converges uniformly inside r . Hence it gives an analytic continuation of f to
the interior of r , which contradicts the denition of R(f ).
Let us summarize what we have proved.
Theorem 3 (Walsh [17, Ch. VII]). Let the compact set E be connected
and have a connected complement. Then for any f A(E),
lim sup en (f )1/n =
n
1
.
R(f )
Remarks.
(a) The proof of this theorem shows that on interpolating f at Fekete points we
obtain a sequence of polynomials that gives, asymptotically, the best possible
rate of approximation. It may be not easy, however, to nd these points and
it is desirable to have other methods at hand. Assume, for example, that E
is bounded by a smooth Jordan curve . With as above, let the points
w1 , . . . , wn be equally-spaced on |w| = 1 and let zi = 1 (wi ) E be their
preimages. These points (called the Fejer points) divide into n subarcs,
each having E -measure 1/n (the latter can be derived from the formula (4)).
Therefore, the Fejer points have asymptotic distribution E . Let Pn be the
monic polynomial with zeros at z1 , . . . , zn . According to the statement in the
end of Section 1, the sequence {Pn } enjoys the same properties (b), (c) as
{Fn } does, and the proof of Theorem 3 shows that
lim sup f Pn
n
1/n
E
1
.
R(f )
(b) R(f ) is the rst value of R for which the level curve R contains a
singularity of f . It may well be possible that f is analytic at some other points
of R(f ) , but the geometric rate of best polynomial approximation does not
feel this whether every point of R(f ) is a singularity or merely one point is
a singularity, the rate of approximation remains the same as if f was analytic
82
(10)
Assume now that C \ E is connected but is E not. Then one can still dene
the Green function gC\E via the formula
gC\E = log
1
U E ,
cap(E)
from which it follows that properties (i)(iii) described in Section 1 will hold,
provided E is regular. Then, with R dened by (10), it is easy to modify the
above proof to show that Walshs Theorem 3 holds in this case as well.
Example 8. Let E = [1, ] [, 1], 0 < < 1, and let f = 0 on [1, ]
and f = 1 on [, 1]. Some level curves R of gC\E are depicted on Fig. 3. For
R small, R consists of two pieces, while for R large, R is a single curve.
There is a critical value R0 = gC\E (0, ) for which R0 represents a selfintersecting lemniscate-like curve (the bold curve in Fig. 3). Clearly, f can be
extended as an analytic function to the interior of R0 (dene f = 0 inside
the left lobe and f = 1 inside the right lobe). For R > R0 , the interior of R
is a (connected) domain; hence there is no function analytic inside of R that
is equal to 0 on [1, ] and to 1 on [, 1]. Therefore
R(f ) = R0 = exp gC\E (0, ) ,
and by the (extension of) Walshs theorem:
lim sup en (f )1/n = exp gC\E (0, ) .
n
83
R0
R , R < R0
R , R > R0
Pn
[0,1]
all n,
then
Pn (x) 0
x [0, 2 ).
for
w (x) := e|x| ,
>0
(11)
be a weight on the real line and let {pn } be orthonormal polynomials with
respect to this weight:
pm (x)pn (x)e|x| dx = mn
(for = 2 these are the classical Hermite polynomials). Since the weight is
even, the polynomials pn satisfy the following 3-term recurrence relation
xpn (x) = an+1 pn+1 (x) + an pn1 (x),
where {an } is some sequence of real numbers (cf. [15]). For the weights (11),
G. Freud conjectured that
lim n1/ an
exists.
84
deg Pn n.
For the Lorentz Problem, one simply observes that any P I of degree
n/(1 ) (which for simplicity we assume to be an integer) can be written in
the form
P (x) = xn/(1) Pn (x),
where Pn is a polynomial of degree n. Therefore, this problem deals with
sequences of weighted polynomials that satisfy
w n Pn
[0,1]
M,
w(x) = x/(1) ,
deg Pn n.
p2n (x)e|x| dx = 1
the substitution
x n1/ x,
L2 (R)
L2 (R)
= 1,
w(x) = e|x|
/2
degPn n,
is dened by
f
L2 (R)
:=
|f (x)|2 dx
1/2
85
1
w
and call it the external eld. Consider the modied energy integral for
M(E):
Iw () :=
=
1
d(z)d(t)
|z t|w(z)w(t)
1
log
d(z)d(t) + 2 Q(z)d(z)
|z t|
log
(12)
and let
Vw :=
inf
M(E)
Iw ().
86
arbitrary closed subset of E. Determining this set is one of the most important
aspects of weighted potential theory.
Weighted transnite diameter: (w, E).
Let
n (w) :=
(n)
max
z1 ,... ,zn E
2/n(n1)
1i<jn
(n)
pPn1
wn (z)(z n p(z))
E.
The following theorem (due to Mhaskar and Sa) generalizes the classical
results of Section 1.
Theorem 5 (Generalized Fundamental Theorem). Let E be a closed
set of positive capacity. Assume that w satises the conditions (i)(iii) and let
Q = log(1/w). Then
cap(w, E) = (w, E) = cheb(w, E) exp
Qdw .
87
on S()
on E.
(13)
On integrating (against = w ) the rst condition, we obtain that the constant is given by
cw = Iw (w ) +
Qdw = Vw
Qdw .
= w n Pn
S(w ) .
QdK ,
where K is the classical (unweighted) equilibrium measure for K. This socalled F-functional of Mhaskar and Sa is often a helpful tool in nding
S(w ). Since cap(K) and K remain the same if we replace K by K, we
obtain that F (K) = F ( K). It turns out that the outer boundary of S(w )
maximizes the F-functional:
max F (K) = F ( S(K )).
K
88
Q(x) = log(1/w(x)) =
log x
1
is convex. Maximizing the F-functional one gets S(w ) = [2 , 1]. (For details,
see [12, Sec. IV.1]).
Example 10 (Freud Weights). Here E = R and w(x) = exp(|x| ). Hence
Q(x) = |x| is convex provided that > 1, and we obtain Sw = [a , a ],
where a can be given explicitly in terms of the Gamma function. (Actually,
this result also holds for all > 0; see [12, Sec. IV.1].) For example, when
= 2, we get Sw = [1, 1].
S(w )
x E \ S(w ).
With the aid of (13) and a variant of the Stone-Weierstrass theorem (cf.
[12]), one can show that if a sequence {wn (x)Pn (x)}, degPn n, converges
uniformly on E, then it tends to 0 for every x E \ S(w ).
Thus, if some f C(E) is a uniform limit on E of such a sequence, it
must vanish on E \ S(w ). The converse is not true, in general, but it is true
in many important cases.
Incomplete polynomials
For the weight w = x/(1) , we have mentioned that S(w ) = [2 , 1]. It was
proved by Sa and Varga and, independently, by M. v. Golitschek (cf. [13],
[5]), that any f C[0, 1] that vanishes on [0, 2 ] is a uniform limit on [0, 1] of
incomplete polynomials of type .
In particular, choosing f (x) = 0 for x [0, 2 ], and f (x) = x 2 for
x > 2 , the sequence of type polynomials converging uniformly to f on [0, 1]
is uniformly bounded on [0, 1], but does not tend to zero for x > 2 . Thus the
answer to Problem 4 is yes, Lorentzs Theorem 4 is indeed sharp!
Freud Conjecture
For > 1, let [a , a ] be the support of the equilibrium measure for the
weight e|x| . Lubinsky and Sa showed in [7], that any f C(R) that
vanishes outside this support is a uniform limit of a sequence of the form
exp{n|x| }Pn (x), n 1. This result was the major ingredient in the argument given by Mhaskar, Lubinsky, and Sa [8], that resolved the Freud
Conjecture in the armative.
Concerning more general weights, Sa made the following conjecture:
89
6 Rational Approximation
For a rational function R(z) = P1 (z)/P2 (z), where P1 and P2 are monic
polynomials of degree n, one can write
1
log |R(z)| = U 1 (z) U 2 (z),
n
log
1
d(t).
|z t|
The theory of such potentials can be developed along the same lines as in
Section 1. We present below only the very basic notions of this theory that
are needed to formulate the approximation results. A more in-depth treatment
can be found in the works of Bagby [1], Gonchar [4], as well as [12].
The analogy with electrostatics problems suggests considering the following energy problem. Let E1 , E2 C be two closed sets that are a positive
distance apart. The pair (E1 , E2 ) is called a condenser and the sets E1 , E2
are called the plates. Let 1 and 2 be positive unit measures supported on
E1 and E2 , respectively. Consider the energy integral of the signed measure
= 1 2 :
I() =
log
1
d(z)d(t).
|z t|
Since (C) = 0, the integral is well-dened, even if one of the sets is unbounded. While not obvious, it turns out that such I() is always positive. We
assume that E1 and E2 have positive logarithmic capacity. Then the minimal
energy (over all signed measures of the above form)
V (E1 , E2 ) := inf I()
90
U = c1 on E1 ,
U = c2 on E2 ,
(14)
(we assume throughout that E1 , E2 are regular otherwise the above equalities hold only quasi-everywhere). On integrating against , we deduce from
(14) that
c1 + c2 = V (E1 , E2 ) = 1/cap(E1 , E2 ).
(15)
We mention that (similar to the case of the conductor potential) the relations
of type (14) characterize . Moreover, one can deduce from (14) that the
measure i is supported on the boundary (not necessarily the outer one) of
Ei , i = 1, 2. Therefore, on replacing each Ei by its boundary, we do not change
the condenser capacity or the condenser potential.
Example 11. Let E1 , E2 be, respectively, the circles |z| = r1 , |z| = r2 , r1 < r2
These sets are invariant under rotations. Being unique, the measure is
therefore also invariant under rotations and we obtain that
1 =
1
ds,
2r1
d2 =
1
ds,
2r2
|z| > r2
0,
r2
.
r1
(16)
Assume now that each plate of a condenser is a single Jordan arc or curve
(without self-intersections), and let G be the doubly-connected domain that
is bounded by E1 and E2 , see Fig. 4. We call such a G a ring domain.
For ring domains one can give an alternative denition of condenser capacity. Let
u(z) :=
1111111111
0000000000
1111111111
0000000000
E
1111111111
0000000000
1111111111
0000000000
G
1111111111
0000000000
1111111111
0000000000
E
1111111111
0000000000
91
1111111111
0000000000
G
0000000000
1111111111
0000000000
1111111111
E
0000000000
1111111111
E
0000000000
1111111111
1111111111
0000000000
2
This function is locally analytic but not single-valued in G (notice that there
is no modulus sign in the integral). Moreover, if we x t and let z move along
a simple closed counterclockwise oriented curve in G that encircles E1 , say,
then the imaginary part of log(z t) increases by 2, for t E1 , while for
t E2 it returns to the original value. Since 1 and 2 are unit measures, it
follows that the function : z w = exp(u(z)) is analytic and single-valued.
Moreover, it can be shown to be one-to-one in G. By its denition, satises
log || = U + c1 = 0 on E1 ;
log || = U + c1 = c1 + c2 on E2 .
Therefore maps G conformally onto the annulus 1 < |w| < ec1 +c2 .
It is known from the theory of conformal mapping, that, for a ring domain
G, there exists unique R > 1, called the modulus of G (we denote it by
mod(G)), such that G can be mapped conformally onto the annulus 1 <
|w| < R. We have thus shown that
cap(E1 , E2 ) = 1/ log(mod(G)).
(17)
rRn
f r
92
(18)
The proof of (18) follows the same ideas as the proof of inequality (9). Let
be a contour in D \E that is arbitrarily close to D. Let = 1 2 be the
(n)
(n)
equilibrium measure for the condenser (E, ). For any n, let 1 , . . . , n be
(n)
(n)
equally spaced on E (with respect to 1 ) and let 1 , . . . , n be equally spaced on (with respect to 2 ). Then one can show that the rational functions
(n)
(n)
rn (z) with zeros at the i s and poles at the i s satisfy
max |rn |
E
min |rn |
1/n
e1/cap(E, ) .
(19)
(n)
Let Rn = pn1 /qn be the rational function with poles at the i s that
(n)
interpolates f at the points i s. Then the Hermite formula (cf. (8)) takes
the following form:
f (z) Rn (z) =
1
2i
rn (z) f (t)
dt,
rn (t) t z
z inside ,
1/n
E
e1/cap(E, ) .
1
.
R
93
(20)
(21)
This conjecture was proved by O. Parfenov [9] for the case when E is a
continuum with connected complement and in the general case by V. Prokhorov [10]; they used a very dierent method the so-called AAK Theory
(cf. [18]). However this method is not constructive, and it remains a challenging problem to nd such a method. Yet, potential theory can be used to
obtain bounds like (21) in the stronger form
lim rn (f ; E)1/n = exp{2/cap(E, D)}
References
1. T. Bagby, The modulus of a plane condenser, J. Math. Mech., 17:315-329, 1976.
2. G. Freud, On the coecients in the recursion formulae of orthogonal polynomials, Proc. Roy. Irish Acad. Sect. A(1), 76:1-6, 1976.
auser, Boston Inc., Bo3. D. Gaier, Lectures on Complex Approximation, Birkh
ston, MA, 1987.
4. A.A. Gonchar, On the speed of rational approximation of some analytic functions, Math USSR-Sb., 125(167):117-127, 1984.
5. M. v. Golitschek, Approximation by incomplete polynomials, J. Approx.
Theory, 28:155-160, 1980.
6. G.G. Lorentz, Approximation by Incomplete Polynomials (problems and results). In E.B. Sa and R.S. Varga, editors, Pade and Rational Approximations:
Theory and Applications, Academic Press, New York, 289-302, 1977.
7. D.S. Lubinsky, E.B. Saff, Uniform and mean approximation by certain weighted polynomials, with applications, Constr. Approx., 4:21-64, 1988.
8. D.S. Lubinsky, H.N. Mhaskar, E.B. Saff, Freuds conjecture for exponential
weights, Bull. Amer. Math. Soc., 15:217-221, 1986.
94
Good Bases
Jonathan R. Partington
School of Mathematics,
University of Leeds,
Leeds LS2 9JT, U.K.
J.R.Partington@leeds.ac.uk
1 Introduction
There are two standard approaches to nding rational approximants to a given
function. The rst approach, which we shall review in this paper, is to employ
a basis of possible functions (interpreted in a fairly loose sense) such that the
possible rational approximants are linear combinations of the basis functions,
and thus given by a simple parametrization. In this situation it is required
to choose the most appropriate parameters or coordinates. An alternative,
which we shall not discuss, is the situation when the possible approximants
are not linearly parametrized: this is seen in Pade approximation, Hankelnorm approximation, and similar schemes.
Thus the theme of this paper is to describe some families of bases that have
been found to be particularly useful in problems of approximation, identication, and analysis of data. The techniques employed are mostly Hilbertian;
even in a comparatively simple Banach space such as the disc algebra (the
space of functions continuous on the closed unit disc and analytic on the open
disc), the technical problems involved in constructing bases well-adapted to
the given norm are much more complicated. In this case the functions constructed also tend to have a much less natural appearance, and seem to be of
mainly theoretical interest.
Our material divides naturally into two sections. In Section 2 we shall
explore situations when we have an orthonormal basis of rational functions and
can use inner-product space techniques, such as least squares. Then in Section
3 we review the theory of wavelets, where the basis functions are obtained
by translation and dilation of one xed function: under these circumstances
rational approximation is most usefully achieved in the context of frames,
which are a convenient generalization of orthonormal bases.
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 95102, 2006.
Springer-Verlag Berlin Heidelberg 2006
96
Jonathan R. Partington
2
2
n=0 |an | < . These functions have L boundary values on the unit circle
T, and we have
f
1
2
an z n ,
n=0
bn z n
n=0
a n bn .
n=0
f, g
1
2
2
0
2
0
is simply pN = k=0 f, gk w gk .
These orthogonal functions are sometimes known as Szeg
o polynomials.
Indeed, it was Szeg
o who made the rst systematic study of the asymptotic
properties of such polynomials; he also looked at the convergence of expansions
of analytic functions in orthogonal polynomials, and studied the location of the
zeroes of such polynomials (in the situation described above, all the zeroes
of gn lie in the open unit disc). Moreover, Szeg
os work goes further and
includes an analysis of the behaviour of functions orthogonal with respect to
a line integral along a general curve in the plane.
Good Bases
97
for
n deg h,
where h denotes the polynomial whose coecients are the complex conjugates
of the coecients of h; thus we have an explicit expression for all but a nite
number of the gn . The remaining ones are easy to calculate as well.
We shall now consider bases of more general rational functions in H 2 (D).
Let (zn )
n=1 be a sequence of distinct points in the unit disc satisfying
(1 |zn |) = ,
n=1
which implies that the only function f H 2 (D) such that f (zn ) = 0 for all n
is the identically zero function. (The negation of this condition is called the
Blaschke condition.)
2
We dene the Malmquist basis (gn )
n=1 in H (D) by
g1 (z) =
(1 |z1 |2 )1/2
,
1 z1z
gn (z) =
(1 |zn |2 )1/2
1 znz
and
n1
k=1
z zk
,
1 zk z
for
n 2.
Note that each gn has zeroes in the disc at z1 , . . . , zn1 and poles outside
the disc. In fact the functions (gn ) form an orthonormal basis for H 2 (D). The
Fourier coecients of a function f with respect to this basis are given by
interpolation at the points (zn ), since
f (zm ) =
f, gn gn (zm ),
n=1
for each m, and we observe that gn (zm ) = 0 if n > m, and so we have the
following formulae, which are a form of multi-point Pade approximant:
f (z1 ) = f, g1 g1 (z1 ),
f (z2 ) = f, g1 g1 (z2 ) + f, g2 g2 (z2 ),
f (z3 ) = f, g1 g1 (z3 ) + f, g2 g2 (z3 ) + f, g3 g3 (z3 ),
98
Jonathan R. Partington
and so on. Indeed, the Malmquist basis can also be obtained by applying the
GramSchmidt procedure to the reproducing kernels kzn (z) = 1/(1 z n z),
which satisfy f (zn ) = f, kzn for f H 2 (D).
Thus if we want the best rational H 2 (D) approximant to f with poles at
1/z 1 , . . . , 1/z n , then the above interpolation procedure explains how to nd
it.
2.2 Functions in the right half-plane
Recall that H 2 (C+ ) consists of all analytic functions F : C+ C such that
F
:=
sup
x>0
|F (x + iy)|2 dy
1/2
< ,
a (a s)k
,
(a + s)k+1
k = 0, 1, . . . .
These are a natural analogue of {1, z, z 2 , . . . } in H 2 (D). Note that the functions ek all have zeroes at a and poles at a. Moreover, they form an orthonormal basis of H 2 (C+ ). Their inverse Laplace transforms form an orthogonal
basis of L2 (0, ) and have the form
fk (t) = pk (t)eat ,
where pk is a polynomial of degree k. In fact
pk (t) =
a
Lk (2at),
et dk k t
(t e ).
k! dtk
Alternatively some people use Kautz functions, which are more appropriate
for approximating lightly damped dynamical systems. These have all their
poles at two complex conjugate points: the approximate models have the form
(s2
p(s)
,
+ bs + c)m
Good Bases
99
where p is a polynomial.
It is also possible to construct Malmquist bases in the half-plane using the
reproducing kernel functions for H 2 (C+ ). Recall the dening formula for a
reproducing kernel, namely
f (sn ) = f, ksn .
In this case the reproducing kernel functions have the formula
ksn (s) =
1
.
2(s + sn )
The Malmquist basis functions for the right half-plane are given by
g1 (s) =
1 (Re s1 )1/2
s + s1
gn (s) =
1 (Re sn )1/2
s + sn
and
n1
k=1
s sk
,
s + sk
for n 2.
In some examples from the theory of linear systems, an approximate location of the poles of a rational transfer function is known, and these techniques
enable one to construct models with poles in the required places. In the next
section we shall see how wavelet theory enables one to gain further insight
into the local behaviour of functions.
3 Wavelets
3.1 Orthonormal bases
One of the purposes of wavelet theory is to provide good orthonormal bases
for function spaces such as L2 (R). These basis functions are derived from a
single function by taking translated and dilated versions of it, and will be
denoted (j,k )j,kZ , where the parameter j controls the scaling and k controls
the positioning. Thus the inner product f, j,k gives information on f at
resolution j and time k. One may compare classical Fourier analysis,
where the Fourier coecients
1
f(k) =
T
T
0
f (t)e2ikt/T dt = f, ek ,
say,
100
Jonathan R. Partington
(k, k + 1), k Z. Let (t) = (0,1) (t) and k (t) = (t k) for k Z. Then
(k )kZ is an orthonormal basis of V0 . Any function f V0 has the form
f, k k ,
f=
k=
and
f
|f (t)|2 dt =
| f, k |2 .
k=
Vj = L2 (R) and
and Vj has orthonormal basis consisting of the functions 2j/2 (2j t k) for
k Z. Any chain of subspaces with these properties is called a multi-resolution
approximation or multi-resolution analysis of L2 (R).
We cannot directly use the functions as an orthonormal basis of L2 (R),
and one new trick is needed. We build the Haar wavelet, which is a function
bridging the gap between V0 and V1 .
We dene the Haar wavelet by
(t) = (2t) (2t 1) = (0,1/2) (t) (1/2,1) (t).
The functions k (t) = (t k), k Z, form an orthonormal basis for a space
W0 such that V0 W0 = V1 (orthogonal direct sum). Then Vj Wj = Vj+1 ,
where Wj has orthonormal basis
j,k (t) = 2j/2 (2j t k),
k Z.
Finally
L2 (R) = . . . W2 W1 W0 W1 . . .
and has orthonormal basis (j,k )j,kZ . Hence, if f L2 (R), we have
f, j,k j,k ,
f=
j= k=
converging in L2 norm.
Good Bases
101
In the construction sketched above, the j,k are very simple functions, but
they are all discontinuous. By working harder, one may obtain wavelets that
are better adapted to approximation problems.
Here is a list of the wavelets most commonly seen in the literature. To
obtain good properties of and its Fourier transform is not straightforward,
and the following are listed in (approximately) increasing order of diculty.1
Wavelet
Properties of (t)
Properties of (w)
Haar
C.S., discontinuous
O(1/w), C
LittlewoodPaley
O(1/t), C
C.S., discontinuous
Meyer
Rapidly-decreasing, C C.S., can be C
O(1/wk ), C
BattleLemarie Rapidly-decreasing, C k
k
Daubechies
C.S., C
O(1/wk ), C
3.2 Frames
For rational approximation, orthogonal wavelets are not so useful, and we
settle for something weaker. A frame (j,k ) in a Hilbert space H is a sequence
for which there are constants A, B > 0 such that
A f
| f, j,k |2 B f
for all f H.
j,k
f, j,k j,k .
j,k
/2
the Mexican hat function (the puzzled reader is invited to sketch it), then
the functions j,k (t) = 2j/2 (2j t k), with j, k Z, form a frame for
L2 (R); these were used by Morlet in the analysis of seismic data.
1
102
Jonathan R. Partington
References
1.
2.
3.
4.
1 Introduction
Very often the observation of natural phenomena leads to an average trend
with uctuations around it. One of the most well known example is the observation by Brown and others of a pollen particle in water. The particle is
subject to many collisions with water molecules and an average behaviour
follows by the law of large numbers. Here the average velocity of the particle
is zero, and the particle should stay at rest. However the observation reveals
an erratic motion known as Brownian motion. The goal of the central limit
theorem (abbreviated below as CLT) and the related results is to study these
uctuations around the average trend.
The CLT is historically attributed to De Moivre and then to Laplace for
a more rigorous study. The original argument is interesting for its relation to
Statistical Mechanics and we will come back to this approach several times. I
will therefore briey present this argument although it is not the most ecient
approach nowadays.
Consider a game of head or tail. One performs independent ips of a coin
which has a probability p to display head and q = 1 p to display tail. We
assume 0 < p < 1 and leave to the reader the discussion of the extreme cases.
One performs a large number n of independent ips and records the number
N (n) of times the coin displayed head. This is equivalent to a simple model of
Statistical Mechanics of n uncoupled 1/2 spins in a magnetic eld. The law of
large numbers gives the average behaviour of N (n) for large n. Namely, with
probability one
lim
N (n)
=p.
n
106
Pierre Collet
probability one. This does not say anything about the size of N (n) np,
namely the uctuations.
Since the ips are independent, the probability that a sequence of n ips
gives r heads (and hence n r tails) is pr q nr . Therefore we obtain
n r nr
p q
.
r
P N (n) = r) =
(1)
In particular, since the events {N (n) = r} are mutually exclusive and one of
them is realized, we have
n
1=
P N (n) = r) =
r=0
r=0
n r nr
p q
.
r
It turns out that relatively few terms contribute to this sum. By the law of
large numbers, only those terms with r np contribute. More precisely,using
Stirlings approximation, one gets the following result for r np = O n
2
e(rnp) /(2npq)
P N (n) = r) =
+O
2npq
We will discuss later on in more detail the case |r np|
turn out that the event
|N (n) np|
O n
1
n
.
O
(2)
n and it will
r P N (n) = r
Var N (n) =
r
rP N (n) = r
= npq .
Notice that for p xed, with a probability very near to 1 (for large n), the
observed sequence of heads and tails satises
r = np + O
npq .
All these sequences have about the same probability enh and their number
is about enh where h is the entropy per ip
h = p log p + q log q) .
107
Using formula (1), the reader can give a rigorous proof of these results (see
also [21]).
A more modern and more ecient approach to the CLT is due to Paul
Levy. This approach is based on the notion of characteristic function. Before
we present this method, we will briey recall some basic facts in probability
theory and introduce some standard notations.
f (x)dP(x) .
In particular
P f (X) A = E 1f (X)A
=E
108
Pierre Collet
109
measure of total mass one on this space. Two real random variables X1 and
X2 are said to be independent if for any pair A and B of (measurable) subsets
of R we have
P
X1 A X2 B
=P
X1 A
X2 B
= E f X1
E g X2
Sn =
Xj
j=1
Sn n /an
Sn n
x
an
(, x] .
110
Pierre Collet
We refer to [24] or other standard probability books for a proof. We will say
that a sequence n of probabilities on R converges in law to the probability
if for any real number x we have
lim
n ((, x])
= ((, x]) .
This implies (see [31]) that for any (measurable) set B such that (B) = 0,
lim n (B) = (B) .
In other words, Levys Theorem relates the convergence in law to the convergence of the characteristic functions.
In order to be able to apply Levys Theorem, we have to understand the
behaviour of n (s) for large n. Since the random variables X1 , X2 , . . . are
independent, we have
n (s) = (s/an )n
where
(s) = E eis
Xj
If we assume that the numbers an diverge with n, we have for any xed s
(s/an ) = 1
s2 2
+o
2a2n
1
a2n
s2 2
+o
2a2n
1
a2n
es
2 /2
2 /2
1
2
eix ex
/(2 2 )
dx
and we can apply the above Levy theorem which proves the following version
of the CLT.
Theorem 2. Let Xj be a sequence of i.i.d. real random variables with mean
(nite) and standard deviation (nite and non-zero). Then, for any real
number x,
lim P
Sn n
x
n
1
=
2
ey
/2
dy .
111
Sn =
Xj .
j=1
j =
Var Xj =
E Xj2 E (Xj ) .
Let
s2n =
j2 .
j=1
lim
E Xj2 1
j=1
Xj >tsn
=0,
112
Pierre Collet
Theorem 4. Let
integer n
sup P
x
= E Xj
1
Sn n
x
n
2
es
/2
ds
33
.
4 3 n
This result can be used for nite n if one has information about the three
numbers , and . If one assumes that higher order moments are nite, one
can construct higher order approximations. They involve Hermite functions
(Edgeworth expansion). We refer the reader to [13] and [4] for more details.
4.3 Other types of convergence
A rst result is the so-called local CLT which deals with the convergence of
probability densities (if they exist). The simplest version is as follows.
Theorem 5. If the common characteristic function of the real i.i.d. random
variables Xj is summable (its modulus is integrable), then for any integer n
Sn n
= + n
n
(3)
with n 0 when n tends to innity. This is not true in general, one can
consider for example the case of i.i.d. Gaussian random variables and use the
associated Hilbert space representation. This only holds in the weaker sense of
distributions as stated above. There is however a so-called almost sure version
of the CLT. In some sense it accumulates all the information gathered for the
various values of n. A simple version is as follows.
Theorem 6. For any real number x, we have almost surely
1
n log n
lim
j=1
Sj j
1
x
j
j
1
=
2
es
/2
ds .
where (y) is the Heaviside function which vanishes for y < 0 and equals 1 for
y > 0. We refer to [3] for references and a review of the results in this domain.
We only emphasize that 1/j is essentially the unique weight for which the
result holds.
113
Sn n
n 2 log log n 2
1/2
=1.
There is of course an analogous result for the lim inf. We refer the reader to
[33] for a proof and similar results.
4.5 Brownian motion
It is also quite natural to study the sequence Sn n as a function of n and
to ask if there is a normalization of the sequence and of the time (n) such that
one obtains a non-trivial limit. Let n (t) be the sequence of random functions
of time (t) dened by
n (t) =
[nt]
Xj ,
j=1
where [ ] denotes the integer part. This function is piecewise constant and has
discontinuities for some rational values of t. One can also interpolate linearly
to obtain a continuous function. Note that this is a random function since it
depends on the random variables Xj . More generally, a random function on
R+ (or R) is called a stochastic process.
An important result is that this sequence of processes converges to the
Brownian motion in a suitable sense. We recall that the Brownian motion
B(t) is a real valued Gaussian stochastic process with zero average and such
that
E Bt Bs = min{t, s} .
We refer the reader to [5] or [14] for the denition of convergence and the proof.
We refer to [37] for the description of the original experimental observation
by Brown.
A related result is connected with the
question of convergence of the sequence of random variables Sn n / n to a Gaussian random variable.
This is the almost sure invariance principle.
Theorem 8. For any sequence Xj of i.i.d. real random variables with zero
average, non-zero variance 2 , and such that for some > 0,
E [X1 |2+ < ,
114
Pierre Collet
Sn n B(n)
Cn1/2 .
In other words, there exists on this other probability space an integer valued
such that for any n > N
the above inequality is satised.
random variable N
Using the scaling properties of the Brownian motion, we have also (with C =
C/)
Sn n
C n .
B(1)
n
We see how this result escapes from the diculty mentioned about the formulation (3) by constructing in some sense a larger probability space which
contains the limit. We refer the reader to [28] or [33] for a proof.
We also stress an important consequence of the central limit theorem which
explains the ubiquity of the Brownian motion. A stochastic process (random
function) is called continuous if its realizations are almost surely continuous.
Theorem 9. Any continuous stochastic process with independent increment
has Gaussian increments.
We refer to [14] for a proof. It is also possible to express any such process in
terms of the Brownian motion. Indeed, if (t) is a continuous stochastic process
(with (0) = 0) with independent increments, there are two (deterministic)
functions e(t) and (t) such that
(t) = e(t) +
t
0
(s)dBs .
The integral in the above formula has to be dened in a suitable way since the
function Bs is almost surely not dierentiable. We refer to [14] for the details.
In the physics literature, the derivative of B (in fact a random distribution) is
known as a white noise. The independence of the increments reects the fact
that the system is submitted to a noise which is renovating at a rate much
faster than the typical rate of evolution of the system. We also refer to [37]
for more discussions on this subject.
4.6 Dependent random variables
There are many extensions of the CLT and of the above mentioned results to
the case of dependent random variables under dierent assumptions. A rst
diculty is that even for non-trivial random variables, the asymptotic variance
115
may vanish. Indeed, let (Yj ) be a sequence of real i.i.d. random variables with
nite non-zero variance 2 , and consider the sequence (Xj ) given by
Xj = Yj+1 Yj .
It is easy to verify that E Xj = 0 and the common variance is 2 2 > 0.
However, we have
n
Sn =
Xj = Xn+1 X1
j=1
which
implies that S2n / n converges in law to zero. Also, the variance of
Sn / n is equal to 2 /n and tends to zero when n tends to innity. We refer
to [19] for a general discussion around this phenomenon.
For
a sequence of non-independent random variables (Xj ), the variance of
Sn / n involves the correlation functions
Ci,j = C(Xi , Xj ) = E (Xi Xj ) E (Xi ) E (Xj ) .
If we moreover assume that the sequence is stationary (i.e. the joint distributions of Xi1 , . . . , Xik are equal to those of Xi1 +l , . . . , Xik +l for any k, i1 , . . . , ik
and any l > min{i1 , . . . , ik }), then Ci,j depends only on |i j|. In this case,
if as a function of |i j|, |Ci,j | is summable, then
lim E
Sn n
n
=E
X1
+2
E
j=2
X1 ( Xj
.
(4)
116
Pierre Collet
1
Sn (x) =
n
g f
j1
(x) =
j=1
Xj ,
j=1
and to wonder if there is a central limit theorem. In order to ensure that the
asymptotic variance does not vanish, one has to impose that g is not of the
form u u f and with this assumption one can prove a CLT. We refer to
[16] or [8] for the details and to [38] for more general cases.
Of course it may happen that even though the Xj have a nite variance,
the quantity (4) diverges. This is for example the case for some observables
in a second order phase transition in Statistical Mechanics. One should use a
non-trivial normalization to understand the uctuations. Some non-Gaussian
limiting distributions may then show up. We refer to [18] for a review of this
question in connection with probability theory.
5 Statistical Applications
The CLT is one of the main tool in statistics. For example it allows to construct
condence intervals for statistical tests. We refer to [7] for a detailed exposition
and many other statistical applications. There are also many results about
uctuations of empirical distributions, we refer to [35] for more on this subject.
6 Large deviations
The CLT describes the uctuations of order n of a sum of n random variables having zero average. One can also ask what would be the probability of
observing a uctuation of larger (untypical) size. For example, a giant uctuation (large deviation) which would provide a wrong estimate of the average
(i.e. an anomaly in the law of large numbers). There are many results in this
direction starting with Chernovs exponential bound. We will give some ideas
for the i.i.d. case, and refer to the literature for deeper results.
We will assume that for any real s, the random variable exp(sXj ) is integrable (existence of exponential moments). One can then dene the sequence
of functions
Zn (s) = E esSn
(5)
(6)
1
log Zn (s) = P (s) = log E esX1
n
117
ens P
Sn >n
Sn > n
en
Sn > n
sP (s)
being kept xed, we now choose s optimally. In other words, we take the
value of s minimizing s P (s). In doing so there appears the so called
Legendre transform of the function P dened by
() = sup s P (s) .
s
(7)
dP
(s) .
ds
(8)
One may wonder (and should wonder) if the solution is unique. It is easy to
see that P is a convex function. We leave to the reader the interesting exercise
of computing P (s) and P (s) and to interpret the results in particular for
s = 0. The solution of the problem (7) is unique unless P has ane pieces.
This occurs in statistical mechanics in the presence of phase transitions (we
refer to [22] for more details). Finally, we have
lim sup
n
1
log P
n
Sn > n
() .
With some more work, one can also obtain a lower bound. The following result
is due to Plachky and Steinebach.
Theorem 10. Let Wj be a sequence of real random variables and assume
that there exists a number T > 0 such that
i)
Zn (t) = E etWn <
ii)
P (t) = lim
118
Pierre Collet
1
log P (Wn > n) = ()
n
where
() = sup
t(0,T )
t P (t) .
We refer to [29] for a proof. In the present context, one applies this result
with Wn = Sn n, or Wn = Sn + n to obtain information on the large
deviations in the other direction. In the case where = 0, it is an interesting
exercise to compute the rst and second derivatives of () in = 0 and to
relate at least intuitively the above result to the CLT.
We now give an application to the (easy) case of the game of head or tail
discussed in the introduction. Formula (1) already solves the problem in this
case, namely one gets easily for q > x > 0 using Stirlings formula
O(1)
P N (n) > n(p + x) e(qx) log(qx)+(p+x) log(p+x)(q) log q(p+x) log p .
n
In other words,
(p + x) = (q x) log(q x) + (p + x) log(p + x) (q x) log q (p + x) log p .
(9)
A similar formula holds for the large deviations below np. Let us recover this
expression using the large deviation formalism (this is essentially the original
Chernovs bound). We rst have to compute the partition function
n
r=0
pes
p.
pes + q
119
Sn > n
7 Multifractal measures
One among the numerous applications of large deviations is the analysis of the
multifractal behaviour of measures. We rst introduce briey this notion. In
order to simplify the discussion we will restrict ourselves to (positive) measures
on the unit interval, the extension to higher dimension being more or less
immediate. The driving question in the multifractal analysis of a (positive)
measure is what is the measure of a small interval. The simplest behaviour
that immediately comes to mind is that the measure of any interval of length
r could be proportional to r. More precisely, we will say that a measure is
monofractal if there is a number > 0 and two positive numbers C1 < C2
such that for any point x [0, 1] belonging to the support of and for any
r > 0 small enough
C1 r Br (x) C2 r
(10)
where Br (x) is the interval [xr, x+r] (or more precisely [xr, x+r][0, 1]).
The Lebesgue measure satises this property with = 1 (with C2 = 2 and
C1 = 1 because of the boundary points). The number is intuitively related
to a dimension. If one considers the Lebesgue measure in dimension two, one
gets a similar relation with exponent two, and this extends immediately to
any dimension. We will say more about this below. Another interesting case
is the Cantor set K. This set can be dened easily as the set of real numbers
in [0, 1] whose triadic expansion does not contain one. In other words
K=
j 3j j 0, 2
.
j=1
It is well known (see [12] or [23]) that this set has dimension log 2/ log 3. This
set can also be dened as the intersection of a decreasing sequence of nite
120
Pierre Collet
unions of closed intervals. Namely for any n 1 and for any nite sequence
1 , . . . , n of numbers 0 or 2, let
n
j 3j ,
x1 ,... ,n =
j=1
and
I1 ,... ,n = x1 ,... ,n , x1 ,... ,n + 3n .
It is left to the reader to verify that
K=
I1 ,... ,n .
n 1 ,... ,n
Since each interval I1 ,... ,n has length 3n , and there are 2n such intervals,
this almost immediately leads to the above mentioned fact that the (Hausdor) dimension of K is log 2/ log 3. Let us now dene a measure on K (the
Cantor measure) by imposing
I1 ,... ,n = 2n .
There are various ways to prove that this indeed denes a probability measure
supported by K. We refer to [12] or [23] for the details. We now check (10).
For x K and a given r > 0 (r < 1/3), let n be the unique integer such that
3n r 3n+1 . It is easy to check that there is a nite sequence 1 , . . . , n
of numbers equal to 0 or 2 such that
I1 ,... ,n Br (x)
and
Therefore
2n Br (x) 2n+1
and we obtain an estimate (10) with = log 2/ log 3 (it is left to the reader
to compute the two constants C1 and C2 ). We see again a relation between
and the dimension. This is a general fact discovered by Frostman, namely
if (10) holds, the dimension of any set of non-zero measure is at least .
There is a converse to this result known as Frostmans Lemma. We refer to
[20] for the complete statement and a proof. We only sketch the proof of the
direct (easy) part. Recall (see [20], [12] or [23]) that the Hausdor dimension
of a set A is dened as follows. Let Brj xj be a sequence of balls covering
A, namely
Br j xj .
A
j
Hd () =
121
rjd .
inf
Br j xj
rj .
C2
log Br (x)
=
r0
log r
x lim
(11)
122
Pierre Collet
1
log # An =
n
where #( ) denotes the cardinality. From now on we will only consider this
case. We then consider the sequence of partition functions at inverse temperature dened by
Zn () =
1
# An
(I) .
IAn
P () = lim
Note that
d log Zn
(1) =
d
= () .
Here we assume a little more than the conclusion of Theorem 10, namely that
instead of having information about those atoms I for which (I) > en , we
have information for (I) en . This follows easily if () is dierentiable
with non-zero derivative for this value of .
From this result we can come back to the Hausdor dimension of the sets
E . For this purpose, we will assume that there is a number (0, 1) such
that all atoms of An are intervals of length n (uniform partition). Therefore
rjd
balls of radius
nd n( log ) n
123
to cover
If d > (( log ) )/ log , the above quantity tends to zero when n tends
to innity and we conclude that dH E (( log ) )/ log . Under some
bounded distortion properties, one can prove that this is also a lower bound
(see [9] and [1]).
As an easy example consider on the Cantor set K the measure dened
for 0 < p < 1 (and q = 1 p) by
I1 ,... ,n = p
n
j=1
j /2 n
n
j=1
j /2
When p = q = 1/2 we get the Cantor measure dened above which has
a trivial mono-fractal structure. From now on we assume p = q. Consider
the sequence of partitions An = I1 ,... ,n . It is easy to prove that the
multifractal formalism applies to this measure using the large deviation results
previously established. One gets since = log 2
P () = log p + q log 2 ,
and assuming for example p > q (the other case is left to the reader) it follows
that
s=
() =
log3 p log3 q
124
Pierre Collet
dH (E )
..
.....
.
.....
....
..
.....
.... ............
.......... ... ..............
.
.
.
.
.
.
.....
..
..
.....
.....
....
.....
..
...
....... ...
..
...
...... .
.
...
.
.
.....
...
...
...
.
.
.
.
...
.
.
..... ....
...
.
.
.
...
.
.
.
...
..... ....
...
...
...
.
.
.
.
...
....
..
..
..
.
.
.
...
.
.
.
.
.
.
.
.
.
...
..
.
.
...
.
.
.
.
.
..
.
.
.
.
.
.
..
.
.
.
.
.
.
...
.
.
.
..
...
.....
....
....
....
...
.
.
.
.
...
..
....
..
.
.
.
.
.
.
.
..
.
.
.
.
.
.
.
.
.
..
...
..
..
.
.
.
.
..
..
.
.
.
.
.
.
.
.
.
..
.
.
.
.
.
...
.
..
...
.....
..
..
....
...
.
.
.
.
.
.
.
.
...
.....
....
...
.
.
.
.
.
.
.
.
.
.
.
..
.
...
.
.
min
max
min = log3 p and max = log3 q where dH E vanishes (recall that p > q).
These correspond respectively to the largest and smallest measure of atoms
of xed size, namely for an atom I of An we have q n (I) pn , and there
is only one atom reaching each bound. The constraint log3 p log3 q
can also be deduced from the formula
=
ps log p + q s log q
.
ps + q s
which tends to zero when n tends to innity since for p = 1/2 one has p(1p) <
1/4.
Finally there is the point H = p log3 p q log3 q where the slope is equal
to one. Since () = s, this gives s = 1. Note also that by the normalization
of the probability measure , we have for each
(I) 2n en log 3 en( log 3) 2n enP
IAn ,
(I)en log 3
1,
125
References
1. L. Barreira, Y. Pesin, J. Schmeling. On a general concept of multifractality: multifractal spectra for dimensions, entropies, and Lyapunov exponents.
Multifractal rigidity. Chaos 7:27-38 (1997).
2. A. Barron, O. Johnson. Fisher information inequality and the central limit
theorem. http://arXiv.org/abs/math/0111020.
ki. A universal result in almost sure central limit theory.
3. I. Berkes, E. Csa
Stochastic Process. Appl. 94:105-134 (2001).
4. R.N. Bhattacharya, R. Ranga Rao. Normal approximations and asymptotic
expansion. Krieger, Melbourne Fla. 1986.
5. P. Billingsley. Convergence of Probability Measures. John Wiley & Sons,
New York 1968.
6. A. Borovkov. Boundary-value problems, the invariance principle, and large
deviations. Russian Math. Surveys 38:259-290 (1983).
7. A. Borovkov. Statistique Mathematique. Editions Mir, Moscou 1987.
8. P. Collet. Ergodic properties of maps of the interval. In Dynamical Systems.
R. Bamon, J.-M. Gambaudo & S. Martnez editeurs, Hermann, Paris 1996.
9. P. Collet, J. Lebowitz, A. Porzio. The dimension spectrum of some dynamical systems. J. Statist. Phys. 47:609-644 (1987).
10. A. Dembo, O. Zeitouni. Large Deviation Techniques and Applications. Jones
and Bartlett, Boston 1993.
126
Pierre Collet
11. R.S. Ellis. Entropy, Large Deviations, and Statistical Mechanics. Springer,
Berlin 1985.
12. K.J. Falconer. The geometry of fractal sets. Cambridge Tracts in Mathematics, 85. Cambridge University Press, Cambridge, 1986.
13. W. Feller. An introduction to Probability Theory and its Applications I, II.
John Wiley & Sons, New York, 1966.
` la Theorie des Processus
14. I. Guikhman, A. Skorokhod. Introduction a
Aleatoires. Editions Mir, Moscou 1980.
15. P. Halmos. Measure Theory. D. Van Nostrand Company, Inc., New York,
N. Y., 1950.
16. F. Hofbauer, G. Keller. Ergodic properties of invariant measures for piecewise monotonic transformations. Math. Zeit. 180:119-140 (1982).
17. E.T. Jaynes. Probability Theory, The Logic of Science. Cambridge University
Press, Cambridge 2004.
18. G. Jona-Lasinio. Renormalization group and probability theory. Physics Report 352:439-458 (2001).
19. A. Kachurovskii. The rate of convergence in ergodic theorems. Russian Math.
Survey 51:73-124 (1996).
20. J.-P. Kahane. Some random series of functions. Cambridge University press,
Cambridge 1985.
21. A.I. Khinchin. Mathematical Foundations of Statistical Mechanics. Dover, New
York 1949.
22. O.E. Lanford III. Entropy and equilibrium states in classical statistical mechanics. In Statistical Mechanics and Mathematical Problems. A. Lenard editor,
Lecture Notes in Physics 20, Springer, Berlin 1973.
23. P. Mattila. Geometry of sets and measures in Euclidean spaces. Fractals and
rectiability. Cambridge Studies in Advanced Mathematics, 44. Cambridge University Press, Cambridge, 1995.
tivier. Notions fondamentales de la theorie des probabilites. Dunod,
24. M. Me
Paris 1968.
25. J. Neveu. Calcul des Probabilites. Masson, Paris 1970.
26. P. Ney. Notes on dominating points and large deviations. Resenhas 4:79-91
(1999).
27. V.V. Petrov. Limit Theorems of Probability Theory. Sequences of independent
random variables. Clarendon Press, Oxford 1995.
28. W. Philipp, W. Stout. Almost sure invariance principles for partial sums of
weakly dependent random variables. Memoirs of the AMS, 161:1975.
29. D. Plachky, J. Steinebach. A theorem about probabilities of large deviations
with an application to queuing theory. Periodica Mathematica 6:343-345 (1975).
30. E. Rio. Theorie asymptotique des processus aleatoires faiblement dependants.
Springer, Berlin 2000.
31. L. Schwartz. Cours dAnalyse de lEcole Polytechnique. Hermann, Paris 1967.
32. C. Stein. Approximate Computations of Expectations. IMS, Hayward Cal.
1986.
33. W. Stout. Almost Sure Convergence.Academic Press, New York 1974.
34. M. Talagrand. Concentration of measure and isoperimetric inequalities in
127
1 Introduction
Random polynomials appear naturally in dierent elds of physics, like quantum chaotic dynamics, where one has to study the statistical properties of
wavefunctions of chaotic systems and the distribution of their zeros [2, 11].
Our personal interest lies rather in their connection with noisy data analysis, especially in the context of the linear parametric modelization of random
processes [5, 3, 16] and the problem of the resonance recognition, i.e. the identication of the poles of rational estimators of the power spectrum, computed
from a measured sample of the signal.
In this contribution we address a probabilistic question concerning the real
and complex roots of certain classes of random polynomials, the coecients
of which are random real numbers. The roots of such polynomials are random
variables, real or complex conjugates, and one is interested in the mathematical expectation of their distribution in the complex plane, according to the
degree of the polynomial and the statistics of its coecients.
Because of mathematical simplicity, we study in section 2 the statistics
of the real roots of polynomials with real random Gaussian coecients. This
material is taken from the historical papers by Kac [8, 9] and subsequent works
[6, 4, 12]. In section 3 are introduced several directions of generalization; we
investigate the statistics of the roots in the whole complex plane, and introduce
the notion of generalized monic polynomials. We just give an outline of the
derivation of the density of complex roots by recalling the passage from the
real case to the complex one, the proof of which can be found in [13]. We use
this result in order to understand and characterize the behavior of the roots
in the two extreme cases, homogeneous random polynomials on the one hand,
monic polynomials with weak disorder on the other hand. We briey look
at the particular class of self-inversive random polynomials [2], whose roots
have an interesting behavior on the unit circle; the case of random complex
coecients is just mentioned in the conclusion.
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 129143, 2006.
Springer-Verlag Berlin Heidelberg 2006
130
Benedicte Dujardin
2 Real roots
The rst problem about random polynomials is the question of the average
number of real roots of a polynomial of degree n and was solved by Kac in the
40s [8] in the simple case of coecients ak independent identically distributed
(i.i.d.) with Gaussian probability density function (pdf) N(0, 1) of average zero
and variance 1. Let
n
ak z k
Pn (z) =
(1)
k=0
n (t)
k=0
(n)
(2)
the Jacobian being due to the change of independent variable in the Dirac
distribution. n (t) is actually the exact density of the roots for a given realization, and one can calculate its mathematical expectation
n (t)
E(n (t)) =
R2
dP dP P(P, P ) |P | (P )
(3)
considering, for any xed t R, Pn (t) and Pn (t) as two correlated random
variables written P and P . Let us now make the hypothesis that the ak are
i.i.d. N(0, 1); since Pn (t) and Pn (t) are linear combinations of the ak , they are
themselves Gaussian variables with zero mean and joint pdf
P(P, P ) =
1
P
exp (P, P )C 1
P
2
det(C)
C=
(4)
E(P 2 ) E(P P )
.
E(P P ) E(P 2 )
dP |P | exp
E(P 2 )P
2
,
E(P 2 )
(5)
n
0
k ak tk1 we
131
1
1
(n + 1)2 t2n
2
2
(1 t )
(1 t2n+2 )2
(6)
n (t)
1/2
Figure 1 shows the distribution of real roots for n = 10 and 100. The dotted
line is the theoretical density n (t) given by (6); it has two peaks centered
near 1 and these peaks get narrower when the degree of the polynomial
n increases. The black line is an histogram over 1000 realizations of the real
roots of random polynomials as dened by (1), and we see that the simulations
match the theory quite well.
10
10
n=10
theoretical density
histogram of real roots
n=100
_10
_100
-4
0
t
-2
-4
-2
0
t
Fig. 1. Density of real roots and histograms over 1000 realizations for polynomials
of degrees n = 10 and n = 100.
dt n (t).
(7)
dt
n (t)
dt
1
(n + 1)2 t2n
(1 t2 )2
(1 t2n+2 )2
1/2
(8)
2
1
2
{ln n + ln(2 )} E(Nn ) {ln n + ln 2 + 4 3} n N,
(9)
so for large degrees the main term of E(Nn ) varies like 2 ln n, as illustrated
in Fig. 2 with numerical simulations.
Several other works have been carried out on this problem, relaxing the
hypothesis of independence or gaussianity of the coecients ak . Littlewood
and Oord [12] worked with uniform and bimodal distributions of the ak , and
proved that for large degrees the order of magnitude of E(Nn ) kept on growing
like 2 ln n.
132
Benedicte Dujardin
5
2
average number of real roots
2/ {ln n + ln 2}
10
100
n : degree of the random polynomial
3 Complex roots
3.1 Complex roots
We are now interested in the average distribution in the whole complex plane
of the roots of the random polynomial Pn , at least in the limit n
1. The
same argument as in section 2 can be applied so we get an integral formula
for the roots density, with slight modications. The counting measure in the
plane is
(n)
n (z) =
k
z C.
(10)
The change between formul (2) and (10) is due to the transition from a 1dimensional space to a 2-dimensional space, and to the holomorphy of Pn . The
133
(11)
d2 z
n (z).
(12)
Pn (z) = (z) +
ak fk (z),
z C.
(13)
k=0
, f0 , . . . , fn are holomorphic functions, and the ak are the real random coecients. We will later focus on two cases of particular interest: taking = 0
and fk = z k returns an homogeneous random polynomial as studied in section
2; taking (z) as a polynomial of degree n and fk = z k , we get a monic in
the classical sense polynomial.
With the hypothesis that the joint pdf of P and P is Gaussian, computing
the integral over P becomes possible and leads to the density of complex roots.
Let us suppose that the ak are i.i.d. N(0, 1); this hypothesis is not restrictive,
since a judicious choice of and fk allows one to reduce systematically the
coecients to zero-mean and same variance random variables.
Working with 4 random variables instead of 2 makes the calculations more
mathematically intensive but does not change the principle, so we just give
the nal result. Let us rst introduce some notations adapted to the problem
[13, 14, 16]. Let v and w be two complex vectors of dimension n, (v, w) is
dened as the 2 2 matrix
(v, w)
(14)
The transposed column vector of (f0 (z), . . . , fn (z)) and its derivative are written f and f is considered as a 2-dimensional vector (Re((z)), Im((z)))
of derivative . With those notations, the mathematical expectation of the
counting measure of the complex roots is, at the points where det(f , f ) = 0,
134
Benedicte Dujardin
E(
n)
1
2
det(f , f )
exp 2 (f ,f )
(15)
1
det(f , f )
(16)
n=100
Im(z)
Im(z)
Re(z)
Re(z)
Fig. 3. 3-dimensional representation of the density of complex roots of an homogeneous random polynomial of degree n, for n = 10 (left) and 100 (right).
1,
z = rei ,
(17)
i.e. close to the unit circle and far enough from the real axis, as shown in Fig. 4
for n = 10, and which corresponds to the interesting area of strong density,
the average density can be approximated by
135
n=10
n (re
1
1
(n + 1)2 r2n+2
.
2
2
(ln r )
(1 r2n+2 )2
(18)
(19)
Since the ak are i.i.d., the statistics of the distribution of complex roots
is invariant under the transformation z z 1 , and (19) implies that most of
the roots are present in a neighborhood of the unit circle. The other result
concerns the fraction of roots in an angular sector [, ]
1
n | |
E(Nn (, ))
n
2
in probability [, ] ]0, [.
(20)
Because of the symmetry with regards to the real axis, and apart from is
singularity, formula (20) implies an angular uniform distribution. In Fig. 5 are
plotted, on the left, the 10000 roots of 1000 random polynomials of degree 10.
We observe the strong concentration of points around the unit circle, and the
singularity of the real axis. On the right are plotted histograms of the moduli
of the complex (non-real) roots for n = 10 and 100, and with dotted lines the
asymptotic curves given by (18); for the angular distribution, see Fig. 10.
Let us now call the order of magnitude of the parameters of the deterministic term . In the limit
1, i.e. for a strong disorder, the governing
136
4
3
Benedicte Dujardin
400
n=10
1000 real.
300
_10
Im(z)
1
0
200
-1
-2
unit circle
100
-3
-4
-4
-3
-2
-1
0
Re(z)
0
0.5
0.6
0.7
0.8
0.9
1
|z|
1.1
1.2
1.3
1.4
1.5
Fig. 5. On the left, the location in the complex plane of the roots of an homogeneous
random polynomial of degree 10, for 1000 realizations. On the right, the radial
distribution of complex roots and histograms of the moduli for n = 10 and 100.
term of Pn is its random part; the resulting density of roots is similar to the
density for the homogeneous polynomial, and this behavior remains true as
long as = O(1).
3.4 Weak disorder limit: monic polynomials
In the weak disorder limit
1, Pn is dominated by (z); the main
contributions to the density of roots (given by formula (15)) come from
2
det(f , f )
(f , f )(f , f )1
1
exp 2 (f , f )1 .
2
(21)
||z-zo||^{2M-2} exp(-||z-zo||^{2M})
137
0.8
0.6
0.4
0.2
0
-3
-2
Fig. 6. Function z z0
-1
2M 2
0
||z-zo||
exp{ z z0
2M
i
1
i
1
)(z + + ) (z 1 i)3 (z 1 + i)3 +
2 2
2 2
10
ak z k
(22)
for 500 realizations. We observe the presence of four (very) sharp peaks of
density around the four simple roots 2i and 0.5 0.5i. The 2 3 remaining
roots are located on two circles surrounding the roots of order M = 3, 1 i.
Let us now consider the particular case of a root of multiplicity n at the
origin by taking (z) = z n . The density has then the shape
2 r2n2 exp
z = rei R \ C ;
(23)
n1
k=1
ak |a0 | n 1 n y k + sgn(a0 ).
(24)
138
Benedicte Dujardin
2
500 real.
Im(z)
-1
-2
-2
-1
0
Re(z)
Since
1 we neglect all the negative powers of . The roots of Pn (y)
th
are then the n roots of 1, depending on the sign of the random variable
a0 , and the roots of Pn (z) are located on a circle of average radius
1/n
1/n E(a0
n+1
1
),
) = 1/n 21/2n (
2n
where (t) =
(25)
1000 real.
n=4
0.8
Function
C(0,1)
2n2
2n2
exp{ r
139
2
sin n
sin 2
0.6
for n=4
0.4
C(0, 1/n)
Im(z)
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-1
-0.8
-0.6
-0.4
-0.2
0
Re(z)
0.2
0.4
0.6
0.8
Fig. 8. Left, position in the plane of the roots of 1000 monic random polynomials
with (z) = z 4 and = 20. Right, a 3-dimensional plot of function (23) for n = 4.
1
0.8
1000
1000 real.
n=10
0.6
800
0.4
Im(z)
0.2
600
C(0, 1/n )
0
-0.2
400
-0.4
-0.6
200
-0.8
-1
-1
C(0,1)
-0.5
0
Re(z)
0.5
0
0.5
0.6
0.7
|z|
0.8
0.9
Fig. 9. On the left, position in the plane of the roots of 1000 monic random polynomials with (z) = z 10 and = 20. On the right, histograms of the moduli of the
non-real roots of monic polynomials for n = 10 and 100.
mogeneous case, the position of the peak is almost constant, close to 1; in the
monic case, we have an exponential law of the inverse of n, corresponding to
the order of magnitude 1/n . On the right, we study the angular distribution
with histograms of the arguments of the complex (non real) roots. We observe
the appearance of an angular structure in the monic case, while the angular
distribution is quite uniform in the homogeneous case.
3.5 Self-inversive polynomials
Let us nally mention the particular case when the polynomial Pn has the
self-inverse symmetry, which means that its coecients have the reective
property ak = ank , k = 0, . . . , n, with the consequence that the set of its
roots is invariant through the transformation z z 1 .
140
Benedicte Dujardin
200 bins
120
100
80
60
-1/n
40
0.05
20
0.1
0.15
1/n : inverse of the degree of the polynomial
0.2
0.25
0.5
1.5
arg(z)
2.5
Fig. 10. Left, the average moduli of the complex roots of homogeneous and monic
( = 20) random polynomials, as a function of the inverse of the degree 1/n, compared to their theoretical values (the constant 1 and 1/n ). Right, histograms of
the positive arguments of the complex roots of homogeneous and monic ( = 20,
= z n ) random polynomials of degree n = 10.
n2
,
3n
(26)
4 Conclusion
We have discussed two classes of random real polynomials, according to the
order of magnitude of the random part with regards to the deterministic
component. In the strong disorder case, the roots are concentrated around
the unit circle. The second class concerns random monic polynomials in the
n=10
1000 real.
0.66
Im(z)
141
0.64
0.62
0.6
-1
1/sqrt(3)
0.58
-2
-2
-1
0
Re(z)
0.56
20
40
n : degree of the polynomial
60
Fig. 11. Left, roots of 1000 self-inversive polynomials of degree n = 10. Right,
evolution of the average fraction of roots on the unit circle as a function
of the
presence of weak disorder. Their roots are located in the neighborhood of the
roots of the non perturbed polynomial if those roots are simple, and in the
case of multiple roots, on a quasi-crystal centered on the root.
We have seen two examples of polynomials whose roots have a certain
symmetry, with regards to the real axis when the coecients are real, or with
regards to the unit circle in the self-inversive case. In both situations, the
symmetry line attracts a certain fraction of the roots.
We have not studied here the case of random polynomials with complex
coecients. Yet, many studies have been carried out concerning this problem
[1, 2, 15]. For high degrees, the roots are located in an annulus around the
unit circle, with a uniform angular distribution, as one can see in Fig. 12 with
the roots of 1000 random polynomials of degree 10 with complex coecients
whose real and imaginary parts are i.i.d. Gaussian variables. Histograms of
the moduli and arguments complete this illustration.
The accumulation of the roots around the unit circle is related to the
existence of a natural boundary of analyticity on this circle of the random
series [10]
ak z k .
(27)
k=0
The zeros of homogeneous random polynomials, i.e. partial sums of this series,
are located in the neighborhood of the boundary [15].
It is possible to pursue the study of the statistical properties of the zeros
of random polynomials with the determination of the k-point correlation functions k (z1 , . . . , zk ) using the same method [14]. Taking k = 1 returns the
density of zeros, and 2 (z1 , z2 ) characterizes the correlation between the roots.
142
Benedicte Dujardin
n=10
1000 real.
300
200
100
Im(z)
|z|
60
Histogram of the arguments of a complex polynomial with random gaussian coefficients
-1
-2
-3
-3
40
C(0,1)
-2
20
-1
0
Re(z)
-3
-2
-1
0
arg(z)
Fig. 12. Left, roots of 1000 complex random polynomials of degree n = 10. The real
and imaginary parts of the coecients are i.i.d. Gaussian N(0, 1) random variables.
Right, histograms of the moduli and of the arguments of the roots.
References
1. E. Bogomolny, O. Bohigas, P. Leboeuf, Distribution of roots of random
polynomials, Phys. Rev. Lett., 68(18):2726-2729, 1992.
2. E. Bogomolny, O. Bohigas, P. Leboeuf, Quantum Chaotic Dynamics and
Random Polynomials, J. Stat. Phys., 85:639-679, 1996.
3. B. Dujardin, J.-D. Fournier, Coloured noisy data analysis using Pade
approximants, submitted.
4. A. Edelman, E. Kostlan, How many zeros of a random polynomial are
real?, Bull. Amer. Math. Soc., 32:1-37, 1995.
5. J.-D. Fournier, Complex zeros of random Szeg
o polynomials, Computational Methods and Function Theory, pp. 203-223, 1997.
6. I.A. Ibragimov, N.B. Maslova, On the expected number of real zeros of
random polynomials I. Coecient with zero means, Theory Probab. Appl.
16:228-248, 1971.
7. I.A. Ibragimov, O. Zeitouni, On Roots of Random Polynomials, Trans.
Amer. Math. Soc., 349(6):2427-2441, 1997.
8. M. Kac, On the average number of real roots of a random algebraic equation,
Bull. Amer. Math. Soc., 49:314-320, 1943.
143
1 Introduction
In the previous lecture I discussed (and advertised) a special type of rational
approximation to functions of the complex variable the one that can be
constructed when the information on the function approximated is given in the
form of coecients of its power (favorably Taylor) expansion. The knowledge
of the Taylor series coecients species a function completely and, as we have
seen, one can construct from a nite number of coecients a rational function
which (almost everywhere) approximates this function better and better when
we take into account more and more coecients. There are however other
interesting sequences of rational approximants and I shall rst say few words
about them. They use the information on a behaviour of the function at
several points. In either case, every application of this or other approximation
scheme encounters in practice the additional diculty: all the information we
want and can use to construct an approximating rational function is biased
by errors we can know either expansion coecients or function values with
nite accuracy only. Consequences of this fact, fundamental in all practical
applications, will be discussed in later sections.
2 Rational Interpolation
As we know an analytic function f (z) can also be uniquely specied by an
innite number of its values at points contained in a compact set, on which
the function is analytic. Construction of a sequence of rational functions having the same values as f (z) on a given nite set of points is known as the
rational interpolation problem. It is the classical problem of the numerical
analysis and was studied long ago. My exposition will be partially based on
[10]. Before we discuss the convergence of sequences of rational interpolants
let us comment on the problem of their existence. Let there be a sequence
of points in the complex plane {zi }N
i=0 (which we shall call the nodes) and a
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 145156, 2006.
Springer-Verlag Berlin Heidelberg 2006
146
Maciej Pindor
i = 0, . . . , N .
(1)
We are looking for a rational function rm,n (z) of degrees m in the numerator
and n in the denominator such that rm,n (zi ) = fi , i = 0, . . . , N . If we call the
numerator and the denominator of rm,n , Tm (z) and Bn (z) respectively, and
treat their coecients as unknowns we get the system of equations for these
coecients
Tm (zi )
i = 0, . . . , N
(2)
= fi
Bn (zi )
and we can expect a unique solution if m + n N .
These equations are nonlinear, but can be linearized to the form
Tm (zi ) = Bn (zi )fi
i = 0, . . . , N
(3)
however (3) is not strictly equivalent to (2) all solutions of the latter are
solutions of the former, but not vice versa. The situation seems analogous to
the one encountered in the construction of Pade approximants, but its origin is
even easier to comprehend here. It is obvious that if (3) is satised and Bn (z)
does not vanish on any node, then we can divide the equation by Bn (zi ) and
(2) is also satised. We conclude that if a solution of (3) does not satisfy
(2) then Bn (z) must vanish on some subset of (say k) nodes. Then, however,
Tm (z) must also vanish there. Therefore both Tm (z) and Bn (z) contain a
common factor the polynomial of degree k vanishing on these nodes let it
be wk (z). In this case (3) looks as
nk (zi )wk (zi )fi
i = 0, . . . , N
(4)
Tmk (zi )wk (zi ) = B
and it means that there exists a rational function of degrees m k and n k
respectively, such that it interpolates our data on a subset of N + 1 k nodes.
Vice versa, it is easy to see that if (4) is satised then there is no rational
function of degrees m and n respectively and with relatively prime numerator
and denominator that interpolates all our data the k nodes at which wk (z)
vanishes are called unattainable. All the details of the problem are studied in
depth in [8]. We can say that the problem is the one of degeneracy and we
shall not be concerned with it in the rest of the lecture.
Before I talk about convergence let me rst point you out that rational
interpolants we discussed above and Pade approximants are not entirely alien
to each other. Actually, they are rather extreme cases of general rational
interpolants. To see that we consider an interpolation scheme, i.e. a triangular
matrix of interpolation nodes ai,j C dened as follows
a00
A :=
(5)
a0n ann
147
(6)
wn (z) =
(z ain ).
(z x) =
xA
(7)
i=0
We say now that rm,n (z) is is the (generalized) rational interpolant of the
function f (z) on the set Am+n (where the function is assumed to be analytical)
if it is the rational function of degrees at most m in the numerator and at
most n in the denominator, such that
f (x) rm,n (x)
is bounded at each x Am+n .
wm+n (x)
(8)
rm,n (z) is also called Hermite type (sense) rational interpolant, or multi-point
Pade approximant. This latter name can be understood if we observe that
when all the points in the interpolation set are identical then rm,n is just
[m/n] Pade approximant. On the other side, if all of them are distinct we
have the ordinary rational interpolant. In the intermediate cases rm,n and its
(k)
derivatives rm,n are identical with f and its derivatives f (k) at x Am+n up
to an order corresponding to a number of occurrences of x in Am+n which
are our data in this situation.
As is the case for the classical rational interpolant, after introducing the
numerator and the denominator of rm,n , Pm and Qn , respectively, we can
substitute (8) by the linearized version
Qm (x)f (x) Pn (x)
is bounded at each x Am+n .
wm+n (x)
(9)
Again, not every rational interpolant with the numerator and the denominator
satisfying (8) satises (9), but the latter always has a solution. If however there
exists a pair of polynomials satisfying (9) and Qn (z) = 0 on any of the points
of Am+n , then the problem (8) is also soluble and the solution is
rmn,n (x) =
Pm (z)
.
Qn (z)
Many algebraic problems connected with existence of multipoint Pade approximants have been studied in [5] and it is known that blocks appearing
in the table do not need to be of square shape.
A special intermediate case is the one called Two-Point Pade Approximants. In this case the interpolation set consists of only two distinct points:
148
Maciej Pindor
3 Convergence
The convergence problem is in many respect analogous to the one of Pade
approximants, though we have here the additional dependence on the asymptotic distribution of the interpolation nodes ain in the interpolation scheme
Am+n . There is no place here to discuss it in detail, but we can summarize the
results by saying that rational interpolants converge in capacity for holomorphic functions and, apart of the set of cuts they choose, also for functions with
branchpoints, but depending on the localisation of the interpolation scheme
and the set of branchpoints it can happen that in dierent areas of the complex plane the rational interpolants will converge to dierent branches of the
function.
149
1 + r0 + (1 + r1 )x + (1 + r2 )x2 + . . .
(10)
with some small and random ri s taken from same distribution. Obviously
all Pade approximants to the series on the left are equal to [0/1], i.e. to the
function itself. On the other hand, all (almost, except for the set of measure
zero on the event space) Pade approximants to the series on the right are
dierent and, if we concentrate rst on the sequence [n 1/n], they have n 1
dierent zeros and n dierent poles both randomly distributed. At rst this
seems a catastrophe independently of how small is, Pade approximants to
the perturbed series seem to have nothing in common with the function represented by the original series! However, when one looks where zeros and poles
of these Pade approximants are, an amazing phenomenon can be seen. Look
at zeros and poles of [4/5] with some choice of (real, normally distributed)
ri s
zeros
.00001 .57471 .64809i
.091740, 3.1348
.01
.57034 .64907i
.091958, 3.0223
poles
.57472 .64809i 1.00000098
.091740, 3.1349
.57468 .64812i
.091958, 3.1384
(11)
1.00099
First you see that there is a pole close to 1 the place where the function
represented by the original series has one. Next you see that all the other zeros
and poles which represent only noise come in tight pairs. The smaller is,
the tighter they are quite natural, because we want that at = 0 we return
to the original series! Of course you must remember that positions of all these
zeros and poles are random and for any nite both the separation of the
pairs as well as the distance of the unpaired pole from 1 can be arbitrarily
large, but we expect that at 0 they will both vanish. You can see it clearly
in the next example where I took dierent choice of ri s
150
Maciej Pindor
zeros
poles
.00001 395.688, .55084, 387.376, .55084, 1.000000097
.013502 1.48561i .013471 1.48566i
.0001 490.299, .55084, 387.370, .55084,
.013776 1.48518i .013471 1.48566i
1.00000097
.001
1.0000097
(12)
z + 1 2z + 1 +
2
.
z1
(13)
151
2
Fig. 1. Zeros and poles of [39/40] to perturbed geometrical series; = .001
105
zeros
poles
p6 /q5
1.3918660, .89206841, .90621750, .73283509, 1.4142139
.71781989, .59123097, .59816026, .52337768,
.52175271, 1.60414873
.999999998
1.391698, .854144,
.873892, .665806,
.651406, .535246,
.538656, .1948449666,
.1948449670, 1.6041319
.9999996
1.414239
.001
1.38207, .712257,
.23399823 .63417286i
.545154, 1.602343
.737541, .550235,
.23399385 63418746i,
.999951
1.41749
.01
1.3608, .587831
.358697 .433745i,
.0517020759, 1.58744
.602766, .999609,
.358691 .433605i,
.051702075
1.43968
(14)
But what happens with Froissart doublets when n grows? Look at Fig. 2.
Now it seems that Froissart doublets are attracted by the circle of the radius
152
Maciej Pindor
1.5
0.5
1.5
0.5
0.5
1.5
x
0.5
1.5
Fig. 2. Zeros and poles of [20/19] to perturbed series of the function discussed in
the text; = .001
1/2 yes, this is what takes place. But what is so special about 1/2? it is
the distance to the closest (with respect to the point of expansion) singularity.
Let us summarize our observations: Pade approximants to perturbed series exhibit the Froissart phenomenon, i.e. part of the zeros and poles form
doublets that are tighter and tighter when perturbation becomes smaller. How
large is this part also depends on a size of perturbation when it is small most
of the zeros and poles of the approximant are only slightly perturbed. When it
grows more and more zeros and poles leave the neighborhood of exact zeros
and poles and from Froissart doublets. For growing degrees of the numerator
and of the denominator (more and more terms of the series used) and a size
of the perturbation kept constant, Froissart doublets become attracted by the
circle of a radius of the closest singularity.
Before I give you some explanation of this behaviour, let me show you
what happens for other types rational interpolant. For this end I calculate 12
values of our function at equidistant interpolation nodes on (2, 4) and calculate
153
the (6/5) rational interpolant, rst from exact function values and then for
values perturbed in the way analogous to the way I perturbed coecients of
the series
f (zi ) f (zi )(1 + ri )
(15)
108
106
104
zeros
poles
p6 /q5
1.3919268, 1.604148754,
.9999999999,
1.41421356
.940143, .799964,
.947363, .814864,
.649026, .539497
.660840, .543340
1.38538, 1.604150,
.640039, 2.541938,
3.016366, 3.341408
1.00004
.671489, 2.541938,
3.016366, 3.341408
1.34710, 1.604116,
.998614,
2.51337413, 3.1022690, 2.51337430, 3.1022689,
3.4848785, 3.7939914
3.4848786, 3.7939911
1.33677, 1.604231,
2.523215, 2.590149,
3.095483, 3.683980
1.000956,
2.523219, 2.590160,
3.095482, 3.6839740
1.414230
(16)
1.41544
1.40403
154
Maciej Pindor
5 Froissart Polynomial
If we assume that f (z) the function responsible for our unperturbed data
can be approximated well by a sequence of rational approximants then we can
consider our perturbed data as perturbed data produced by some member of
this sequence. Let us, therefore, assume that, for given m and n such that
m + n + 1 + 2k = M , there exists a rational function
rm,n (z) =
Tm (z)
Bn (z)
(17)
approximating f (z) with some accuracy on some vicinity of the set of interM
polation nodes {zi }M
i=0 where M > m + n + 1. We are given data {di }i=0 at
these nodes; let me recall you that if some even all nodes appear with
multiplicity mi > 1 then the data at this (multiple) node are the value of
f (z) and derivatives of f (z) up to the (mi 1)th one which are perturbed
randomly with some scale as in (10) or (15)
(0)
di = di (1 + ri )
i = 0, . . . , M
(18)
(r)
(0)
(0)
di = di (1 + ri ) + (di
=
(r)
di
(r)
di )(1 + ri )
i = 0, . . . , M
(19)
i
(r)
Tm (z)Gk (z) +
Bn (z)Gk (z) +
n+1 l (l)
l=1 Um+k (z)
.
n
l (l)
l=1 Vn+k (z)
(20)
155
denominator are close to zeros of Tm (z)Gk (z) and of Bn (z)Gk (z). It is then
Gk (z), depending on i s, that governs where the Froissart doublets appear
distances of zeros of numerator and of the denominator from zeros of Gk (z)
will be O(). If f (z) itself was a rational function rm,n (z) or it diered neglibly
from rm,n (z) then Gk (z) would depend only on perturbations ri s and on rm,n .
In that case we shall call it the Froissart polynomial and use the symbol Fk (z).
This is the manageable situation and we can say a lot about Fk (z) ([6], [7],
[3]).
Before I discuss the Froissart polynomial let me point you out that as
seen from (20) and (19), the zero-pole doublets appearing for perturbed data
coming form arbitrary function will behave like Froissart doublets, i.e. will
be distributed randomly with their mutual distance being O(), only when
(r)
(0)
is denitely larger than di di . We can formulate it this way: Froissart
doublets will be observed in a rational approximant constructed from perturbed data of a function when perturbations of the data are much larger than
dierences between exact data from this function and exact data from the best
rational approximation of the same type but lower degrees, to the function of
interest.
To say where the Froissart doublets go, we would have to study the distribution of zeros of Fk (z). For this end one needs a formula expressing coecients of this polynomial by ri s and this formula depend on what type of
rational interpolant we deal with. From considerations in [2] one can only say
that they will be linear combinations of all possible products of k dierent ri s
from a set of M of them. The only thing that was possible to nd from this
very general information was the asymptotic behaviour of the pdf of zeros of
Fk for |z| [1].
As was shown in [9] pdf of zeros of polynomials with random but real
coecients has two components: pdf of real zeros (called the singular component) and pdf of complex zeros (called the regular component). One can
show that the pdf of the singular component of Fk (z), which we denote s (x)
behaves like 1/x2 as x , while pdf of the regular component r (z)
falls of like 1/|z|4 for |z| , except for k directions along which it falls of
like 1/|z|3 . This behaviour means that whatever is a locus of coalescence of
Froissart doublets when their number grows, their distribution has the long
tail the behaviour observed in numerical experiments.
Up to now it was possible to nd the exact form of the pdf of zeros of
the Froissart polynomial only for k = 1 in that case coecients of the
polynomial are linear in ri s therefore a pdf of the polynomial and of its
derivative, necessary to calculate pdf of zeros according to formulae in [9],
are very simple. The very interesting result came out for classical rational
interpolation on equidistant nodes inside of a real interval [3]: the pdf of zeros
of F1 (x) (they are all real, here) has a maximum on the interpolation interval
and also the probability of nding the zero on the interpolation interval is
larger than the probability of nding it outside. It means that the Froissart
156
Maciej Pindor
doublets will appear rather inside the interpolation interval than outside, i.e.
the extrapolation will be less aected by noise in data than the interpolation!
6 Conclusions
My goal was to convince you that rational functions are a very powerful tool in
deciphering (or if you prefer: making a sophisticated guess about) an analytical
structure of a function known from a nite set of data. This explains why
they are so good in approximating values of functions most economically no
wonder they are exploited in your pocket calculators and in internal compiler
routines for transcendental functions. Moreover, the rational approximation
of the form I discussed, has the amazing property of being practically stable
with respect to perturbation of these data noise in data goes mainly into
Froissart doublets that almost annihilate themselves. This is one more reason,
I think, why rational functions have much more potential in applications than
generally recognised.
References
1. J.D. Fournier, M. Pindor. in preparation.
2. J.D. Fournier, M. Pindor. On multi-point Pade approximants to perturbed
rational functions. submitted to Constr. Math. and Funct. Th.
3. J.D. Fournier, M. Pindor. Rational interpolation from stochastic data: A
new froissart phenomenon. Rel. Comp., 6:391409, 2000.
4. M. Froissart. private information. J. Gammel, see also J. Gilewicz, Approximants de Pade LNM 667, Springer Verlag 1976 ch 6.4.
5. M.A. Galluci, W.B. Jones. Rational approximations corresponding to Newton series (Newton-Pade approximants). J. Appr. Th., 17:366372, 1976.
6. J. Gilewicz, M. Pindor. Pade approximants and noise: A case of geometric
series. JCAM, 87:199214, 1997.
7. J. Gilewicz, M. Pindor. Pade approximants and noise: rational functions.
JCAM, 105:285297, 1999.
8. J. Meinguet. On the solubility of the Cauchy interpolation problem. In Proc.
of the University of Lancaster Symposium on Approximation Theory and its
Applications, pages 137164. Academic Press, 1970.
9. G.A. Mezinescu, D. Bessis, J.-D. Fournier, G. Mantica, F. D. Aaron.
Distribution of roots of random real generalized polynomials. J. Stat Phys.,
86:675705, 1997.
10. H. Stahl. Convergence of rational interpolants. Technical Report 299/8-1,
Deutsche Forschungsgemeinschaft Report Sta, 2002.
11. S. Tokarzewski, J.J. Telega, M. Pindor, J. Gilewicz. Basic inequalities
for multipoint Pade approximants to Stieltjes functions. Arch. Mech., 54:141
153, 2002.
12. H. Wallin. Potential theory and approximation of analytic functions by rational interpolation. In Springer Verlag, editor, Proc. of the Colloquium on
Complex Analysis at Joensuu, number 747 in LNM, pages 434450, 1979.
1 Introduction
Time series analysis is concerned with the systematic approaches to extract
information from time series, i.e. from observations ordered in time. Unlike
in classical statistics of independent and identically distributed observations,
not only the values of the observations, but also their ordering in time may
contain information. Main questions in time series analysis concern trends,
cycles, dependence over time and dynamics.
Stationary processes are perhaps the most important models for time series. In this contribution we present two central parts of the theory of wide
sense stationary processes, namely spectral theory and the Wold decomposition; in addition we treat the interface between the theory of stationary
processes and linear systems theory, namely ARMA and state-space systems,
with an emphasis on structure theory for such systems.
The contribution is organized as follows: In section 2 we give a short introduction to the history of the subject, in section 3 we deal with the spectral
theory of stationary processes with an emphasis on the spectral representation
of stationary processes and covariance functions and on linear transformations
in frequency domain. In section 4, the Wold decomposition and prediction are
treated. Due to the Wold decomposition every (linearly) regular stationary
process can be considered as a (in general innite dimensional) linear system
with white noise inputs. These systems are nite dimensional if and only if
their spectral density is rational and this case is of particular importance for
statistical modeling. Processes with rational spectral densities can be described as solutions of ARMA or (linear) state space systems (with white noise
inputs) and the structure of the relation between the Wold decomposition and
ARMA or state space parameters is analyzed in section 5. This structure is
important for the statistical analysis of such systems as is shortly described
in section 6.
The intention of this contribution is to present main ideas and to give
a clear picture of the structure of fundamental results. The contribution is
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 159179, 2006.
Springer-Verlag Berlin Heidelberg 2006
160
Manfred Deistler
oriented towards a mathematically knowledgeable audience. A certain familiarity with probability theory and the theory of Hilbert spaces is required.
We give no proofs. The main references are [12], [7], [8], [10] and [11]. For the
sake of brevity of presentation, we do not give reference, even to important
original literature, if cited in the references listed above; for this reason important and seminal papers by Kolmogorov, Khinchin, Wold, Hannan, Kalman,
Akaike and others will not be found in the list of references at the end of this
contribution.
161
162
Manfred Deistler
A major shortcoming of the Box-Jenkins approach was that order determination had to be done by an experienced modeler in a non-automatic way.
Thus an important step was the development and evaluation of automatic
model selection procedures based on information criteria like AIC or BIC by
Akaike, Hannan, Rissanen and Schwartz.
Identication of multivariate ARMA(X) and state space systems was further developed in the seventies and eighties of the last century, leading to a
certain maturity of methods and theory. This is also documented in the monographs on the subject appearing in the late eighties and early nineties, in
particular [9], [2], [8], [13] and [11]. However substantial research in this area
is still going on.
163
(0) (1) . . . (T + 1)
..
(1)
.
(0)
T =
..
...
.
(T 1) . . .
(0)
are nonnegative-denite (denoted by T 0). The following theorem gives a
mathematical characterization of covariance functions of stationary processes:
Theorem 1. A function : Z Css is a covariance function of a stationary
process if and only if it is nonnegative-denite.
Let L2 denote the Hilbert space of square integrable random variables
x : C (or, to be more precise, of the corresponding P-a.e. equivalence classes), over the complex numbers, with inner product dened by <x, y> = Ex
y
where y denotes the conjugate of y. Then the Hilbert space Hx L2 , spanned
(i)
by the one dimensional process variables xt , t Z, i = 1, . . . , s is called the
time domain of the stationary process (xt ) (Note that condition (i) above
(i)
implies xt L2 .)
The stationarity condition (iii), in Hilbert space language, means that for
(i)
(i)
every i, i = 1, . . . , s, the lengths ||xt || of the xt do not depend on t and
(i)
(j)
that the angles between xt+r and xt also do not depend on t. Note that the
lengths are the square roots of the noncentral variances and the angles are
noncentral correlations. Thus the operator shifting the process in time does
not change lengths and angles.
This motivates the following theorem:
Theorem 2. For every stationary process (xt ) there is a unique unitary operator U : Hx Hx such that
(i)
(i)
xt = U t x0 ,
holds.
We only consider stationary processes where the random variables are
Rs -valued; clearly then is Rss valued; the complex notation is only used
for simplication of formulas for the spectral representation.
Important examples of stationary processes are:
164
Manfred Deistler
yt =
bj Rsm
bj tj ,
(1)
j=0
yt =
bj Rsm
bj tj ,
(2)
j=
bj
<
(3)
j=
guaranteeing the existence of the innite sum in (2) in the sense of mean
squares convergence, holds. In this paper limits of random variables are
always dened in this sense; bj denotes a norm. Note that the rst and
second moments of MA() processes are given by
Eyt = 0
and
(r) =
bj bjr .
(4)
j=
From (4) and (3), we see that an MA() process has fading linear memory.
The class of MA() processes is a large class of stationary processes; it
includes important subclasses, such as the class of causal or one-sided
MA() processes
yt =
bj tj
(5)
j=0
165
eij t zj
xt =
(6)
j=1
where without loss of generality the (angular) frequencies j are restricted to (, ], 1 < 2 < . . . < h and where zj : Cs are, in general, genuine complex random variables describing random amplitudes and
phases. In order to guarantee stationarity of (xt ) we have to assume
Ezj zj <
Ezj =
Ext
0
for j = 0
for j = 0
and
Ezj zl = 0
for j = l.
j = 0, . . . , h 1
and
z1+j = zhj
j = 0, . . . , h 1.
(r) =
j=1
eij r Fj ;
Fj =
Ezj zj
E(zj Ezj )(zj Ezj )
for j = 0
for j = 0
(7)
From this we see, that for (nontrivial) harmonic processes, the memory
is not fading. The spectral distribution function F : [, ] Css is
dened by
Fj .
F () =
(8)
j:j
As is easily seen and F are in an one-to-one relation, and thus contain the
same information about the underlying process, however this information
is displayed in F in a dierent way. The k th diagonal element of Fj is a
(k)
measure of the expected amplitude of the frequency component eij t zj of
(k)
166
Manfred Deistler
Ez ()z() <
z() = 0
lim0 z( + ) = z(), [, ]
E(z(4 ) z(3 ))(z(2 ) z(1 )) = 0 for 1 < 2 3 < 4
[,]
eit dz()
(9)
holds.
The importance of the spectral representation (9) is twofold: First, it allows to interpret a stationary process in terms of frequency components. In
particular, as has been said already, every stationary process may be obtained
as a limit, pointwise in t, of a sequence of harmonic processes. Note that, in
general, convergence will not be uniform in t. Second, as will be seen in the
next subsection, certain operations are easier to perform and to interpret in
frequency domain.
For a general stationary process its spectral distribution function F :
[, ] Css is dened by
F () = E
z ()
z ()
where
z() =
z()
z() Ext
for < 0
for 0.
(10)
(t) =
[,]
eit dF ()
167
(11)
f ()d.
||(t)||2 <
(12)
j=
f () = (2)1
eit f ()d
(13)
(t)eit
(14)
t=
a.e.
and
f ()d
(= (0))
exists
(15)
for
2 > 1 (16)
168
Manfred Deistler
f ()d .
(17)
Interpreting the integral in (9) as a limit of a sums of the form (6), we can
adopt the interpretation of F , given for harmonic processes, for general stationary processes, and, if f exists, analogously for f . For instance, for the case
s = 1, the integral (17) is a measure for the expected amplitudes in this
interval (often called frequency band) (1 , 2 ). In a certain sense, peaks of f
(to be more precise areas under such peaks) mark the important frequency
bands. Equation (15) gives a decomposition of the variance of the stationary
process (xt ) into the variance contributions (17) corresponding to dierent
frequency bands. For the case s > 1, e.g. the o diagonal elements in (17)
(which are complex in general) again convey the information concerning the
strength of the linear dependence between dierent component processes in a
certain frequency band and about expected phase shifts there.
3.3 The Isomorphism between Time Domain and Frequency
domain. Linear Transformations of Stationary Processes
The spectral representation (9) denes an isomorphism between the time domain Hx and an other Hilbert space introduced here, the so-called frequency
domain. For simplicity of notation here we assume Ext = 0, otherwise F ()
has to be replaced by Ez()z () in this subsection. As shown in this subsection, the analysis of linear transformations of stationary processes has some
appealing features in the frequency domain.
We start by introducing the frequency domain: For the one-dimensional
(i.e. s = 1) case, the frequency domain LF
2 is the L2 over the measure space
([, ], B [, ], F ), where B [, ] is the -algebra of Borel sets over
[, ] and F is the measure corresponding to the spectral distribution function, i.e. F ((a, b]) = F (b) F (a). The isomorphism I : Hx LF
2 , given by
(9) then is dened by I(xt ) = eit .
For the multivariate (s > 1) case, things are more complicated: First
consider a measure on B [, ] such that there exists a density f () for
F w.r.t. this measure, i.e. such that
F () =
[,]
f () d
holds. Such a measure always exists, one particular choice is the measure
corresponding to the sum of all diagonal elements of F . Let = (1 , . . . , s )
and = (1 , . . . , s ) denote row vectors of functions i , i : [, ] C; we
identify and if
[,]
( )f () ( ) d = 0
169
[,]
f () d < }
[,]
f () d
yt =
aj Rsm .
aj xtj ;
(18)
j=
Here
||aj || <
(19)
j=
is a sucient condition for the existence of the innite sum in (18) or, to be
more precise a necessary and sucient condition for the existence of this innite sum for all stationary inputs (xt ). As can easily be seen, the stationarity
of (xt ) implies that (xt , yt ) is (jointly) stationary. From (9) we obtain (using
an obvious notation):
yt =
[,]
eit dzy () =
[,]
eit (
(20)
j=
k(z) =
aj z j
(21)
j=
170
Manfred Deistler
domain of (xt ), kj (ei )eit , where kj is the j th row of the transfer function
(j)
k, corresponds to yt . Strictly speaking there are two transfer functions. The
rst is dened under the condition (20), from (21) as a function in the sense
of pointwise convergence. The second is a matrix whose rows are elements of
the frequency domain of (xt ). In the latter case (19) is not required.
Note that the discrete convolution (18) in time-domain corresponds to
multiplication in frequency domain. In a sloppy notation we have from (20)
dzy () = k(ei )dzx ()
(22)
(23)
(24)
k(ei ) ;
k(z) =
bj z j
(25)
j=
where (3) holds. Note that (3) is more general than (19). The expression (18)
shows an input process (xt ) transformed by a (deterministic) linear system
(described by its weighting function (aj | j Z) or its transfer function k) to
an output process (yt ). Such systems are time invariant, i.e. the aj do not
depend on t and stable, i.e. the inputoutput operator is bounded.
The eect of the linear transformation (18) can easily be interpreted from
(22). For instance for the case s = m = 1, where k is scalar, the absolute
value of k(ei ) shows how the frequency components of (xt ) are amplied
(for |k(ei )| > 1) or attenuated (for |k(ei )| < 1) by passing through the
linear system and its phase indicates the phase-shift.
Linear systems with noise are of the form
yt = yt + ut
yt =
lj xtj ;
j=
(26)
lj Rsm
(27)
kj Rss
kj tj ;
ut =
171
(28)
j=
where (xt ) are the observed inputs, (ut ) is the noise on the unobserved outputs
(
yt ); (t ) is white noise and nally (yt ) are the observed outputs. We assume
that
Ext us = 0
for all s, t Z
(29)
(j)
(j)
(30)
(31)
and
j
j= lj z ,
j=
kj z hold.
k(z) =
where l(z) =
Formulas (30), (31) describe the relations between the second moments of
observed inputs and outputs on one side and the covariance matrix and
the two linear systems described by l and k on the other side. If fx () > 0,
[, ] holds, then l is obtained from the second moments of the observations by the so called Wiener lter formula
l(ei ) = fyx ()fx ()1 .
An important special case occurs if both transformations (30), (31) are causal,
i.e. lj = 0, j < 0; kj = 0, j < 0 and the transfer functions are k(z) and l(z)
are rational, i.e. there exist polynomial matrices
p
a(z) =
aj z ,
j=0
b(z) =
bj z ,
j=0
d(z) =
dj z j
j=0
(32)
(33)
yt = Cst + t + Ext .
(34)
Here z is used for a complex variable as well as for the backward shift
z(xt | t Z) = (xt1 | t Z), st is the state at time t and A, B, C, D, E are
parameter matrices. For further details we refer to [8].
172
Manfred Deistler
ned by projecting the components xt+h of xt+h on the Hilbert space Hx (t)
(j)
(j)
for one t and thus for all t holds. White noise is a simple example for a regular
process. For a regular process we have rt Hx (r) = {0}.
Theorem 6 (Wold).
1. Every stationary process (xt ) can be uniquely decomposed as
x t = y t + zt
(35)
where
Eys zt = 0
(j)
for all s, t
(j)
yt =
kj tj ,
j=0
(36)
j=0
173
h1
yt+h =
kj t+hj +
kj t+hj
(37)
j=0
j=h
The components of the rst part of the r.h.s. of (37) are elements of
Hy (t) = H (t) and the components of the second part of the r.h.s. are orthogonal to Hy (t). Thus, by the projection theorem,
yt,h =
kj t+hj
(38)
j=h
and the second part on the r.h.s of (37) is the prediction error. Expressing
(j)
(j)
the l as linear combinations or limits of linear combinations of yr , r l
and inserting this in (38) gives the prediction formula, i.e. yt,h as a linear
function of yr , r t. Thus, for given Wold representation (36) (i.e. for given
kj , j = 0, 1, . . . ) the predictor formula can be determined.
From (36) we see that every linearly regular process can be interpreted
as the output of a linear system with white noise inputs. Thus the spectral
density fy of (yt ) exists and is of the form (see (25))
fy () = (2)1 k(ei )k(ei )
(39)
where
k(z) =
kj z j ,
= Et t .
(40)
j=0
(41)
174
Manfred Deistler
output process. As is well known, the set of all solutions of a linear dierence
equation (41) is of the form one particular solution plus the set of all solutions
of a(z)yt = 0. We are only interested in stationary solutions; they are obtained
by the so called z-transform. In solving (41), the equation, in a certain sense,
has to be multiplied by the inverse of a(z) from the left. Using the fact that
multiplication of power series in the backward shift and in z C is done in
the same way, we obtain:
Theorem 7. Under the assumption
det a(z) = 0
|z| 1
(42)
yt = k(z)j =
kj tj
(43)
j=0
k(z) =
|z| 1
(44)
j=0
Here det and adj denote the determinant and the adjoint respectively.
Condition (42) is called the stability condition. It guarantees that the
norms kj in the causal solution converge geometrically to zero. Thus the
ARMA process has an exponentially fading (linear) memory. For actually
determining the kj , the following block recursive linear equation system
a 0 k 0 = b0 ,
a 0 k 1 + a 1 k 0 = b1 , . . .
|z| < 1
(45)
is imposed, then we have Hy (t) = H (t) and thus the solution (43) is
already the Wold representation (36). Condition (42) sometimes is relaxed to det a(z) = 0 for |z| = 1. Then there exists a stationary solution
(46)
yt = Cst + t
(47)
175
(48)
(49)
Here the coecients of the transfer function are given as kj = CAj1 B for
j > 0.
The miniphase condition
|max (A BC)| 1
(50)
then guarantees that (49) corresponds to Wold representation (36). Note that
(43) and (49) dene causal linear processes, with a spectral density given by
(39). Clearly the transfer function of both, ARMA and state space solutions
are rational and so are their spectral densities. The following theorem claries the relation between rational spectral densities, ARMA and state space
systems:
Theorem 8. 1. Every rational and -a.e nonsingular spectral density fy can
be uniquely factorized (as in (39)) such that k(z) is rational, analytic within a circle containing the closed unit disk, det k(z) = 0 |z| < 1, k(0) = I
and > 0;
2. For every rational transfer function k(z) with the properties given in (1),
there is a stable and miniphase ARMA system with a0 = b0 and conversely, every such ARMA system has a rational transfer function with the
properties given in (1);
3. For every rational transfer function k(z) with the properties given in (1),
there is a stable and miniphase state space system and conversely, every
such state space system has a rational transfer function with the properties
given in (1).
Thus, in particular, (stable and causal) ARMA (with a0 = b0 )- and (stable
and causal) state space systems represent the same class of transfer functions
or spectral densities.
Now we consider the inverse problem of nding an ARMA or state space
system from the spectral density, or, equivalently, from the transfer function.
From now on we assume throughout that the stability and the miniphase
conditions hold. Two ARMA systems (a, b) and (
a, b) say, are called observationally equivalent if they have the same transfer function (and thus for given
, the same second moments of the solution) i.e. if a1 b = a
1b holds. Observational equivalence for state space systems is dened analogously. Now,
176
Manfred Deistler
(51)
holds.
A state space system (A, B, C) is called minimal if the state dimension n
is minimal among all state space systems corresponding to the same transfer
function. This is the case if and only if the observability matrix
On = (C , A C , . . . , (A )n1 C )
and the controllability matrix
Cn = (B, AB, . . . , An1 B)
both have rank n. Also minimality is a requirement of nonredundancy. We
have:
B,
C)
are
Theorem 10. Two minimal state space systems (A, B, C) and (A,
observationally equivalent if and only if there exists a nonsingular matrix
T Rnn such that
1 ,
A = T AT
B = T B,
1
C = CT
holds.
A class of ARMA or state space systems is called identiable if it contains no distinct observationally equivalent systems. Of course identiability
is a desirable property, because it attaches to a given spectral density or a
given transfer function a unique ARMA or state space system. In general
terms, identiability is obtained by selecting representatives from the classes
of observationally equivalent systems. In addition, from an estimation point
of view, subclasses of the class of all ARMA or state space systems, leading
to nite dimensional parameter spaces and to a continuous dependence of the
parameters on the transfer function (for details see e.g. [5]) are preferred.
As an example consider the set of ARMA systems (a, b) where (42), (45)
and a0 = b0 = I hold, which are relatively left prime and where the degrees
177
of a(z) and b(z) are both p and where (ap , bp ) has rank s. We denote the set
of all corresponding vec (a1 , . . . , ap , b1 , . . . , bp ), where vec means stacking
the columns of the respective matrix, by Tp,p . As can be shown, Tp,p contains
2
a nontrivial open subset of R2ps and is identiable as under these conditions,
since (51) implies that u must be the identity matrix; thus Tp,p is a reasonable parameter space. In this setting a system is described by the integer
valued parameter p and by the real valued parameters in vec (a1 , . . . , bp ). For
the description of fy , of course also is needed. Let Up,p denote the set of all
transfer functions k corresponding to Tp,p via (44). Then due to identiability
there exists a mapping : Up,p Tp,p attaching to the transfer functions the
corresponding ARMA parameters. Such a mapping is called parameterization.
A disadvantage of the specic approach described above is that for s > 1 not
every transfer function corresponding to an ARMA system can be described
in this way, i.e. there are k for which there is no p such that k Up,p .
For a general account on parameter spaces for and parameterizations of
ARMA and state space systems we refer to [8], [4] and [5].
178
Manfred Deistler
References
1. G. Box, G. Jenkins, Time Series Analysis. Forecasting and Control, Holden
Day, San Francisco, 1970
2. P. E. Caines, Linear Stochastic Systems, John Wiley & Sons, New York, 1988.
3. H. Davis, The Analysis of Economic Time Series, Principia Press, Bloomington, 1941.
4. M. Deistler, Identication of Linear Dynamic Multiinput/Multioutput Systems, in D. Pena et al. (ed.), A Course in Time Series Analysis, John Wiley
& Sons, New York, 2001.
5. M. Deistler, System Identication - General Aspects and Structure, in
G. Goodwin (ed.), System Identication and Adaptive Control, (Festschrift for
B.D.O. Anderson), Springer, London, pp. 3 26, 2001.
6. M. Deistler, System Identication and Time Series Analysis: Past, Present
and Future., in B. Pasik-Duncan (ed.), Stochastic Theory and Control, Springer,
Kansas, USA, pp. 97 108. (Festschrift for Tyrone Duncan), 2002.
7. E. Hannan, Multiple Time Series, Wiley, New York, 1970.
8. E. Hannan, M. Deistler, The Statistical Theory of Linear Systems, John
Wiley & Sons, New York, 1988.
9. L. Ljung, System Identication: Theory for the User, Prentice Hall, Englewood
Clis, 1987.
10. M. Pourahmadi, Foundations of Time Series Analysis and Predicton Theory,
Wiley, New York, 2001.
11. G. Reinsel Elements of Multivariate Time Series Analysis, Springer Verlag,
New York, 1993.
179
Abstract
The knowledge of the noise Power Spectral Density is fundamental in signal
processing for the detection algorithms and for the analysis of the data. In this
lecture we address both the problem of identifying the noise Power Spectral
Density of physical system using parametric techniques and the problem of
the whitening procedure of the sequence of data in time domain.
1 Introduction
In the detection of signals buried in noisy data, it is necessary to know the
Power Spectral Density (PSD) S() of the noise of the detector in such a way
to be able to perform the Wiener lter [9]. By the theory of optimal ltering
for signal buried in stationary and Gaussian noise [9, 10], if we are looking for
a signal of known wave-form with unknown parameters, the optimal lter is
given by the Wiener matching in lter domain
C() =
x()h(, )
,
S()
(1)
where h(, ) is the template of the signal we are looking for, are the parameters of the waveform and x() is the Fourier Transform of our sequence of
data x[n].
We can implement the Wiener lter in the frequency domain and, supposing the noise is stationary, we can estimate the PSD, for example, as a
windowed averaged periodogram:
PPER
1
=
N
N 1
x[n] exp(2in) .
(2)
n=0
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 181191, 2006.
Springer-Verlag Berlin Heidelberg 2006
182
Elena Cuoco
Pmedio =
1
K
K1
m=0
(m)
PPER () = S().
(3)
w(z)
x(z)
H(z)
Fig. 1. Linear model
B(z)
A(z)
(4)
x[n] =
ak x[n k] +
k=1
bk w[n k]
k=0
(5)
183
rxx [k] =
al rxx [k l] +
l=1
bl rxw [k l],
(6)
l=0
where rxw [k] is the cross correlation between the output x[n] and the driving
noise w[n]. Let h[l] be the taps of the lter H(z), the lter being causal, we
n
for k > 0, since the output depends only on the driving input at step l < n.
Noting that
n
l=
( is the amplitude of the driving white noise) we can write the YuleWalker
equations in the following way
rxx [k] =
p
l=1
p
l=1
al rxx [k l] + 2
al rxx [k l]
qk
l=0
h[l] bl+k
for k = 0, 1, . . . , q
for k q + 1 .
(7)
In the general case of an ARMA process we must solve a set of non linear
equations while, if we specialize to an AR process (that is an all-poles model)
the equations to be solved to nd the AR parameters become linear.
184
Elena Cuoco
The relationship between the parameters of the AR model and the autocorrelation function rxx (n) is given by the YuleWalker equations written in
the form
p
l=1
p
l=1
rxx [k] =
al rxx [k l]
al rxx [l] + 2
for k 1
for k = 0 .
(8)
ak x[n k] + w[n],
x[n] =
(9)
k=1
x[n]
a1
-1
x[n-1]
a2
z-1
x[n-2]
...
z -1
ap
x[n-p]
185
remove all the correlation, making it a white process (see refs [1, 2, 3] for
application of whitening in real cases), the idea is to model x[n] as an AR
process, nd the AR parameters and use them to whiten the process.
Since we want to deal with real physical problem we have to assume the
causality of the lter. So when we whiten the data we must assure that the
causality has been preserved. Moreover we must have a stable lter to avoid
divergences in the application of this lter to the data. In the next section
we will show how estimating the AR parameters assures the causality and
stability of the whitening lter in the time domain.
3.1 Minimum phase lter and stability
The necessary condition to have a stable and causal lter H(z) is that the
all poles of the lter are inside the unit circle of the zplane [4, 8]. This lter
is called minimum-phase lter. A complete anticausal lter will have all its
poles outside the unit circle. This lter is called a maximum phase lter.
If a system has poles or zeros outside the unit circle, it can be made minimum phase by moving poles and zeros z0 inside the unit circle. For example if
we want to put inside the unit circle a zero we have to multiply the function
by this term
z 1 z0
.
1 z0 z 1
(10)
This will alter the phase, but not the magnitude of the transfer function.
If we want to nd the lter H(z) of a linear system for a random process
with given PSD S(z) (the complex PSD), which satises the minimum phase
condition, we must perform a spectral factorization (see [8]) in causal and
anti-causal components:
S(z) = H(z)H(z 1 ).
(11)
We can perform this operation in alternative way. We can nd a rational function t to the PSD with polynomials that are minimum phase. A
minimum-phase polynomial is one that has all of its zeros and poles strictly
inside the unit circle. If we consider a rational function H(z) = B(z)/A(z),
both B(z) and A(z) must be minimum phase polynomials.
If we restrict to AR t, we are looking for a polynomial A(z) which is a
minimum phase one. The AR estimation algorithm we choose will ensure that
this condition is always satised.
4 AR parameters estimation
There are dierent algorithms to estimate the AR parameters of a process
which we assumed can be modeled as an autoregressive one, for example the
186
Elena Cuoco
Levinson, Durbin or Burg ones [4, 8]. We are looking for parameters of a
transfer function which models a real physical system.
We can show that problem of determining the AR parameters is the same
of that of nding the optimal weights vector w = wk , for k = 1, . . . , P
for the problem of Linear Prediction [4]. In the Linear Prediction we would
predict the sample x[n] using the P previous observed data x[n] = {x[n 1],
x[n 2], . . . , x[n P ]} building the estimate x
[n] as a transversal lter:
P
wk x[n k] .
x
[n] =
(12)
k=1
(13)
the error we made in this prediction, obtaining the so called Normal or WienerHopf equations
P
wk rxx [k] ,
(14)
k=1
min = .
(15)
(16)
p1
1
(p1)
kp =
rxx [p]
aj
rxx [p j] .
(17)
p1
j=1
187
At the p stage the parameter of the model is equal to the p-th reection
coecient
ap(p) = kp .
(18)
aj
(p1)
= aj
(p1)
kp apj
p = (1 kp2 )p1
(19)
(20)
aj = aj ,
2 = P .
(21)
P
k=1
(P )
ak x[n k] ,
(22)
where the coecients ak are the coecients for the AR model for the process
x[n]. The FPE represents the output of our lter. We can write the zeta
transform for the FPE at each stage p for the lter of order P as
j=1
(p)
aj z j X[z] .
(23)
aj
(p1)
= aj
(p1)
+ kp apj
1j p1 .
(24)
If we use the above relation for the transform Fpf [z], we obtain
f
Fpf [z] = Fp1
[z] + kp z p +
p1
j=1
(p1)
apj z j .
(25)
188
Elena Cuoco
p1
j=1
(p1)
apj z (j1) .
(26)
In order to understand the meaning of Fpb [z] let us see its action in the time
domain
b
Fp1
[z]x[n] = ebp1 [n] = x[n p + 1] +
p1
j=1
(p1)
apj x[n j + 1] .
(27)
So ebp1 [n] is the error we make, in a backward way, in the prediction of the
data x[n p + 1] using p 1 successive data {x[n], x[n 1], . . . , x[n p + 2]}.
b
We can write the eq. (25) using Fp1
[z]. Let us substitute this relation in the
f
ztransform of the lter Fp [z]
f
b
Fpf [z] = Fp1
[z] + kp Fp1
[z].
(28)
In order to know the FPE lter at the stage p we must know the BPE lter
at the stage p 1.
Also for the backward error we may write in a similar way the relation
f
b
[z] + kp Fp1
[z] .
Fpb [z] = z 1 Fp1
(29)
The equations (28) (29) represent our lattice lter that in the time domain
could be written
efp [n] = efp1 [n] + kp ebp1 [n 1] ,
(30)
kp efp1 [n] .
(31)
In gure 3 is showed how the lattice structure is used to estimate the forward
and backward errors.
f
ep (n)
b
ep (n)
x(n)
f
e (n)
p+1
k (n)
p+1
z-1
SINGLE STAGE
k (n)
p+1
STAGE
1
eb (n)
p+1
f
e (n)
1
b
e (n)
1
STAGE
2
f
e (n)
2 ....
e b(n)
2
....
STAGE
P
f
e (n)
P
eb
P(n)
189
6 An example of whitening
We will show an example of the whitening procedure in time domain. First of
all, we generate a stochastic process in time domain, using an AR(4) model
with the values for the parameters reported in table 1.
0.01
a1
0.326526
a2
0.338243
a3
0.143203
a4
0.101489
-1
-1
-2
-2
-1
-2
-2
-1
Fig. 4. Poles for the simulated AR model and poles for the estimated AR t
190
Elena Cuoco
1
simulated noise
AR fit
S(f)
0.01
0.0001
1e-06
0.01
0.1
1
f
10
100
In gure 6 we show the PSD of the simulated noise process and the PSD
of the output of the whitening lter applied in the time-domain, using the
estimated reection coecients.
1
Simulated process
Whitened process
S(f)
0.01
0.0001
1e-06
0.01
0.1
1
f
10
100
Fig. 6. Simulated power spectral density and PSD of the whitened data
References
1. M. Beccaria, E. Cuoco, G. Curci, Adaptive System Identication of VIRGOlike noise spectrum, Proc. of 2nd Amaldi Conference, World Scientic, 1997.
2. E. Cuoco et al, Class.Quant.Grav., 18, 1727-1752, 2001.
3. E. Cuoco et al, Phys. Rev., D 64, 122002, 2001.
4. S. Kay, Modern spectral estimation:Theory and Application, Prentice Hall,
Englewood Clis, 1998.
5. S. Haykin, Adaptive Filter Theory, (Upper Saddle River: Prentice Hall), 1996.
6. S.T. Alexander, Adaptive Signal Processing, (Berlin: Springer-Verlag), 1986.
7. M.H. Hayes, Statistical Digital Signal Processing and Modeling, Wiley, 1996.
191
1 Introduction
The Laplace transform is extensively used in control theory. It appears in
the description of linear time-invariant systems, where it changes convolution
operators into multiplication operators and allows one to dene the transfer
function of a system. The properties of systems can be then translated into
properties of the transfer function. In particular, causality implies that the
transfer function must be analytic in a right half-plane. This will be explained
in section 2 and a good reference for these preliminary properties and for a
panel of concrete examples is [11].
Via Laplace transform, functional analysis provides a framework to formulate, discuss and solve problems in control theory. This will be sketched
in section 3, in which the important notion of stability is introduced. We
shall see that several kind of stability, with dierent physical meaning can
be considered in connection with some function spaces, the Hardy spaces of
the half-plane. These functions spaces provide with their norms a measure
of the distance between transfer functions. This allows one to translate into
well-posed mathematical problems some important topics in control theory,
as for example the notion of robustness. A design is robust if it works not only
for the postulated model, but also for neighboring models. We may interpret
closeness of models as closeness of their transfer functions.
In section 4, we review the main properties of nite order linear timeinvariant (LTI) causal systems. They are described by state-space equations
and their transfer function is rational. We give the denition of the McMillan
degree or order of a system, which is a good measure of its complexity, and
some useful factorizations of a rational transfer function, closely connected
with its pole and zero structure. Then, we consider the past inputs to future
outputs map, which provides a nice interpretation of the notions of controllability and observability and we dene the Hankel singular values. As claimed
by Glover in [6], the Hankel singular values are extremely informative invari-
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 193209, 2006.
Springer-Verlag Berlin Heidelberg 2006
194
Martine Olivi
ants when considering system complexity and gain. For this section we refer
the reader to [8] and [6].
Section 5 is concerned with system identication. In many areas of engineering, high-order linear state-space models of dynamic systems can be derived
(this can already be a dicult problem). By this way, identication issues
are translated into model reduction problems that can be tackle by means of
rational approximation. The function spaces introduced in section 3 provide
with their norms a measure of the accuracy of a model. The most popular
norms are the Hankel-norm and the L2 -norm. In these two cases, the role of
the Hardy space H 2 with its Hilbert space structure, is determinant in nding
a solution to the model reduction problem. In the case of the Hankel norm,
explicit solutions can be found [6] while in the L2 case, local minima can be
numerically computed using gradient ow methods. Note that the approximation in L2 norm has an interpretation in stochastic identication: it minimizes
the variance of the output error when the model is fed by a white noise. These
approximation problems are also relevant in the design of controllers which
maximize robustness with respect to uncertainty or minimize sensitivity to
disturbances of sensors, and other problems from H control theory. For an
introduction to these elds we refer the reader to [4].
In this paper, we are concerned with continuous-time systems for which
Laplace transform is a valuable aid. The z-transform performs the same task
for discrete-time systems. This is the object of [3] in the framework of stochastic systems. It must be noted that continuous-time and discrete-time systems
are related through a M
obius transform which preserves the McMillan degree
[6]. For some purposes, it must be easier to deal with discrete-time. In particular, the poles of stable discrete-time systems lay in a bounded domain the
unit circle. Laplace transform is also considered among other transforms in
[12]. This paper also provides an introduction to [2].
195
h(t )u( )d =
h( )u(t )d,
h( )es(t ) d = est
h( )es d.
Assuming that the integral converges, the response to est is of the form
y(t) = H(s)est ,
where H(s) is the Laplace transform of the impulse response h(t) dened by
H(s) =
h( )es d.
In the specic case in which Re{s} = 0, the input is a complex integral eit at
frequency and H(i), viewed as a function of , is known as the frequency
response of the system and is given by the Fourier transform
H(i) =
h( )ei d.
est f (t)dt
|f ( )|ex d < .
The range of values of s for which the integral converges is called the region of
convergence. It consists of strips parallel to the imaginary axis. In particular,
if f L1 (R), i.e.
196
Martine Olivi
|f (t)|dt < ,
then Lf is dened on the imaginary axis and the Laplace transform can be
viewed as a generalization of the Fourier transform.
Another obvious and important property of the Laplace transform is the
following. Assume that f (t) is right-sided, i.e. f (t) = 0, t < T , and that the
Laplace transform of f converges for Re{s} = 0 . Then, for all s such that
Re{s} = > 0 , we have that
|f ( )|e d =
|f ( )|e d e(0 )T
|f ( )|e0 d,
h(t )u( )d
as a multiplication operator
Y (s) = H(s)U (s),
where
Y (s) =
y( )es d, H(s) =
h( )es d, U (s) =
u( )es d,
are the Laplace transforms. The p m matrix function H(s) is called the
transfer function of the system.
Causality is a common property for a physical system. A system is causal
if the output at any time depends only on the present and past values of the
input. A LTI system is causal if its impulse response satises
h(t) = 0 for
t < 0,
h( )u(t )d =
t
0
Then, the transfer function of the system is dened by the unilateral Laplace
transform
H(s) =
h( )es d,
197
(1)
q
q
|f (t)|q dt < ,
if 1 q < ,
if q = .
The most natural measure is the L norm. A signal will be called bounded
if there is some M > 0 such that
u
2
2
u(t) u(t)dt.
198
Martine Olivi
Notions of stability are associated with the requirement that the convolution operator
u(t) y(t) = h u(t),
is a bounded linear operator, the input and output spaces being endowed with
some (maybe dierent) norms. This implies that the transfer functions of such
stable systems belong to some spaces of analytic functions, the Hardy spaces
of the right half-plane [7]. We rst introduce these spaces.
3.1 Hardy spaces of the half-plane
The Hardy space H p is dened to be the space of functions f (s) analytic in
the right half-plane which satisfy
f
:= sup
0<x<
|f (x + iy)|p dy
1/p
< ,
:=
Re{s}>0
Lf 2 = 2 f
2.
199
and Hpm are the spaces of p m matrix functions with entries in H and
H 2 respectively endowed with the norm
F
F
2
2
sup
F (iw)
<w<
= Tr
F (iw) F (iw)dw,
(2)
(3)
where . denotes the Euclidean norm for a vector and for a matrix, the
operator norm or spectral norm (that is the largest singular value). We shall
2
and Hpm
, the size of the matrix or vector
often write H , H 2 etc. for Hpm
functions (case m = 1) being understood from the context.
3.2 Some notions of stability
We shall study the notions of stability which arises from the following choices
of norm on the input and output function spaces:
stability L L (BIBO). A system is BIBO stable if and only if
its impulse response is integrable over (0, ). Indeed, if h(t) is integrable
and u < M, then
y(t) M
=M
M
t
0
0
0
h(t ) d
h( ) d,
h( ) d,
200
Martine Olivi
= A x(t) + B u(t)
(4)
t
0
0
t0
t0
= sX(s).
so that the system (4) takes the form
sX(s) = A X(s) + B U (s)
Y (s) = C X(s) + D U (s),
and yields
Y (s) = [D + C(sI A)1 B]U (s),
(5)
201
1 2
r
,
,... ,
, 0, . . . , 0
1 2
r
202
Martine Olivi
s C,
for some polynomial matrices E1 (s) and E2 (s). In this factorization the matrix
D(s) brings the pole structure of G(s) and the matrix N (s) its zero structure.
This representation is very useful in control theory. In our function spaces context another factorization is more natural. It is the inner-unstable or
Douglas-Shapiro-Shields factorization
G(s) = Q(s)P (s),
where Q(s) is an inner function in H , i.e. such that
Q(iw) Q(iw) = I,
w R,
and P (s) is unstable (analytic in the left half-plane). We shall also require this
factorization to be minimal. It is then unique up to a common left constant
unitary matrix and the McMillan degree of Q is the McMillan degree of G.
The existence of such a factorization follows from Beurlings theorem on shift
invariant subspaces of H 2 [5]. Here again, the inner factor brings the pole
structure of the transfer function and the unstable factor the zero structure. In
many approximation problems this factorization allows to reduce the number
of optimization parameters, since the unstable factor can often be computed
from the inner one. This makes the interest of inner function together with
the fact that inner functions are the transfer function of conservative systems.
4.1 Controllability, observability and associated gramians
The notions of controllability and observability are central to the state-space
description of dynamical systems. Controllability is a measure for the ability to
use a systems external inputs to manipulate its internal state. Observability is
a measure for how well internal states of a system can be inferred by knowledge
of its external outputs.
The following facts are well-known [8]. A system described by a state-space
realization (A, B, C, D) is controllable if the pair (A, B) is controllable, i.e. the
matrix
B AB A2 B An1 B
has rank n, and the pair (C, A) observable, i.e. the matrix
C
CA
CA2
..
.
CAn1
203
eAt BB eA t dt,
eA t C CeAt dt.
CeA(t ) Bu( )d =
(6)
where v(t) = u(t) is in L2 (0, ). The mapping g can be view as a composition of two mappings:
u(t) x(0) =
eA Bu( )d,
and
x(0) y(t) = CeAt x(0),
where x(0) is the state at time t = 0. Now, consider the following minimum
energy problem
min
uL2 (,0)
2
2
subject to x(0) = x0 .
u
(t) = B eA t P 1 x0 .
204
Martine Olivi
It satises
2
2
= x0 P 1 x0 .
If P 1 is large, there will be some state that can only be reached if a large input
energy is used. If the system is realized from x(0) = x0 with u(t) = 0, t 0
then
y
2
2
= x0 Qx0 ,
inf
k
rank M
= k+1 ,
M M
Dk = diag(1 , 2 , . . . , k ).
rank R k}.
Thus 0 (T ) = T and
0 (T ) 1 (T ) 2 (T ) 0.
When T is compact, it can be proved that k (T )2 is an eigenvalue of T T
[15, Th.16.4]. Any corresponding eigenvector of T T is called a Schmidt vector
205
T y = k (T )x.
The past inputs to future outputs mapping g associated with a LTI system
by (6) is a compact operator from L2 (, 0) to L2 (0, ). The Hankel singular
values of a LTI system are dened to be the singular values of g . Via the
Laplace transform, we may associate with g , the Hankel operator
2
G : H
H 2,
Since g and G are unitarily equivalent via the Laplace transform, they share
the same set of singular values
0 (G) 1 (G) 2 (G) 0.
The Hankel norm is dened to be the operator norm of G , which turns out
to be its largest singular value 0 (G):
G
= G = 0 (G).
Note that
G
sup
uL2 (,0)
y L2 (0,)
,
u L2 (,0)
so that the Hankel norm gives the L2 gain from past inputs to future outputs.
If the LTI system has nite order, then its Hankel singular values correspond to the singular values of the matrix P Q, where P is controllability
gramian and Q the observability gramian. Indeed, let be a singular value of
g with u the corresponding eigenvector of g g : (g g u)(t) = 2 u(t). Then,
since the adjoint operator g is given by
(g y)(t) =
B eA
(t+ )
C y( )d,
we have that
u(t) = 2 B eA t Qx0 .
(7)
206
Martine Olivi
Now,
2 x0 =
e(A ) B 2 u( )d = P Qx0 ,
(8)
The choice of the norm |. | is inuenced by what norms can be minimized with reasonable computational eorts and whether the chosen norm is an
appropriate measure of error. The most natural norm from a physical viewpoint is the norm . . But this is an unresolved problem: there is no known
numerical method which is guaranteed to converge. In Banach spaces other
than Hilbert spaces, best approximation problems are usually dicult. There
are two cases in which the situation is easier since they involve the Hardy
space H 2 which is an Hilbert space: the L2 -norm and the Hankel norm, since
the Hankel operator acts on H 2 . In this last case an explicit solution can be
computed.
207
= inf G F
F H
GH
GG
H,
k.
McMillan degree of G
solution G(s)
is presented which makes use of a balanced realization of G(s)
[6, Th.6.3]. Moreover, in [6] all the optimal Hankel norm approximations are
characterized in state-space form.
Since,
GG
F
= inf G G
F H
GG
k (G) +
j (G).
j>k
208
Martine Olivi
It is often the case in practical applications that G has a few sizable singular
values and the remaining ones tail away very quickly to zero. In that case the
right hand-side can be made very small, and one is assured that an optimal
Hankel norm approximant is also good with respect to the L norm.
5.2 L2 -norm approximation
In the case of the L2 norm, an explicit solution of the model reduction problem
cannot be computed. However, the L2 norm being dierentiable we may think
of using a gradient ow method. The main diculty in this problem is to
describe the set of approximants, i.e. of rational stable functions of McMillan
degree n. The approaches than can be found in the literature mainly dier
from the choice of a parametrization to describe this set of approximants.
These parametrizations often arise from realization theory and the parameters
are some entries of the matrices (A, B, C, D). To cope with their inherent
complexity, some approaches choose to relax a constraint: stability or xed
McMillan degree. They often run into diculties since smoothness can be lost
or an undesirable approximant reached.
Another approach can be proposed. The number of optimization parameters can be reduced using the inner-unstable factorization (see section 4) and
be a best L2 approximant
the projection property of an Hilbert space. Let G
of G, with inner-unstable factorization
= QP,
G
where Q is the inner factor and P the unstable one. Then, H 2 being an Hilbert
must be the projection of G onto the space H(Q) of matrix functions
space, G
of degree n whose left inner factor is Q. We shall denote this projection by
G(Q)
and the problem consists now in minimizing
Q G G(Q)
2,
over the set of inner functions of McMillan degree n.
Then, more ecient parametrizations can be used which arise from the
manifold structure of this set. It consists to work with an atlas of charts, that
is a collection of local coordinate maps (the charts) which cover the manifold
and such that changing from one map to another is a smooth operation. Such
a parametrization present the advantages to ensure identiability, stability of
the result and the nice behavior of the optimization process. The optimization
is run over the set as a whole changing from one chart to another when
necessary. Parametrizations of this type are available either from realization
theory or from interpolation theory in which the parameters are interpolation
values. Their description goes beyond the aim of this paper, and we refer
the reader to [9] and the bibliography therein for more informations on this
approach.
209
References
1. J.A. Ball, I. Gohberg, L. Rodman. Interpolation rational matrix functions,
Birkh
auser, Operator Theory: Advances and Applications, 1990, vol. 45.
2. L. Baratchart. Identication an Function theory. This volume, pages 211.
3. M. Deistler. Stationary Processes and Linear Systems. This volume, pages
159.
4. B.A. Francis. A course in H control theory, Springer, 1987.
5. P.A. Fhurmann. Linear systems and operators in Hilbert Spaces, McGraw-Hill,
1981.
6. K. Glover. All optimal Hankel norm approximations of linear multivariable
systems and their L error bounds, Int. J. Control, 39(6):1115-1193, 1984.
7. K. Hoffman. Banach spaces of analytic functions, Dover publications, New
York, 1988.
8. T. Kailath. Linear systems, Prentice-Hall, 1980.
9. J.-P. Marmorat, M. Olivi, B. Hanzon, R.L.M. Peeters. Matrix rational
H 2 approximation: a state-space approach using Schur parameters, in Proceedings of the CDC02, Las-Vegas, USA.
10. L. Mirsky. Symmetric gauge functions and unitarily invariant norms, Quart.
J. Math. Oxford Ser. 2(11):50-59, 1960.
11. A.V. Oppenheim, A.S. Willsky, S.H. Nawab. Signals and Systems, PrenticeHall, 1997.
12. J.R. Partington. Fourier transforms and complex analysis. This volume, pages 39.
13. J.R. Partington. Interpolation, identication and sampling, Oxford University
Press, 1997.
14. W. Rudin. Real and complex analysis, New York, McGraw-Hill, 1987.
15. N.J. Young. An introduction to Hilbert space, Cambridge University Press,
1988.
16. N.J. Young. The Nehari problem and optimal Hankel norm approximation,
Analysis and optimization of systems: state and frequency domain approach for
innite dimensional systems, Proceedings of the 10th International Conference,
Sophia-Antipolis, France, June 9-12, 1992.
1 Introduction
We survey in these notes certain constructive aspects of how to recover an
analytic function in a plane domain from complete or partial knowledge of
its boundary values. This we do with an eye on identication issues for linear
dynamical systems, i.e. one-dimensional deconvolution schemes, and for that
reason we restrict ourselves either to the unit disk or to the half-plane because
these are the domains encountered in this context. To ensure the existence of
boundary values, restrictions on the growth of the function must be made,
resulting in a short introduction to Hardy spaces in the next section. We
hasten to say that, in any case, the problem just mentioned is ill-posed in
the sense of Hadamard [32], and actually a prototypical inverse problem: the
Cauchy problem for the Laplace equation. We approach it as a constrained
optimization issue, which is one of the classical routes when dealing with illposedness [51]. There are of course many ways of formulating such issues;
those surveyed below make connection with the quantitative spectral theory
of Toeplitz and Hankel operators that are deeply linked with meromorphic
approximation. Standard regularization, which consists in requiring additional
smoothness on the approximate solution, would allow us here to use classical
interpolation theory; this is not the path we shall follow, but we warn the
reader that linear interpolation schemes are usually not so extremely ecient
in the present context. An excellent source on this topic and other matters
related to our subject is [39].
2 Hardy spaces
Let T be the unit circle and D the unit disk in the complex plane. We let C(T)
denote continuous functions and Lp = Lp (T) the familiar Lebesgue spaces. For
1 p , the Hardy space H p of the unit disk is the closed subspace of Lp
consisting of functions whose Fourier coecients of strictly negative index do
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 211230, 2006.
Springer-Verlag Berlin Heidelberg 2006
212
Laurent Baratchart
vanish. These are the nontangential limits of functions g analytic in the unit
disk D having uniformly bounded Lp means over all circles centered at 0 of
radius less than 1:
sup
0<r<1
g(rei )
< .
(1)
The correspondence is one-to-one and, using this identication, we alternatively regard members of H p as holomorphic functions in the variable z D.
The extension to D is obtained from the values on T through a Cauchy as well
as a Poisson integral, namely if g H p then:
g(z) =
1
2i
g()
d,
z
z D,
(2)
and also
g(z) =
1
2
Re
ei + z
ei z
g(ei )d,
z D.
(3)
f (z) =
ak z k ,
with
j=0
|aj |2 < .
j=0
1
2
2
0
ei + z
log |f (ei )|d
ei z
(4)
z l z zl
|zl | 1 zl z
(5)
1
2
2
0
ei + z
d()
ei z
(6)
213
1
2
2
0
ei + z
log (ei ) d
ei z
(7)
214
Laurent Baratchart
sup
x>0
and again they have nontangential limits at almost every point of the imaginary axis, thereby giving rise to a boundary function G(iy) that lies in
Lp (iR). The space H consists of bounded analytic functions in + , and a
theorem of Paley-Wiener characterizes H2 as the space of Fourier transforms
of functions in L2 (R) that vanish for negative arguments.
p thanks to the isometry :
The study of Hp can be reduced to that of H
0
g (s 1)2/p g
s+1
s1
(8)
yk =
fj ukj ,
j=0
215
where the output at time k is a linear combination of the past inputs with
xed coecients fj R. For such systems, function theory enters the picture
when signals are encoded by their generating functions:
uk z k ,
u(z) =
yk z k .
y(z) =
kZ
kZ
Indeed, if we dene the transfer function of the linear control system to be:
f (z) =
fk z k ,
k=0
yk = Cxk + Duk ,
B.
216
Laurent Baratchart
y(k) =
fj u(k j)
j=0
fj2 = E y(k)2
j=0
2,
fk z k H
f (z) =
k=0
2
gRR
n,n H
f g 2.
This principle can be used to identify a linear system from observed stochastic
inputs, although computing the fj is dicult because it requires spectral
factorization of the function
E {y(j + k)y(k)}
jZ
2
0
|f g|2 d,
(9)
217
where the positive measure is the so-called spectral measure of the input
process (that reduces to Lebesgue measure when the latter is white noise),
2 (d). Though we
and f now has to belong to a weighted Hardy space H
shall not dwell on this, we want to emphasize that the spectral theorem, as
applied to shift operators, stresses deep links between time and frequency
representations of a stochastic process, and the isometric character of this
theorem (that may be viewed as a far-reaching generalization of Parsevals
relation) is a fundamental reason why L2 approximation problems arise in
system theory. The scheme just mentioned is a special instance of maximum
likelihood identication where the noise model is xed [26, 33, 49], that aims
at a rational extension of the Szeg
o theory [50] of orthogonal polynomials.
At this point, it must be mentioned that stochastic identication, as applied to linear dynamical systems, is not just concerned with putting up probabilistic interpretations to rational approximation criteria. Its main methodological contribution is to provide one with a method of choosing the degree
of the approximant as the result of a trade-o between the bias term (i.e.
the approximation error that goes small when the degree goes large) and the
variance term (i.e. the dispersion of the estimates that goes large when the
degree goes large and eventually makes the identication unreliable). We shall
not touch on this deeper aspect of the stochastic paradigm, whose deterministic counterpart pertains to the numerical analysis of approximation theory
(when should we stop increasing the degree to get a better t since all we
shall approximate further is the error caused by truncation, round o, etc.?).
For an introduction to this circle of ideas, the interested reader is referred to
the above-quoted textbooks.
3.2 Harmonic Identication
This example deals with continuous-time rather than discrete-time linear control systems, namely with convolution operators u(t) y(t) of the form:
y(t) =
t
0
h(t )u( ) d.
218
Laurent Baratchart
2
GRR
n,n H
H G
L2 () ,
or
min
GRR
n,n H
H G
L () ,
such a problem is poorly behaved because the optimum may not exist (the best
rational may have unstable poles) and even if it exists it may lead to a wild
behavior o . One way out, which is taken up in these notes, is to extrapolate
a complete model in Hp from the knowledge of H| by solving an analytic
bounded extremal problems as presented in the forthcoming section. Once
the complete model is obtained, one faces a rational approximation problem
in Hp that we will comment upon. The H2 norm is often better suited, due
to the measurement errors, but physical constraints on the global model, like
passivity for instance, typically involve the uniform norm. Figure 1 shows a
numerical example of this two-steps identication scheme on a hyperfrequency
lter (see [5, 44]), and an illustration in the design of transmission lines can be
found in [47]. Often, weights are added in the criteria to trade-o between L2
norms, that tend to oversmooth the data, and L norms that are put o by
irregular samples. We shall not consider this here, and turn to approximation
proper.
219
H2 approx. in degree 8
frequency measurements
0.8
0.6
0.4
0.2
0
-0.2
-0.4
-0.6
-0.8
-1
-0.8
-0.6
-0.4
-0.2
0.2
0.4
0.6
0.8
Fig. 1. The dotted line in this diagram is the Nyquist plot (i.e. the image of the
bandwidth on the imaginary axis) of the transfer function of the reexion of a hyperfrequency lter measured by the French CNES (Toulouse). The data were rst
2 bounded extremal problem and then approximated
completed by solving an H
by a rational function of degree 8 whose Nyquist plot has been superimposed on
the gure. The locus is not conjugate-symmetric because a low-pass transformation
sending the central frequency to the origin was performed on the data. This illustrates that approximation with complex Fourier coecients can be useful in system
identication, even though the physical system is real.
duce analytic operators of Hankel and Toeplitz type. We begin with analytic
extrapolation from partial boundary data.
4.1 Analytic bounded extremal problems
For I is a measurable subset of T and J the complementary subset, if h1 is
a function dened on E and h2 a function dened on J, we use the notation
h1 h2 for the concatenated function, dened on the whole of T, which is h1
on I and h2 on J. The L2 (I)/L2 (J) analytic bounded extremal problem is:
[ABEP L2 (I), L2 (J) ]
Given f L2 (I), L2 (J) and a strictly positive constant M , nd g0 H 2
such that
g0 (ei ) (ei )
L2 (J)
and
220
Laurent Baratchart
f g0
L2 (I)
min
gH 2
f g
L2 (I)
(10)
L2 (J) M
(11)
PH 2 (f (1 + )),
(12)
where (1, +) is the unique real number such that the right hand side
of (12) has L2 (J)-norm equal to M .
Note that (12) indeed makes sense because the spectrum of J is [0, 1].
Theorem 1 provides a constructive means of solving ABEP L2 (I), L2 (J)
because, although the correct value for is not known a priori, the L2 (J)norm of the right-hand side in (12) is decreasing with so that iterating by
dichotomy allows one to converge to the solution.
Let us point out that H|2I is dense in L2 (I), hence the error in (10) can
be made very small, but this is at the cost of making M very big unless
221
(13)
(14)
(15)
(16)
and d, the
L (J)
and
222
Laurent Baratchart
f g0
L (I)
min
gH
L (J) M
f g
L (I)
(17)
PH 2 (f )wM/ v0
,
v0
(18)
223
Here again, the solution has conjugate symmetry if the data do. Although
the value of the problem is not known a priori, it is the unique positive real
number such that the right hand side of (18) has modulus M a.e. on J, and
so the theorem allows for us to constructively solve [ABEP L (I), L (J) ]
if a maximizing vector of (f )wM/ can be computed for given . In [9],
generically convergent algorithms to this eect are detailed in the case where
I is an interval and f is C 1 -smooth. They are based on the fact that a
smooth H function may be added to f to make it vanish at the endpoints
of I, and in this case (f )wM/ may suciently well approximated by
rationals, say in H
older norm (uniform convergence is not sucient here, see
[40]). Reference [9] also contains a meromorphic extension.
Analytic bounded extremal problems have been generalized to abstract
Hilbert and Banach space settings, with applications to hyperinvariant subspaces [34, 17, 48, 18]; they can be posed with dierent constraints where
bounds are put on the imaginary part rather than the modulus [27]. In
another connection, the work [3] investigates the problem of mixed type
[ABEP L2 (I), L (J) ], which is important for instance when identifying
passive systems whose transfer-function must remain less than 1 in modulus at every frequency. It turns out that the solution can be expressed very
much along the same lines as [ABEP L2 (I), L2 (J) ], except that this time
unbounded Toeplitz operators appear. We shall not go further into such generalizations, and we rather turn to rational approximation part of the two-step
identication procedure sketched in section 3.
4.2 Meromorphic and rational approximation
We saw in subsection 3.1, and in the second step of the identication scheme
sketched in subsection 3.2, that stable and proper rational approximation of
a complete model on the line or the circle is an important problem from the
system-theoretic viewpoint. Here again, the isometry (8) makes it enough to
consider the case of the circle. We shall start with the Adamjan-Arov-Krein
theory (in short: AAK) which deals with a related issue, namely meromorphic
approximation in the uniform norm.
For k = 0, 1, 2, . . . , recall that the singular values of are dened by the
formula:
sk ( ) := inf ||| A|||,
A an operator of rank k on H 2 .
224
Laurent Baratchart
inf
gHn
= sn ( )
(19)
vn
P 2 (vn )
= H
,
vn
vn
(20)
sj ( )
(21)
j=n+1
of the optimal one out of Rn1,n . To estimate how good this bound requires
a link between the decay of the singular values of and the smoothness of
. The summability of the singular values is equivalent to the belonging of
PH 02 () to the Besov class B11 of the disk [40], but this does not tell how fast
the series converges.
For an appraisal of this, we need introduce some basic notions of potential
theory. For more on fundamental notions like equilibrium measure, potential,
capacity, balayage, as well as the basic theorems concerning them, the reader
may want to consult some recent textbook such as [43]. However, for his
convenience, we review below the main concepts, starting with logarithmic
potentials.
Let E C be a compact set. To support his intuition, one may view E
as a plane conductor and imagine he puts a unit electric charge on it. Then,
if a distribution of charge is described as being a Borel measure on E, the
electrostatic equilibrium has to minimize the internal energy:
I() =
log
1
d(x)d(t)
|x t|
225
those we do not dene the equilibrium measure. But if E is such that I() is
nite for some probability measure on E, then there is a unique minimizer
for I() among all such probability measures. This minimizer is called the
equilibrium measure (with respect to logarithmic potential) of E, and we denote it by E . For example, the equilibrium measure of a disk or circle is the
normalized arc measure on the circumference, while the equilibrium measure
of a segment [a, b] is
d[a,b] (x) =
(x a)(b x)
dx.
log
1
d(t),
|x t|
1 az
.
za
226
Laurent Baratchart
g(z, a)d(z)d(a).
g(z, a) dE (a)
(22)
227
2 rational approximation
Let us conclude with a few words concerning H
of type (n, n). We saw in subsection 3.1 and 3.2 the relevance of this problem
in identication, but still it is basically unsolved. For a comparison, observe
from the Courant minimax principle that the error in AAK approximation to
is
sn ( ) = inf
V Vn
sup
vV
(v),
2 =1
where Bn denotes the set of Blaschke products of degree at most n. The nonlinear character of Bn makes for a much more dicult problem, and practical
algorithms have to rely on numerical searches with the usual burden of local
minima. Space prevents us from describing in details what is known on this
problem, and we refer the reader to [4] for a survey. Let us simply mention
that in the special case of Markov functions, namely Cauchy transforms of positive measures on a segment that also correspond to the transfer functions of
so-called relaxation systems [53, 16], a lot is known including sharp error rates
[10, 13] asymptotic uniqueness of a critical point for Szeg
o-smooth measures
[14] and uniqueness for all orders and small support [15]. For certain entire
functions like the exponential, sharp error rates and asymptotic uniqueness
of a critical point have also been derived [11], but for most classes of functions the situation is unclear. Results obtained so far concern functions for
which the decay of the error is comparable to the one in AAK approximation
and fairly regular. Finally we point out that, despite the lack of a general
theory, rather ecient algorithms are available to generate local minima e.g.
[21, 25, 35].
References
1. D. Alpay, L. Baratchart, J. Leblond. Some extremal problems linked
with identication from partial frequency data. In J.L. Lions, R.F. Curtain,
A. Bensoussan, editors, 10th conference on analysis and optimization of systems,
SophiaAntipolis 1992, volume 185 of Lect. Notes in Control and Information
Sc., pages 563573. Springer-Verlag, 1993.
2. L. Aizenberg. Carlemans formulas in complex analysis. Kluwer Academic
Publishers, 1993.
3. L. Baratchart, J. Leblond, F. Seyfert. Constrained analytic approximation of mixed H 2 /H type on subsets of the circle. In preparation.
4. L. Baratchart. Rational and meromorphic approximation in Lp of the circle :
system-theoretic motivations, critical points and error rates. In Computational
Methods and Function Theory, pages 4578. World Scientic Publish. Co, 1999.
N. Papamichael, St. Ruscheweyh and E.B. Sa eds.
228
Laurent Baratchart
5. L. Baratchart, J. Grimm, J. Leblond, M. Olivi, F. Seyfert, F. Wielonsky. Identication dun ltre hyperfrequence. Rapport Technique INRIA
No 219., 1998.
6. L. Baratchart, J. Grimm, J. Leblond, J.R. Partington. Asymptotic estimates for interpolation and constrained approximation in H 2 by diagonalization
of toeplitz operators. Integral equations and operator theory, 45:269299, 2003.
7. L. Baratchart, J. Leblond. Hardy approximation to Lp functions on subsets
of the circle with 1 p < . Constructive Approximation, 14:4156, 1998.
8. L. Baratchart, J. Leblond, J.R. Partington. Hardy approximation to
L functions on subsets of the circle. Constructive Approximation, 12:423436,
1996.
9. L. Baratchart, J. Leblond, J.R. Partington. Problems of AdamjanArov
Krein type on subsets of the circle and minimal norm extensions. Constructive
Approximation, 16:333357, 2000.
10. L. Baratchart, V. Prokhorov, E.B. Saff. Best LP meromorphic approximation of Markov functions on the unit circle. Foundations of Constructive
Math, 1(4):385416, 2001.
11. L. Baratchart, E.B. Saff, F. Wielonsky. A criterion for uniqueness of a
critical points in H 2 rational approximation. J. Analyse Mathematique, 70:225
266, 1996.
12. L. Baratchart, F. Seyfert. An Lp analog to the AAK theory. Journal of
Functional Analysis, 191:52122, 2002.
13. L. Baratchart, H. Stahl, F. Wielonsky. Asymptotic error estimates for
L2 best rational approximants to Markov functions on the unit circle. Journal
of Approximation Theory, (108):5396, 2001.
14. L. Baratchart, H. Stahl, F. Wielonsky. Asymptotic uniqueness of best
rational approximants of given degree to Markov functions in L2 of the circle.
2001.
15. L. Baratchart, F. Wielonsky. Rational approximation in the real Hardy
space H 2 and Stieltjes integrals: a uniqueness theorem. Constructive Approximation, 9:121, 1993.
16. R.W. Brockett, P.A. Fuhrmann. Normal symmetric dynamical systems.
SIAM J. Control and Optimization, 14(1):107119, 1976.
17. I. Chalendar, J.R. Partington. Constrained approximation and invariant
subspaces. J. Math. Anal. Appl. 280 (2003), no. 1, 176187.
18. I. Chalendar, J.R. Partington, M.P. Smith. Approximation in reexive
Banach spaces and applications to the invariant subspace problem. Proc. Amer.
Math. Soc. 132 (2004), no. 4, 11331142.
19. P.L. Duren. Theory of H p -spaces. Academic Press, 1970.
20. B. Francis. A course in H control theory. Lecture notes in control and
information sciences. SpringerVerlag, 1987.
21. P. Fulcheri, M. Olivi. Matrix rational H 2 approximation: a gradient algorithm based on Schur analysis. 36(6):21032127, 1998. SIAM Journal on Control
and Optimization.
22. J.B. Garnett. Bounded Analytic Functions. Academic Press, 1981.
23. K. Glover. All optimal Hankelnorm approximations of linear multivariable
systems and their L error bounds. Int. J. Control, 39(6):11151193, 1984.
24. A.A. Gonchar, E.A. Rakhmanov. Equilibrium distributions and the degree
of rational approximation of analytic functions. Math. USSR Sbornik, 176:306
352, 1989.
229
25. J. Grimm. Rational approximation of transfer functions in the hyperion software. Rapport de recherche 4002, INRIA, September 2000.
26. E.J. Hannan, M. Deistler. The statistical theory of linear systems. Wiley,
New York, 1988.
27. B. Jacob, J. Leblond, J.-P. Marmorat, J.R. Partington. A constrained
approximation problem arising in parameter identication. Linear Algebra and
its Applications, 351-352:487-500, 2002.
28. M.G. Krein, P. Ya Nudelman. Approximation of L2 (1 , 2 ) functions by
minimumenergy transfer functions of linear systems. Problemy Peredachi Informatsii, 11(2):3760, English transl., 1975.
29. R.E. Kalman, P.L. Falb, M.A. Arbib. Topics in mathematical system theory.
Mc Graw Hill, 1969.
30. P. Koosis. Introduction to Hp -spaces. Cambridge University Press, 1980.
stner. Distribution asymptotique des zeros de polyn
31. R. Ku
omes orthogonaux
par rapport a
` des mesures complexes ayant un argument `
a variation bornee.
Ph.D. thesis, University of Nice, 2003.
32. M.M. Lavrentiev. Some Improperly Posed Problems of Mathematical Physics.
Springer, 1967.
33. L. Ljung. System identication: Theory for the user. PrenticeHall, 1987.
34. J. Leblond, J.R. Partington. Constrained approximation and interpolation
in Hilbert function spaces. J. Math. Anal. Appl., 234(2):500513, 1999.
35. J.P. Marmorat, M. Olivi, B. Hanzon, R.L.M. Peeters. Matrix rational H 2
approximation: a state-space approach using schur parameters. In Proceedings
of the C.D.C., 2002.
36. N.K. Nikolskii. Treatise on the shift operator. Grundlehren der Math. Wissenschaften 273. Springer, 1986.
37. O.G. Parfenov. Estimates of the singular numbers of a Carleson operator.
Math USSR Sbornik, 59(2):497514, 1988.
38. J.R. Partington. An Introduction to Hankel Operators. Cambridge University
Press, 1988.
39. J.R. Partington. Robust identication and interpolation in H . Int. J. of
Control, 54:12811290, 1991.
40. V.V. Peller. Hankel Operators and their Applications. Springer, 2003.
41. M. Rosenblum, J. Rovnyak. Hardy classes and operator theory. Oxford
University Press, 1985.
42. H.H. Rosenbrock. State Space and Multivariable Theory. Wiley, New York,
1970.
43. E.B. Saff, V. Totik. Logarithmic Potentials with External Fields, volume 316
of Grundlehren der Math. Wiss. Springer-Verlag, 1997.
44. F. Seyfert, J.P. Marmorat, L. Baratchart, S. Bila, J. Sombrin. Extraction of coupling parameters for microwave lters: Determination of a stable
rational model from scattering data. Proceedings of the International Microwave
Symposium, Philadelphia, 2003.
45. A.N. Shiryaev. Probability. Springer, 1984.
46. H. Stahl. The convergence of Pade approximants to functions with branch
points. J. of Approximation Theory, 91:139204, 1997.
47. J. Skaar. A numerical algorithm for extrapolation of transfer functions. Signal
Processing, 83:12131221, 2003.
48. M.P. Smith. Constrained approximation in Banach spaces. Constructive Approximation, 19(3):465-476, 2003.
230
Laurent Baratchart
derstro
m, P. Stoica. System Identication. PrenticeHall, 1987.
49. T. So
. Orthogonal Polynomials. Colloquium Publications. AMS, 1939.
50. G. Szego
51. A. Tikhonov, N. Arsenine. Methodes de resolution des probl`emes mal poses.
MIR, 1976.
52. J.L. Walsh. Interpolation and approximation by rational functions in the complex domain. A.M.S. Publications, 1962.
53. J.C. Willems. Dissipative dynamical systems, Part I: general theory, Part
II: linear systems with quadratic supply rates. Arch. Rat. Mech. and Anal.,
45:321351, 352392, 1972.
Dipartimento di Matematica,
Universit`
a di Roma Tor Vergata,
Via della Ricerca Scientica 1,
I-00133 Roma (Italy)
celletti@mat.uniroma2.it
Abstract
Perturbation theory is introduced by means of models borrowed from Celestial Mechanics, namely the twobody and threebody problems. Such models
allow one to introduce in a simple way the concepts of integrable and nearly
integrable systems, which can be conveniently investigated using Hamiltonian
formalism. After discussing the problem of the convergence of perturbative series expansions, we introduce the basic notions of KAM theory, which allows
(under quite general assumptions) to state the persistence of invariant tori.
The value at which such surfaces breakdown can be determined by means of
numerical algorithms. Among the others, we review three methods to which
we refer as Greene, Pade and Lyapunov. We present some concrete applications to discrete models of the three dierent techniques, in order to provide
complementary information about the breakdown of invariant tori.
1 Introduction
The dynamics of the planets and satellites is ruled by Newtons law, according
to which the gravitational force is proportional to the product of the masses
of the interacting bodies and it is inversely proportional to the square of their
distance. The description of the trajectories spanned by the celestial bodies
starts with the simplest model in which one considers only the attraction exerted by the Sun, neglecting all contributions due to other planets or satellites.
Such model is known as the twobody problem and it is fully described by
Keplers laws, according to which the motion is represented by a conic. Consider, for example, the trajectory of an asteroid moving on an elliptic orbit
around the Sun. In the twobody approximation the semimajor axis and the
eccentricity of the ellipse are xed in time. However, such example represents
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 233261, 2006.
Springer-Verlag Berlin Heidelberg 2006
234
only the rst approximation of the asteroids motion, which is actually subject
also to the attraction of the other planets and satellites of the solar system.
The most important contribution comes from the gravitational inuence of
Jupiter, which is the largest planet of the solar system, its mass being equal
to 103 times the mass of the Sun. Therefore, next step is to consider the
threebody problem formed by the Sun, the asteroid and Jupiter. A complete
mathematical solution of such problem was hunted for since the last three centuries. A conclusive answer was given by H. Poincare [18], who proved that
the threebody problem does not admit a mathematical solution, in the sense
that it is not possible to nd explicit formulae which describe the motion of
the asteroid under the simultaneous attraction of the Sun and Jupiter. For
this reason, the threebody problem belongs to the class of nonintegrable
systems. However, since the mass of Jupiter is much smaller than the mass
of the Sun, the trajectory of the asteroid will be in general weakly perturbed
with respect to the Keplerian solution. In this sense, one can speak of the
threebody problem as a nearlyintegrable system (see section 3).
Let us consider a trajectory of the threebody problem with preassigned
initial conditions. Though an explicit solution of such motion cannot be found,
one can look for an approximate solution of the equations of motion by means
of mathematical techniques [2] known as perturbation theory (see section 4),
which can be conveniently introduced in terms of the Hamiltonian formalism
reviewed in section 2. More precisely, one can construct a transformation of
coordinates such that the system expressed in the new variables provides a
better approximation of the true solution. The coordinates transformations
are builded up by constructing suitable series expansions, usually referred to1
as PoincareLindstedt series (see [2]), and a basic question (obviously) concerns their domain of convergence. In the context of perturbation theory, a
denite breakthrough is provided by Kolmogorovs theorem, later extended
by Arnold and Moser, henceforth known as KAM theory by the acronym of
the authors [15], [1], [17]. Let denote the perturbing parameter, such that
for = 0 one recovers the integrable case (in the threebody problem the perturbing parameter is readily seen to represent the JupiterSun massratio).
The novelty of KAM theory relies in xing the frequency, rather than the initial conditions, and in using a quadratically convergent procedure of solution
(rather than a linear one, like in classical perturbation theory). The basic assumption consists in assuming a strongly irrational (or diophantine) condition
on the frequency . In conclusion, having xed a diophantine frequency for
the unperturbed system ( = 0), KAM theory provides an explicit algorithm
to prove the persistence, for a suciently small = 0, of an invariant torus
on which a quasiperiodic motion with frequency takes place; moreover,
Kolmogorovs theory proves that the set of such invariant tori has positive
measure in the phase space.
1
Quoting [2] the Lindstedt technique is one of the earliest methods for eliminating
fast phases. We owe its contemporary form to Poincare.
235
2 Hamiltonian formalism
Consider a smooth function H := H(y, x) with (y, x) M := Rn Rn ,
n = 1, 2, 3, . . . and the following systems of O.D.E.s:
y(t)
= Hx (y(t), x(t)),
(1)
x(t)
= Hy (y(t), x(t))
(here and in the following Hx (y, x)
H
x (y, x),
Hy (y, x)
H
y (y, x)).
Dene
236
(2)
1 2
r + cos ,
2
(3)
where T R/(2Z) is the angle described by the particle with the vertical
Being onedimensional, the system
axis and r R is the velocity, i.e. r = .
can be easily integrated by quadratures.
Another classical example of a physical model which can be conveniently
studied through Hamiltonian formalism, is provided by the harmonic oscillator. The equation governing the small oscillations (described by the coordinate
x R) of a body attached to the end of an elastic spring is given by Hookes
law, which can be expressed as
x
= 2 x,
for a suitable > 0 representing the strength of the spring. The corresponding
Hamiltonian is given by
H(y, x) =
1 2 1 2 2
y + x ,
2
2
x = y.
(4)
(5)
for suitable constants , related to the initial conditions x(0), y(0). It is rather instructive to use this example in order to introduce suitable coordinates,
known as actionangle variables, which will play a key role in the context of
perturbation theory. Indeed, we proceed to construct a change of coordinates
(I, ) = (y, x), dened through the relation
tH = tH .
(6)
y = 2I cos ,
x=
237
(7)
2I/ sin ,
where I > 0 and T := R/(2Z). Notice that the coordinate I has the
dimension of an action2 , while is an angle. A trivial computation shows that
the previous change of coordinates satises (6). Moreover, we remark that the
new Hamiltonian
K := K(I) := H (I, ) = I
does not depend on . Denoting by := K(I)
become
K(I)
I ,
I = 0, = ,
equations (1)
(8)
(9)
Notice that inserting (9) in (7), one recovers the solution (5). In view of this
example, we are led to the following
Denition: A change of coordinates verifying (6) is said to be canonical.
The coordinates (I, ) M (where the phase space is M := Rn Tn with
Tn := Rn /(2Z)n ) are called actionangle variables.
Notice that if the Hamiltonian H is expressed in terms of actionangle
variables, namely H = H(I, ), then equations (1) become
= H (I(t), (t)),
I(t)
(t)
= HI (I(t), (t)).
(10)
Namely energytime.
238
for a suitable function h, is said to be (completely) integrable. For these systems, Hamiltons equations can be written as I = 0, = h(I). Correspondingly, we introduce the invariant tori 3
T0 := {(I, ) | I I0 ,
Tn },
(11)
f
(t)
(12)
Namely tH (T0 ) T0 .
239
Sun is an ellipse with the Sun located at one of the two foci. The Hamiltonian
formulation of the twobody problem is described as follows. Choose the units
of length, mass and time so that the gravitational constant and the mass of the
Sun are normalized to one. In order to investigate the asteroidSun problem, it
is convenient to introduce suitable coordinates, known as Delaunay variables:
(I1 , I2 ) (L, G) R2 ,
(1 , 2 ) (l, g) T2 ,
whose denition is the following. Denoting by a and e, respectively, the semimajor axis and eccentricity of the orbit of the asteroid, the Delaunay actions
are:
L := a,
G := a(1 e2 ).
(13)
The conjugated angle variables are dened as follows: l is the mean anomaly,
which provides the position of the asteroid along its orbit and g is the longitude of perihelion, namely the angle between the perihelion line and a xed
reference direction (see [20]).
The Hamiltonian function in Delaunay variables can be written as
H(L, G, l, g) := h(L) :=
1
,
2L2
(14)
which shows that the system is integrable, since H = h(L) depends only on
the actions. From the equations of motion L = 0, G = 0, we immediately
recognize that L and G are constants, L = L0 , G = G0 , which in view of (13)
is equivalent to say that the orbital elements (semimajor axis and eccentricity)
do not vary along the motion. Being also g constant (g = 0), we obtain that
the orbit is a xed ellipse with one of the foci coinciding with the Sun. Finally,
the mean anomaly is obtained from l = H(L)
:= (L) as l(t) = (L0 )t + l(0).
L
Let us remark that the Hamiltonian (14) does not depend on all the actions,
being independent on G: such kind of systems are called degenerate and often
arise in Celestial Mechanics.
3.2 The threebody problem
The twobody problem describes only a rough approximation of the motion
of the asteroid around the Sun; indeed, the most important contribution we
neglected while considering Keplers model comes from the gravitational inuence of Jupiter. We are thus led to consider the motion of the three bodies: the
Sun (S), Jupiter (J) and a minor body of the asteroidal belt (A). We restrict
our attention to the special case of the planar, circular, restricted threebody
problem. More precisely, we assume that the Sun and Jupiter revolve around
their common center of mass, describing circular orbits (circular case). Choose
the units of length, mass and time so that that the gravitational constant, the
orbital angular velocity and the sum of the masses of the primaries (Sun and
240
Jupiter) are identically equal to one. Consider the motion of an asteroid A moving in the same orbital plane of the primaries (planar case). Assume that the
mass of A is negligible with respect to the masses of S and J; this hypotheses
implies that the motion of the primaries is not aected by the gravitational
attraction of the asteroid (restricted case). Finally, let us identify the mass
of J with a suitable small parameter . Though being the simplest (non trivial) threebody model, as shown by Poincare [18] such problem cannot be
explicitly integrated (like the twobody problem through Keplers solution).
In order to introduce the Hamiltonian function associated to such problem,
it is convenient to write the equations of motion in a barycentric coordinate
frame (with origin at the center of mass of the SunJupiter system), which
rotates uniformly at the same angular velocity of the primaries. The resulting
system is described by a nearlyintegrable Hamiltonian function with two degrees of freedom (see [20]) with the perturbing parameter representing the
JupiterSun massratio.
We immediately recognize that for = 0 (i.e., neglecting Jupiter), the
system reduces to the unperturbed twobody problem. As described in the
previous section, we can identify the Delaunay elements with actionangle
variables; the only dierence is that in the rotating reference frame the variable
g represents the longitude of the pericenter, evaluated from the axis coinciding
with the direction of the primaries. If H = h(AS) (L) denotes the Hamiltonian
function associated to the asteroidSun problem, it can be shown [20] that
the Hamiltonian describing the threebody problem has the form
H(L, G, l, g; ) := h(AS) (L) G + f (L, G, l, g; ),
(15)
241
4 Perturbation theory
Perturbation theory owered during the last two centuries through the works
of Laplace, Leverrier, Delaunay, Poincare, Tisserand, etc.; it provides constructive methods to investigate the behavior of nearlyintegrable systems.
The importance of studying the eects of small Hamiltonian perturbations
on an integrable system was pointed out by Poincare, who referred to it as
the fundamental problem of dynamics. We introduce perturbation theory in
section 4.1 and we present in section 4.2 the celebrated KolmogorovArnold
Moser theorem.
4.1 Classical perturbation theory
Consider an analytic Hamiltonian of the form
H(I, ; ) := h(I) + f (I, ; ),
(16)
f (I, ; ) =:
j0
for suitable functions f (j) (I, ), which can be expanded in Fourier series as
(j)
f (j) (I, ) = kZn fk (I)eik . We implement an integrating transformation
: (J, ) (I, ) ,
dened by the implicit equations
I = J + S(J, ; ),
= + J S(J, ; ),
(17)
S(J, ; ) =
j0
S (0) (J, ; )
+ O(2 ), + O()).
242
(19)
(0)
S (0) (J, ) := i
k=0
fk (J) ik
e
.
h(J) k
(20)
(21)
Indeed, the function S (0) can be dened only for values of the actions such that
the small divisors (21) are dierent from zero, namely only for J belonging to
the set of rational independent frequencies
:= {J Rn , such that h(J) k = 0, k Zn \ {0} }.
Since has empty interior, if we want to dene S (0) in an open neighborhood
of a xed J0 , we must truncate the series in (20) up to a suitable order, say
|k| d0 for a given d0 Z+ . In particular, let us write the Fourier expansion
of f (0) as
(0)
f (0) (J, ) =
(0)
fk (J)eik +
|k|d0
fk (J)eik ;
(22)
|k|>d0
J B
0 ()
(J0 ).
In order to eliminate the angle dependence in (18) for the terms of order
2 , 3 , . . . , m+1 , . . . , we determine S (1) , S (2) , . . . , S (m) , . . . , by solving equations similar to (19). Again, we need to truncate the series associated to S (m)
at the orders |k| dm for suciently large indexes
dm := dm () for
m .
:=
m ()
0 for
243
m .
H(I, ; ) := I + I1 +
ak sin(k ) ,
(23)
kN2 \{0}
ak k1
kN/{0}
t
0
cos
(1 + )k1 + 2 k2 t dt,
(24)
for some
p Z, q N,
the sum in (24) over all the terms with (k1 , k2 ) = (q, p) gives a uniformly
bounded contribution (for any t R), while the term with (k1 , k2 ) = (q, p)
244
Tn
0.
(25)
245
(26)
It is easy to see that after the last change of variables the new remainder term
2
will be of order 2 = 2 .
Iterating this procedure, at the mstep the angledepending remainder will
m
be of order 2 . Finally, applying innitely many times the KAM scheme, we
end up with an integrable Hamiltonian of the form
K(J ; ) := H 1 2 . . . =: H (J , ; ),
(27)
Tn ,
provided that the overall procedure converges on a non trivial domain. Actually such domain results to be a Cantor set; the equality (27) holds precisely
on such set and can be dierentiated innitely many times on it, see [19], [7].
Remark 1. Before stating the KAM theorem, let us summarize the main differences between PoincareLindstedt series and KAM procedures.
In the rst case:
(1) after the mth coordinates transformation the remainder term (depending
upon angles) is of order m+1 ,
(2) the initial datum J0 is kept xed, while the nal frequency h(J0 ; ) (respectively the invariant surface Th(J0 ;) ) varies with , eventually becoming rationally dependent (respectively, a resonant torus).
Concerning the KAM procedure we recall that:
246
(1) after the mth coordinates transformation the remainder term (depenm
ding upon angles) is of order 2 (ignoring the contribution of the small
divisors4 ),
(2) the frequency vector 0 is xed and it is supposed to be strongly rational
independent, so that the corresponding torus T0 is strongly non resonant.
We now proceed to state the KAM theorem as follows. Consider a Hamiltonian system as in (11). As we discussed previously, for = 0 the phase
space of the unperturbed (integrable) system h is foliated by ndimensional
invariant tori labeled by I = I0 . Such tori are resonant or nonresonant according to the fact that the frequency 0 := h(I0 ) is rationally dependent or
not. KAM theorem describes the fate of nonresonant tori under perturbation.
We recall that the three assumptions necessary to prove the theorem are the
following: the smallness of , the strong rational independence of 0 and the
nondegeneracy of the unperturbed Hamiltonian h as given in (26).
Theorem 1 (KAM Theorem). If the unperturbed system is nondegenerate,
then, for a suciently small perturbation, most nonresonant invariant tori
do not breakdown, though being deformed and displaced with respect to the
integrable situation. Therefore, in the phase space of the perturbed system
there exist invariant tori densely lled with quasiperiodic phase trajectories
winding around them. These invariant tori form a majority, in the sense that
the measure of the complement of their union is small when the perturbation
is small.
The proof of the KAM theorem is based on the superconvergent procedure
described above. We remark that the strong rational independence of 0 (see
(25)) plays a central role. In particular, the KAM scheme works if 0 satises
the socalled Diophantine condition
|0 k|
,
|k|
k Zn \ {0},
(28)
> const.
are preserved under the perturbation.
We remark that the KAM result is global, in the sense that for all 0
(0 xed) and for all Diophantine frequency vectors 0 satisfying (28), the
Hamiltonian system (11) admits an invariant perturbed torus with frequency
vector 0 . We refer to [9], [11], [6], [12] for dierent methods to construct
4
Taking into account the contribution of the small divisors, one can nevertheless
obtain a superexponential decay even if it is not strictly necessary for the convergence of the KAM scheme.
247
248
(29)
we can use a leap-frog method, such that if T is the timestep and (rn , n )
denotes the solution at time nT , then (29) can be integrated as
rn+1 = rn + T sin n
n+1 = n + T rn+1 .
Normalizing the timestep to one, we are reduced to the study of the socalled
standard map introduced by Chirikov in [8]:
rn+1 = rn + sin n
n+1 = n + rn+1 ,
(30)
with rn R and n T R/(2Z). We will also consider the generalized standard map, which is obtained replacing the sine term in (30) by any
periodic, continuous function f ():
rn+1 = rn + f (n )
n+1 = n + rn+1 = n + rn + f (n ),
(31)
n
.
n
In the unperturbed case the motion takes always place on an invariant circle.
If Q, the trajectory described by the iteration of the mapping with initial data (r0 , 0 ) is a periodic orbit; if R\Q, the motion lls densely an
invariant curve with frequency , say C0, . When = 0 the system becomes nonintegrable and we proceed to describe the conclusions which can be
drawn by applying perturbation theory. As we described in the previous sections, KAM theorem [15], [1], [17] ensures that if is suciently small, there
still exists for the perturbed system an invariant curve C, with frequency .
249
The KAM theorem can be applied provided that the frequency satises a
strong irrationality assumption, namely the diophantine condition (28), which
can now be rephrased as
|
p
1
|
,
2
q
Cq 2
p, q Z \ {0},
(32)
U (n )
Uj (n ) =
j=1
j=1
lj eiln
U
j
lZ
250
(k 1)
with
F0 = 0,
F1 = 1.
Figure 1 shows in the (, r)plane (with normalized by a factor 2), how the
dierent periodic orbits with frequency 2Fk /Fk+1 approach the goldenmean
invariant curve.
1
1/ 1
0.9
0.8
2/ 3
0.7
3/ 5
0.6
1/ 2
0.5
0
0.2
0.4
0.6
0.8
In order to explore the link between periodic orbits and invariant curves,
it is useful to recall some simple notions of number theory and in particular
about rational approximants.
For any R, let the continued fraction representation of be dened
as the sequence of integer numbers aj , such that
= a0 +
1
a1 +
1
a2 +...
[a0 ; a1 , a2 , ...] .
251
51
= [0; 1, 1, 1, 1...] [0; 1 ] .
2
1
a2
...
The rational numbers {pj /qj } are the best rational approximants to the irrational number .
5.2 Perturbative series expansions
Let us consider the standard map dened in (30); by the relations n+1 n =
rn+1 and n n1 = rn , one obtains that must satisfy the equation
n+1 2n + n1 = sin n .
(33)
Let us look for a periodic solution with frequency = 2p/q, satisfying the
periodicity conditions
n+q = n + 2p
(34)
rn+q = rn .
In analogy to (32), we parameterize the solution as
n = n + u(n ),
u(n )
j=1
uj (n )j =
j=1
min(j,q)
alj sin(ln ),
l=1
(35)
252
where the real coecients alj will be recursively determined by means of (33).
Notice that the reason for which the summation ends at min(j, q) is due
to the fact that one needs to avoid zero divisors, as an explicit computation
shows.
From (33), one nds that u must satisfy the relation
p
p
u(n + 2 ) 2u(n ) + u(n 2 ) = sin(n + u(n )).
q
q
(36)
The coecients alj can be obtained by inserting the series expansion (35) in
(36) and equating same orders of . More precisely, dening l,j such that
j+1
j=0
l=1
sin(n + u(n ))
l,j+1 sin(ln ) j ,
one obtains (see [5]) that the coecients alj in (35) are given by
alj =
lj
.
2[cos(2lp/q) 1]
(37)
p
) 1.
q
and
0 =
2
.
q
253
PN ()
that j=1 uj j Q
+ O(2N +1 ). The shape of the analyticity domain
N ()
will be provided by the zero of the denominators. In particular, having xed
a value for 0 , we consider the series
u(0 ) =
uj (0 )j ,
C.
(38)
j=1
1 2 3 5 8 13 21 34 55 89
, ...}
{ , , , , , , , , ,
2 3 5 8 13 21 34 55 89 144
2
1 12 13 25 38
1
{ , , , ,
, ...}
3 37 40 77 117
2
{
10 11 21 32 53 85
2
, , , ,
,
, ...}
.
2
21 23 44 67 111 178
Figure 2 (see [5]) shows the Pade approximants of the periodic orbits 3/5,
13/21, 34/55, 89/144 (times 2), associated to the mapping (30); the inner
black region denotes the analyticity domain of the invariant curve with frequency . We remark that, as the order q of the periodic orbit grows, the
singularities associated to the periodic orbits approach more and more the
analyticity domain of the goldenmean invariant curve.
Similarly, the Pade approximants corresponding to 1 and 2 and to some
of their rational approximants are shown, respectively, in Figures 3a and 3b
(see [5]).
Let u satisfy (36) and let the radius of convergence of the series u(0 ) =
j
j=1 uj (0 ) be dened as
c(
p
)=
q
254
3/ 5
34/ 55
89/ 144
-1
13/ 21
-2
-2
-1
Fig. 2. Pade approximants of order [200/200] of the golden mean curve and of the
periodic orbits with frequencies 3/5, 13/21, 34/55, 89/144 (times 2) for the map
(30) (after [5]).
a)
b)
1.5
25/77
21/44
0.5
10/21
0.5
38/117
0
-0.5
-0.5
-1.5
-1.5
53/111
13/40
-0.5
0.5
1.5
-1
-1
-0.5
0.5
Fig. 3. Pade approximants of order [200/200] for the invariant curve with rotation number 1 (a) and 2 (b) and for some of their rational approximants in the
framework of the mapping (30) (after [5]).
255
= inf
c(
pk
)=
qk
c (); ,
rn+1 = rn + (sin n +
n+1 = n + rn+1 ,
f13 :
rn+1
n+1
1
sin 2n )
20
1
sin 3n )
= rn + (sin n +
30
= n + rn+1 .
(39)
(40)
(41)
(42)
51
4b) for = 2 2 .
6.2 Lyapunovs method
In order to estimate the radius of convergence of the PoincareLindstedt series
(38), we review the algorithm proposed in [5], to which we refer as the Lyapunovs method. This technique consists in applying the following procedure:
1) consider discrete values of the small parameter from an initial in to a
nal f in with a relative increment (1 + h);
2) for any of these values, compute the distance dk between the truncated
series at order k calculated at with that at (1 + h); more precisely, denoting
by
u(k) (0 ; )
k
j=1
uj (0 )j ,
256
a)
b)
21/34
21/34
8/13
8/13
-1
-1
3/5
-2
-2
-1
-2
3/5
-2
-1
Fig. 4. Pade approximants of order [200/200] for the golden mean curve and some
rational approximants associated to the mapping f12 (a) and f13 (b), respectively
(after [5])
1
N 1
log
k=2
dk
;
d1
4) plot s1 versus (see Figure 5a). Experimentally one notices that all graphs
show an initial almost constant value of s1 as is increased, followed by a
small well, and then by a sharp increase with almost linear behavior;
5) estimate the analyticity radius as follows (compare with Figure 5b): having
xed the order N (see step 3), at which the series is explicitly computed, and
the increment h (see step 1), we interpolate the points before the well with
a straight line. The critical value, say L , is determined as the intersection of
such line with the portion of the curve after the minimum is reached.
Figure 5 (see [5]) shows an implementation of such algorithm for the standard
mapping (30) and the frequency = 234/55; the parameters has been set
as N = 800 and h = 0.001.
6.3 Greenes method
Let C, denote the invariant curve with irrational frequency . We dene the
critical threshold at which C, breaks down as
257
b)
a)
15
s1
s1
2.6
10
2.4
5
2.2
0
0.99
1.01
1.02
1.03
1.04
1.05
1.01
1.011
1.012
1.013
1.014
1.015
1.016
Fig. 5. (a) Plot of s1 versus for = 234/55, N = 800, h = 0.001; (b) zoom of
(a) for 1.01 1.016 and computation of L (after [5]).
, }.
M=
i=1
1 + cos(i ) 1
,
cos(i ) 1
where (1 , ..., q ) are successive points on the periodic orbit associated to the
mapping (30). From the areapreserving property, one has that det M = 1.
Let T be the trace of M , whose eigenvalues are solutions of the equation:
2 T +1 = 0. Then, if |T | < 2, the eigenvalues of M are complex conjugates
on the unit circle and the periodic orbit is stable. On the contrary, if |T | > 2
the eigenvalues are real and the periodic orbit is unstable.
To be concrete, let us x a periodic orbit with frequency 2p/q, such that
it is elliptic for small values of (as well as for = 0). As increases, the
trace of the matrix M exceeds eventually 2 in modulus and the periodic orbit
becomes unstable.
Figure 6a (see [5]) shows the quantity Gr (pk /qk ), which corresponds to
the value of marking the transition from stability to instability of the periodic orbit with frequency 2pk /qk . We selected the sequence of rational
approximants to the golden ratio and we represented with a dotted line the
estimated breakdown value of C, . Figure 6b provides a comparison between
Greenes and Lyapunovs methods, by showing the plot of the relative error
of the quantities c (pk /qk ) and Gr (pk /qk ).
( p /q )
k
Gr k
1/2
1.75
5/8
8/13
( )
Gr
21/34
13/21
0.75
1/2
2/3
0.2
3/5
Gr
3/5
0.3
( p /q )
k k
2/3
1.5
1.25
0.4
( p /q )
k k
2.25
Gr
b)
a)
55/89
34/55
89/144
10
11
12
c ( pk /qk ) -
258
0.1
5/8
8/13
13/21
10
pk
Gr ( q )
k
6.4 Results
In the framework of the standard map (30), we show some results on the behavior of the invariant curves with frequencies , 1 , 2 , where = 2[0; 1 ],
1 = 2[0; 3, 12, 1 ], 2 = 2[0; 2, 10, 1 ]. We have implemented the three
techniques presented in the previous sections, i.e. Pade, Greene and Lyapunov. Such methods depend on the choice of some parameters and of tolerance
errors. In particular, Lyapunovs method depends on the order N of the truncation and on the increment h of the perturbing parameter . For a xed N
the agreement of the three methods is almost optimal taking a suitable value
of the increment, typically h = 0.001.
Table 1 shows a comparison of the three methods for the golden ratio
approximants of the standard map (30). The series have been computed up to
N = 800 and the increment of Lyapunovs method has been set to h = 103
(compare with [5]).
Table 2 (see [5]) provides the results for the invariant curve associated to
the standard map (30) with frequency equal to 1 ; we underline the good
agreement between Pades and Lyapunovs methods, providing estimates of
the analyticity radius. Due to the length of the calculations, it was possible to
compute only the rst few approximants while applying Pades method (up
to 38/117 in Table 2 and up to 85/178 in Table 3).
Similarly, we report in Table 3 the results for the invariant curve associated
to the standard map (30) with frequency equal to 2 (compare with [5]).
In the cases of the mappings f12 and f13 , we found the results reported in
Figure 7 (see [5]), where the abscissa k refers to the order of the approximant
pk /qk of the golden ratio: for example, k = 1 corresponds to 2/3, while k = 9
corresponds to 89/144. With reference to Figure 7, the breakdown threshold,
as computed by Greenes method (squares), amounts to c () = 1.2166 for f12
(see Figure 7a) and to () = 0.7545 for f13 (see Figure 7b). The Lyapunovs
method has been applied with a series truncated at N = 800 and for h = 0.01
Greene Pade
2/3
3/5
5/8
8/13
13/21
21/34
34/55
55/89
89/144
1.5176
1.2856
1.1485
1.0790
1.0353
1.0106
0.9953
0.9862
0.9832
259
Lyap.:h = 103
2.0501
1.4873
1.2495
1.1440
1.0753
1.0366
1.0115
0.9974
0.9909
2.0584
1.4913
1.2561
1.1492
1.0766
1.0397
1.0142
0.9960
0.9897
Table 1. Comparison of Greenes, Pades and Lyapunovs methods for the rational
approximants to the golden ratio in the framework of the mapping (30) (after [5])
p/q
Greene
12/37
13/40
25/77
38/117
63/194
101/311
164/505
265/816
429/1321
0.7486
0.7322
0.7232
0.7145
0.7134
0.7085
0.7073
0.7066
0.7056
0.5730
0.5447
0.5579
0.5539
0.57513
0.54658
0.56063
0.55580
0.55755
0.55688
0.55715
0.55706
0.55705
Table 2. Same as Table 1 for the invariant curve with frequency 1 (after [5]).
p/q
Greene
10/21
11/23
21/44
32/67
53/111
85/178
138/289
223/467
361/756
584/1223
0.7487
0.7198
0.7060
0.6919
0.6869
0.6842
0.6801
0.6785
0.6781
0.6771
0.5395
0.5008
0.5158
0.5084
0.5116
0.5105
0.54165
0.49557
0.51795
0.51058
0.51354
0.51239
0.51281
0.51265
0.51271
0.51268
Table 3. Same as Table 1 for the invariant curve with frequency 2 (after [5])
260
a)
2
2.5
1.75
1.5
1.5
1.25
1
1
0.75
10
0.5
10
Fig. 7. Comparison of the results for the mappings f12 (a) and f13 (b) and for some
rational approximants labeled by the index k; square: Greenes value, diamond:
Pades results, cross: Lyapunovs indicator s2 (after [5]).
The results indicate that there is a good agreement between all methods as far
as the approximants to the golden mean curve associated to (30) are analyzed.
In such case the analyticity domain is close to a circle, so that the intersection
with the positive real axis (providing an estimate of Greenes threshold) almost
coincides with the analyticity radius, which is obtained implementing Pades
and Lyapunovs methods.
Such situation does not hold whenever the analyticity domain is not close
to a circular shape. This happens, for example, if the invariant curves with
rotation numbers 1 = 2 [0; 3, 12, 1 ] and 2 = 2 [0; 2, 10, 1 ] are considered; in these cases the two thresholds (breakdown value and analyticity
radius) are markedly dierent.
References
1. V.I. Arnold. Proof of a Theorem by A.N. Kolmogorov on the invariance of
quasiperiodic motions under small perturbations of the Hamiltonian, Russ. Math.
Surveys 18(9), 1963.
2. V.I. Arnold (editor). Encyclopedia of Mathematical Sciences, Dynamical Systems III, SpringerVerlag 3, 1988.
3. G.A. Baker Jr., P. Graves-Morris. Pade Approximants, Cambridge University Press, New York, 1996.
4. A. Celletti, L. Chierchia. KAM Stability and Celestial Mechanics, Preprint
(2003), http://www.mat.uniroma3.it/users/chierchia/PREPRINTS/SJV 03.pdf
5. A. Celletti, C. Falcolini. Singularities of periodic orbits near invariant curves, Physica D 170(2):87, 2002.
261
Abstract
This text is a brief introduction to Ruelle resonances, i.e. the spectra of
transfer operators and their relation with poles and zeroes of dynamical zeta
functions, and with poles of the Fourier transform of correlation functions.
zm
Tr Am
m
m=1
f (z) = exp
zm
# Fix f m ,
m
m=1
m times
m
264
Viviane Baladi
L(x) =
(y) ,
(y)=x
+
to functions : A
C which only depend on the coecient x0 (this is an
N -dimensional vector space).
To nish, note that since the coecients of A are non-negative, the classical
Perron-Frobenius theorem holds (cf e.g. [3]). For example, if A satises an
aperiodicity assumption, i.e. if there exists m0 such that all coecients of
Am0 are (strictly) positive, then the matrix A (and thus the operator L) has
a simple eigenvalue > 0 equal to its spectral radius, with strictly positive
right Au = u and left vA = v eigenvectors, while the rest of the spectrum is
contained in a disc of radius strictly smaller than . In fact, is the exponential
of the topological entropy of A and the vectors u et v can be used to construct
a A -invariant measure which maximises entropy (cf e.g. [3]).
This situation gives a good introductory example, but it is far too simple:
in general, the transfer operator must be considered as acting on an innitedimensional space on which it is often not trace-class. Nevertheless, one can
still sometimes interpret the zeroes of a dynamical zeta function as the inverses
of some subset of the eigenvalues of this operator.
1.2 Correlation functions and spectrum of the transfer operator
Let us consider a circle mapping f which is a small C 2 perturbation of the map
x 2x (modulo 1). This transformation is not invertible (it has two branches)
and it is locally uniformly expanding (hyperbolicity). Let us associate to f
a weighted transfer operator
L(x) =
f (y)=x
(y)
,
|f (y)|
265
i.e. the jacobian of the inverse branch, implies that the dual of L acting (e.g.)
on Radon measures preserves Lebesgue measure: L()dx = dx.
Recall that if M is a bounded operator on a Banach space B, the essential
spectral radius of M is the smallest 0 so that the spectrum of M outside
of the disc of radius contains only isolated eigenvalues of nite multiplicity
(cf [13, 3]).
In this situation, we can prove quasi-compactness: the spectral radius of L
on C 1 (S 1 ) is 1 while its essential spectral radius ess is < 1 (cf. e.g. [3]). (Note
that the spectrum on L1 (Leb) or C 0 (S 1 ) is too big: on these two Banach
spaces, each point of the open unit disc is an eigenvalue of innite multiplicity
[41].) In fact, for the operator acting on C 1 (S 1 ), we even have a PerronFrobenius-type picture: 1 is a simple eigenvalue for a positive eigenvector 0
(up to normalisation, one can assume that the integral of 0 is 1), while the
rest of the spectrum is contained in a disc of radius with ess < 1.
There is thus a spectral gap. The eigenvalues in the annulus ess < |z| ,
if there are any, are called resonances. To motivate this terminology, let us
describe ergodic-theoretical consequences of these spectral properties. Before
this, note that one can show that the dynamical zeta function
1/|f | = exp
zm
m
m=1
|(f m ) (x)|1
xFix f m
is meromorphic in the disc of radius 1/ ess , where its poles are the inverses of
the eigenvalues of L acting on C 1 (S 1 ), i.e. the resonances (together with the
simple pole at 1).
Let us rst observe that the absolutely continuous probability measure 0
with density 0 (with respect to Lebesgue) is f -invariant: if L1 (Leb)
f 0 dx =
L(( f ) 0 ) dx =
L(0 ) dx =
0 dx .
One can show that 0 is ergodic, therefore the Birkho ergodic theorem says
that for all in L1 (Leb) and 0 -almost every x (i.e., Lebesgue almost every x!),
m1
the temporal averages (1/m) k=0 (f k (x)) converge to the spatial average
d0 .
We shall next see that this measure 0 is exponentially mixing for test
functions 1 , 2 in C 1 (S 1 ). Since the spectral projector corresponding to the
eigenvalue 1 is 0 dx, we have the spectral decomposition
L = 0
dx + PL ,
with P the spectral projector associated to the complement of 1 in the spectrum. This projector satises PLm C m for any < < 1 and for the
operator-norm acting on C 1 . Therefore,
266
Viviane Baladi
1 f m 2 0 dx =
Lm (1 f m 2 0 ) dx =
1 Lm (2 0 ) dx
1 0
2 0 dx + PLm (2 0 ) dx
1 d0
2 d0 +
1 PLm (2 0 ) dx ,
and since
1 PLm (2 0 )dx
|1 |dx PLm (2 0 )
|1 |dx
C1
C1
m ,
C1 2 () =
mZ
C1 ,2 (m) =
1 d0
2 d0
m 0,
m 0,
1 f m 2 d0
2 d0
1 d0
mN
1 P(ei L)m (2 0 ) dx =
mN
mN
P(ei L)m (2 0 ) dx
1 (1 ei PL)1 (2 0 ) dx .
g(y)(y) ,
y:f (y)=x
267
zm
m
m=1
g(f k (x)) .
xFix f m k=0
One can sometimes show that it is meromorphic in a disc where its poles are
in bijection with the resonances.
Denition 3. Dynamical (Ruelle-Fedholm) determinant
Assume moreover that f is (at least) C 1 and set Fixh f m = {x Fix f m |
det(I Df m (x)) = 0}. The dynamical determinant is the power series
m1
k
m
g(f
(x))
z
k=0
.
df,g (z) = exp
m
det(I
Df m (x))
m
m=1
xFixh f
One can sometimes show that this series is holomorphic in a disc which is
larger than the disc associated to f,g , and that in this larger disc, its zeroes
are in bijection with the resonances.
Denition 4. Correlation function
Let be an f -invariant probability measure (for example, a measure absolutely continuous with respect to Lebesgue or an equilibrium state for log |g|).
The correlation function for (f, ) and a class of functions : M C, is the
function C1 ,2 : N C dened for 1 , 2 in this class by
C1 ,2 (m) =
1 f m 2 d
1 d
2 d .
Analogous concepts exist for continuous-time dynamics (ows, in particular geodesic ows in not necessarily constant negative curvature - an example of an intersection between hyperbolic and hamiltonian dynamics). The
corresponding zeta function (s) is then often holomorphic in a half-plane
Re(s) > s0 and it admits a meromorphic extension in a larger half-plane.
We refer to the various surveys mentioned in the bibliography, which contain
references to the fundamental articles of Smale, Artin-Mazur, Ruelle, etc.
268
Viviane Baladi
269
determinant df,g (z) [45, 42] (see also [44, 39] in the hyperbolic case). Let us
also mention the recent results of Collet et Eckmann [40] who show that in
general the essential rate of decay of correlations is slower than the smallest
Lyapunov exponent, contrary to a widespread misconception.
4. The case of continuous-time dynamical systems is much more delicate. A
meromorphic extension of the zeta function of a hyerbolic ow to a half-plane
larger than the half-plane of convergence was obtained in the eighties by Ruelle, Pollicott [76, 74]. ParryPollicott [73] obtained a striking analogue of the
prime number theorem for hyperbolic ows. This result was followed by many
other counting results. Ikawa [86] proved a modied Lax-Phillips conjecture
(see also [98]). However, in order to get exponential decay of correlations, a
vertical strip without poles is required, and this is not always possible: Ruelle
[75] constructed examples of uniformly hyperbolic ows which do not mix exponentially fast. Only recently could Dolgopyat [70, 71] prove (among other
things) exponential decay of correlations for certain Anosov ows, by using
oscillatory integrals. This result has consequences for billiards [97, 94, 91], yet
another hyperbolic/hamiltonian system. Liverani very recently introduced a
new method to prove exponential decay of correlations instead of representing
the ow as a (local) suspension of hyperbolic dieomorphisms under return
times (using the Poincare map associated to Makov sections), he studies directly the semi-group of operators associated to the ow [14, 72].
Despite its length, the bibliography is not complete. We hope that the
decomposition in items, although rather arbitrary, will make it more useful.
We do not mention at all the vast existing literature on sub-exponential decay
of correlations.
References
Surveys and books
1. V. Baladi. Dynamical zeta functions, Proceedings of the NATO ASI Real
and Complex Dynamical Systems (1993), B. Branner et P. Hjorth, Kluwer
Academic Publishers, Dordrecht, pages 126, 1995. See www.math.jussieu.fr/baladi/zeta.ps.
2. V. Baladi. Periodic orbits and dynamical spectra, Ergodic Theory Dynamical
Systems, 18:255292, 1998. See www.math.jussieu.fr/baladi/etds.ps.
3. V. Baladi. Positive Transfer Operators and Decay of Correlations, World Scientic, Singapore, 2000. Erratum available on www.math.jussieu.fr/baladi/erratum.ps.
4. V. Baladi. The Magnet and the Buttery: Thermodynamic formalism and
the ergodic theory of chaotic dynamics, Developpement des mathematiques au
cours de la seconde moitie du XXe si`ecle, Birkh
auser, Basel, 2000. Available on
www.math.jussieu.fr/baladi/thermo.ps
270
Viviane Baladi
5. V. Baladi. Spectrum and Statistical Properties of Chaotic Dynamics, Proceedings Third European Congress of Mathematics Barcelona 2000, pages 203224,
Birkh
auser, 2001. Available on www.math.jussieu.fr/baladi/barbal.ps
6. V. Baladi. Decay of correlations, AMS Summer Institute on Smooth ergodic
theory and applications, (Seattle, 1999), Proc. Symposia in Pure Math. AMS,
69:297325, 2001. See www.math.jussieu.fr/baladi/seattle.ps
7. R. Bowen. Equilibrium states and the ergodic theory of Anosov dieomorphisms,
Springer (Lecture Notes in Math., Vol. 470), Berlin, 1975.
, R. Artuso, R. Mainieri, G. Tanner, G. Vattay. Chaos:
8. P. Cvitanovic
Classical and Quantum, Niels Bohr Institute, Copenhagen, 2005.
9. D. Dolgopyat, M. Pollicott. Addendum to: Periodic orbits and dynamical
spectra, Ergodic Theory Dynam. Systems, 18:293301, 1998.
10. N. Dunford, J.T. Schwartz. Linear Operators, Part I, General Theory,
Wiley-Interscience (Wiley Classics Library), New York, 1988.
11. J.-P. Eckmann. Resonances in dynamical systems, IXth International Congress
on Mathematical Physics, (Swansea, 1988), Hilger, Bristol, pages 192207, 1989.
12. I. Gohberg, S. Goldberg, N. Krupnik. Traces and Determinants of Linear
Operators, Birkh
auser, Basel, 2000.
13. T. Kato. Perturbation Theory for Linear Operators, Springer-Verlag, Berlin,
1984. Second Corrected Printing of the Second Edition.
14. C. Liverani. Invariant measures and their properties. A functional analytic
point of view, Dynamical systems. Part II, pages 185237, Pubbl. Cent. Ric.
Mat. Ennio Giorgi, Scuola Norm. Sup., Pisa, 2003.
15. D.H. Mayer. The Ruelle-Araki transfer operator in classical statistical mechanics, Lecture Notes in Physics, Vol 123, Springer-Verlag, Berlin-New York, 1980.
16. D. Mayer. Continued fractions and related transformations, Ergodic Theory,
Symbolic Dynamics and Hyperbolic Spaces, T. Bedford et al., Oxford University
Press, 1991.
17. W. Parry, M. Pollicott. Zeta functions and the periodic orbit structure of
hyperbolic dynamics, Societe Mathematique de France (Asterisque, vol. 187-188),
Paris, 1990.
18. D. Ruelle. Resonances of chaotic dynamical systems, Phys. Rev. Lett, 56:405
407, 1986.
19. D. Ruelle. Dynamical Zeta Functions for Piecewise Monotone Maps of the
Interval, CRM Monograph Series, Vol. 4, Amer. Math. Soc., Providence, NJ,
1994.
20. D. Ruelle. Dynamical zeta functions and transfer operators, Notices Amer.
Math. Soc, 49:887895, 2002.
21. M. Zinsmeister. Formalisme thermodynamique et syst`
emes dynamiques holomorphes, Panoramas et Synth`eses, 4. Societe Mathematique de France, Paris,
1996.
Analytical framework
22. V. Baladi, H.H. Rugh. Floquet spectrum of weakly coupled map lattices,
Comm. Math. Phys, 220:561582, 2001.
23. D. Fried. The zeta functions of Ruelle and Selberg I, Ann. Sci. Ecole
Norm.
Sup. (4), 19:491517, 1986.
271
24. D. Fried. Meromorphic zeta functions for analytic ows, Comm. Math. Phys.,
174:161190, 1995.
25. A. Grothendieck. Produits tensoriels topologiques et espaces nucleaires,
(Mem. Amer. Math. Soc. 16), Amer. Math. Soc., 1955.
26. A. Grothendieck. La theorie de Fredholm, Bull. Soc. Math. France, 84:319
384, 1956.
27. G. Levin, M. Sodin, P. Yuditskii,. A Ruelle operator for a real Julia set
Comm. Math. Phys., 141:119131, 1991.
28. D. Mayer. On the thermodynamic formalism for the Gauss map, Comm. Math.
Phys., 130:311333, 1990.
29. D. Ruelle. Zeta functions for expanding maps and Anosov ows, Inv. Math.,
34:231242, 1976.
30. H.H. Rugh. Generalized Fredholm determinants and Selberg zeta functions for
Axiom A dynamical systems, Ergodic Theory Dynam. Systems, 16:805819, 1996.
31. H.H. Rugh. Intermittency and regularized Fredholm determinants, Invent.
Math., pages 124, 1999.
Dierentiable framework
39. M. Blank, G. Keller, C. Liverani. Ruelle-Perron-Frobenius spectrum for
Anosov maps, Nonlinearity, 15:19051973, 2002.
40. P. Collet, J.-P. Eckmann. Liapunov Multipliers and Decay of Correlations
in Dynamical Systems, J. Statist. Phys., 115:217254, 2004.
41. P. Collet, S. Isola. On the essential spectrum of the transfer operator for
expanding Markov maps, Comm. Math. Phys., 139:551557, 1991.
42. D. Fried. The at-trace asymptotics of a uniform system of contractions, Ergodic Theory Dynamical Systems, 15:10611073, 1995.
43. V.M. Gundlach, Y. Latushkin. A sharp formula for the essential spectral radius of the Ruelle transfer operator on smooth and Holder spaces, Ergodic Theory
Dynam. Systems, 23:175191, 2003.
44. A. Kitaev. Fredholm determinants for hyperbolic dieomorphisms of nite
smoothness, Nonlinearity, 12:141179, 1999. See also Corrigendum, 17171719.
45. D. Ruelle. An extension of the theory of Fredholm determinants, Inst. Hautes
Etudes Sci. Publ. Math., 72:175193, 1991.
272
Viviane Baladi
273
Flows
70. D. Dolgopyat. On decay of correlations in Anosov ows, Ann of Math.,
147:357390, 1998.
71. D. Dolgopyat. Prevalence of rapid mixing for hyperbolic ows, Ergodic Theory
Dynam. Systems, 18:10971114, 1998.
72. C. Liverani. On contact Anosov ows, Preprint (2002). To apear Ann. of Math.
73. W. Parry, M. Pollicott. An analogue of the prime number theorem for closed
orbits of Axiom A ows, Ann. of Math. (2), 118:573591, 1983.
74. M. Pollicott. On the rate of mixing of Axiom A ows, Invent. Math., 81:413
426, 1985.
75. D. Ruelle. Flots qui ne melangent pas exponentiellement, C. R. Acad. Sci.
Paris Ser. I Math., 296:191193, 1983.
76. D. Ruelle. Resonances for Axiom A ows, J. Dierential Geom., 25:99116,
1987.
274
Viviane Baladi
85. J. Hilgert, D. Mayer. Transfer operators and dynamical zeta functions for a
class of lattice spin models, Comm. Math. Phys, 232:1958, 2002.
86. M. Ikawa. Singular perturbation of symbolic ows and poles of the zeta functions, Osaka J. Math., 27:281300, 1990.
87. S. Isola. Resonances in chaotic dynamics, Comm. Math. Phys, 116:343352,
1988.
88. O. Jenkinson, M. Pollicott. Calculating Hausdor dimensions of Julia sets
and Kleinian limit sets, Amer. J. Math., 124:495545, 2002.
89. D. Mayer. The thermodynamic formalism approach to Selbergs zeta function
for P SL(2, Z), Bull. Amer. Math. Soc., 25:5560, 1991.
90. T. Morita, Markov systems and transfer operators associated with conite
Fuchsian groups, Ergodic Theory Dynam. Systems, 17:11471181, 1997.
91. F. Naud. Analytic continuation of a dynamical zeta function under a Diophantine condition, Nonlinearity, 14:9951009, 2001.
92. F. Naud, Expanding maps on Cantor sets, analytic continuation of zeta functions with applications to convex co-compact surfaces. Preprint, 2003.
93. S.J. Patterson, P.A. Perry. The divisor of Selbergs zeta function for Kleinian groups, Duke Math. J., 106:321390, 2001.
94. V. Petkov. Analytic singularities of the dynamical zeta function, Nonlinearity,
12:16631681, 1999.
95. M. Pollicott and A.C. Rocha. A remarkable formula for the determinant
of the Laplacian, Invent. Math, 130:399414, 1997.
96. M. Pollicott, R. Sharp. Exponential error terms for growth functions on
negatively curved surfaces, Amer. J. Math., 120:10191042, 1998.
97. L. Stoyanov. Spectrum of the Ruelle operator and exponential decay of correlations for open billiard ows, Amer. J. Math., 123:715759, 2001.
98. L. Stoyanov. Scattering resonances for several small convex bodies and the
Lax-Phillips conjecture, Preprint, 2003.
ENS
Lyon 46 allee dItalie
69364 Lyon Cedex 07 (France)
Pierre.Borgnat@ens-lyon.fr
278
Pierre Borgnat
the Eulerian velocity. Both velocities are related via the change of variable
u(r(0); t) = v(r(t); t). The partial derivative equation for this Eulerian velocity is called the Navier-Stokes (NS) equation and reads:
t v
local derivative
(v )v
= (1/ )p +
convective derivative
pressure
viscous friction
f v.
(1)
Here is the density of the uid. The term f v stands for volumic forces in
the uid (electric forces, gravity,...) Internal friction in the uid (supposed
Newtonian) is proportional to the viscosity . Due to this friction, the boundary conditions are taken so that the uid has zero velocity relatively to
the boundaries. The friction will impose also that the motion of the uid will
decay if there is no forcing external to the uid. For an incompressible ow,
the continuity equation completes the problem: v = 0. Remark that the
pressure term is non-local because of a Poisson equation that relates p to v:
p = 2 (vi vj )/xi xj .
The NS equation could be analyzed from its inner symmetries but, because
the boundaries and the forcing will usually not satisfy the same symmetries,
a simple approach adopted by physicists is to study turbulence in open systems far from the boundaries, in order to nd a possible generic behavior
of an incompressible turbulent uid, disregarding the specic geometry of
the boundaries. The purpose of this part is to provide, rst an overview of
the properties of this situation, called homogeneous turbulence, second some
elements of its statistical modeling.
Dimensional analysis of turbulence.
A diculty of the NS equation is the non-linearity of the convective term that
is part of the inertial behavior of the uid. On the one hand, one may expect
solutions with irregular shapes but, on the other hand, the friction term works
to impose some regularity on the solutions. The balance between the two effects is evaluated by engineering dimensional analysis [6]. Let U be a typical
velocity, and L a typical length scale of the full ow (for instance the size of
an experiment). Let us use the symbol for equality of typical values. Then:
(v )v U 2 /L, and v U/L2 . The ratio is the Reynolds number Re
and equals U L/. This is the only quantity left if one takes out the dimensions
from the variables. When Re is large, the non-linear term is dominant and the
ow becomes irregular, with motions at many dierence length scales. Typical turbulent ows seem far from having symmetries: the ow is disordered
spatially; it is unpredictable temporally with strong variations from one time
to another; neither is the velocity clearly stationary, displaying excursions far
from its mean during long period of times. Those events at long time-scales
are mixed with unceasing short-time variations of the velocity.
A turbulent ow is very dierent from a ow with zero viscosity, even for
the situation of fully developed turbulence, when Re . Indeed, the energy
dissipated in the ow is never zero because the irregularity of the solutions
279
0.002
0.012
0.01
dissipation
velocity (m/s)
2.5
1.5
0.008
0.006
0.004
1
0.002
0.5
0.2
0.4
0.6
0.8
time (ms)
0.2
0.4
0.6
0.8
time (ms)
Fig. 1. Left: a typical Eulerian velocity signal v(t). Right: its corresponding surrogate dissipation (derivative of the squared velocity). Those signals were measured
by Pietropinto et al. as part of the GReC experiment [64] of uid turbulence in low
temperature gaseous helium.
280
Pierre Borgnat
281
+
{(DH, X)(t), t R+
} = {X(t), t R }.
v(r; x, t) = h v(r; x, t)
if
0+ .
(2)
This last property is also a prescription of the regularity of the solution because, if this relation holds for small separations |r|, one solution is to have
v(r) = |r|h for small r, and that rules the behavior of the derivative, and consequently of the dissipation. This denes the singularities and peaks expected
in the dissipation signal.
With those symmetries, the only parameters left to describe the velocity
are the mean dissipation , the viscosity , the self-similarity exponent h and
the length-scale r = |r| one considers. Kolmogorov supposes a full scale invariance so that all spatial scales behave the same, sharing the same mean dissipation so that for any r, = C[v(r)]2 /[r/v(r)], where C is some constant.
Thus v(r) = c1 1/3 r1/3 : the velocity has a unique exponent of self-similarity
h = 1/3. The moment of order p of v is called the structure function of order
p and obeys, according to this theory, the following relation:
E {|v(r; x, t)|p } = cp (r)p/3
if
L.
(3)
E is the mathematical expectation, i.e. the mean of the quantity; cp are constants. Here L is an integral scale, that is a characteristic distance of the whole
282
Pierre Borgnat
ow, for instance the scale of the forcing. The scales between and L for which
the scaling of Eq. (3) holds, are called the inertial zone because friction is small
at those scales and the inertial eects are dominant for the NS equation, especially the convection term. Note that for order 3, the exponent is 1 and this is
fortunate because the K
arm
an-Howarth equation derived from the NS equation imposes so [33]. The scaling law for p = 2 imposes the spectrum of the
velocity by means of the Wiener-Khinchin relation. Kolmogorovs well-known
prediction is that the spectrum should be: Sv (k) = c2 5/4 1/4 (k)5/3 if k is
in-between 1/L and 1/ (the inertial zone). This is a property of long-range
dependence: the spectrum and the correlation of the process decrease slowly.
This is in this model related to scale invariance, or self-similarity.
The prediction for the spectrum holds well, as seen on Figure 2. The
structure functions as a function of r are also roughly power laws rp , but not
exactly [9]. But the general prediction of (3) is found failing for other orders.
Indeed, experimental exponents p depart from linearity predicted in p/3. We
report in Figure 2 some properties of the structure functions: they look like
power laws over the inertial range. On the right, we display the evolution of
the exponents p of this power law with the order p of the moment, and the
probability density function of the increments v(r) for dierent r.
Multifractality: Characterization in terms of singularities.
The failure of the previous theory is related to the spatial and temporal intermittency of the dissipation: random bursts of activity exist and the regularity
of the signal changes from one point to another, and so does from one scale
to another. The statistical self-similarity property (2) is now true only if h is
also a random variable that depends on x and t. If this property holds for
0+ , h is called the Holder exponent of the signal at point x. The set of
points sharing the same H
older exponent is a complicated random set that
is a fractal set with dimension D(h). This is a multifractal model [32, 33]
that describes the signal in terms of singularities at small scale. The underlying hypothesis is that all the statistics are ruled by those singularities. The
complementary property of the multifractality is the conjecture of a relation
between the singularity spectrum D(h) and the scaling exponents p , by means of a Legendre transform: D(h) = inf p (hp + 1 p ). Mathematical aspects
of multifractality and of its equality can be found in [37, 38]. Experimentally,
in order to measure the multifractal spectrum that is the core of this model,
one has rst to compute a multiresolution quantity, then use a Legendre transform that is a statistical measure of D(h) from the exponents p . Experiments
now agree with p c1 p c2 p2 /2, where c1 0.370 and c2 0.025; this is a
development in a power series pn and terms pn with n 3 are too small to be
correctly estimated nowadays. The corresponding singularity spectrum D(h)
is 1 (h c1 )2 /2c2 , for values of h such that D(h) 0. The expected value
of h on a set of dimension 1 in the signal is 0.37, close to the 1/3 exponent
predicted by Kolmogorov, but the local exponent uctuates.
log(Structure functions)
ESS
0
2
6
6
7
0.5
10
S1
S2
S
3
S4
283
1.5
2.5
log(r)
3.5
S2
4
(exponential tail)
15
20
25
30
35
40
45
4.5
(almost Gaussian)
5
50
15
10
10
15
v/ (linear scale)
284
Pierre Borgnat
singularities was not proved nor derived from the NS equation, but only in
simpler dynamical systems [30, 31].
Another approach was to relate the uctuations of the exponents h to the
dissipative scales . Beneath the dissipative scale, the velocity is dierentiable: v(r) = rv/x. This small scale regularisation is obtained via a local
dissipative scale [65, 34], dened as the scale where the local Reynolds number
Re(r) = rv(r)/ equals 1. In fact we have v(r) = U (r/L)h(x) if r > (x)
so that Re(r) = rv(r)/ = (l/L)1+h(x) Re. The dissipative scale is uctuating locally as (h) = LRe1/(1+h) , whereas K41 uses a xed dissipative scale
= ( 3 /)1/4 which is now the mean of the (h). Given this behavior, a unied description of the statistics E {|v(r; x, t)|p } was derived, valid both in
the inertial and dissipative scales [22].
Characterization as random cascades.
We hereby test further statistical aspects of the intermittency of the ows; for
this we stick with modeling only the statistics of the ows. A feature of equation (3) is notable: if the equation were true, the random variable v(r)/(r)1/3
should be independent of r [18]. However experimental measurements of the
probability density function (pdf) of v(r) shows that its shape changes with
r, even in the inertial domain; see Figure 2. At large scale (close to L), the pdf
is almost a Gaussian; when probing smaller scale, exponential tails become
more and more prominent: rare intense events are more frequent at small scale
this is the statistical face of intermittency.
This property is best modeled as a multiplicative random process, where
each scale is derived from the larger one. The general class of this model comes from the Mandelbrot martingales [42, 69] and was also developed from
the experimental data in turbulence [59, 23, 63]. The challenge is to model dependencies between scale, for instance by means of multipliers between scales W (r1 , r2 ) dened by v(r2 ) = W (r1 , r2 )v(r1 ). For the density probability function Pr1 (log |v|) at scale r1 , this equation is a convolution between Pr1 and the pdf of the multipliers log(W (r1 , r2 )). Because
the relation holds for every couple of scales, the relevant solutions are innitely divisible distributions. For instance, one can explicitly write [18]:
Pr2 (log |v|) = G [n(r2 )n(r1 )] Pr1 (log |v|), where is a convolution. G is
here the kernel of the cascade, that is the operator that maps the uctuations from one scale r1 to another r2 ; equivalently, it gives the distribution of
log(W (r1 , r2 )). Derived from this, the structure functions read:
(4)
285
obeys a power law with exponents p = H(p). If not, the property is the socalled Extended Self-Similarity because all orders share the same law en(r)
and for instance, with 3 = 1: E {|v(r; x, t)|p } = (E |v(r; x, t)|3 )H(p) , as
illustrated on Figure 2.
1.3 Vortex modeling for turbulence and oscillating singularities
The models reported were built on multi-scale properties of the velocity and
on its singularities, and they are good descriptions of the data. Nevertheless,
these models lack connections with the NS equation and with the structured
organization of turbulent ows which are not purely random ows. One would
like to characterize a ow from its own structures. Experiments of turbulence
show that there are intense vortices: objects similar to stretched laments around which the particles are mainly swirling [25]. The singularities in velocity
signals could then be understood as features of a few organized objects with a
complex inner structuration and a singular behavior near their core [51, 36].
A mechanism could be spiraling structures, analogous to the phenomenon of
a Kelvin-Helmholtz instability [52]. Lundgren studied a specic collection of
elongated vortices having a spiraling structure in their orthogonal section, and
that are solution of the NS equation given a specied strain [46]. It was shown
that such a collection could be responsible for a spectrum in k 5/3 and intermittency of the structures functions consistent with modern measurements of
p [66]. Turbulence is understood in this case as some superposition of building objects with complex geometrical characteristics, such as oscillations or
fractality (now in a geometrical, not statistical, way).
A simple model for corresponding Eulerian velocity signals would be an
accumulation of complex singularities. This is dierent from modeling singularities in complex times in the sense that here the exponent is complex
(t t0 )h+i , not the central time t0 of the singularity. See some examples of
those functions on Figure 5. The exponent is responsible for oscillations in
the signal and multifractal estimation is perturbed by such oscillations [1].
The Fourier spectrum of a function ea(tt0 ) (t t0 )h+i behaves like
4 atan(2/a)
|4 2 2 + a2 |h1 ; except at low frequencies, so when
a,
e
the spectrum scales like ||2h2 . This is a power law so they can be used as
basis functions to built a synthetic signal with properties of turbulence. A sum
of many functions of this kind may have multifractal properties that depend
on the distribution of the h and exponents [17]. One is then interested to
nd whether or not there are such oscillations in velocity signals.
The consequences of the existence of spiraling structures for Lagrangian
velocity would be the existence of swirling motions when a particle is close to
a vortex core. Far from vortices, the motion should be almost ballistic, with
small acceleration. A consequence is that expected Lagrangian trajectories will
go through periods of large acceleration and periods of almost no acceleration.
Non-stationary descriptions would then give interesting characterizations of
the velocity.
286
Pierre Borgnat
The vortices and the swirling motions are described by the vorticity
= v. Vorticity is related to dissipation since = ||2 r . If vortices are relevant features of a ow, vorticity should be strongly organized in
those specic structures. One expects that they can be detected as isolated
objects and a question is their role in intermittency. Hereto the non-stationary
evolution of those objects is an expected feature.
To sum up, the general problem is that one can not easily track at the same
time the three kinds of interesting properties for turbulence: non-stationarity
of the signals; the inner oscillating or geometric structure; and the statistical
self-similar properties (exponent h or multifractality) of the spiraling vortices
or their consequence for velocity.
Alternative representations of signals.
Dealing with these three properties, we know how to construct a representation jointly suited to two of them at the same time. The third one is then
dicult to assess.
1) Time evolution and self-similarity: statistical methods using wavelets
are adapted to multifractal models or random cascades because they probe
statistical quantities of stationary signals with relevant self-similar properties
but no inner oscillations [1, 39].
2) Time evolution and Fourier analysis: modern Lagrangian and vorticity
measurements are made possible by following the instant variation of the
Fourier spectrum of some non-stationary signal. Neither the temporal nor the
spectral representation is enough: time-frequency representations that unfold
the information jointly in time and frequency [28] are needed.
3) Self-similarity and inner geometry: one may be interested in oscillations and self-similarity at the same time. It is known that wavelets are not
well adapted to study oscillations [1]. A variant is to measure geometry in a
non-stationary context (since self-similarity implies non-stationarity). Ad-hoc
procedures constructed on the wavelet transform [43] or on the Mellin-time
representations [17] were considered, but for now without clear-cut results.
The third part of this section is devoted to the Mellin representation that is
adapted to probe self-similarity and some features of geometry because it is
based on self-similar oscillating functions (t t0 )h+i .
To conclude this overview of turbulence, let us summarize the complexity of uid turbulence. The problem is driven by a non-linear PDE that is
reluctant to mathematical analysis. Still we dispose of strong phenomenological properties to build stochastic modeling of the velocity. The signals are
irregular, intermittent and one would like to question their (multi)-fractal
aspects, their singularities but also situations where their geometrical organization or some non-stationary properties are more relevant. Because there
exists no single method that capture all these features, multiple tools of signal
processing are useful.
287
X(t)e
2it
(5)
t=1
288
Pierre Borgnat
v(u)([u t]/a)du/a,
(6)
provided that the mother-wavelet (t) has zero integral. The wavelet is further
characterized by an integer N 1, the number of vanishing moments. The
representation is then blind to polynomial trends of order less than N, and
this gives robustness to the representation regarding the slow, large-period
excursions that one nds in signals of turbulent velocity. Velocity increments
are the poor mans wavelet, setting (u) = (u + 1) (u) and letting
be the scale variable a. This wavelet has only one vanishing moment, N = 1.
With a larger number of vanishing moments, wavelets give good methods of
estimation when one expects power law statistics, such as self-similarity or
multifractality. We report briey two estimation methods.
Wavelet transforms and singularities.
A property of the wavelet transform is that it captures the singularities of a signal on the maximum of the wavelet [48, 37]. Let us assume that v(t ) v(t)
behaves like | |h near point t. If the number N of vanishing moments of
is higher than h, the wavelet transform will have a maximum in the scaletime cone (a, u) dened by |u t| Ca for some constant C. This maximum
289
(7)
(8)
Moreover, it has been proven that the {dX (j, k), k N} form short range
dependent sequences as soon as N 1 > H. This means that they no longer
suer from statistical diculties implied by the long memory property. In
nj
particular, the time averages S(j; p) = 1/nj k=1
|dX (j, k)|p can then be used
as relevant, ecient and robust estimators for E {dX (j, k)p }. The possibility of
varying N brings robustness to these analysis and estimation procedures. The
performance of the estimators was studied, see for instance [3]. One can then
characterize all the statistics of X from the following estimation procedure:
For a velocity signal v, a weighted linear regression of log2 S(j; p) against
log2 2j = j, performed in the limit of the coarsest scales, provides with an
estimate of the exponents p of the structure functions E {|v(r)|p }.
Combining the WTMM idea and the properties of discrete wavelet transform, Jaard proposed an exact characterization of multifractal signals using
wavelet leaders (local maxima of discrete wavelet coecients) [39] that are
now developed as signal processing tools [45].
290
Pierre Borgnat
Fig. 3. Lagrangian velocity of a particle in turbulence (from [50]). Left: the Doppler
signal whose instantaneous frequency gives the velocity of the tracked solid particle
in a turbulent uid, and its time-frequency representation. Right: acceleration, velocity and trajectory, reconstructed for two components from the measurement of
velocity by Doppler eect.
291
A general class is obtained by applying some smoothing in time and/or frequency. Such a distribution represents well the energy of the signal because
of the following physical properties.
1. Marginals in time and frequency: Wv (t, )d = |v(t)|2 ; Wv (t, )dt =
|V ()|2 if V is the Fourier transform of v.
2. Covariances with time and frequency shifts: Wv (t , ) is the transform
of v(t ) and Wv (t, f ) is transform of e2if t v(t).
3. Instantaneous frequency: the mean frequency Wv (t, )d is equal to
the instantaneous frequency of the signal v(t), that is the derivative of the
phase of the analytic signal associated to v(t).
Representations of this kind are used, because of the properties, to analyze the non-stationary signals of Lagrangian experiments and of vorticity
measurements.
Fig. 4. Measurement of vorticity by acoustic scattering [16]. Up: examples of recorded signals for two dierent scattered waves at the same time by the same vo i (k, t) along time t. Down: quadratic timelume: they both represent the same
frequency representation of one signal, exhibiting packets of structured vorticity
advected through the measurement volume. [16]
292
Pierre Borgnat
their Lagrangian velocities u(t) [47, 50]. One solution uses high-speed detectors to record the trajectories, and the second one relies on tracking by sonar
methods. In both cases the experiment deals with a non-stationary signal that
should be tracked in position and value along time. In the second experiment,
ultra-sonor waves are reected by the particle and the Doppler eect catches
its velocity. Figure 3 shows a sample experimental signal whose instantaneous
frequency is the Lagrangian velocity. A time-frequency analysis follows the
instantaneous frequency and thus u(t). Acceleration, velocity and trajectory
are reconstructed from this data. The signals contain many oscillating events
such as the one gured here, and many more trajectories which are almost
smooth and ballistic between short periods of times with strong accelerations.
This is consistent with the existence of a few swirling structure but a clear
connection between oscillations and intermittency is not made. By now, statistical analysis of the data show that Lagrangian velocity is intermittent [50],
and this is well described by a multifractal model analogous to the one for
Eulerian velocity [22].
Measurements of vortices and of vorticity.
Instead of trying to nd indirect eects of the vortices, the intermittency of
turbulence was looked after directly in vorticity. Measuring locally is dicult and by now not reliable. Using the sound scattering property of vorticity,
an acoustic spectroscopy method was developed [16]. The method measures a
i (k, t) = i (r, t)e2ikr dr,
time-resolved Fourier component of vorticity,
summed all over some spatial volume. Figure 4 shows recorded signals of
scattering amplitudes for two dierent incident waves; they look alike because
i (k, t). The intermittency here
both are measurements of the same quantity,
is the existence of bursts of vorticity that cross the measurement volume; those
packets are characteristic of some structuration of vorticity, which could be
vortices. They are revealed in the time-frequency decomposition of one signal on the right. The intermittency is well captured by the description of a
slow non-stationary activity that drives many short-time bursts, and so causes
multi-scale properties [62].
2.4 Mellin representation for self-similarity
Another signal processing method uses oscillating functions as basis functions:
the Mellin transformation. Its interest is that it is encompasses both selfsimilar and oscillating properties in one description. Because those tools are
less known, we will survey some of their properties with more mathematical
details.
Dilation and Mellin representation.
We aim at nding a formalism suited to scale invariance. Self-similarity is a
statistical invariance under the action of dilations. Given exponent H, the
293
+
group {DH, , R+
} is a continuous unitary representation of (R , ) in
2
+ 2H1
the space L (R , t
dt). The associated harmonic analysis is the Mellin representation. Indeed, the hermitian generator of this group is C dened as: 2i(CX)(t) = (H + td/dt)X(t), so that DH, = e2iC . The operator C characterizes a scale because its eigenfunctions are unaected by
scale changes (dilations), so the eigenvalues are a possible measure of scale.
Those eigenvalues EH, (t) satisfy dEH, (t)/EH, (t) = (H + 2i)dt/t, thus
EH, (t) = tH+2i up to a multiplicative constant. One obtains the basis of
Mellin functions with associated representation:
(MH X)() =
X(t) =
+
0
tH2i X(t)
dt
t
(9)
and
LH S LH 1 = DH,e .
(11)
294
Pierre Borgnat
(MH X)() =
=
tH X(t)t2i1 dt
(12)
(13)
+
0
Y ( )e2i d
(14)
cX (k)k
2i1
dk = (M0 cX )()=
X ().
H-ss processes admit also an harmonisable decomposition on the Mellin basis so that X(t) = tH+2i dX(), with uncorrelated spectral increments
dX(). Thus we have E{dX(1 )dX(2 )} = (1 2 ) X (1 )d1 d2 .
Among the tools coming from the Lamperti equivalence, there are scale
invariant lters. A linear operator G is invariant for dilations if it satises
GDH, = DH, G for any scale ratio R+
. Using equation (11), we may
replace DH, by Slog and we obtain the equality:
(LH 1 GLH )Slog = Slog (LH 1 GLH ).
Thus, LH 1 GLH = H is a linear stationary operator, so it acts as a lter
by means of a convolution. The Lamperti transformation maps addition onto
multiplication so that G will act by means of a multiplicative convolution
instead of the usual one:
(GX)(t) =
g(t/s)X(s)
ds
=
s
g(s)X(t/s)
ds
.
s
(15)
Let us consider A = GX with {X(t), t > 0} and H-ss process and G a scale
invariant lter. Then A(t) is also self-similar because
DH, A = DH, GX = (GDH, )X = DH, X.
This lter acts on the Mellin spectrum as a multiplication:
A () = |(MH g)()|2 X ().
295
+
0
g(t/s)V (s)
ds
, with E V (t)V (s) = 2 t2H+1 (t s). (16)
s
The random noise V (t) is white and Gaussian but non-stationary; it is the
image by LH of the Wiener process. The self-similar process X is dened by
g; the second-order properties are covariances given by means of
cX (k) = 2 k H
g(k)g()2H1 d,
(1/2 + 2i)
2
H 2 + 4 2 2 (H + 2i)
(17)
Pierre Borgnat
=6.4, H=0.5
,H
(t)
g(t) E,H(t)
2
4
200
400
600
800
=6.4, H=0.5
200
400
600
800
50
1
0
1
100
150
200
2
0
200
400
600
time
800
1000
250
200
400
600
time
800
60
50
40
30
20
10
0
1000
reassigned spectro.
frequency
(t)
,H
1000
=12, H=0.5
deterministic phase
100
200
300
400
100
200
300
400
500
600
700
800
900
1000
500
600
700
800
900
1000
random phase
296
0
2
4
6
8
time
frequency
time
Fig. 5. Left: Mellin functions with various H, and spectrogram of one smoothed
Mellin function (where g(t) is a Kaiser window) that shows the instaneous frequency path, chirp behavior of the Mellin functions. Middle: samples of WeierstrassMandelbrot functions, both deterministic and random (H = 0.3, = 1.07). Right:
spectrogram of the empirical variogram of a Weierstrass-Mandelbrot function (adapted from [26]). Spectograms are computed here using reassignment techniques for
time-frequency distributions [2].
(H m/ ln ) [i(H+m/ ln )/2]
EH,m/ ln (t).
e
ln
297
The two writings of W (t) are its time-frequency representation and its
time-Mellin scale representation. Both methods of analysis are valid as tools
to assess the characteristics of the function. The relevance comes from the joint
properties of stationary increments and self-similarity (even in the weakened
sense of Discrete Scale Invariance). A time-frequency analysis illustrates this,
see Figure 5. Deterministic and randomized versions of W (t) have a spectrogram (from the detrended empirical variogram) that is made partly of pure
tones, and partly of chirps, that are localized on the Mellin modes = m/ ln .
Here both aspects are shown, depending on the width of the smoothing window with respect to the rapidity of variation of the chirp (one see the chirp
when its frequency does not change quickly over the length of the window)
[26].
Concluding remarks.
We lectured here a signal processing view of turbulence. We have surveyed
how the complexity of turbulence, and the need to understand various models
and experiments, is linked to a great diversity of signal processing methods
that are useful for turbulence: time-scale analysis, time-frequency analysis,
self-similarity and Mellin analysis, and geometrical characterizations.
Concerning the last point, we are far from having at disposal convenient
tools for estimation of the geometry (fractal sets, oscillations,...) of a selfsimilar process. We have proposed here a framework adapted to self-similarity
and based on the oscillating Mellin functions th+2i but a tractable extension to oscillating singularities of the form |t t0 |h+2i is yet to be found.
To be relevant for turbulence, the central point t0 of the singularity has to
be a variable, whereas the Lamperti framework is for a xed central time,
t0 = 0, of the Mellin functions. Consequently, though a mixture of oscillating
functions such as |t t0 |h+2i may have multifractal properties close to the
one measured in turbulence, one lack signal processing tools to inverse the
mixture and estimates the various parameters (t0 , h, ) of each object.
Finally, turbulence is an active, challenging and open eld with many problems that are interesting from a mathematical, physical or signal processing
point of view. This is a subject where one needs to establish fruitful interactions between models, tools of analysis and experimental measurements.
Thanks.
I would like to thank people that helped me by their competence and their
willingness to share their knowledge and ideas. Many thanks thus to Olivier
Michel, Patrick Flandrin and Pierre-Olivier Amblard, with whom I have the
pleasure to work. I am also thankful to C. Baudet, B. Castaing, L. Chevillard,
N. Mordant, J.F. Pinton, and J.C. Vassilicos.
298
Pierre Borgnat
References
1. A. Arneodo, E. Bacry, S. Jaffard, J.F. Muzy. Singularity spectrum of
multifractal functions involving oscillating singularities. J. Four. Anal. Appl.,
4(2):159174, 1998.
2. F. Auger, P. Flandrin. Improving the readability of time-frequency and timescale representations by reassignment methods. IEEE Trans. on Signal Proc.
V, SP-43(5):10681089, 1995.
3. P. Abry, P. Flandrin, M. Taqqu, D. Veitch. Wavelets for the analysis,
estimation, and synthesis of scaling data. In K. Park and W. Willinger, editors,
Self-Similar Network Trac and Performance Evaluation. Wiley, 2000.
4. A. Arneodo, E. Bacry, J.F. Muzy. The thermodynamics of fractals revisited
with wavelets. Physica A, 213:232275, 1995.
5. A. Arneodo, J.F. Muzy, S. Roux. Experimental analysis of self-similarity
and random cascade processes: applications to fully developped turbulence data.
J. Phys. France II, 7:363370, 1997.
6. G. Barenblatt. Scaling, self-similarity, and intermediate asymptotics. CUP,
Cambridge, 1996.
7. G.K. Batchelor. The theory of homogeneous turbulence. Cambridge University Press, 1953.
8. J. Bertrand, P. Bertrand, J.P. Ovarlez. The Mellin transform. In A.D.
Poularikas, editor, The Transforms and Applications Handbook. CRC Press,
1996.
9. R. Benzi, S. Ciliberto, R. Tripicione, C. Baudet, F. Massaioli. Extended
self-similarity in turbulent ows. Phys. Rev. E, 48:R29R32, 1993.
10. J. Beran. Statistics for Long-memory processes. Chapman & Hall, New York,
1994.
11. P. Borgnat, P. Flandrin, P.-O. Amblard. Stochastic discrete scale invariance. Signal Processing Lett., 9(6):181184, June 2002.
12. M. Berry, Z. Lewis. On the Weierstrass-Mandelbrot fractal function. Proc.
Roy. Soc. Lond. A, 370:459484, 1980.
13. A. Blanc-Lapierre, R. Fortet. Theorie des fonctions aleatoires. Masson,
Paris, 1953.
14. J. Barral, B. Mandelbrot. Multifractal products of cylindrical pulses. Probab. Theory Relat. Fields, 124:409430, 2002.
15. E. Bacry, J.F. Muzy. Log-innitely divisible multifractal processes. Comm.
in Math. Phys., 236:449475, 2003.
16. C. Baudet, O. Michel, W. Williams. Detection of coherent vorticity structures using time-scale resolved acoustic spectroscopy. Physica D, 128:117, 1999.
17. P. Borgnat. Mod`eles et outils pour les invariances dechelle brisee : variations
sur la transformation de Lamperti et contributions aux mod`
eles statistiques de
299
300
Pierre Borgnat
41. A.N. Kolomogorov. On degeneration of isotropic turbulence in a incompressible viscuous liquid. Dokl. Akad. Nauk SSSR, 31:538540, 1941.
`re. Sur certaines martingales de Benoit Mandelbrot.
42. J.-P. Kahane, J. Peyrie
Adv. Math., 22:131145, 1976.
43. N. Kevlahan, J.C. Vassilicos. The space and scale dependencies of the selfsimilar structure of turbulence. Proc. R. Soc. Lond. A, 447:341363, 1994.
44. J. Lamperti. Semi-stable stochastic processes. Trans. Amer. Math. Soc.,
104:6278, 1962.
301
1 Introduction
Interferometric gravitational wave detectors are promising instruments to
make the rst direct detection of gravitational waves, and later to permanently open a new window on the universe [1, 2, 3, 4]. A detector like Virgo
aims at observing signals in the 10 Hz - 10 kHz band. The detectors have very
strong noise requirements, bringing challenging designs of control loops.
It was directly observed in 1918 that gravity has some eect on light
propagation, and in particular is able to bend light rays nearby massive objects
like the sun. This eect was predicted by A. Einstein as a consequence of
General Relativity, a relativistic theory of gravitation describing gravitational
elds as the geometry of space-time. Non static gravitational elds can, in this
theory, have some time variable eects on space-time, and be considered as
gravitational waves. Highly energetic astrophysical events are expected to
produce such waves, the observation of which would be of the highest interest
for our understanding of the Universe. These waves in all theoretical studies,
are foreseen very weak (analogous, in an interferometric length measurement,
to a distortion of one interferometer arm L/L 1022 in best cases).
2 Interferometers
Interferometric detection of gravitational waves amounts to continuously measure the length dierence between two orthogonal paths. The right topology
of the instrument is that of a Michelson interferometer [5]. Virgo is the nearest (Pisa, Italy) example of such an interferometric detector of gravitational
waves. It consists essentially of large mirrors suspended by wires in a vacuum,
and of light beams partially reected and/or transmitted by these mirrors
over long distances. The right sensitivity is reached by two steps of light
J.-D. Fournier et al. (Eds.): Harm. Analysis and Ratio. Approx., LNCIS 327, pp. 303311, 2006.
Springer-Verlag Berlin Heidelberg 2006
304
305
power build-up: one step is the resonance caused by the so-called recycling
mirror, a second one is due to the use of Fabry-Perot cavities on each arm.
Recycling has the eect of increasing the light power, Fabry-Perots have the
eect of enlarging the eective lengths of the arms.
3 Servo systems
3.1 Introduction
To make the complex optical structure of an interferometer work, an ensemble
of servo-systems is needed in order to lock the resonant cavities and the lasers
frequency at the right place. The design of the open loop transfer functions
has to make a trade-o between large gain for frequencies below 1 Hz (where
seismic noise is large), stability of the system, and large attenuation of loop
gain above 10 Hz, where the aim is to detect gravitational waves. This is the
reason why sophisticated servo-loops have been studied for several years.
An interferometer as a complex optical structure
An interferometer to detect gravitational waves, is, today, typically made up
of 6 main mirrors. Each arm of the interferometer is made of a long resonant
optical cavity, and an additional mirror, in front of the interferometer, helps
to build up the light power.
EMY
Ly
cavity Y
IMY
ly
PR
laser
IMX
l0
splitter
lx
Lx
EMX
cavity X
Here we will simplify the analysis to the displacements of the mirrors along
the optical axes.
306
The main output of the interferometer, the so-called dark fringe sensitive
to gravitational waves, is measuring the variations of the dierence of the two
cavity lengths Lx Ly; this is the dierential mode of the interferometer (the
static dierence is close to zero). In order for the cavities to work close to
their optimal point, both Lx and Ly have to be controlled at the picometer
level.
The common mode of the interferometer, Lx + Ly, can be used to control
the laser frequency uctuations.
The short Michelson dierence, lx ly, and the recycling cavity length,
l0 + (lx + ly)/2, should be controlled at the order of the picometer as well.
The error signals for these controls are provided by additional outputs of
the interferometer (other ports than the dark fringe), and actuation is done
by means of currents in coils facing magnets glued on the mirrors.
On these four lengths, only the interferometer dierential mode can be
monitored with high signal to noise ratio. Thus, on the other lengths, the
unity gain and gain loop should be designed so that it does not introduce
noise on the dark fringe.
Seismic noise and the suspensions
The spectrum of the seismic noise, in the 10 Hz-10 kHz bandwidth, where
one expects to detect gravitational waves, is orders of magnitude higher than
what is necessary to detect gravitational waves. Thus, seismic noise isolators
are necessary [6, 7].
The suspensions act as low pass lter transfer functions: at the test mass
mirror level, the seismic noise is much attenuated for frequencies above 10 Hz.
But the seismic noise is amplied on resonances (several resonances in the 0.1
Hz - few Hz band), and still quite high on low frequencies. As a result, the
free suspended mirror motion is about 1 m on a 1 second timescale. There
is a need for a loop gain of 106 below 1 Hz. When controlling the degrees
of freedom other than the dierential one, the loop gain should be very low
for frequencies above 10 Hz, in order to not re-inject the error signal noise.
Laser frequency stabilization
A similar feedback loop issue exists for the laser frequency stabilization [8].
The Virgo instrument
requires a very stable laser frequency with a relative
level as low as 1020 / Hz, in the 10 Hz 10 kHz band; the frequency has
also to be reasonably stable for frequencies below 10 Hz, so that the FabryPerot cavities are kept resonant. The stability is ensured by the quality of the
reference oscillator.
The laser frequency in the 10 Hz 10 kHz band is locked on the common
mode of the two long Fabry-Perot cavities. The seismic isolation ensures that
this level of stability can be reached.
307
308
|G(f )| > G1
(1)
f > f2 ,
|G(f )| < G2
(2)
The closed loop should be stable, i.e. 1/(1 + G) should not have any pole
with positive real parts.
The closed loop should be robust. We could dene as commonly done gain
and phase margins, but unfortunately this can still allow low eective margins
for complicated functions. We then require that:
f,
(3)
309
10
10
10
Loop gain
10
10
10
10
10
10
10
10
10
Frequency (Hz)
10
Fig. 4. Two open loop transfer functions, based on the same simplied suspension
system (one resonant pole at 0.6 Hz): Coulons lter (continuous line), engineer
design (dashed line). The Coulons lter is dened for k = 2, f2 /f1 = 278.
Open loop transfer function; Nichols plot
10
10
10
10
10
10
10
10
10
720
630
540
450
360
270
180
Transfer function phase (degrees)
90
90
Fig. 5. Coulons lter used to stabilize a pendulum in the Nichols plot. The dotted
circles correspond to a closed loop overshoot of 2 (corresponding to a gain margin
of 2 and a phase margin of 30 .)
310
10
10
m=2 n=4
m=4 n=6
m=6 n=8
10
10
10
10
10
10
1
10
10
10
f2/f1
Fig. 6. Performances of various Coulons lters, with m zeroes and n poles for the
Gs (f ) function.
The performances of the lters have been computed, depending of the ratio
for various orders of poles and zeroes of the Gs (f ) function: The gure 6
seems to indicate that the lter performances will not be very high on a short
frequency span f2 /f1 = 10, if one increases the order of the lter.
f2
f1 ,
References
1. F. Acernese et al. Status of Virgo. Class. Quantum Grav., 21(5):S385+, March
2004.
2. D. Sigg. Commissioning of Ligo detectors. Class. Quantum Grav., 21(5):S409+,
March 2004.
311
3. B. Willke et al. Status of Geo 600. Class. Quantum Grav., 21(5):S417+, March
2004.
4. R. Takahashi and the TAMA collaboration. Status of Tama 300. Class. Quantum Grav., 21(5):S403+, March 2004.
5. P. Saulson. Fundamentals of interferometric gravitational wave detectors. World
Scientic Publishing Company, Singapore, 1994.
6. F. Acernese et al. The last stage suspension of the mirrors for the gravitational
wave antenna Virgo. Class. Quantum Grav., 21(5):S245+, March 2004.
7. F. Acernese et al. Properties of seismic noise at the Virgo site. Class. Quantum
Grav., 21(5):S433+, March 2004.
8. F. Bondu, A. Brillet, F. Cleva, H. Heitmann, M. Loupias, C.N. Man,
H. Trinquet, and the VIRGO collaboration. The Virgo injection system. Class.
Quantum Grav., 19(7), April 2002.