Mathematics For Photonics 2009-2010
Mathematics For Photonics 2009-2010
Mathematics For Photonics 2009-2010
The aim of this course is to expose students to various mathematical concepts often used in the
field of photonics. Neither mathematical rigour nor extensive detail are envisaged, as the purpose
of this course is to get students acquainted with a wide range of basic principles, which should
allow them to independently refine their knowledge of mathematical topics that might emerge
during their later career. As such, an important emphasis is placed on demonstrating the links of
these mathematical concepts to concrete problems in the field of photonics.
The list of topics treated in this course also illustrates this focus on breath rather than depth:
The first four chapters deal with solving the wave equation in the case of time-harmonic optical
signals, which gives rise to an equation in the complex plane. Chapter 1 therefore gives a brief
introduction to complex calculus. Chapter 2 treats the solution of this equation for various special
cases, for which we will need special functions and orthogonal polynomials. In order to solve this
equation in the general case however, numerical techniques are required, so Chapter 3 will briefly
discuss finite elements, finite differences and expansion in basis functions. Finally, Chapter 4 deals
with the influence of periodicity and symmetry on the solutions of the complex wave equation.
After having treated time-harmonic systems, Chapter 5 will consider dynamical optical systems
where the solutions can have a more general time dependence.
Finally, we wish to thank the people who provided feedback for this course and pointed out er-
rors: Peter Vandersteegen, Kristof Vandoorne, Lieven Verslegers, Karel Van Acoleyen. We are also
grateful to Wim Bogaerts for providing some of the figures for Chapter 4.
Peter Bienstman
i
Contents
1 Complex calculus 1
1.1 The complex wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Functions of a complex variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Derivatives of complex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Integrals of complex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4.2 Cauchy’s integral theorem (Cauchy I) . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.3 Cauchy’s integral formula (Cauchy II) . . . . . . . . . . . . . . . . . . . . . . . 9
1.5 Residue calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.1 Laurent series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5.2 Singularities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5.3 Residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
1.5.4 Residue theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.5 Calculating residues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.5.6 Limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
1.5.7 Cauchy principal value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5.8 Evaluation of integrals on the real axis . . . . . . . . . . . . . . . . . . . . . . . 20
1.6 Conformal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.1 Angle–preserving transformations . . . . . . . . . . . . . . . . . . . . . . . . . 28
1.6.2 Conformal transformations and the Helmholtz equation . . . . . . . . . . . . 29
1.6.3 Example: bent waveguides . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2 Special functions 36
2.1 Bessel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
ii
2.1.1 Bessel’s differential equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.1.2 Generating function for integer order . . . . . . . . . . . . . . . . . . . . . . . 37
2.1.3 Recurrence relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.1.4 Bessel’s differential equation revisited . . . . . . . . . . . . . . . . . . . . . . . 40
2.1.5 Fourier–Bessel series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
2.1.6 Neumann and Hankel functions . . . . . . . . . . . . . . . . . . . . . . . . . . 44
2.1.7 Application: the eigenmodes of an optical fibre . . . . . . . . . . . . . . . . . 46
2.2 Hermite polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
2.2.1 The paraxial wave equation and Gaussian beams . . . . . . . . . . . . . . . . 48
2.2.2 Higher order solutions of the paraxial wave equation . . . . . . . . . . . . . . 48
2.2.3 Generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.2.4 Recurrence relations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
2.2.5 Hermite’s differential equation . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
2.2.6 Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2.7 Normalisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3 Numerical techniques 58
3.1 Finite differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.1.1 Finite difference approximation of derivatives . . . . . . . . . . . . . . . . . . 58
3.1.2 Example: the 1D Helmholtz equation . . . . . . . . . . . . . . . . . . . . . . . 60
3.1.3 Solving linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.1.4 Finite differences in the time domain . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2 Finite elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.1 Basic recipe . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.2.2 Example: the 1D Helmholtz equation . . . . . . . . . . . . . . . . . . . . . . . 65
3.2.3 Finite elements as a variational method . . . . . . . . . . . . . . . . . . . . . . 68
3.3 Eigenmode expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4 Method of weighted residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
iii
4.1.1 Formulation as an eigenvalue problem . . . . . . . . . . . . . . . . . . . . . . 79
4.1.2 Hermiticity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
4.1.3 General properties of a Hermitian operator . . . . . . . . . . . . . . . . . . . . 81
4.1.4 A variational form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
4.2 Using symmetries to classify modes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.2 Inversion symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2.3 Continuous translation symmetry . . . . . . . . . . . . . . . . . . . . . . . . . 87
4.3 1D periodic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.1 Discrete translational symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.3.2 Bloch’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
4.3.3 Brillouin zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.4 A second form of Bloch’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . 91
4.3.5 Band gaps in 1D layered stacks . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
4.4 2D and 3D periodic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4.1 Bloch’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.4.2 Reciprocal lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
4.4.3 Brillouin zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.4.4 Irreducible Brillouin zone . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.4.5 Example: 2D square lattice of dielectric rods . . . . . . . . . . . . . . . . . . . 99
4.4.6 Time reversal symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.5 Applications of photonic crystals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
iv
5.2.7 Sensitive dependence on initial conditions . . . . . . . . . . . . . . . . . . . . 111
5.3 2D discrete systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.1 The Hénon map . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.3.2 Saddle points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
5.3.3 Linear maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
5.3.4 Nonlinear maps and the Jacobian matrix . . . . . . . . . . . . . . . . . . . . . 119
5.3.5 Stable and unstable manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
5.4 Continuous systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4.1 Relation with discrete maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
5.4.2 Phase portraits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
5.4.3 The Lorenz attractor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
v
Chapter 1
Complex calculus
The shortest path between two truths in the real domain passes through the complex
domain.
— Jacques Hadamard
The imaginary number is a fine and wonderful recourse of the divine spirit, almost an
amphibian between being and not being.
— Gottfried Wilhelm Leibniz
Contents
1.1 The complex wave equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Functions of a complex variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.3 Derivatives of complex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.4 Integrals of complex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.5 Residue calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Conformal transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
In this chapter, we will discuss the basic principles of complex analysis. We will define functions
of a complex variable and show how to differentiate and integrate them. In order to facilitate the
calculation of integrals, residue calculus is introduced. Finally, we discuss conformal transforma-
tions as a way to map problems to equivalent ones having an ’easier’ geometry.
We will see that complex analysis has a number of direct applications in photonics, e.g. the
Kramers–Kronig dispersion relations, and the use of conformal transformations to study wave-
guide bends
In the absence of current sources, Maxwell’s curl equations in the time domain are given by
1
∂H(r, t)
∇ × E(r, t) = −µ(r) (1.1)
∂t
∂E(r, t)
∇ × H(r, t) = ε(r) (1.2)
∂t
For the common case of linear isotropic media, the dielectric permittivity ε and the magnetic
permeability µ are scalars, which can however be a function of the spatial coordinate r.
In the next chapters, we will restrict ourselves to the important case where all the fields have a
harmonic time dependence with a fixed frequency ω, e.g. for the x–component of the electric
field:
Similar expressions can be written for the other field components and for the magnetic field. It is
convenient to introduce a complex number Ẽx called a phasor, that is defined as
The relation between this complex phasor and the real electric field is then simply given by
Ex (r, t) = < Ẽx (r)ejωt (1.5)
∂Ex (r, t)
= < jω Ẽx (r)ejωt (1.6)
∂t
Therefore, the phasor corresponding to ∂Ex (r, t)/∂t is given by jω Ẽx (r).
In what follows, we will omit the tilde and implicitly assume that we are always dealing with
phasors. We can also collect the phasors for all field components in complex vectors E and H.
Using phasor notation Eq. 1.1 and 1.2 become
By taking the curl of Eq. 1.7 and plugging Eq. 1.8 in the result we get
2
Let’s for the moment consider a uniform medium without charge ρ. In that case, Maxwell’s diver-
gence laws tell us that ∇ · E(r) = 0, leading to
Now, if we are studying a system which has a piecewise constant refractive index, Eq. 1.12 will
be valid in each of the uniform subregions. So formally we can write for the scalar Helmholtz
equation in a medium with discrete index jumps:
def def
f (z) = f (x + jy) = u(x, y) + jv(x, y) (1.14)
Here, u and v are two real–valued functions of x and y. Unsurprisingly, we call u the real part of
f and v the imaginary part of f .
3
Example 1.2.1. Take
u = x2 − y 2
v = 2xy
Then
f (z) = zz ∗
such that
u = x2 + y 2
v=0
Let us now differentiate complex functions. In analogy with real–valued functions, we would like
to define the complex derivative as
The problem here is that it is entirely possible that the result of Eq. 1.15 is dependent on the
direction of approach to the point z (see Fig. 1.1).
Let us derive sufficient conditions for which the value of the derivative is independent on the
direction of approach.
First, for an approach where δy = 0 and δx → 0, we get
4
Figure 1.1: Different approaches to z in the complex plane.
δf δu + jδv
lim = lim
δz→0 δz δz→0 δx + jδy
δu δv
= lim +j
δx→0 δx δx
∂u ∂v
= +j (1.16)
∂x ∂x
δf δu + jδv
lim = lim
δz→0 δz δz→0 δx + jδy
δu δv
= lim −j +
δy→0 δy δy
∂u ∂v
= −j + (1.17)
∂y ∂y
A necessary condition for the complex derivative to be independent of the direction of approach,
can be derived from equating the real and imaginary parts of Eq. 1.16 and 1.17:
∂u ∂v ∂v ∂u
= , =− (1.18)
∂x ∂y ∂x ∂y
These are called the Cauchy–Riemann conditions. A complex function that satisfies these conditions
is called analytic (or holomorphic). Important to note is that for a function to be analytic, it nec-
essarily has to be continuous and should not contain any singularities, otherwise the derivative
would certainly be undefined.
We will now show that the Cauchy–Riemann conditions are not only necessary, they are also
sufficient.
We can write
5
∂u ∂v ∂u ∂v
df = +j dx + +j dy (1.19)
∂x ∂x ∂y ∂y
such that
∂u ∂v
dx + ∂u ∂v
df ∂x + j ∂x ∂y + j ∂y dy
= (1.20)
dz dx + jdy
or
∂u ∂v
∂u
∂v dy
df ∂x + j ∂x + ∂y + j ∂y dx
= dy
(1.21)
dz 1 + j dx
We now need to prove that this expression is independent on dy/dx in order for the complex
derivatives to be independent of the direction of approach.
Applying the Cauchy–Riemann conditions to the y–derivatives, we obtain
∂u ∂v ∂v ∂u
+j =− +j (1.22)
∂y ∂y ∂x ∂x
Substituting this into Eq.1.21, we get that the dy/dx dependence cancels out to give
df ∂u ∂v
= +j (1.23)
dz ∂x ∂x
This equation also shows that only derivatives with respect to x are needed to calculate the com-
plex derivative.
Exercise 1.1. Are the following functions analytic? If so, calculate their derivative.
a) f (z) = z 2
b) f (z) = zz ∗
c) f (z) = <(z) = x
Exercise 1.2. u and v are the real and imaginary parts, respectively, of an analytic
function w. Show that u and v are harmonic functions, i.e. they
satisfy Laplace’s equation
∇2 u = ∇2 v = 0
6
Figure 1.2: A complex line integral.
1.4.1 Definition
With differentiation under control, it is time to study integrals of complex functions. The integral
of a complex function along a specified path C between z0 and z1 (see Fig. 1.2) is defined by
Z Z
f (z)dz = (u(x, y) + jv(x, y)) (dx + jdy)
C ZC Z
= [u(x, y)dx − v(x, y)dy] + j [v(x, y)dx + u(x, y)dy] (1.24)
C C
So, the integral of a complex function has been expressed in terms of well-known real line in-
tegrals, which are calculated in the normal way by parametrisation of the curve. (Note that in
Eq. 1.24 dx and dy are not independent, but related by the choice of the integration path.)
7
Example 1.4.1. Let’s calculate the contour integral C z n dz where the contour
H
One of two major theorems in complex calculus is Cauchy’s integral theorem, which will suc-
cinctly call Cauchy I in these notes. It states that
I
f (z)dz = 0 (1.25)
C
Important to note is that Eq. 1.25 only holds if f (z) is analytic for allHthe points enclosed by the
contour C, as well as for the points on the contour itself. The symbol is used to emphasise that
the path is closed.
To prove Eq. 1.25, we first apply the definition of the complex contour integral given in Eq. 1.24:
I I
f (z)dz = (u(x, y) + jv(x, y)) (dx + jdy)
C IC I
= [u(x, y)dx − v(x, y)dy] + j [v(x, y)dx + u(x, y)dy] (1.26)
C C
The two (real–valued) line integrals on the right–hand side can be converted to surface integrals
by making use of Stokes’s Theorem:
I ZZ
A · dl = ∇ × A · dS (1.27a)
C S
or in two dimensions (i.e. Az = 0 and the contour contained in the (x, y)–plane):
I ZZ
∂Ay ∂Ax
(Ax dx + Ay dy) = − dxdy (1.27b)
C S ∂x ∂y
8
For the first term in Eq. 1.26, we choose Ax = u and Ay = −v such that
I I
[u(x, y)dx − v(x, y)dy] = (Ax dx + Ay dy)
C
ZCZ
∂Ay ∂Ax
= − dxdy
S ∂x ∂y
ZZ
∂v ∂u
=− + dxdy (1.28)
S ∂x ∂y
Exercise 1.3. Prove that for an analytic function f (z) the line integral
Z z1
f (z)dz
z0
The second major theorem in complex calculus is Cauchy’s integral formula (as opposed to Cauchy’s
integral theorem from the previous section). Cauchy II states that for an analytic function f (z)
I
1 f (z)
dz = f (z0 ) (1.29)
2πj C z − z0
Important here is that z0 is a point enclosed by the contour C. If z0 were outside of the contour,
then the integrand of Eq. 1.29 would be analytic in the entire region enclosed by the contour. In
that case, Cauchy I applies and the integral vanishes.
f (z)
To prove Eq. 1.29, we note that although the function f (z) is analytic, z−z 0
is not because is has a
singularity at z = z0 . However, if we deform C to exclude the singularity as in Fig. 1.3, Cauchy I
does apply:
I I
f (z) f (z)
dz − dz = 0 (1.30)
C z − z0 C2 z − z0
Here, C is the original outer contour and C2 is the circle surrounding the point z0 traversed in a
counterclockwise direction (hence the minus sign). The two sections of the path along the contour
line cancel because f (z) is continuous.
To calculate the integral along C2 , we use the parametrisation z = z0 + rejθ :
9
Figure 1.3: Contour to prove Cauchy’s formula.
f (z0 + rejθ ) jθ
I I
f (z)
dz = rje dθ (1.31)
C2 z − z0 C2 rejθ
Exercise 1.5. By repeating the technique used in Ex. 1.4, show that
I
(n) n! f (z)
f (z0 ) = dz
2πj C (z − z0 )n+1
This also means that if a function is analytic, then the derivatives of any
order automatically exist.
10
Figure 1.4: Contour to derive Laurent’s theorem.
Let’s try to come up with a series expansion for complex functions. Here, special care has to
be taken with respect to the convergence area. Consider e.g. a function that is analytic in some
annular region of inner radius r and outer radius R (Fig. 1.4). In this region, we can construct a
contour that only encloses points where the function is analytic, such that we can apply Cauchy
II:
f (z 0 ) 0 f (z 0 ) 0
I I
1 1
f (z) = 0
dz − dz (1.33)
2πj C1 z −z 2πj C2 z0 − z
Here, z is a fixed point on the inside of the contour, while z 0 is the integration variable running
along the contour. The two line segments cancel once again. Simple manipulation of Eq. 1.33
yields
1/z 0
I I
1 0 1 1/z
f (z) = f (z ) 0
dz 0 + f (z 0 ) dz 0 (1.34)
2πj C1 1 − z/z 2πj C2 1 − z 0 /z
Note that z is on the inside of C1 , such that |z| < |z 0 |, or |z/z 0 | < 1 in the first term in Eq. 1.34.
Similarly, z is on the outside of C2 , such that |z 0 | < |z|, or |z 0 /z| < 1 in the second term in Eq. 1.34.
One can prove that the following straightforward generalisation of the well–known real series
expansion holds:
∞
1 X
= zn, |z| < 1 (1.35)
1−z
n=0
11
With this, we can write Eq. 1.34 as
∞ ∞
f (z 0 ) X z m 0 f (z 0 ) X z 0 n 0
I I
1 1
f (z) = dz + dz (1.36)
2πj C1 z0 z0 2πj C2 z z
m=0 n=0
∞ ∞
f (z 0 )
I I
1 X m 0 1 X −n−1
f (z) = z 0(m+1)
dz + z f (z 0 )(z 0 )n dz 0 (1.37)
2πj C 1
z 2πj C 2
m=0 n=0
or more symmetrically
∞ ∞
f (z 0 ) f (z 0 )
I I
1 X m 0 1 X −n
f (z) = z 0(m+1)
dz + z dz 0 (1.38)
2πj z 2πj z 0(−n+1)
m=0 C 1 n=1 C 2
This means that we can expand f (z) in a power series containing both negative and positive
powers of z:
∞
X
f (z) = am z m (1.39)
m=−∞
This generalisation of a Taylor series in the presence of singularities is called a Laurent series.
However, in the second ring, the previous series does not converge, so this time
we write
∞
1 1/z 1 X 1 1 1 1
f (z) = · = 2 = ··· + 4 + 3 + 2
z 1 − 1/z z zm z z z
m=0
This series converges in the second ring, since |1/z| < 1 there.
12
Exercise 1.6. Develop
ez
f (z) =
z−1
in a Laurent series for the ring 0 < |z| < 1.
Here ez is a function which is analytic in all of C and defined by
∞
def X zm
ez =
m!
m=0
Exercise 1.7. What is the relationship between a Fourier and a Laurent series?
Exercise 1.8. Show that the exact location of the contours to derive Laurent series
does not
matter, as long as the countours don’t cross any singularities.
1.5.2 Singularities
z0 is a singular point of f (z) if f (z) is not analytic there. There are two important types of singu-
larities, poles and branch points, which we will now discuss.
Poles
∞
X
f (z) = am (z − z0 )m (1.40)
m=−∞
If the index m doesn’t run all the way to −∞, but only to a finite value −M , z0 is called a pole of
order M . As a trivial example, f (z) = 1/z has a pole of order 1 at z = 0 (also called a simple pole).
On the other hand, if there is an infinite number of negative m–values for which am is non–zero,
z0 is called an essential singularity.
A pole of order M can be removed by multiplying f (z) by (z − z0 )M . This obviously cannot be
done for an essential singularity.
One can also prove that in a small neighbourhood around an essential singularity, the function
actually takes all possible complex values! So, obviously it makes no sense to talk about the limit
of that function there.
Branch points
13
Figure 1.5: Riemann surface of w = z 1/2 .
For z = ρejθ , a possible solution to Eq. 1.41 is w1 = ρ1/2 ejθ/2 . However, there is a different point in
the w–plane that corresponds to the same z, namely w2 = ρ1/2 ej(θ/2+π) = −w1 . This is the complex
√
analog of the real equation y 2 = x for x > 0, which has two solutions y = ± x.
So, Eq. 1.41 is clearly not a single–valued function: a single point in the z–plane (except z = 0)
corresponds to two points in the w–plane. To get a handle on the multivaluedness of z 1/2 , it’s
instructive to look at a graphical representation of this complex function. Since this is essentially
a 4D–object, we can plot e.g. a 3D projection (<(z), =(z), <(w)) and use the phase of w to colour
the surface. This is done in Fig. 1.5.
Such a representation is called the Riemann surface of the complex square root function. It clearly
shows the multivaluedness of the square root: for each point in the z–plane there are two points
on the Riemann surface. Another way to appreciate this multivaluedness is by looking at how the
argument of w changes if we pick a point on the surface, make a full round trip on the Riemann
surface, and arrive back at our original point. From the figure, it is clear that we will have circled
twice around the origin for that, so one could say that on the Riemann surface, the argument w
needs to change over 4π in order to complete a full round trip. Therefore, this Riemann surface
can be seen as an extension of the complex plane which is ’twice as large’ as the regular complex
plane, so now we can have a 1–to–1 mapping between the z–plane and this extended w–plane.
Although the Riemann surface is a very profound and beautiful mathematical concept, for practi-
cal purposes people often like to work with single–valued functions mapping the complex plane
to the regular (i.e. non-extended) complex plane. To do this, we can adopt a convention to restrict
ourselves e.g. to values in the right half of the w–plane, i.e. where <(w) > 0.
14
Figure 1.6: The mapping w = z 1/2 .
Figure 1.7: The branch cut cuts the Riemann surface in two separate Riemann sheets. Adjacent regions on
the Riemann surface are marked by similar letters.
The boundary <(w) = 0 between these two halves of the w–plane corresponds to the negative real
axis in the z–plane. We call this negative real z–axis a branch cut, originating from the branch point
z = 0 (see Fig. 1.6).
Note that our function is now no longer continuous when crossing the branch cut: for z+ =
√ √
ρej(π−) , we get w+ = ρej(π/2−/2) , whereas for z− = ρej(π+) , we get w− = ρej(π/2+/2) ejπ .
√
(This can be seen by looking at Fig. 1.6.) In the limit of zero , w+ tends to j ρ, whereas w− tends
√
to −j ρ.
The relationship between branch cuts and the Riemann surface is show schematically in Fig. 1.7:
our branch cut cuts the Riemann surface in two separate Riemann sheets, one corresponding to
arguments [−π, π], the other one corresponding to arguments [π, 3π].
It is important to remark that, when choosing a contour to evaluate the integral theorems we’ve
seen so far, we are not allowed to cross a branch cut, as the resulting discontinuity would make
the function no longer analytic. However, skirting on one edge of a branch cut, circling around
the branch point and returning along the other edge is allowed, as the function stays continuous
and analytic throughout the entire path.
15
Note that integrals on both sides of a branch cut will generally not cancel out, unlike integrals on
both sides of the contour cut in Fig. 1.3, which only serves to transform a contour into one which
can be used to apply Cauchy’s theorem.
The choice of branch cut is in no way unique: any line (even a curved one) from z = 0 to infinity
would suit the purpose equally well. Its only goal is to lift the ambiguity with respect to the choice
of w. There are however often physical reasons which inspire the choice of branch cut.
Exercise 1.9. What does the Riemann surface of w = ln(z) look like?
1.5.3 Residues
Suppose z0 is an isolated singularity of f (z). We can develop this function in a Laurent series
around z0 :
∞
X
f (z) = am (z − z0 )m (1.42)
m=−∞
The residue of z0 is defined as a−1 , i.e. the coefficient of 1/z in the Laurent series. Using Eq. 1.38,
we immediately get
I
1
Resz0 = f (z)dz (1.43)
2πj C
16
Figure 1.8: Contour to prove residue theorem.
Suppose f (z) is analytic inside a contour C, except for some isolated singularities zk inside C (so,
branch cuts are not allowed, but e.g. poles are OK). Then
I X
f (z)dz = 2πj Reszk (1.44)
C k
The summation index k runs over all singularities enclosed by the contour. Note that no singular-
ity should lie on the contour itself.
To prove Eq. 1.44, we make use of the contour in Fig. 1.8, on which we can apply Cauchy I (the
straight segments cancel as before):
I XI
f (z)dz − f (z)dz = 0 (1.45)
C k Ck
The integral around any singular point can be written as (Eq. 1.43)
I
f (z)dz = 2πjReszk (1.46)
Ck
Combining Eq. 1.45 and Eq. 1.46 immediately proves the theorem.
The residue theorem is of enormous practical importance, as it allows us to replace the cumber-
some problem of evaluating contour integrals by the algebraic problem of calculating residues
at enclosed singular points. Because of its importance, we will now discuss a number of tech-
niques that allow us to easily calculate the residues, apart from writing down the Laurent series,
or applying Eq. 1.43. The proofs are left as an exercise.
17
Exercise 1.11. If z0 is a simple pole of f (z), use the Laurent series to prove that
g(z)
Exercise 1.13. If z0 is a simple pole of f (z) = h(z) , i.e. z0 is a simple
zero of h(z) and g(z0 ) 6= 0, show that
g(z0 )
Resz0 =
h0 (z0 )
Hint: write h(z) as r(z)(z − z0 ), and apply the result from Ex.
1.11.
The residue theorem is not directly applicable to contours that are infinitely large or contours that
pass through a singularity. For these cases, the following limit theorems (which we will present
without proof) are of great practical importance.
Jordan’s lemma
For a contour lying on the ’proper’ side of a, i.e. the half plane not containing a + β ∗ (see Fig. 1.9),
we have that if
then Z
lim f (z)eβz dz = 0 (1.48)
R→∞ CR
Note that for this theorem (and also for the next limit theorems) CR only includes the circular part
of the contour and not the straight line segments.
The condition that the contour should lie in the proper half plane is to assure that eβz does not
blow up towards infinity, as can be easily verified for the case where a = 0 and β = jb.
For a circular contour spanning an angle α ≤ π with top at z = a (see Fig. 1.10), we have that if
18
Figure 1.9: Jordan’s lemma.
For a circular contour spanning an angle α ≤ π with top at z = a (see Fig. 1.10), we have that if
For the special case where f (z) has a simple pole at z = a, we can easily prove this by integrating
the Laurent series term by term.
19
Figure 1.11: The function 1/x.
R1
Strictly speaking, the same is true for the integral −1 dx/x, as this should be seen as
Z −1 Z 1
dx dx
lim + lim (1.53)
1 →0 −1 x 2 →0 2 x
because each term separately diverges, and the limits are taken independently. However, Fig. 1.11,
suggests that both singularities will cancel out. Mathematically, this means that
Z − Z 1
dx dx
lim + lim =0 (1.54)
→0 −1 x →0 x
if both limits are taken at the same time. This is written succinctly as
Z 1
dx
PV =0 (1.55)
−1 x
where P V stands for the Cauchy principal value and represents the balancing process of Eq. 1.54.
R1
However, −1 dx/x still diverges.
The same balancing process can also be applied to limits at infinity:
Z ∞ Z a
PV f (x)dx = lim f (x)dx (1.56)
−∞ a→∞ −a
Now we have all the necessary tools in our box to tackle the important problem of calculating real–
valued integrals by means of complex contour integration. We will present a number of examples
which illustrate some typical techniques used in this respect. After that, we will also present an
important application in the domain of photonics, namely the Kramers–Kronig dispersion rela-
tions.
20
Example 1
Z 2π
f (cos θ, sin θ)dθ (1.57)
0
z + 1/z
cos θ = (1.58)
2
z − 1/z
sin θ = (1.59)
2j
1 dz
dθ = (1.60)
j z
The integration contour C is the unit circle, and the integral can easily be evaluated using the
residue theorem.
As an example, we will calculate
Z 2π
dθ
I= , || < 1 (1.62)
0 1 + cos θ
or
I
2j dz
I=− (1.64)
C z 2 + (2/)z + 1
1 1p
z=− ± 1 − 2 (1.65)
21
Figure 1.12: Contour for example 2.
Only the pole with the plus sign lies within the unit circle. By using the formula from Ex. 1.11 to
calculate the residue there, we get
" #
2j 1
I = − · 2πj √ (1.66)
z − (− 1 − 1 1 − 2 ) z=− 1 + 1 √1−2
or
2j 1
I=− · 2πj · 2 √ (1.67)
1−
2
So finally
Z 2π
dθ 2π
=√ , || < 1 (1.68)
0 1 + cos θ 1 − 2
Example 2
Let’s calculate
Z ∞
cos λx
dx, λ>0 (1.69)
0 1 + x2
To do this, we will calculate the real part of the following contour integral 1 (Fig. 1.12):
ejλz
I
I= dz (1.70)
C 1 + z2
Alternatively, we could make use of cos z = (ejz + e−jz )/2, but this would require two different contours for each
1
22
The only pole enclosed by the contour is at j, with residue e−λ /(j + j), such that the residue
theorem leads to
R
ejλx ejλz e−λ
Z Z
dx + dz = 2πj = πe−λ (1.71)
−R 1 + x2 CR 1 + z2 2j
Let’s take the limit of R → ∞. Can we apply Jordan’s lemma to the contribution of CR ? If λ > 0,
the contour lies indeed in the ’proper’ part of the complex plane. For the second condition, we
need to calculate the following limit:
1 1
lim 2
= lim (1.72)
z→∞ 1 + z z→∞ 1 + |z| e2j arg z
2
Since e2j arg z always stays bounded, this limit goes to zero for all directions in the complex plane.
So, we can apply Jordan’s lemma and the contribution from CR vanishes:
∞
ejλx
Z
dx = πe−λ (1.73)
−∞ 1 + x2
Example 3
Z ∞ √
x
dx, a>0 (1.76)
0 x2 + a2
We stick to the traditional choice of branch cut, namely the negative real axis (Fig. 1.13). The
contour is placed ’just above’ this branch cut. In order to avoid the branch point at z = 0, we also
make a small detour around the origin.
23
Figure 1.13: Contour for example 3.
z 1/2
I
I= dz =2πjResja
C z 2 + a2
(ja)1/2
=2πj
ja + ja
√ 1j π
ae 2 2
=π
a
π jπ/4
=√ e (1.77)
a
We use the big limit theorem to calculate the contribution of the outer circular part CR to the
integral. For that, we calculate the following limit:
p
z 1/2 |z|ej arg z/2
lim 2 · z = lim · |z|ej arg z (1.78)
z→∞ z + a2 z→∞ |z|2 e2j arg z + a2
Just as before, the complex exponentials involving arg z always stay bounded, so this limit goes to
zero and the contribution from this part of the contour vanishes.
Similarly, we use the small limit theorem to show that the contribution for the semicircle around
the branch point vanishes:
z 1/2 0
lim ·z = 2 =0 (1.79)
z→0 z 2 + a2 a
0 ∞
z 1/2 z 1/2
Z Z
I= dz + dz (1.80)
−∞ z 2 + a2 0 z 2 + a2
Returning from z to x, we get with our choice of branch cut (see the calculation of w+ in the section
on branch cuts) that
24
Figure 1.14: Contour for Kramers–Kronig dispersion relations.
(√
x, x>0
z 1/2 = √ (1.81)
j −x, x<0
such that
√0 Z ∞ √
j −x
Z
x
I= 2 2
dx + dx
−∞ x + a 0 x + a2
2
Z ∞ √
x
=(1 + j) dx
0 x + a2
2
π
= √ ejπ/4 (1.82)
a
So finally
Z ∞ √
x π
2 2
dx = √ (1.83)
0 x +a 2a
Suppose f (z) is an analytic function with lim∞ f (z) = 0 in the upper half of the complex plane.
Applying Cauchy I to the contour in Fig. 1.14 leads to
I
f (z)
dz = 0 (1.84)
C z − x0
The integral over the large semi–circle vanishes because of the big limit theorem. For the con-
tribution of the small semi–circle around the pole x0 we could use the small limit theorem, or
alternatively calculate it directly:
x0 − 0 ∞
f (x0 + ejθ ) jθ
Z Z Z
f (x) f (x)
lim dx + lim je dθ + lim dx = 0 (1.85)
→0 −∞ x − x0 →0 π ejθ →0 x0 + x − x0
25
This leads to
Z ∞
1 f (x)
f (x0 ) = PV dx (1.86)
πj −∞ x − x0
Exercise 1.14. Show that Eq. 1.86 can also be obtained by choosing a contour that
circles around x0 in the lower half of the complex plane.
or
Z ∞
1 v(x)
u(x0 ) = PV dx (1.88a)
π −∞ x − x0
Z ∞
1 u(x)
v(x0 ) = − P V dx (1.88b)
π −∞ x − x0
Equations 1.88 are called the Kramers–Kronig dispersion relations. They state that under the con-
ditions given above, the real part of a function can be expressed as an integral over the imaginary
part and vice–versa. By definition, equations 1.88 also express that the real and imaginary parts
are Hilbert transforms of each other.
In photonics, we apply these relations to the function f (z) = χ(ω) = n(ω)2 − 1. (Remember from
the course “Optical Materials” that P(ω) = ε0 χ(ω)E(ω)). Replacing x0 by ω, and x by ω 0 , we get
∞
=χ(ω 0 ) 0
Z
1
<χ(ω) = PV dω (1.89a)
π −∞ ω0 − ω
Z ∞
1 <χ(ω 0 ) 0
=χ(ω) = − P V dω (1.89b)
π −∞ ω0 − ω
Remember that the imaginary part of the refractive index n is a measure for optical absorption.
So, this means e.g. that in theory it is enough to measure the losses of a material over a wide
wavelength range to calculate its refractive index.
Exercise 1.15. Suppose you have a material without dispersion, i.e. where the real
part of the
squared refractive index is constant. Show that such a material is lossless,
i.e. the imaginary part of the squared refractive index is zero.
26
Recall the definition of the susceptibility is the time domain:
Z t
P(t) = ε0 χ(t − t0 )E(t0 ) dt0 . (1.90)
−∞
From this, we can easily see that χ(t) is necessarily real, as complex numbers only show up in the
frequency domain after the introduction of phasors. From this, one can derive that for its Fourier
transform χ(−ω) = χ∗ (ω), which in turn has some implications on the even and odd character of
the real and the imaginary parts of χ(ω). These can be used to reformulate the Kramers–Kronig
relations in another common form:
Exercise 1.16. Given that χ(t) is real-valued, show that the Kramers–Kronig dis-
persion
relations reduce to
Z ∞ 0
2 ω =χ(ω 0 ) 0
<χ(ω) = P V dω
π 0 ω 02 − ω 2
Z ∞
2 ω<χ(ω 0 ) 0
=χ(ω) = − P V dω
π 0 ω 02 − ω 2
Finally, it’s important to note that one can prove (the Titchmarsh theorem) that there is a relation
between the Kramers–Kronig dispersion relations and the fact that the systems considered are
causal (i.e. effect cannot precede cause). Thus, there is an immediate physical significance of the
Kramers–Kronig dispersion relations.
Exercises
27
Figure 1.15: Conformal transformation.
A complex function w = f (z) = u(x, y) + jv(x, y) can be seen as a mapping from the complex
z–plane to the complex w–plane. A typical way to visualise this is to plot how a curve Cz in the
z–plane is transformed to a curve Cw in the w–plane (see Fig. 1.15).
As long as w = f (z) is an analytic function, we have
df dw ∆w
= = lim (1.91)
dz dz ∆z→0 ∆z
By putting this expression in polar form and looking at the angle, we can derive an expression
relating the angles of the tangents at the points z0 and w0 :
df ∆w
arg = arg lim = arg lim ∆w − arg lim ∆z (1.92)
dz ∆z→0 ∆z ∆z→0 ∆z→0
For an analytic function arg df /dz = α (the angle of the derivative) depends on z, but for a given z
it is independent of the direction of approach. So, if in Fig. 1.15 the angle of the increment ∆z with
respect to the x–axis is θ, and the increment ∆w forms an angle φ with the u axis, we can relate
these angles by
φ=θ+α (1.93)
28
This means that an analytic transformation will rotate any line in the z–plane over an angle of α
in the w–plane. Since this result holds for any line through z0 , it holds for any pair of lines, such
that for the angle between these two lines we get
which means that such a transformation preserves the angle between any pair of lines.
Such an angle–preserving transformation defined by an analytic function is called a conformal
transformation.
Conformal transformations can be useful to solve the Helmholtz equation in a ’difficult’ coordi-
nate system. The idea is to use a conformal mapping to transform the coordinate system to a
different one where the solution is more tractable. In this section, we will investigate what form
the Helmholtz equation takes in the new domain.
Suppose our old (x, y)–coordinate system is related to a new (u, v)–coordinate system by a con-
formal transformation f :
In the z–plane, the Helmholtz equation in media with piecewise constant refractive index has the
following form:
∂ 2 ψ(x, y) ∂ 2 ψ(x, y)
+ + k02 n2 (x, y)ψ(x, y) = 0 (1.96)
∂x2 ∂y 2
∂ψ ∂ψ ∂u ∂ψ ∂v
= + (1.97)
∂x ∂u ∂x ∂v ∂x
∂2ψ ∂ ∂ψ ∂u ∂ψ ∂ 2 u ∂ψ ∂ 2 v
∂ ∂ψ ∂v
= + + + (1.98)
∂x2 ∂x ∂u ∂x ∂u ∂x2 ∂x ∂v ∂x ∂v ∂x2
Or
∂2ψ
2
∂ 2 ψ ∂v ∂u ∂ψ ∂ 2 u
2
∂ 2 ψ ∂u ∂v ∂ψ ∂ 2 v
∂ ψ ∂u ∂ ψ ∂v
= + + + + + (1.99)
∂x2 ∂u2 ∂x ∂u∂v ∂x ∂x ∂u ∂x2 ∂v 2 ∂x ∂u∂v ∂x ∂x ∂v ∂x2
29
" 2 2 #
∂2ψ ∂2ψ ∂2ψ ∂ψ ∂ 2 u ∂ 2 u ∂ 2 ψ ∂u ∂v
∂u ∂u
+ = + + + + 2
∂x2 ∂y 2 ∂u2 ∂x ∂u ∂x2 ∂y 2
∂y ∂u∂v ∂x ∂x
" #
∂2ψ ∂v 2 ∂v 2 ∂ψ ∂ 2 v ∂2v ∂ 2 ψ ∂u ∂v
+ 2 + + + + 2 (1.100)
∂v ∂x ∂y ∂v ∂x2 ∂y 2 ∂u∂v ∂y ∂y
We know from Ex. 1.2 that for a holomorphic function ∇2 u = ∇2 v = 0, so the second and the fifth
term on the right–hand–side of Eq. 1.100 vanish. Also, because of the Cauchy–Riemann equations
we can write
2 2 2 2 2 2
∂u ∂u ∂v ∂v ∂u ∂v def 1
+ = + = + = (1.101)
∂x ∂y ∂x ∂y ∂x ∂x T (u, v)2
It is easy to see that terms three and six cancel as well because of the Cauchy–Riemann conditions.
So finally, we can write the Helmholtz equation in the transformed domain as
∂ 2 ψ(u, v) ∂ 2 ψ(u, v)
+ + k02 T 2 (u, v)n2 (u, v)ψ(u, v) = 0 (1.102)
∂u2 ∂v 2
Formally, this is identical to the Helmholtz equation in the z–plane, except that the index distribution
n has been replaced by a transformed index distribution T · n.
To give an example of the power of conformal transformations, we will use it to find the eigen-
modes of a bent dielectric waveguide (Fig. 1.16). We consider the system to be invariant in the
z–direction.
To tackle this problem, we could formulate the Helmholtz equation in cylindrical coordinates, and
the solution would involve Bessel functions which we will introduce in the next chapter. Here
however, we will use the following conformal transformation to convert the curved geometry to
a straight one:
z
w = Rt ln (1.103)
Rt
r
u + jv = Rt ln + jRt θ (1.104)
Rt
This clearly shows the effect of the transformation: a circular path in the z–plane with constant r
is transformed to a straight path in the w–plane with constant u (see Fig. 1.16).
Let’s now calculate the transformation factor T as defined in Eq. 1.101.
30
Figure 1.16: Calculating bend modes using conformal transformation.
∂u Rt ∂r Rt 2x x
= =p · p = Rt 2 (1.105)
∂x r ∂x x2 + y 2 2 x2 + y 2 x + y2
Also,
∂v ∂ y Rt −1 y
= Rt arctan = 2 · y · 2
= −Rt 2 (1.106)
∂x ∂x x y
1 + x2 x x + y2
So finally
p
1 x2 + y 2 r u
T =p = = = e Rt (1.107)
u2x + vx2 Rt Rt
In the new coordinate system, the transformed index profile looks like
u
nt (u) = n(u)e Rt (1.108)
This profile is sketched in Fig. 1.16. The rest of the solution now proceeds as follows. The contin-
uous index profile is approximated by a stepwise constant profile with a large number of steps.
The modes of such a waveguide can be readily found by using standard techniques (see e.g. the
course ’Microphotonics’).
Even without calculating the modes of the transformed waveguide explicitly, we can gain quali-
tative insight in their behaviour by looking at the new index profile. Because the refractive index
31
is higher near the outside of the bend, the mode will tend to be concentrated there. For very short
bend radii, the mode will leak towards the outer cladding, where the index is even higher. This is
the origin of the radiation loss in waveguide bends.
32
Augustin Cauchy (1789–1857)
33
This would not be the last time his political views would get him into trouble. In 1830 after the
overthrow of King Charles X, all members of the Academy were obligated to swear an oath of
allegiance to the new king. Having already taken an oath to Charles, Cauchy refused. He was
removed from his position and self–exiled to Switzerland without his family. There he became a
professor at the University of Turin and planned to spend the rest of his life working on mathemat-
ics. That was not to be. Two years later, Charles X, now in exile, asked the professor to supervise
the education of his heir Henri. Being a good royalist, he agreed and was joined with his family
in Vienna. His new duties overwhelmed him and his mathematical work lessened to a trickle. He
found his escape in 1838 when he returned to Paris. Before he left, the king had given him the
impressive sounding but practically useless title of baron. He still refused to take the oath and
constantly struggled to find and hold a position. Finally in 1848, the oath was abolished and he
resumed his old posts. Recognising his value to Academy, he was exempted when the oath was
reestablished in 1852.
Augustin Cauchy died on May 23, 1857, after contracting a fever on a trip to the country to help
restore his health. His last words were, ”Men die but their works endure.”
Cauchy’s life was one as unusual and complicated as the times he lived in. Brought up as a devout
Catholic in a time most Frenchmen were opposed to the Church, he suffered prejudice from many
people. However, the discrimination did not discourage him from engaging in his life’s favourite
hobby, charity. When he was not involved in some math problem, he was often working on
some new mercy mission for the less fortunate. On the other hand, he could be bigoted against
those who did not hold his religious views. For example, part of the reason Cauchy delayed the
publication of fellow mathematician Niels Abel’s masterpiece was because the latter called him a
”bigoted Catholic.”
He also tended to be just as opinionated in matters of politics. A supporter of the monarchy,
he came into direct conflict with the supporters of both the republic and Napoleon. Again, he
was both discriminated against and prejudiced against others. On one hand, his life was put
into constant turmoil because of the affair with the oath. On the other hand, he helped repress
the mathematical work of Nicolas Galois because the latter was a radical republican. Certainly,
Cauchy led a very complicated and intricate life.
Cauchy is famous in the field of mathematics for two main reasons: his numerous contributions
to the science and his immense publishing. His works spanned every branch of mathematics and
are simply too long to list. He is especially famous for his works with convergent series and rigour
in analysis. Early in his career, Cauchy developed the criteria for determining if an infinite series
is convergent or divergent. While attending a lecture on the subject, it is told that Laplace be-
came panicked and rushed home. He had just finished his masterpiece that used infinite series
as its backbone and desperately checked each one for convergence, which they did. Cauchy sec-
ond great contribution was setting the groundwork for rigour in analysis and all of mathematics.
Rigour is discovery of the logical foundations of a science. Over the previous centuries, mathe-
maticians had tried in vain to discover what were the underlying principles of calculus and many
had asserted that Newton’s discovery was flawed. Cauchy took the first step toward unifying
the science. First, he defined continuity and derivative in terms of the limit. Second, he gave the
first good definition of the limit as : ”When the values successively assigned to the same variable
indefinitely approach a fixed value, so as to end by differing from it as little as desired, this fixed
value is called the limit of all the others.” Though this is not a mathematical definition, it is a good
approximation of the idea, which would be further clarified by future mathematicians. Other
34
important works include determinants, polygonal numbers, complex numbers and the theory of
substitutions.
Cauchy is also famous for his writings. Simply put, he overwhelmed the mathematics world with
the number and size of his works. All in all, his total output included 789 full length papers, one of
the largest outputs ever. It was not uncommon for him to finish two such papers in one week. In
addition, these works tended to be rather long, sometimes extending for over 300 pages. In fact,
after submitting several large papers to be published in the weekly bulletin, the Academy was
forced to limit submissions to four pages to save their small budget from Cauchy’s pen. However,
all this writing did get his work out into the public and spread his ideas. A lot of his fame can be
assessed to the fact that he simply overwhelmed all his competitors on the bookshelves. Because
of this fact, his name is prominent in almost any analysis textbook.
(Paul Golba, from http://www.shu.edu/projects/reals/history/cauchy.html)
35
Chapter 2
Special functions
We must admit with humility that, while number is purely a product of our minds,
space has a reality outside our minds, so that we cannot completely prescribe its prop-
erties a priori.
– Carl Friedrich Gauss, letter to Bessel
Contents
2.1 Bessel functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.2 Hermite polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
In this chapter, we will solve the wave equations for a few special geometries. Firstly for the case
of a medium with cylindrical symmetry, which will allow us to introduce Bessel, Neumann and
Hankel functions. Secondly, we will look at higher order solutions of the paraxial wave equation,
which requires the use of Hermite polynomials.
In the previous chapter, we derived the Helmholtz equation Eq. 1.11. Writing this in cylindrical
coordinates gives
∂ 2 ψ 1 ∂ψ 1 ∂2ψ ∂2ψ
+ + + + k02 n2 ψ = 0 (2.1)
∂r2 r ∂r r2 ∂θ2 ∂z 2
Here, ψ can represent any of the Cartesian components of the electric or magnetic field, or the
axial coordinates Ez and Hz in cylindrical coordinates.
36
We assume that we are looking at a region of space where n is constant. Let’s try a solution of the
following form, with kθ and kz constant:
Later in this chapter, we will deal with the physical interpretation behind this expression. For the
time being, let’s just substitute it in the Helmholtz equation. This leads to
Or
d2 R(r) dR(r) 2 2 2
r2 + r (k0 n − kz2 ) − kθ2 R(r) = 0
2
+r (2.4)
dr dr
q
def
x=r k02 n2 − kz2 = kt r (2.5)
d2 X(x) dX(x)
x2 2 2
+ x + x − ν X(x) = 0 (2.6)
dx2 dx
Eq. 2.6 is called Bessel’s differential equation. A solution to this equation is given by Jν (x), the
Bessel function of the first kind of order ν.
In what follows, we will mostly consider the important special case where ν is an integer. In
this situation, we will use the notation Jn (x), where n is the integer version of ν and is not to be
confused with the refractive index n in Eq. 2.1.
To investigate the properties of Bessel functions, it is instructive to define them through a generat-
ing function. Let’s introduce a function of two variables:
When we expand this function in a Laurent series in t, we define the coefficients as being the Bessel
functions:
∞
X
ex/2(t−1/t) = Jn (x)tn (2.8)
n=−∞
37
Figure 2.1: The Bessel functions J0 (x), J1 (x) and J2 (x).
Later, we will show that the functions Jn (x) defined in this way do indeed satisfy Bessel’s differ-
ential equation Eq. 2.6. For now, let’s start with deriving a series expansion for Jn (x) by expanding
the exponentials in Eq. 2.8:
∞ r X ∞ x s t−s
X x rt
ext/2 · e−x/2t = · (−1)s (2.9)
2 r! 2 s!
r=0 s=0
Introducing a new variable n = r − s and getting rid of r, we can reorganise this equation to give
∞
"∞ #
X X x n+s 1 x s 1
ext/2 · e−x/2t = · (−1)s tn (2.10)
n=−∞
2 (n + s)! 2 s!
s=0
∞
X (−1)s x n+2s
Jn (x) = (2.11)
s!(n + s)! 2
s=0
This series expansion can be used to numerically calculate the Bessel functions. Fig. 2.1 plots J0 (x),
J1 (x), J2 (x). These functions oscillate, but are not periodic.
Exercise 2.1. Use the definition of the generating function Eq. 2.8 to show that
38
2.1.3 Recurrence relations
∞
x 1 x/2(t−1/t)
X
1+ 2 e = nJn (x)tn−1 (2.12)
2 t n=−∞
∞
X ∞
x 1 n
X
1+ 2 Jn (x)t = nJn (x)tn−1 (2.13)
2 t n=−∞ n=−∞
x x
Jn (x) + Jn+2 (x) = (n + 1)Jn+1 (x) (2.14)
2 2
2n
Jn−1 (x) + Jn+1 (x) = Jn (x) (2.15)
x
Exercise 2.2. Differentiate the generating function with respect to x to show that
J1 (k) 2
3
k −4
k
39
2.1.4 Bessel’s differential equation revisited
n
Jn−1 (x) = Jn (x) + Jn0 (x) (2.16)
x
This can be written as
or after multiplying by x:
This simplifies to
As a next step, we now take another recurrence equation from Ex. 2.3:
n
Jn+1 (x) = Jn (x) − Jn0 (x) (2.22)
x
Replacing n with n − 1, this can be written as
0
xJn (x) = (n − 1)Jn−1 (x) − xJn−1 (x) (2.23)
0
With this, we can eliminate Jn−1 and Jn−1 from Eq. 2.21:
40
2.1.5 Fourier–Bessel series
Bessel functions can be used as a basis set for a series expansion of an arbitrary function. Before we
can tackle this, we require some orthogonality relations for which we need to calculate a certain
type of integral.
Lommel’s integral
d2 φ dφ
x2 + l 2 x2 − n2 φ = 0
2
+x (2.25)
dx dx
d2 ψ dψ
x2 2 2 2
+ x + k x − n ψ=0 (2.26)
dx2 dx
Solutions to these equations are Jn (lx) and Jn (kx) respectively. By multiplying Eq. 2.25 with ψ/x,
Eq. 2.26 with φ/x and subtracting the two results, we get
d2 φ d2 ψ
dφ dψ
φ + l2 − k 2 xφψ = 0
x 2
ψ− 2φ + ψ − (2.27)
dx dx dx dx
such that
Z
dJn (lx) dJn (kx) 2 2
x Jn (kx) − Jn (lx) + C = k − l xJn (lx)Jn (kx)dx (2.29)
dx dx
With prime denoting derivation with respect to argument, we have dJn (kx)/dx = kJn0 (kx) and
dJn (lx)/dx = lJn0 (lx), such that for l 6= ±k
Z
x 0
lJn (lx)Jn (kx) − kJn0 (kx)Jn (lx) + C
xJn (lx)Jn (kx)dx = (2.30)
k2 −l 2
The second recurrence relation from Ex. 2.3 takes the form
n
Jn0 (lx) = Jn (lx) − Jn+1 (lx) (2.31)
lx
Likewise,
n
Jn0 (kx) = Jn (kx) − Jn+1 (kx) (2.32)
kx
41
So, Eq. 2.30 becomes
Z
x
xJn (lx)Jn (kx)dx = [kJn+1 (kx)Jn (lx) − lJn+1 (lx)Jn (kx)] + C (2.33)
k2 − l2
Orthogonality
Suppose that ξi and ξj are two different zeros of Jn (x). Then, from Lommel’s integral Eq. 2.33, it
immediately follows that
Z 1
xJn (ξi x)Jn (ξj x)dx = 0 (2.34)
0
This can be seen as an orthogonality condition that Jn (ξi x) and Jn (ξj x) satisfy.
Fourier–Bessel series
Eq. 2.34 can be used to expand functions in a so–called Fourier–Bessel series. Suppose ξi is the set
of zeros of Jn (x). (One can prove that all ξi are real and that there is an infinite number of them.)
For a function f (x) defined in the interval [0, 1], we can write:
∞
X
f (x) = ai Jn (ξi x) (2.35)
i=0
To determine the unknown coefficients ai , we multiply Eq. 2.35 by xJn (ξm x) and integrate over
[0, 1]. Thanks to the orthogonality relations, we get
Z 1 Z 1
xJn (ξm x)f (x)dx = am xJn2 (ξm x)dx (2.36)
0 0
The integral on the left–hand side of Eq. 2.36 can be calculated analytically or numerically, de-
pending on the nature of f (x). To evaluate the integral on the right–hand side of 2.36, we cannot
use Lommel’s integral Eq. 2.33, because k = l. So we need to calculate this normalisation integral
in a different way, which we will do now.
Normalisation
We need to calculate
Z
I= xJn2 (kx)dx (2.37)
42
which after a change of variables kx = t becomes
Z
1
I= 2 tJn2 (t)dt (2.38)
k
t2 2
Z
1
I= 2
Jn (t) − 2 t2 Jn (t)Jn0 (t)dt (2.39)
2k k
−t2 Jn (t)Jn0 (t) =t2 Jn00 (t)Jn0 (t) + tJn02 (t) − n2 Jn (t)Jn0 (t)
1 0
= t2 Jn02 (t) − n2 Jn2 (t) (2.41)
2
So finally
t2 2 1
Jn (t) + 2 t2 Jn02 (t) − n2 Jn2 (t) + C
I= 2
(2.42)
2k 2k
or
x2 n2
Z
xJn2 (kx)dx = Jn02 (kx) + 1 − 2 2 Jn2 (kx) + C (2.43)
2 k x
1
n2 n2
Z
1 02
xJn2 (ξm x)dx = Jn (ξm ) + 1 − 2 Jn (ξm ) + 2 Jn2 (0)
2
(2.44)
0 2 ξm 2ξm
This reduces to
1
n2
Z
1
xJn2 (ξm x)dx = Jn02 (ξm ) + 2 Jn2 (0)
0 2 2ξm
1 2 n2
= Jn+1 (ξm ) + 2 Jn2 (0)
2 2ξm
1 2
= Jn+1 (ξm ) (2.45)
2
43
where the second step makes use of the recurrence relations and the fact that ξm is a zero of Jn (x).
The last transition is based on the fact that Jn (0) = 0 for n 6= 0 (see Fig. 2.1 or Eq. 2.11).
So, finally we get from Eq. 2.36 the following expression for the expansion coefficients in the
Fourier–Bessel series:
Z 1
2
am = 2 xJn (ξm x)f (x)dx (2.46)
Jn+1 (ξm ) 0
From the theory of differential equations, it can be derived that Bessel’s equation has two linearly
independent solutions. One of them is the Bessel function of the first kind Jν (x). It can be shown
that a second independent solution is given by the Bessel function of the second kind defined by
This function is sometimes also called the Neumann or the Weber function, and sometimes sym-
bolised by Nν (x).
Fig. 2.2 plots the Neumann functions of the lowest three orders. Note the logarithmic type singu-
larity at x = 0.
So the general solution of Bessel’s equation can be written as
Of course, any other linearly independent combination of Jν (x) and Yν (x) can also be used to
express the solution, e.g.
(1) (2)
Here, Hν (x) and Hν (x) are the Hankel functions of the first and second kind respectively, de-
fined by
44
Figure 2.2: The Neumann functions Y0 (x), Y1 (x) and Y2 (x).
Of course, if we are solving Eq. 2.1 instead of Eq. 2.6, the arguments of the functions above are kt r
instead of x. It is instructive to compare these functions to the solutions of the Helmholtz equation
in a 1D Cartesian coordinate system:
The sine and cosine solutions are oscillating solutions which can be interpreted physically as
standing waves. Note the correspondence to Jn (x) and Yn (x) which are also oscillating functions.
Therefore, they can be thought of as the representation of standing waves in a circular coordinate
system, with the only difference that Yn (x) diverges at the origin.
Another way of writing the solutions of Eq. 2.51 is
45
Figure 2.3: Optical fibre.
(1)
It can be proven that for large x the following asymptotic expansions hold: Hν (x) ∼ ejx and
(2) (2) (1)
Hν (x) ∼ e−jx . So, Hν (x) is the cylindrical equivalent of an outgoing plane wave, while Hν (x)
is an incoming plane wave.
Whether to use Bessel/Neumann functions or rather Hankel functions to represent the solution of
Bessel’s differential equation, is usually determined by physical and/or practical considerations,
as we will illustrate next for the case of finding eigenmodes in an optical fibre.
Consider the optical fibre from Fig. 2.3: a central core with radius R made of a material with
refractive index n1 , surrounded by a cladding with refractive index n2 < n1 . The cladding is
taken to be infinitely thick.
Fields in the optical fibre have to satisfy the Helmholtz equation. Just like in Section 2.1.1, we
propose solutions of the form
These kinds of solutions are actually the eigenmodes of this particular waveguide, because of the
form of their z–dependence2 . Also, the fields must be periodic in the θ–direction with period 2π.
This means kθ has to be an integer, which we will call l = 0, ±1, ±2, . . ..
As we’ve seen before, R has to satisfy this equation:
d2 R(r) dR(r) 2 2 2
r2 + r k0 n (r) − kz2 − l2 R(r) = 0
2
+r (2.55)
dr dr
2 = k 2 n2 − k 2 , where i = 1 for the core and i = 2 for the cladding.
We define kt,i 0 i z
We already know that in each of the regions i the solutions of Eq. 2.55 are Bessel functions of order
l with argument kt,i r.
2
These eigenmodes propagate along z. Contrast this with the treatment of bent waveguides in the previous chapter,
where the z–dimension didn’t come in play because of the 2D nature of the problem and where the propagation was
essentially along θ.
46
For the core, we write the general solution as
(1) (2)
R(r) = CHl (kt,2 r) + DHl (kt,2 r) , r>R (2.57)
For physical reasons, there can only be outgoing waves towards infinity, so C = 0. Note that
contrary to kt,1 , kt,2 is imaginary for k0 n2 < kz < k0 n1 , so the fields for guided modes are expo-
nentially decaying in the cladding.
So far, the treatment has been fully vectorial because solutions of the form (2.56) and (2.57) have
to written both for Ez and Hz , and these two fields couple because of the continuity conditions
at r = R. For the sake of simplicity, we will only consider a scalar approximation here, where
the modes are approximately TEM, Ez and Hz are uncoupled, and the continuity conditions are
simplified to the continuity of R and dR/dr. This approximation turns out to be valid for weakly
guided fibres where n1 ≈ n2 .
Imposing the continuity of R and dR/dr leads to
(2)
AJl (kt,1 R) = DHl (kt,2 R) (2.58)
0(2)
Akt,1 Jl0 (kt,1 R) = Dkt,2 Hl (kt,2 R) (2.59)
(2)
Jl (kt,1 R) Hl (kt,2 R)
0 = 0(2)
(2.60)
kt,1 Jl (kt,1 R) kt,2 Hl (kt,2 R)
Since kt,1 and kt,2 are functions of kz , Eq. 2.60 can be used to calculate the kz ’s of the eigenmodes
of the fibre. After that, the field profiles can be calculated from Eq. 2.56 and 2.57.
In the Bachelor’s level Photonics course, Gaussian beams were introduced as solutions to the
paraxial wave equation. We will briefly review this material, and then go on to look for higher
order solutions to this equation. In this process, Hermite polynomials will pop up and we will
study their properties as a model for other orthogonal polynomials.
47
2.2.1 The paraxial wave equation and Gaussian beams
The paraxial wave equation is an approximation to the Helmholtz equation under the so–called
slowly varying envelope approximation (SVEA). This approximation looks for solutions which are
essentially plane waves propgating along the z–direction, but which are modulated by a slowly
varying function A(r):
So,
and
The fact that A(r) is a slowly varying function of z means that we can neglect ∂ 2 A/∂z 2 with respect
to jk∂A/∂zin the previous equation. With this, the Helmholtz equation reduces to
∂A(r)
∇2T A(r) − 2jk =0 (2.64)
∂z
Here, ∇2T stands for the transverse part (∂ 2 /∂x2 ) + (∂ 2 /∂y 2 ) of the Laplacian operator.
A solution to this equation is the Gaussian beam, which is given by
jkρ2
1 − 2q(z)
AG (r) = e (2.65)
q(z)
s
z2
2b0
W (z) = 1+ 2 (2.66)
k b0
Let’s try to find a modulated version of the Gaussian beam which also satisfies the paraxial wave
equation:
48
√ ! √ !
2x 2y
A(x, y, z) = X Y e−jZ(z) AG (x, y, z) (2.67)
W (z) W (z)
Here, AG is the Gaussian beam from Eq. 2.65 and X(), Y () and Z() are three real–valued functions
that we still need to determine such that Eq. 2.67 satisfies the paraxial Helmholtz equation.
For the derivatives of Eq. 2.67 we get
√
∂A 2 0 −jZ ∂AG
= X Ye AG + XY e−jZ (2.68)
∂x W ∂x
and
√
∂2A 2 00 −jZ 2 0 −jZ ∂AG 2
−jZ ∂ AG
= X Y e A G + 2 X Y e + XY e (2.69)
∂x2 W2 W ∂x ∂x2
√
∂2A 2 00 −jZ 2 0 −jZ 2
−jZ ∂ AG
= X Y e A G − 2jkx X Y e A G + XY e (2.70)
∂x2 W2 qW ∂x2
and similar equations for the y–derivatives. For the z–derivative we get
√ √
∂A 2xW 0 0 −jZ 2yW 0
=− X Y e A G − XY 0 e−jZ AG
∂z W2 W2
∂AG
+XY −jZ 0 e−jZ AG + XY e−jZ
(2.71)
∂z
Let’s substitute this in the paraxial equation Eq. 2.64. Because AG is itself a solution of this equa-
tion, the last terms from Eq. 2.70 and 2.71 cancel and we get:
2
X 00 Y + XY 00 e−jZ AG
2
√W
2
xX 0 Y + yXY 0 e−jZ AG
−2jk
Wq
√
2W 0 0 0
−jZ
+2jk xX Y + yXY e AG
W2
−2jkXY −jZ 0 e−jZ AG = 0
(2.72)
√ √ !
X 00 Y 00 2W 0 X0 Y0
1 2
+ − jk − x +y − kZ 0 = 0 (2.73)
W2 X Y Wq W2 X Y
49
or √
X 00 Y 00 W2 X0 Y0
2
+ − jk − W 0W x +y − kW 2 Z 0 = 0 (2.74)
X Y q W X Y
2 0
Using the definitions forp p W /q − W W = −jλ/π. If we now perform the
W and q, it follows that
change of variables u = (2)x/W (z) and v = (2)y/W (z), we get
X 00 (u) X 0 (u)
00
Y 0 (v)
Y (v)
− 2u + − 2v − kW 2 (z)Z 0 (z) = 0 (2.75)
X(u) X(u) Y (v) Y (v)
The left–hand side of this equation is a sum a three terms, each of which is a function of a single
independent variable (u, v and z respectively). Therefore, each of these terms must be equal to
a constant. Equating the first term to −2µ1 and the second to −2µ2 , the third must be equal to
2(µ1 + µ2 ). This separation of variables leads to the following ordinary differential equations:
z2
b0 1 + 2 Z 0 (z) = −(µ1 + µ2 ) (2.78)
b0
From this, it follows immediately that Z(z) = −(µ1 + µ2 ) arctan(z/b0 ). However, the differential
equations for X and Y have no obvious solutions at first sight. In the next sections, we will show
that their solutions are Hermite polynomials, and that µ1 and µ2 are integers.
In similar vein to the treatment of Bessel functions, we will start by introducing a generating
function and then continue to derive recurrence relations which will lead to a differential equation.
2 +2tx
g(x, t) = e−t (2.79)
The Hermite polynomials Hn (x) are defined from the the Laurent series in t of g(x, t) as
∞
2 +2tx
X tn
e−t = Hn (x) (2.80)
n!
n=0
Note the absence of a superscript in Hn (x), which distinguishes them from the unrelated Hankel
functions.
50
Exercise 2.6. Show that
bn/2c
X n!
Hn (x) = (−1)r (2x)n−2r
(n − 2r)!r!
r=0
Similar to the treatment of Bessel functions, we can derive recurrence relations by differentiating
the generating function.
E.g. by differentiating Eq. 2.80 with respect to t, we get
∞
−t2 +2tx
X ntn−1
(−2t + 2x)e = Hn (x) (2.81)
n!
n=0
∞ ∞
X tn X ntn−1
(−2t + 2x) Hn (x) = Hn (x) (2.82)
n! n!
n=0 n=0
This leads to
Direct expansion of the generating function yields that H0 (x) = 1 and that H1 (x) = 2x. With this
and Eq. 2.84, we can iteratively construct all the Hermite polynomials. For reference, Table 2.1 lists
the first Hermite polynomials. Fig. 2.4 plots the first three Hermite polynomials.
Exercise 2.8. Use the definition of the generating function Eq. 2.80 to prove that
(2n)!
H2n (0) = (−1)n
n!
H2n+1 (0) = 0
51
Figure 2.4: The Hermite polynomials H0 (x), H1 (x), H2 (x).
H0 (x) =1
H1 (x) =2x
H2 (x) =4x2 − 2
H3 (x) =8x3 − 12x
H4 (x) =16x4 − 48x2 + 12
H5 (x) =32x5 − 160x3 + 120x
H6 (x) =64x6 − 480x4 + 720x2 − 120
52
Figure 2.5: Some Gauss–Hermite modes.
0
Hn+1 (x) = 2Hn (x) + 2xHn0 (x) − 2nHn−1
0
(x) (2.85)
0
Using the results from Ex. 2.7, we have Hn+1 0
(x) = 2(n + 1)Hn (x) and 2nHn−1 (x) = Hn00 (x):
2(n + 1)Hn (x) = 2Hn (x) + 2xHn0 (x) − Hn00 (x) (2.86)
This reduces to
53
2.2.6 Orthogonality
Just as we did with Bessel functions, we can use Hermite polynomials to expand a function in a
series. In order to do that, we will need to establish the correct orthogonality relation between
Hermite polynomials and normalise them.
Consider the following two differential equations:
2 2 2
e−x φ00 ψ − ψ 00 φ − e−x 2x φ0 ψ − ψ 0 φ + e−x 2(m − n)φψ = 0
(2.90)
h 2 i0 2
e−x φ0 ψ − ψ 0 φ = e−x 2(n − m)φψ (2.91)
2
The right hand side is equal to zero, because e−∞ goes to zero more quickly than any polynomial.
So in the end we get
Z ∞
2
e−x Hn (x)Hm (x)dx = 0, n 6= m (2.93)
−∞
This is the orthogonality relation for Hermite polynomials: they are orthogonal over the interval
2
[−∞, ∞] with the weighting function e−x .
2.2.7 Normalisation
2
We do this by multiplying the generating function Eq. 2.80 by itself and by e−x :
54
∞
−x2 −t2 +2tx −s2 +2sx
X 2 tn sm
e e e = e−x Hn (x) Hm (x) (2.95)
n! m!
m,n=0
When we integrate this over x from −∞ to ∞, the terms with m 6= n on the right–hand side drop
out because of the orthogonality relation Eq. 2.93:
∞ ∞ Z ∞
(st)n
Z
2 2 +2tx 2 +2sx 2
X
e−x e−t e−s dx = e−x Hn2 (x) dx (2.96)
−∞ n!n!
n=0 −∞
Z ∞ Z ∞
−x2 −t2 +2tx −s2 +2sx 2
e e e dx = e−(x−s−t) e2st dx
−∞ −∞
√
= πe2st
∞
√ X 2n (st)n
= π (2.97)
n!
n=0
By equating like powers of st in the the right–hand sides of Eq. 2.96 and 2.97 we get the value of
the normalisation integral:
Z ∞ √
2
e−x Hn2 (x)dx = 2n n! π (2.98)
−∞
With this we can finally write the complete expression to expand a function f (x) in a series of
Hermite polynomials:
∞
X
f (x) = an Hn (x) (2.99)
n=0
with
Z ∞
1 2
an = n √ e−x Hn (x)f (x)dx (2.100)
2 n! π −∞
R∞ 2
qR
∞ R∞
3
To calculate −∞
e−x dx, write it as −∞
e−x2 dx −∞
e−y2 dy and transform it to polar coordinates with x2 +y 2 =
r2 and dxdy = rdrdθ.
55
2.2.8 Exercises
2 dn −x2
Hn (x) = (−1)n ex e
dxn
56
Friedrich Wilhelm Bessel (1784–1846)
57
Chapter 3
Numerical techniques
Understanding grows only logarithmically with the number of floating point opera-
tions
— J.P. Boyd
Contents
3.1 Finite differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.2 Finite elements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.3 Eigenmode expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
3.4 Method of weighted residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
In this chapter, we will briefly touch upon a number of techniques to numerically solve the wave
equation in cases where no analytic solution is available. Different methods will be introduced:
finite differences, finite elements, eigenmode expansion, and finally a more abstract framework,
namely the method of the weighted residuals. Each of these subjects could be the topic of an entire
course in its own right, so we will only be able to introduce the basic principle of these methods
without going into any level of detail. For more details on these and other methods, we refer to
the course ”Computational solutions of wave problems”.
In this method, we do not seek to find the solution to the wave equation in the entire domain of
interest, but only in a limited number of discrete points inside the simulation domain. The set of
discretisation points is called the mesh, and the distance between discretisation points is called the
58
mesh size. Obviously the hope is that as the mesh becomes finer and finer, the solution that we
obtain converges to the true solution of the wave equation.
Since we only have information about the solution at discrete points, we will need to approximate
the derivatives that figure in the wave equation using data from those discrete points. To de-
velop these so–called finite difference approximations for derivatives, we start from the following
truncated Taylor expansions (for a 1D case):
df (∆x)2 d2 f (∆x)3 d3 f
+ O ∆x4
f (x + ∆x) = f (x) + ∆x + 2
+ 3
(3.1)
dx 2 dx 6 dx
df (∆x)2 d2 f (∆x)3 d3 f
+ O ∆x4
f (x − ∆x) = f (x) − ∆x + 2
− 3
(3.2)
dx 2 dx 6 dx
df f (x + ∆x) − f (x)
= + O (∆x) (3.3)
dx ∆x
This is called the forward difference approximation of df /dx. Since the error term is O(∆x), this is
called a first order approximation.
Similarly from Eq. 3.2:
df f (x) − f (x − ∆x)
= + O (∆x) (3.4)
dx ∆x
df f (x + ∆x) − f (x − ∆x)
+ O ∆x2
= (3.5)
dx 2∆x
This formula is second–order accurate thanks to the cancellation of terms in the Taylor expansion.
Another way of looking at this is that the central difference involves both neighbouring points, so
there is more balanced information on the local behaviour of the function, which makes it more
accurate.
Forward, backward and central differences are illustrated in Fig. 3.1.
By adding Eq. 3.1 and Eq. 3.2, we get an approximation for the second derivative:
If we want to get more accurate results, we could either use a finer mesh or add more information
by including higher order neighbours like f (x − 2∆x) and f (x + 2∆x).
59
Figure 3.1: The arc AB represents backward differences, BC forward differences and AC central differences.
Exercise 3.1. Derive the following higher order approximation of the second
derivative:
Illustrating finite difference methods is best done using a simple example, so we will solve the 1D
Helmholtz equation in a uniform medium:
d2 f (x)
+ k 2 f (x) = 0 (3.7)
dx2
We solve this problem at a fixed wavelength, i.e. a fixed value of k. The discretised version of
Eq. 3.7 reads
To make this example even more concrete, let us assume a uniform grid with ∆x = 1 from x = 1
to x = 5. To illustrate boundary conditions, let us further assume that the values of f are fixed at
the boundaries, e.g. f (1) = f1 and f (5) = f5 . This could be from a source forcing the electric field
to a constant value, or from a metal wall which forces the electric field to zero.
We still need to determine the following unknowns: f (2), f (3) and f (4).
The discretised version of Eq. 3.7, including the boundary conditions, reads
60
f (1) =f1 (3.9)
2
f (1) − 2f (2) + f (3) + k f (2) =0 (3.10)
f (2) − 2f (3) + f (4) + k 2 f (3) =0 (3.11)
2
f (3) − 2f (4) + f (5) + k f (4) =0 (3.12)
f (5) =f5 (3.13)
1 0 0 0 0 f (1) f1
1 k 2 − 2 1 0 0 f (2) 0
2−2
0
1 k 1 0 f (3) = 0
(3.14)
0 0 1 k2 − 2 1 f (4) 0
0 0 0 0 1 f (5) f5
So, we’ve reduced the problem of solving a differential equation to the problem of solving a linear
system of equations of the form A · x = b.
It is important to realise that the discretisation step that we have introduced replaces the original
equation with a new one, and that even an exact solution of the discretised problem will only
yield an approximate solution of the original problem. The error we have introduced in this way,
is called the discretisation error. The hope is that by refining the mesh size, we get successively
smaller discretisation errors.
In order to solve the linear system A · x = b that arises from formulating a problem using finite
differences, there are a number of options.
First of all, we can use a direct method, i.e. explicitly invert the matrix A. Normally, this takes
O(N 3 ) operations, so it doesn’t scale very well with large problem sizes. However, inspecting
Eq. 3.14, we notice that the discretisation of the wave equation yields a tridiagonal matrix, where
only three diagonals are non–zero. Matrix equations of this type can be solved quite efficiently,
using a fairly compact variant of Gaussian elimination, taking only O(N 2 ) operations.
A second philosophy is to give up the notion of solving the system exactly, and search for a faster
approximate solution, converged relative to the discretisation error but perhaps not relative to
machine round–off. This seems acceptable, especially since the algebraic equations are only an
approximation to the continuum equations anyway, subject to discretisation error.
A way of constructing such an approximate solution is to use an iterative method: starting from an
initial guess (e.g. setting all unknowns to zero), successive approximations are constructed in the
hope that after a sufficient number of iterations, the estimate converges to the true solution. There
61
exists a plethora of literature on iterative methods, which we obviously cannot treat in detail here,
but in order to get a flavour for the basic ideas, we will now discuss one of the most basic iterative
schemes (which unfortunately has limited practical use in its most simple incarnation).
Let’s return to the discretised version of the Helmholtz equation Eq. 3.8:
n
fi+1 − 2fin + fi−1
n
+ ∆x2 k 2 fin = α(fin+1 − fin ) (3.16)
This can be understood as follows. If the method has converged, then the successive estimates
won’t change significantly anymore, such that fin+1 ≈ fin . But this means that the right–hand side
of Eq. 3.16 will be equal to zero, or alternatively that fin is indeed the solution of Eq. 3.15.
The (positive) parameter α is called the relaxation parameter, which can be fine–tuned by the user
of the method. A small value of α means that the f values will change rapidly from iteration to
iteration, but with the risk of taking such large steps that the method won’t converge. On the
other hand, a larger value of α reduces the risk of diverging solutions, but on the other hand more
iterations will be needed to reach convergence.
To clarify this further, we will now investigate when the Jacobi method diverges, i.e. when the f
values become unbounded for a bounded initial estimate.
To study convergence, we take an approach that is based on Fourier analysis: any wave used as
the initial estimate can be written as a sum of Fourier components e−jkx x , which are nothing other
than plane waves with wavevector kx . Let’s take such a plane wave and investigate what happens
when iterating through the Jacobi method. If the amplitude of such a wave stays bounded when
iterating, and this is true for all admissible wavevectors kx , then we can be sure that whatever
initial estimate will be used, the method will not produce an unbounded result at some time.
Our plane waves take the following form:
62
n −jkx i∆x 1 +jkx ∆x
fin+1 =A e 1+ e − 2 + e−jkx ∆x + ∆x2 k 2 (3.18)
α
or
fin+1 1
2 cos(kx ∆x) − 2 + ∆x2 k 2
n =1+ (3.19)
fi α
In order for this not to diverge, the ratio |fin+1 /fin | should be smaller than 1 for any positive α.
This happens if
The left–hand side of this equation lies between -4 and 0 and reaches its maximum for kx = 0,
which is an important case as it corresponds to a uniform solution. Since it’s very likely that the
final solution has a DC component, we better make sure that the method converges in this case.
However, for this case we can only satisfy Eq. 3.20 if k = 0. So this means that unfortunately the
Jacobi method is non–convergent for the general Helmholtz equation, but only works for Laplace’s
equation, i.e. the special case where k = 0.
Without going into any detail, we finally want to mention the existence of a very popular method
to solve Maxwell’s equations in the time domain. This finite–difference time–domain method (FDTD)
uses central differences to discretise the full vectorial Maxwell’s equations both in space and time.
It uses a staggered grid, i.e. the electric field and the magnetic field are not discretised at the same
point in space, but are arranged in a Yee cell (Fig. 3.2). The arrangement of the field components
in a Yee cell is such that the discretised curl laws of Maxwell’s equations can be written out in a
natural form.
As for the time evolution of the fields, FDTD uses a leap–frogging scheme, where the electric fields
calculated at t = 0 are used to determine the magnetic fields at t = 0.5. These are subsequently
used to calculate the electric fields at t = 1 and so on.
All finite element methods more or less share the same philosophy, which is outlined in the fol-
lowing recipe:
• Subdivide the structure you want to model into K finite subsections (hence the name finite
elements). The elements don’t have to be the same size, the mesh can be irregular. A trian-
gular mesh is a popular choice, because this allows one to approximate curved boundaries
63
Figure 3.2: The Yee cell in the FDTD algorithm.
much better than e.g. with the rectangular grid that was used in finite difference methods.
Often, more triangles are used where higher resolution is required. See e.g. the grid in
Fig. 3.3.
• Approximate the unknown function using a separate approximation expression for each
element of the form
M
X
ψ(x, y) = ui bi (x, y) (3.21)
i=1
Here the bi (x, y) are some convenient set of known basis functions. In order to solve the
problem, the M unknown coefficients ui per element still need to be determined.
• Introduce some constraint on the M K number of unknowns, e.g. to ensure that ψ is contin-
uous across the elements.
64
• Write down an expression J containing the approximating functions bi and the coefficients
ui . Because the bi are chosen in advance and therefore known, J is function of the unknown
ui only. The precise form of J differs from problem to problem, but is often related to the
total power in the system.
• Find the coefficients ui such that J is minimised. In a popular method due to Rayleigh and
Ritz, this is done simply by setting all ∂J/∂ui equal to zero and solving the resulting linear
system. If J is an expression for the total energy in the system, there is a clear physical
interpretation for this procedure: the true solution to the problem is one that minimises the
energy in the system.
In order to make the recipe from the previous section more concrete, we will now indicate in more
detail how finite element methods can be used to solve the 1D Helmholtz equation in a uniform
medium, i.e. the same problem we used to illustrated the concept of finite differences:
d2 ψ(x)
+ k 2 ψ(x) = 0 (3.22)
dx2
Piecewise approximation
To describe the unknown function ψ, we will use a very simple but common approach, which is
that of a piecewise straight approximation. We will subdivide our domain into K finite elements
being line segments of identical length. In each segment, we will assume that ψ varies linearly
between the end points x1 and x2 of that segment:
x2 − x x − x1
ψ(x) = ψ1 + ψ2 (3.23)
x2 − x1 x2 − x1
Here ψ1 and ψ2 are the so far unknown values of ψ at x1 and x2 respectively. End points of the
segments are often called nodes of the mesh.
Comparing Eq. 3.21 and 3.23, we get that M = 2, ui is ψi and
x2 − x
b1 (x) = (3.24)
x2 − x1
x − x1
b2 (x) = (3.25)
x2 − x1
65
Figure 3.4: Simple shape functions b1 (x) and b2 (x) for a 1D finite element model.
Figure 3.5: Top: disconnected local numbering. Bottom: connected global numbering.
The elements are not independent but are usually coupled through some continuity condition. In
the case of our problem, ψ needs to be continuous across elements. The effect of this is that the
number of unknowns is reduced.
To see how this is done in practise, consider Fig. 3.5 with on one hand a disconnected local num-
bering scheme of the nodes, and on the other hand a connected global numbering scheme, which
already takes the continuity conditions into account.
The disconnected values are related to connected values by the following matrix equation:
ψ1 1 0 0 0
ψ2 0 1 0 0
ψ1
ψ3 0 1 0 0
ψ2
ψ4 =
0 (3.26)
0 1 0 ψ3
ψ5 0 0 1 0 ψ4 con
ψ6 disc
0 0 0 1
The matrix in Eq. 3.26 is called the connection matrix of the mesh. Obviously, for such a simple case
is seems overkill to write the continuity equations in matrix form, but for complicated 3D meshes
such a matrix has its practical value for computer implementation of finite element methods.
An expression for J
In order to keep the method physically interpretable, let’s use the total energy of the system as an
expression for J. From the complex Poynting theorem, we know that the time–averaged stored
energy in a system is given by
66
Z
1
J= (E · D∗ − H · B∗ )dx (3.27)
2
Z
1
= (ε|E|2 − µ|H|2 )dx (3.28)
2
1
E= ∇×H (3.29)
jωε
Let’s assume that we are working in the TM case, where H = H1z = ψ1z . We get
!
1 dH 2
Z
1
J= − µ|H|2 dx (3.30)
2 ω 2 ε dx
Although we will not prove it here, it can be shown that in the case of lossless media, we can
always scale the solution such that the absolute values in the previous equation can be replaced
by simple squares. Also, multiplying J by the constant 2ω 2 ε does not affect the location of the
minimum, such that finally we can write
Z 2 !
dH
J= − k2 H 2 dx (3.31)
dx
Z 2 !
dψ
J= − k2 ψ2 dx (3.32)
dx
The contribution of a single element to J can now be calculated in terms of the unknown nodal
values ψi using the explicit form of the shape functions from Eq. 3.23. This leads to tedious but
straightforward algebra, which we will not repeat here. In view of the structure of Eq. 3.32, the
contribution of a single element leads to a bilinear equation of the form
A B ψ1
Ji = ψ1 ψ2 disc (3.33)
C D ψ2 disc
The total power is just the sum of the contributions from all elements, which can also be cast in
matrix form, this time with a block diagonal matrix, each block corresponding to a matrix of the
type from Eq. 3.33.
The expressions in terms of disconnected nodal values can be transformed in terms of connected
values using the connection matrix, like e.g. the one from Eq. 3.26. In this way, all the continuity
conditions are imposed.
67
Minimising J
∂J
=0 (3.34)
∂ψi
and this for all i. Since J is a bilinear form, Eq. 3.34 takes the form of M K linear equations with
M K unknowns. Just as was the case with the finite–difference method, this system can either be
solved directly or iteratively.
In theory, the outline of finite element method that we have given so far is perfectly adequate
to describe it. However, it yields additional insight to cast it in the framework of a variational
method.
To do this, let’s introduce some terminology first.
A functional is a mapping from a function to a scalar value, unlike a function, which just maps one
or more scalars to a scalar. The entity J that we have introduced is a functional 1 , as it maps a
function ψ to a scalar (being in this case the energy of the system), so J = J(ψ).
A variational expression comes from minimising a functional over a set of admissible functions. E.g.
in our case we have been solving the following variational expression:
where P is the total power in the system, and ψ are functions coming from an admissible set, e.g.
continuous functions that satisfy the boundary conditions of the problem.
The Rayleigh–Ritz method to minimise a functional is to expand ψ in a number of known test
functions, and then determine the expansion coefficients such that the functional is minimised.
So far we don’t seem to have done anything useful except introducing some new terminology.
However, by looking at the power P as a minimum of a functional, we realise that small variation
δψ (caused e.g. by numerical round–off errors) will not have a very big impact on the calculated
value of P , since δJ = 0 at the minimum functional. This insensitivity to errors is a big bonus for
practical applications, and means that methods which have a variational basis can be much more
robust.
Now, the power P is a decidedly uninteresting quantity for practical applications. However, it is
possible to derive variational expressions for e.g. the propagation constant of a waveguide, which
is a much more relevant parameter that people are interested in.
1
Note that in applying the Rayleigh–Ritz method, we have parametrised the function ψ using a finite set of un-
knowns ψi . This effectively turns the functional J(ψ) into a standard function J(ψ(ψi )), albeit one of multiple argu-
ments.
68
To conclude this section on finite elements, we wish to provide some extra proof that a function
that minimises the functional from Eq. 3.32 is indeed a solution to the Helmholtz equation. So
far, this was only made acceptable by invoking physical arguments based on minimum energy,
but a mathematical proof for this was lacking. At the same time, let’s extend the method to three
dimensions:
ZZZ
(∇ψ)2 − k 2 ψ 2 dV
J= (3.36)
If a function ψ minimises the functional J, then δJ has to be zero for ψ. For δJ (also called the first
variation of J) we get
So,
ZZ ZZZ ZZZ
∂ψ 2
δJ = 2 δψ · dS − 2 δψ · ∇ ψdV − 2 k 2 ψδψdV (3.41)
∂n
Suppose that our boundary conditions are of the Dirichlet type such that ψ = 0 on the boundary.
Then, in order to be an admissible function, δψ also has to be zero on the boundary, so the right–
hand side of Eq. 3.42 will be zero.
Similarly, for Neumann–type boundary conditions where ∂ψ/∂n = 0 on the boundary, the right–
hand side of Eq. 3.42 will also be zero.
So for these common boundary conditions we get
2
Apply the divergence theorem to the vector identity ∇ · (u∇v) = u∇ · ∇v + (∇u) · (∇v)
69
ZZZ
δψ(∇2 ψ + k 2 ψ)dV = 0 (3.43)
Since our choice of δψ has been completely arbitrary, it follows that stationarity of the variational
expression requires that ψ has to satisfy
∇2 ψ + k 2 ψ = 0 (3.44)
Exercise 3.2. Consider an arbitrarily shaped waveguide filled with air but
bounded by perfectly conducting metal walls. As we know, its modes will sat-
isfy the following equation:
∇2xy ψ + k 2 − kz2 ψ = 0
We are interested in finding the value of k at cut–off (kz2 = 0), because we can
get the cut-off wavelength from that. Show that the following is a variational
expression for this cut–off wavevector k:
2
RR
2 S (∇ψ) dS
k = RR 2
S ψ dS
where S is the cross–section of the waveguide.
In eigenmode expansion, we slice up the structure that we want to model in a number of layers
where the refractive index profile does not change in the propagation direction (often taken to be
the z–direction). A typical example is given in Fig. 3.6, which shows the interface between two
waveguides I and II.
We will expand the field in each waveguide using the eigenmodes of that waveguide. It can be
shown that these eigenmodes form a complete set, i.e. they can be used to expand any arbitrary
field in that layer. Also, by enclosing the structure that we want to model in a metal box, this set
of eigenmodes is a discrete set rather than a continuum.
The interface is placed at z = 0 and a single mode with index p is incident from medium I. This
incident mode will give rise to a backward–propagating field in medium I, which we expand
in terms of the eigenmodes of this medium. Likewise, we expand the transmitted field in the
eigenmodes of medium II. We now apply the well–known mode–matching technique. It starts off
by imposing the continuity of the tangential components of the total field:
70
Figure 3.6: Interface between two layers.
X X
EIp,t + Rj,p EIj,t = Tj,p EII
j,t (3.45)
j j
X X
HIp,t − Rj,p HIj,t = Tj,p HII
j,t (3.46)
j j
The minus sign for the reflected H field deserves some additional clarification. It is relatively easy
to show by splitting Maxwell’s curl equations into their transverse and z–components, that for
any eigenmode solution
X X
EIp , HIi + Rj,p EIj , HIi Tj,p EII I
= j , Hi (3.49)
j j
X X
EIi , HIp − Rj,p EIi , HIj Tj,p EIi , HII
= j (3.50)
j j
71
where the scalar product is defined as the following overlap integral:
ZZ
hEm , Hn i ≡ (Em × Hn ) · dS (3.51)
s
If we decide to truncate the series expansion after N terms, we have 2N unknowns: N reflection
coefficients and N transmission coefficients. Eq. 3.49 and 3.50 provide us exactly with 2N equa-
tions, since we can write them for all i in 1 · · · N . However,
RR we can reduce the dimensionality of
this linear system by invoking the orthogonality relation S (Em × Hn ) · dS = 0:
X
δip EIp , HIp + Ri,p EIi , HIi Tj,p EII I
= j , Hi (3.52)
j
X
δip EIp , HIp − Ri,p EIi , HIi Tj,p EIi , HII
= j (3.53)
j
X
EIi , HII
II I
+ Ej , Hi Tj,p = 2δip EIp , HIp
j (3.54)
j
1 X
EII I
I II
Ri,p = j , Hi − Ei , Hj Tj,p (3.55)
EIi , HIi
2 j
This shows that we can first calculate the transmission coefficients by solving an N × N linear
system, and then obtain the reflection coefficients by a simple matrix multiplication.
After obtaining R and T upon incidence of mode p, we can of course repeat the whole procedure
using all modes p in 1 · · · N . Important to note is that this changes only the right–hand side in the
linear system in Eq. 3.54, so that we do not have to invert3 another system matrix.
Usually, we will choose to normalise our modes such that EIi , HIi = 1. We can then write Eq. 3.54
and 3.55 more compactly after defining the following overlap matrices:
I II
OI,II (i, j) ≡ Ei , Hj (3.56)
II I
OII,I (i, j) ≡ Ei , Hj (3.57)
−1
TI,II = 2 OI,II + OTII,I (3.58)
1
OTII,I − OI,II · TI,II
RI,II = (3.59)
2
3
Actually, when solving the system even an explicit inverse is not required, as we can use the LU decomposition of
the system matrix.
72
In these expressions TI,II and RI,II are the so–called transmission and reflection matrices. Their
p–th columns consist of the Tj,p and Rj,p from Eq. 3.54 and 3.55. If we collect the expansion coeffi-
cients of an arbitrary incident field in a column vector Ainc , we can write very compactly for the
reflected and transmitted fields:
Obviously, we can repeat the entire procedure for incidence from medium II, which gives us the
matrices RII,I and TII,I . These four matrices completely characterise the scattering that occurs at
an interface.
Finally, we want to point out the similarity in structure between Eq. 3.58 and 3.59 and the well–
known Fresnel equations to calculate reflection and transmission for normal incidence of a plane
wave upon the interface between two semi–infinite media:
2n1
T = (3.62)
n1 + n 2
n1 − n2
R = (3.63)
n1 + n 2
In fact, Eq. 3.62 and 3.63 can also be derived from the more general treatment presented here. For
homogeneous layers, where the refractive index does not vary in the transverse direction, it turns
out that the overlap matrices are diagonal, meaning that there is no cross–coupling between the
different modes. Further evaluation of the formula’s reveals that the Fresnel equations are indeed
recovered.
Obviously, a realistic structure will consist of more than one interface, but that problem is outside
the scope of this course.
To conclude this section, we will now present a rather abstract framework for solving Maxwell’s
equations numerically, which has the advantage of allowing for a unified approach which shows
more clearly the similarities and the differences between several numerical methods.
The method of weighted residuals is a very general scheme for projecting Maxwell’s equations into
a form suitable for numerical solution by standard matrix methods.
We start from a problem of the form
Lu = v (3.64)
where v represents some known excitation and u is the unknown and wanted field. L is a linear
operator involving differentiation, integration or both. For the Helmholtz equation, v = 0 and
L = ∇2 + k 2 .
73
Suppose we approximate the unknown u by
N
X
ũ(x) = ui bi (x) (3.65)
i=1
where bi (x) are a complete set of known basis functions. The general problem is now to choose
the coefficients ui to approximate as well as possible the unknown solution u to Eq. 3.64 with a
finite number of basis functions. In general, it will be impossible to satisfy Eq. 3.64 exactly, so how
are we to define ”approximate as well as possible”? We do not know the exact solution u and so
cannot even discuss any error in its approximation ũ. We can however define what is called the
error residual:
N
X
R(x) = Lũ − v = L ui bi (x) − v(x) (3.66)
i=1
which is clearly zero if and only if we have an exact solution to Eq. 3.64. Now we have a realistic
objective: trying to make the residual as small as possible.
To proceed, we choose another set of functions, the so–called test or weight functions wi . We also
introduce an inner product (scalar product) formally written as hf (x), g(x)i. The definition of the
inner product is in a sense arbitrary and often follows from the specifics of the problem considered.
Rather than asking the impossible that R(x) = 0 for all x, we insist that R(x) be orthogonal to
each of the weight functions under the scalar product considered. This results in the following N
equations for N unknowns:
N
*( ) +
X
L ui bi (x) − v(x) , wj (x) =0 j = 1, 2 · · · , N (3.67)
i=1
In the above equation, bi (x), v(x) and wj (x) are known functions of x. L is the known operator
and ui are the unknown and wanted scalars which when put into Eq. 3.65 give our approximate
solution.
Because of the linearity of Eq. 3.67, it can be put into a compact matrix form:
L·u=v (3.68)
Here, u denotes the unknown column vector containing u1 , u2 , . . . , uN . v denotes the known col-
umn vector with elements hv(x), wj (x)i. Finally L is a known N × N matrix with (i, j)th element
hLbi (x), wj (x)i.
So, our equation Lu = v has been projected by approximation into the matrix equation L · u = v,
which can be solved with standard numerical routines.
Apart from the the weighted residuals method, the above procedure is also called the method of
moments or the generalised Galerkin method.
74
Choice of basis and test functions
Starting from the general procedure outlined above, different numerical methods can be derived
that are catalogued according to their choice of basis and test functions.
If basis and test functions are chosen equal, i.e. bi (x) = wi (x), the the method is called the (non–
generalised) Galerkin method. Often this leads to a formulation which can be cast in a variational
form, having the advantages already outlined previously.
Another choice is
This corresponds to demanding the the error residual vanishes exactly at the points xj when the
scalar product is defined as the integral of the product:
Z
hR(x), wj (x)i = R(x)δ(x − xj )dx = R(xj ) = 0 (3.70)
Exercise 3.3. Assuming a commutative scalar product, show that the choice
hR(x), R(x)i
Because of this property, this choice is called the least squares residual method. This
is the surest and most explicit way minimising the error residual, although it is
not the most frequently used in practise.
Although there is no exact one–to–one correspondence, finite elements and eigenmode expansion
share a similar philosophy with the method of weighted residuals, most notably the concept of
projecting an unknown function using a set of basis functions.
In the finite element method, basis functions are the interpolation or shape functions. Because
they are non–zero only in a limited area of the computational domain (the finite element), they are
called subdomain basis functions. Weighting functions are chosen identical to test functions.
In the eigenmode expansion method, basis and test functions are also equal and are chosen to be
the eigenmodes of each layer. The scalar product used is defined as the overlap integral
ZZ
hEm , Hn i ≡ (Em × Hn ) · dS (3.71)
s
75
As far as finite differences are concerned, one could say that the basis functions are a set of delta
functions at each grid point, however this is perhaps stretching the analogy a bit too far.
Another interesting comparison between finite differences, finite elements and eigenmode expan-
sion is how they differ in terms of what we could loosely call “local extent”. A second order
finite difference approach only looks at neighbouring points in order to calculate derivatives. For
a finite element approach, information from a larger spatial area is taken into account, namely
the behaviour of the function over a single element. Finally, eigenmode expansion is like finite
elements in the limit of a single element spanning the entire domain, and has the largest spatial
extent.
76
Boris Grigorievich Galerkin (1871–1945)
Boris Grigorievich Galerkin came from a poor family and this was
to mean that he had a harder time through his years of education
than would otherwise have been the case. He attended secondary
school in Minsk, then in 1893 he entered the Petersburg Techno-
logical Institute. Here he studied mathematics and engineering
but he needed to make money to survive so at first he took on
private tutoring, then from 1896 he worked as a designer.
After graduating from the Technological Institute in 1899 he got
a job at the Kharkov Locomotive Plant. In 1903 Galerkin went to
St Petersburg and there he became engineering manager at the
Northern Mechanical and Boiler Plant.
From 1909 Galerkin began to study building sites and construc-
tion works throughout Europe. In the same year he began teaching at the Petersburg Technolog-
ical Institute. His first publication on longitudinal curvature also appeared in 1909, work which
carried on from beginnings which had been laid by Euler. This paper was highly relevant to his
study of construction sites since the results were applied to the construction of bridges and frames
for buildings.
His visits around European construction sites ended around 1914 but his academic work then
turned to the area for which he is today best known, namely the method of approximate inte-
gration of differential equations known as the Galerkin method. He published his finite element
method in 1915.
In 1920 Galerkin was promoted to Head of Structural Mechanics at the Petersburg Technologi-
cal Institute. By this time he also held two chairs, one in elasticity at the Leningrad Institute of
Communications Engineers and one in structural mechanics at Leningrad University.
In 1921 the St Petersburg Mathematical Society was reopened (it had closed in 1917 due to the
Russian Revolution) as the Petrograd Physical and Mathematical Society. Galerkin played a major
role in the Society along with Steklov, Sergei Bernstein, Friedmann and others.
Other work for which Galerkin is famous is his work on thin elastic plates. His major monograph
on this topic Thin Elastic Plates was published in 1937. From 1940 until his death, Galerkin was
head of the Institute of Mechanics of the Soviet Academy of Sciences.
(J.J. O’Connor and E.F. Robertson, from http://www–gap.dcs.st–and.ac.uk)
77
Chapter 4
The mathematical sciences particularly exhibit order, symmetry, and limitation; and
these are the greatest forms of the beautiful.
— Aristotle
Contents
4.1 The vectorial Helmholtz equation in non–uniform media . . . . . . . . . . . . . . 79
4.2 Using symmetries to classify modes . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.3 1D periodic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
4.4 2D and 3D periodic systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4.5 Applications of photonic crystals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
Systems which exhibit periodicity and symmetry are important in photonics, as e.g. in so–called
photonic crystals, which are artificial media where the refractive index has a periodicity on a scale
of the order of the wavelength. This chapter introduces some concepts which are important for
the study of these systems.
First, the scalar Helmholtz equation will be generalised to a vectorial equation which can also be
used in the case of continuous index variations. We will interpret this as an eigenvalue problem,
introduce the concept of a Hermitian operator and discuss some of its properties.
Next, we will analyse how we can use symmetries in these systems to classify modes. A very
important symmetry is that of a discrete translational symmetry or periodicity. This will allow us
to talk about Bloch’s theorem, Brillouin zones and band diagrams.
78
Finally, we will discuss some applications of photonic crystals, and show how they can be used to
localise light in a cavity, or guide it along a waveguide.
We will first derive the most general form of the Helmholtz equation which is also valid in the
case of continuous refractive index variations. We start from
1
∇× ∇ × H(r) = ω 2 µH(r) (4.3)
ε(r)
It is important not to yield to the temptation of bringing ε(r) outside of the ∇–operator. This
would be incorrect, because ε is now no longer constant (although we assume that µ is constant
since most ’normal’ materials are non–magnetic).
Let’s define the following linear differential operator:
def 1
Θ = ∇× ∇× (4.4)
ε(r)
This operator takes the curl, divides by ε(r) and then takes the curl again. Using this notation, we
can easily write Eq. 4.3 as an eigenvalue problem:
So, the eigenfunctions H(r) of the Θ–operator are field distributions that satisfy Maxwell’s curl
equations. The eigenvalues of these eigenfunctions are given by ω 2 µ. Note that in this point
of view ω is seen as an unknown: for a given system with a certain index distribution, we are
interested in finding the ω–values of the allowed solutions.
For physical reasons, we would obviously like these eigenvalues to be real and positive, as that
would give rise to a real and positive value of the angular frequency ω, provided that µ is real and
positive.
We will prove this by making use of the fact that Θ is a Hermitian operator.
79
4.1.2 Hermiticity
To define what it means for an operator to be Hermitian, we first need to introduce a scalar product
(also called inner product). Let’s define this inner product between two vector functions F(r) and
G(r) as
ZZZ
hF, Gi = F∗ (r) · G(r)dV (4.6)
From this simple definition. we immediately get that hF, Gi = hG, Fi∗ . We also see that hF, Fi is
always real, even if F itself is complex.
Now, an operator Ξ is said to be Hermitian if
This means that it doesn’t matter which function is operated upon before taking the inner product.
Clearly this is a special property which not all operators have. Before showing that our operator Θ
has this property, let’s first derive an auxiliary result to help us manipulate the integrals involved
in taking the scalar product.
Starting from
∇ · (a × b) = b · (∇ × a) − a · (∇ × b) (4.8)
we immediately get
ZZZ ZZZ ZZZ
a · (∇ × b)dV = b · (∇ × a)dV − ∇ · (a × b)dV (4.9)
Now, back to our proof of the hermiticity of Θ. Simply applying definitions, we get
ZZZ
∗ 1
hF, ΘGi = F ·∇× ∇ × G(r) dV (4.11)
ε(r)
ZZZ
∗ 1
hF, ΘGi = [∇ × F] · ∇ × G(r) dV (4.12)
ε(r)
80
The term related to the surface integral was dropped because in all practical cases, one of two
things will be true. Either the fields at infinity will decay to zero, or the fields are periodic on the
surface. In both cases, the surface term will vanish.
If ε is real (which will be the case for lossless media), we can write
ZZZ ∗
1
hF, ΘGi = ∇ × F · [∇ × G(r)] dV (4.13)
ε(r)
ZZZ ∗
1
hF, ΘGi = ∇× ∇×F · G(r)dV = hΘF, Gi (4.14)
ε(r)
Exercise 4.1. From Maxwell’s curl equations, derive an eigenvalue problem for
the electric field. Show that the resulting operator is not Hermitian, which is why
people often prefer a representation in terms of the magnetic field.
Exercise 4.2. In this exercise, we will explore the relation between Hermitian op-
erators and Hermitian matrices. As an operator working on a function, we take
a matrix A which multiplies a column vector x, i.e. the effect of the operator is
given by A · x. Let us define the scalar product between two column vectors as
def X
hx, yi = xH · y = x∗i yi
Now show that the operator defined by matrix multiplication by A is Hermitian
if the matrix A is Hermitian, i.e. A = AH
Real eigenvalues
ΘH = ω 2 µH (4.15)
Now, taking the complex conjugate and using the fact that hF, Fi is always real, we get
81
For any operator hG, Fi∗ = hF, Gi such that
Comparing Eq. 4.16 and Eq. 4.19, we get that (ω 2 µ)∗ = ω 2 µ, so the eigenvalues of a Hermitian operator
are real. If we now only consider lossless materials with real µ, it immediately follows that ω 2 is
real.
Positive ω 2
Using a different argument, we can show that not only is ω 2 real, it is also always positive. Setting
F = G = H in Eq. 4.12, we get
ZZZ
1
hH, ΘHi = |∇ × H(r)|2 dV (4.20)
ε(r)
For lossless materials ε is real and positive. Since now all terms in Eq. 4.21 are positive, ω 2 must
be positive too. So, we can pick the positive square root, leading to a positive ω and a physical
interpretation.
Note that the fact that the eigenvalues are positive here, is due to the special form of our operator.
It is not a general property of Hermitian operators.
Orthogonality
Consider two modes H1 and H2 with frequencies ω1 and ω2 . Because these are eigenfunctions of
Θ we can write
and
Because Θ is hermitian, the left–hand sides of Eq. 4.22 and Eq. 4.23 are equal, so we get
82
(ω22 − ω12 ) hH1 , H2 i = 0 (4.24)
In order to get some general feeling for the nature of the modes and derive some more of their
properties, it is useful to look at a variational form:
We will show that any function H for which J is stationary (i.e. δJ = 0) will satisfy our eigenvalue
problem.
We can calculate the first variation of J directly from Eq. 4.25. However, it turns out that the
calculation is a bit easier if we first rewrite Eq. 4.25 as
Subtracting Eq. 4.26 from Eq. 4.27 and neglecting higher order variations gives
Now, if δJ needs to be zero for all possible δH, this means that G should be equal to zero, or
83
hH, ΘHi
ΘH = H (4.29)
hH, Hi
Bearing in mind that the quotient on the right hand side is just a number, Eq. 4.29 can only be
fulfilled if H is indeed an eigenvector of Θ. Incidentally, we have also proven that its eigenvalue
is J.
Having survived the gory mathematical details of verifying that Eq. 4.25 is indeed variational, it’s
time to do something useful with it and derive a physical heuristic for the modes in our system.
First of all, let’s evaluate J for the case where H is a solution to the eigenvalue problem:
H, ω 2 µH
hH, ΘHi
J= = = ω2µ (4.30)
hH, Hi hH, Hi
This shows that solutions H which have low values of J will tend to have a low frequency ω.
Secondly, using Eq. 4.20, we can write the functional Eq. 4.25 as
ZZZ
1 1
J= |∇ × H(r)|2 dV (4.31)
hH, Hi ε(r)
Exercise 4.4. Transform Eq. 4.31 into the following form, which contains only the
electric field:
|∇ × E(r)|2 dV
RRR
J = RRR
|E(r)|2 dV
From this equation, we can see that J is minimised if the electric field E is concentrated in regions
where the refractive index is high. So, to minimise J, a mode will try to concentrate its field in
regions of high dielectric, while remaining orthogonal to the modes below it in frequency. With
this heuristic we can e.g. explain why it is that dielectric waveguides do indeed work: the field
likes to be concentrated in regions where the refractive index is high. Also, because of the link
between J and ω, concentrating a mode in regions of high refractive index (thus lowering J) will
tend to lower its frequency.
The same equation also suggests that to minimise J, is is advantageous to keep spatial variations
in the field as low as possible so as to keep |∇ × E(r)| small. Indeed, the fundamental mode has
much slower spatial variations than the higher order modes.
We will use these heuristics in a later section to explain the peculiar behaviour of certain periodic
structures.
84
Figure 4.1: A 2D metal cavity with inversion symmetry. On the left, an even mode with H(r) = H(−r), on
the right an odd mode with H(r) = −H(−r).
4.2.1 Introduction
In this section the use of symmetries to classify the solutions of Eq. 4.3 is discussed. A symmetry
is a change of coordinates which leaves the form of Eq. 4.3 intact.
To clarify this, we start with the structure from Fig. 4.1. This structure is inversion symmetric
around its centre, i.e. invariant under the coordinate change r0 = −r. Later in this section, we will
consider other symmetries, like e.g. systems invariant under the transformation r0 = r + R, which
is called translation symmetry.
Suppose we want to find the modes of the cavity from Fig. 4.1. Solving Maxwell’s equations
for such a cavity will not be possible analytically, but the cavity has an important symmetry: by
inverting it about its centre, you end up with the same shape. So, if somehow we find that a
particular pattern H(r) is a mode of the cavity with frequency ω, then H(−r) must also be a mode
with frequency ω, since the cavity cannot tell r from −r.
Now suppose H(r) is not degenerate, so that it is the only mode at frequency ω. Then, since H(−r)
is also a mode with frequency ω, it must really be the same mode, which means that it must be a
simple multiple of H(r):
To determine α, we can invert the system twice, meaning that we return to the original function
H(r) and pick up another factor α in the process. So
This means that α is either 1 or -1. So, a given nondegenerate mode 1 must be one of two types:
either it is invariant under inversion, H(−r) = H(r), and we call it even, or it becomes its own
opposite, H(−r) = −H(r) and we call it odd.
So, we have classified the modes on how they respond to one of the cavity’s symmetry operations.
Let’s place this on a slightly more mathematical footing.
1
This is not true for degenerate modes. But, although we do not prove it here, we can always form new modes
which are even or odd, by taking appropriate linear combinations of degenerate modes.
85
Suppose OI is an operator that inverts vector fields H(r). Now, what is the mathematical expres-
sion that our system Θ has inversion symmetry? Since inversion is a symmetry of our system, it
does not matter whether we operate with Θ, or first invert the coordinates, then operate with Θ
and then change the coordinates back:
This can be rearranged as OI Θ − ΘOI = 0. Just like in quantum mechanics, we can define the
commutator [A, B] of two operators A and B as
def
[A, B] = AB − BA (4.35)
Note that the commutator is itself an operator. So, we have shown that our system is symmetric
under inversion only if the inversion operator commutes with Θ. If we now operate with this
commutator on any mode H(r) of the system, we get
Or
This is nothing other than saying that if H is mode with frequency ω, then OI H is also a mode with
the same frequency. But in the absence of degeneracy there can only be one mode per frequency, so
H and OI H can only differ by a multiplicative factor: OI H = αH. This is an eigenvalue equation
for OI , and we already know that the eigenvalues α are 1 and -1.
Tracing back, we see that we started from a function H that was an eigenvector of Θ and ended
up by showing that the same H was also an eigenvector of OI in the absence of degeneracy.
But what if there is degeneracy in the system? In that case, two modes might have the same fre-
quency, even though they are not related by a simple multiplier. Although we won’t show it here,
in that case it will still by possible to construct different linear combinations of these degenerate
modes to make new modes which themselves are even or odd.
So, generally speaking, if two operators commute, it is possible to construct simultaneous eigenfunctions
H of both operators, i.e. functions H that are solutions of both eigenproblems ΘH = ω 2 µH and
OI H = αH.
This is very convenient, as eigenvectors and eigenvalues for simple symmetry operators can be
easily determined, whereas those for Θ cannot. So we can classify the solutions according to the
properties of the symmetry operation.
86
4.2.3 Continuous translation symmetry
One way of saying that our system is unchanged by a translation over a displacement d is
def
Td ε(r) = ε(r + d) (4.40)
Just like before, saying that our system in invariant under that operator amounts to [Θ, Td ] = 0.
A system with continuous translation symmetry in the z–direction is invariant for all Td in that
direction. So, Θ must commute with all the translation operators over a distance d = d1z . In order
not to make the notation too heavy, we will limit ourselves here to scalar fields which only vary
in the z–direction. We know that we can construct modes of Θ that are eigenfunctions of these
translation operators Td as well:
This has to be true for all possible d, but the eigenvalues λ can be different for each d, which is
why we write them as λ(d).
Geometrically it is obvious that a translation over d followed by a translation over d0 is equivalent
to a single translation over d + d0 :
A function which has this property of converting sums to products is the exponential. Therefore
there has to exist a number kz such that
87
H(z) = e−jkz z (4.46)
Let’s now take a closer look a two systems with continuous translation symmetry.
The first one is free space ε(r) = 1. This system has translation symmetry in all three dimensions.
Following a similar line of reasoning as before, we conclude that the modes are of the form
with H0 a constant vector. So, we have retrieved the plane wave solutions of Maxwell’s equations
based on symmetry arguments alone! Substituting the form of Eq. 4.47 into Maxwell’s equations,
we can derive that k 2 = ω 2 /c2 .
A second interesting system is that of a slab waveguide. In this case, the dielectric constant only
varies on the x–direction and the system has continuous translation symmetry in the z– and y–
directions (Fig. 4.2).
Just like we did for the case of inversions, we can classify the modes by their eigenvalue of the
symmetry operator, i.e. by their value of k = ky 1y + kz 1z . Although we cannot say anything yet
about h(x), we can nevertheless line up the modes in order of increasing frequency. For a given
k, we call n the number indicating that mode’s place in the line of increasing frequency, so we can
identify each mode by its name (k, n). n is called the band number2 . If there is a countable number
of modes, n is an integer, but sometimes n may be a continuous variable.
2
Don’t confuse with the refractive index n.
88
Figure 4.3: Dispersion relation of a slab waveguide.
If we make a plot of wave vector versus frequency for the slab waveguide, the different bands
correspond to different lines that rise uniformly in frequency. This band structure is shown in
Fig. 4.33 . It has been computed by solving Maxwell’s equations numerically.
From e.g. the ”Photonics” bachelor’s course, we know that a slab waveguide has a discrete set of
guided modes (which correspond to the lines in the band diagram), and also a continuous set of
radiation modes (which corresponds to the shaded region in the band diagram). The boundary
between these two sets of solutions is called the light line.
The second variant of systems with translational symmetry, namely that of discrete translational
symmetry is so important that it warrants its own section. We will start with 1D systems and later
on move to higher dimensional systems.
An important class of systems has discrete translational symmetry, i.e. they are not invariant under
translations of any distance, but only under distances that are a multiple of some fixed step length.
Let’s first study systems with such symmetry in one dimension. Higher dimensional systems will
be discussed in the next section.
Fig. 4.4 shows a simple structure that is periodic in the z–direction, or - to phrase things differently
- that has discrete translational symmetry in the z–direction. The basic step length is called the
period or the lattice constant a, and the basic step vector is called the primitive lattice vector a = a1z .
3
Beware that in some engineering texts this information is presented in a different way. Rather than plotting ω
against k, nef f is plotted against λ, and such a plot is called a dispersion relation.
89
Figure 4.4: A 1D periodic structure with periodicity a in the z–direction.
Because of symmetry, ε(r) = ε(r + a), or by repeating this procedure ε(r) = ε(r + la) with l an
integer. The dielectric block that is considered to be repeated over and over is called the unit cell
and is highlighted in the square box in Fig. 4.4.
We will now derive an important property of the eigenfunctions of systems with discrete transla-
tional symmetry.
Because of the symmetries present, Θ must commute with all the translation operators over a
distance R = la1z in the z–direction. Again, we will restrict ourselves to the case of scalar fields
only varying in the z–direction to lighten the notation.
We can follow the same argument that led to Eq. 4.45, but this time with the set of translations
over la, to arrive at
As mentioned before, a crucial difference with the case of continuous translation symmetry is that
Eq. 4.50 is only valid for displacements that are an integer multiple of the period, and not for
arbitrary displacements.
Eigenfunctions of periodic systems are often called Bloch states, and the number kz is called the
wave number of the Bloch state. 4
Once again, we can use this number kz to classify the modes and draw a band diagram. Com-
pared to the case of continuous translational symmetry however, the wave number has a different
interpretation. Rather than saying something about the phase relationship between any two dif-
ferent points on e.g. a slab waveguide, the Bloch wave number only provides a phase relationship
between points in the structure that are spaced with the same periodicity as the system.
4
Sometimes the name of Floquet is also used instead of that of Bloch. The mathematician Floquet first came up with
this theory in the context of 1D systems in mechanics, while Bloch later generalised it to 3D in the context of solid state
physics.
90
Figure 4.5: Band structure of a periodic system. The Brillouin zone is the hatched region.
Looking at Eq. 4.50, it is clear that the wave number kz is only defined up to an integer multiple
of 2π/a:
2π
e−j(kz +m a
)a
= e−jkz a e−jm2π = e−jkz a (4.51)
Any Bloch state doesn’t have just a single wave number kz at a given frequency, but in fact a whole
family of equivalent wave numbers kz + m2π/a at that frequency. This means that we can restrict
our band diagram to e.g. the kz –range [−π/a, π/a]. This important region of non–redundant kz –
values is called the Brillouin zone. The band diagram at other kz –values can be constructed by
periodic repetition of the Brillouin zone (see Fig 4.5).
It is easy to prove that u has the same periodicity of the lattice. We start from
u(z + la) = ejkz z ejkz la e−jkz la H(z) = ejkz z H(z) = u(z) (4.54)
Rewriting Eq. 4.52, we can say that the eigenfunctions of a 1D periodic system can be written as
91
So, the solution H(z) is a plane wave, as it would be in free space, but modulated by a periodic
function because of the periodic lattice.
Actually, since u is periodic in z, we can write it as the following Fourier series:
∞
X 2π
u(z) = um e−jm a
z
(4.56)
m=−∞
∞
2π
um e−j (kz +m a )z
X
H(z) = (4.57)
m=−∞
This shows again that a Bloch mode is made up of an infinity of contributions with wave vectors
kz + m2π/a.
Exercise 4.5. Consider a Bloch state H(z) at frequency ω with wave vector k and
described by a certain periodic function uk (z) as per Eq. 4.55. The same Bloch
state can also be described by another wave vector k 0 = k + m2π/a. What is the
corresponding function uk0 (z) for that description?
As a simple example, let’s plot the band diagram of a planar multilayered film, consisting of
alternating layers of low index and high index materials, each with thickness a/2. This system is
periodic in the z–direction, and Bloch theory tells us that we only need to consider kz –values in
the interval −π/a ≤ kz ≤ π/a. As for kx and ky , our system has continuous translation symmetry
in these directions, so these k–components can assume any value. However, let’s restrict ourselves
now to the special case of normal incidence or on–axis propagation, where kx = ky = 0. Without
risk of confusion, we can abbreviate kz by k.
Let’s first look at the case of zero index contrast between the materials, so that the medium is
completely homogeneous. We already know that in such a system the solutions are plane waves
with ω = ck/n, so that one band is a straight line starting from the origin upwards toward the
right. There is also a similar solution corresponding to waves propagating in the −z direction
instead of the +z direction. This solution for negative values of k = kz is another straight line
ω = −ck/n starting from the origin and going upwards toward the left. Now, we have imposed
an artificial periodicity a on this medium, which means that according to Bloch theory, these two
lines should repeat themselves starting from the points k = l2π/a. This is illustrated in Fig. 4.6.
When we restrict such a plot to the Brillouin zone, we get the results from the left panel of Fig. 4.7,
with the bands appearing to fold back at the edges of the Brillouin zone. In this figure, we plot the
band diagrams ωn (k) for three different cases of index contrast between the two layers: zero, low
and high.
92
Figure 4.6: Band structures for a uniform medium where we impose an artificial periodicity a. The thick
lines are the regular dispersion relations, the thin lines are copies of this as required by the artificial period-
icity.
Figure 4.7: Band structures for normal incidence for three different multilayer films where all layers are
a/2 thick. Left: all layers have ε = 13. Middle: layers alternate between ε = 12 and ε = 13. Right: layers
alternate between ε = 1 and ε = 13
93
Figure 4.8: Two ways to position the field: maximum intensity in the air (top) or in the dielectric (bottom).
Note that this is a diagram in real space, i.e. the horizontal axis represents position.
As we increase the index contrast, a curious feature emerges at the edges of the Brillouin zone in
Fig. 4.7. We start to see frequency regions appear where no solutions exist for any k. This so–called
photonic band gap becomes wider as the index contrast increases.
What is the significance of such a band gap? In this frequency range, there are no allowed prop-
agating states in the system. Suppose we have a semi–infinite periodic system for z > 0, and a
uniform medium for z < 0. So, if we excite the semi–infinite periodic system with light coming
from the uniform medium, this light will find no states to couple to inside the periodic system.
Therefore, it has no choice but to return from where it came and be fully reflected. So, inside a
band gap, a periodic structure acts as a perfect mirror.
Where does such a band gap come from? One way of explaining it, is saying that the reflections
from each of the interfaces in the systems interfere constructively, such that the system will behave
as a perfect reflector and no modes can exist inside it.
Another way of looking at it, is by making use of the heuristics we derived from the variational
formulation of Maxwell’s equations. Let’s look at mode profiles for states immediately above and
below the gap. The gap between bands 1 and 2 occurs at the edge of the Brillouin zone, where
k = π/a. For this k–value, one can prove that the modes are standing waves with a wavelength of
λ = 2π/k = 2a, twice the lattice constant.
For small index contrast, one can also prove that the field profiles of the modes look like the
ones in Fig. 4.8. There are two ways to centre a standing wave of this kind, without violating the
symmetry of the system: we can position its peak either in the low or in the high dielectric. But we
know from our variational study that a mode that is more concentrated in high dielectric regions
will have a lower frequency. So the two modes from Fig. 4.8 will have a different frequency and
therefore a gap opens between them.
Since in the low–frequency band the fields are more concentrated in the high dielectric, this band
is usually called the dielectric band. Often, the low dielectric is air, which explains why the upper
band is often called the air band.
All of this is completely analogous to semiconductor physics, where the periodicity of the semi-
conductor crystal gives rise to conduction and valence bands with a forbidden gap in between. In
94
our case, we have artificial periodicity on a larger length scale (comparable to the wavelength of
light used), so we call these structures photonic crystals.
In the 1D case, the gap will start to close as soon as we move away from normal incidence. This is
logical, as in the limiting case of grazing incidence, there is no longer periodicity in the propaga-
tion direction.
If we want to have a gap for oblique incidence too, we will need more–dimensional periodic-
ity: carefully designed 2D structures can have a band gap for all propagation angles in a plane,
whereas carefully designed 3D photonic crystals can have a gap for any propagation direction.
We say ’carefully designed structures’ because in more dimensions it is not the case that any peri-
odic structure will give rise to a band gap. This is in contrast with the 1D case, where a gap opens
up for normal incidence as soon as there is any index contrast.
The same principles we explained for 1D systems can of course be extended to higher dimension-
alities.
By following a similar line of reasoning as in the 1D case, we can write down Bloch’s theorem in
three dimensions. For any eigenfunction H of Θ which describes a periodic system, there exists a
vector k such that for any lattice vector R the following holds:
Since we can label each solution by its wavevector, k was used a subscript of H. Completely
similar to the 1D argument, we can derive an alternative formulation of Bloch’s theorem by saying
that any solution Hk can be written as
The function u is periodic for all lattice vectors R of the crystal: uk (r + R) = uk (r)
95
Exercise 4.6. By explicitly expanding in components, show that Bloch modes with
wavevector k satisfy the following equation:
Θk uk = ω 2 µuk
with
def 1
Θk = (−jk + ∇) × (−jk + ∇)×
ε(r)
This formula can be used to build a numerical scheme to calculate Bloch modes:
you first fix a value for k and then solve an eigenvalue problem, where the eigen-
values give the ω–values where Bloch modes appear for this particular k–value.
Just like in 1D, there is a range of k–values which gives rise to non–redundant solutions which
is called the Brillouin zone. However, in higher dimensions it is slightly more complicated to
construct this region, so we will now spend some time discussing that.
Suppose we have a function f (r) that is periodic on a lattice, i.e. f (r + R) = f (r) for all vectors R
that translate the lattice into itself. As we have already seen, these vectors R are called the lattice
vectors.
Let us write down the continuous Fourier transform of the function f , which we can do for any
sufficiently well–behaved function:
ZZZ
f (r) = g(k)e−jk·r dk (4.60)
Physically, this means we write f as a sum of plane waves e−jk·r with wavevectors k and expan-
sion coefficients g(k).
An expansion like this can be performed on any function. But f is not just any function, it is
periodic:
ZZZ ZZZ
−jk·r
f (r) = g(k)e dk = f (r + R) = g(k)e−jk·r e−jk·R dk (4.61)
But this is impossible, unless either g(k) = 0 or e−jk·R = 1. In other words, the transform g is zero
everywhere, except for discrete spikes at k–values where e−jk·R = 1 for all R.
So, what we have just discovered is that if we construct a periodic function using plane waves,
we only need to consider those plane waves with wave vectors k such that e−jk·R = 1 for all
lattice vectors R. These k–vectors are called the reciprocal lattice vectors and are usually indicated
96
by G. They form a lattice of their own: e.g. adding two reciprocal lattice vectors yields another
reciprocal lattice vector.
We still need to answer the question of how to build the set of all G–vectors given the set of
R–vectors, i.e. find all G such that G · R = N 2π for all R and for an integer N .
Every lattice vector R can be written as a linear combination of the so–called primitive lattice vectors,
which are the smallest vectors pointing from one lattice point to another. E.g. for a simple cubic
lattice with spacing a, all R’s are of the form R = la1x + ma1y + na1z , with (l, m, n) integers. In
general, we call the primitive lattice vectors a1 , a2 and a3 . They don’t need to be of unit length.
Since the reciprocal lattice vectors G also form a lattice, they have a set of primitive vectors bi as
well, such that G = l0 b1 + m0 b2 + n0 b3 . Our requirement that G · R = N 2π can now be written as
For all choices of (l, m, n), this must hold for some N . We can satisfy this if we construct the bi
such that ai · bj = 2πδij , where δij is non–zero only if i = j. We can do this by exploiting the fact
that x · (x × y) = 0 for all vectors x and y:
a2 × a3
b1 =2π
a1 · a2 × a3
a3 × a1
b2 =2π
a1 · a2 × a3
a1 × a2
b3 =2π (4.64)
a1 · a2 × a3
To summarise, when we build the Fourier transform of a periodic function, we only need to in-
clude plane waves with wave vectors that are reciprocal lattice vectors. To construct these recip-
rocal lattice vectors, we take the primitive lattice vectors and apply Eq. 4.64.
We already know that for a Bloch mode there exists a vector k such that
If k is incremented by a reciprocal lattice vector G, then e−jk·R is unchanged by the very definition
of reciprocal lattice vector. So, incrementing k by G results in the same physical mode.
Therefore, we can restrict our attention to a region in k–space where you cannot get from any
point in that region to another in that region by adding a reciprocal lattice vector. This region is
the Brillouin zone and consists of all the k–vectors around the origin in reciprocal space that are
closer to the origin than to any other neighbouring lattice point in the reciprocal lattice.
These two definitions are equivalent. If a particular k is closer to a neighbouring lattice point,
you can always reach it by staying close to the original point and then translating it by the G that
reaches from one lattice point to the other (Fig. 4.9).
97
Figure 4.9: Construction of the Brillouin zone using bisectors of lines joining two lattice points. Any vector
k0 that reaches to an arbitrary point on the other side from A can be expressed as the sum of a vector k on
the same side and a lattice vector G.
Figure 4.10: The square lattice. Left: lattice in real space. Middle: lattice in reciprocal space. Right: con-
struction of the Brillouin zone in reciprocal space.
To make this more concrete, let’s consider the case of a 2D square lattice (Fig. 4.10). Its lattice
vectors are a1 = a1x and a2 = a1y . In order to use Eq. 4.64, we need a third basis vector in
the z–direction, but for that we can choose one of any length, since the structure is uniform in the
z–direction. For the reciprocal lattice, we end up with b1 = 2π/a1x and b2 = 2π/a1y . So, the
reciprocal lattice is also a square lattice, but with spacing 2π/a instead of a.
To construct the Brillouin zone, we focus our attention on a particular point (the origin), and draw
the lattice vectors that start at the origin. For each of these vectors, we draw the perpendicular
bisectors, which divide the lattice into two half–planes (like in Fig. 4.10), one of which contains
the origin. The intersection of all these half–planes forms the Brillouin zone (Fig. 4.10).
Exercise 4.7. For a triangular lattice, what are the lattice vectors, reciprocal lattice
vectors and Brillouin zone?
Often, a photonic crystal will have additional symmetries. E.g. the square lattice we considered
earlier stays invariant for rotations over 90, 180 and 270 degrees. Another symmetry for the square
crystal is mirror symmetry about the 45 degrees diagonal. The collection of symmetry operations
(rotations, reflections, inversion) for which the crystal is invariant is called the point group of the
crystal.
98
Figure 4.11: The irreducible Brillouin zone for the square lattice
Figure 4.12: Full TM band diagram for a square lattice of dielectric rods, as a set of surfaces in k–space.
All of these symmetries mean that there is additional redundancy in the Brillouin zone. The small-
est region in the Brillouin zone for which the bands are not related by symmetry is called the ir-
reducible Brillouin zone. As shown in Fig. 4.11, the irreducible Brillouin zone for the square lattice
is a triangular wedge 1/8 the size of the full Brillouin zone. Also indicated in Fig. 4.11 are some
special high–symmetry points of the Brillouin zone. The origin is called the Γ–point, X is in the
direction along the shortest lattice vector and M is in the direction of the diagonal.
As an example we plot the band structure of a 2D square lattice of dielectric rods in air, for waves
propagating in the xy–plane. For each k–point in the triangular irreducible Brillouin zone there
are a number of bands with increasing frequency. Each band can be plotted as a surface over the
irreducible Brillouin zone, as illustrated in Fig. 4.12:
As such a 3D plot quickly becomes difficult to interpret, people usually only plot the bands along
the edges of the irreducible Brillouin zone, as is done in Fig. 4.13. We can see that there is a
99
Figure 4.13: Projected band diagram for a square lattice of dielectric rods. Dotted band are TE, full lines are
TM polarisation.
bandgap for modes with TM polarisation, but not for TE. In the context of photonic crystals, TM
is defined as modes with the E–vector along the rods in the z–direction.
Exercise 4.8. Band diagrams are symmetric around k = 0, i.e. if there is a mode
at frequency ω with wave vector k, there is always a mode at that same frequency
with wave vector −k. This is true even if the crystal itself does not have inversion
symmetry. The mode at −k can be thought of as the backward version of the mode
at k and is a consequence of the time reversal symmetry of Maxwell’s equations.
Make this plausible by taking the following steps:
• From the second form of Bloch’s theorem, associate H∗ with a Bloch mode at −k.
What is its corresponding u function?
100
Figure 4.14: Defects in a photonic crystal: a) point defect b) line defect c) surface defect
Figure 4.15: Example of photonic crystal components: a) bend b) splitter c) resonant filter
Just as with semiconductors, photonic crystals only become truly useful when defects are intro-
duced in the crystal lattice (Fig. 4.14).
When we introduce a point defect in a crystal, we create a small region of space which is sur-
rounded by a photonic crystal which acts as a perfect reflector. So, light has no way to escape from
this region and we have created a resonating cavity.
Similarly, when we create a line defect, light cannot propagate into the crystal and therefore it has
no choice but to follow the line defect. In this way, we have created a waveguide.
Finally, we can also create a surface defect, where we just exploit the mirror properties of the
photonic crystal.
The nice thing about these structures is that they are very small, on the order of the wavelength of
the light used. This is why they are often called nanophotonic structures. They offer the possibility
of miniaturising and integrating a variety of optical components (like e.g. the bends, splitters and
filters from Fig. 4.15) onto a single photonic IC.
101
Felix Bloch (1905–1983)
102
solids, liquids, or gases. A few weeks after the first successful experiments he received the news
of the same discovery having been made independently and simultaneously by E.M. Purcell and
his collaborators at Harvard.
Most of Bloch’s work in the subsequent years has been devoted to investigations with the use of
this new method. In particular, he was able, by combining it with the essential elements of his
earlier work on the magnetic moment of the neutron, to remeasure this important quantity with
great accuracy in collaboration with D. Nicodemus and H.H. Staub (1948). His more recent theo-
retical work has dealt primarily with problems which have arisen in conjunction with experiments
carried out in his laboratory.
In 1952 he was awarded the Nobel prize in Physics, together with Purcell, ”for their develop-
ment of new methods for nuclear magnetic precision measurements and discoveries in connection
therewith”.
In 1954, Bloch took a leave of absence to serve for one year as the first Director General of CERN
in Geneva. After his return to Stanford University he continued his investigations on nuclear
magnetism, particularly in regard to the theory of relaxation. In view of new developments, a
major part of his recent work deals with the theory of superconductivity and of other phenomena
at low temperatures.
In 1961, he received an endowed Chair by his appointment as Max Stein Professor of Physics at
Stanford University.
Prof. Bloch married in 1940 Dr. Lore Misch, a refugee from Germany and herself a physicist.
(From Nobel Lectures. Physics 1942–1962, Elsevier Publishing Company, Amsterdam, 1964)
103
Chapter 5
Dynamical systems
The mathematician’s patterns, like the painter’s or the poet’s must be beautiful; the
ideas, like the colours or the words must fit together in a harmonious way. Beauty is
the first test.
— Godfrey H. Hardy
As for everything else, so for a mathematical theory: beauty can be perceived but not
explained.
— Arthur Cayley
Contents
5.1 Chaos in a photonic system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.2 1D discrete systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.3 2D discrete systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
5.4 Continuous systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
In this chapter, we will study some aspects of the dynamics of nonlinear systems. We will see that
these systems can exhibit wonderfully complex and intricate phenomena, producing both pretty
pictures and profound insights into the behaviour of physical processes.
After an example from the photonics world, we will look at 1D discrete systems. 2D discrete
systems are studied next, and finally we will discuss continuous systems, i.e. systems that are
described by differential equations instead of difference equations.
In 1991, scientists at the University of Lille studied a CO2 –laser which contained an electrooptic
absorber inside the cavity to modulate the output intensity of the laser beam. An alternating
current I(t) = A + B cos(ωt) was applied to the modulator. The modulation amplitude B was
kept fixed at 3V, and the modulator frequency ω was set to 640 kHz, a resonance frequency of
104
Figure 5.1: Laser intensity as a function of modulator bias in a CO2 –laser.
the laser. Next, the experimenters varied the DC bias A from 60V to 460V and observed what
happened to the laser output. They sampled the output 640 000 times per second, i.e. in lockstep
with the modulation frequency. They did this experiment for various values of the DC bias to
produce the oscilloscope picture of Fig. 5.1.
For low values of the DC offset, there is only a single branch, meaning that the laser intensity
varies with the same frequency as the modulation current, which is just what you would expect.
However, for larger DC offsets, there are suddenly two branches in the curves: the laser oscilla-
tion takes two periods of the modulation to repeat. This pattern with doubled period is called
a subharmonic of the modulation period. Increasing the DC offset even further leads to patterns
which take 4, 8 or more modulation periods to repeat. There is even a certain range where the
laser output varies between so many levels that it looks very chaotic.
The combination of CO2 –laser plus absorber can be described by a set of relatively simple nonlin-
ear rate equations (see e.g. the course ’High speed photonic components’). At first sight, it is very
surprising that dynamical systems which are described by relatively simple–looking differential
equations can exhibit such rich behaviour.
In this chapter, we will study some aspects of this complex behaviour of dynamical systems,
starting from very simple 1D models described by difference equations. Higher dimensional sys-
tems and systems described by differential equations instead of difference equations will also be
treated.
A dynamical system consists of a set of possible states, together with a rule that determines the
present state in terms of the past states. E.g., a simple model for growth of bacteria in a lab culture
is that the population doubles each hour. So, the update rule of this system is
105
xn = f (xn−1 ) = 2xn−1 (5.1)
Here, the state x designates the population size, and the subscript n stands for the state at time
step n, in this case the state after n hours.
Based on the properties of the update rule, we can distinguish several types of dynamical systems.
In this chapter, we require the rule to be deterministic, which means we can determine the present
state uniquely from the past states. In the next chapter, we will study stochastic systems, where
randomness is introduced in the update rule.
If the update rule is applied at discrete times, the system is called a discrete–time system. In the
limit of infinitesimally small time steps, the governing rules become a set of differential equations,
and we end up with a continuous–time system.
5.2.2 1D maps
A function f whose input and output space are the same is called a map. The evolution of a
dynamical system is governed by multiple applications of the update rule f . Let us define f 2 (x) =
f (f (x)) and more generally define f k (x) as the results of applying f to x k times. The set of points
x0 , f (x0 ), f 2 (x0 ), ... is called the orbit of x0 under f . The starting point x0 is called the initial value
of the orbit.
Of interest in the study of dynamical systems is the limiting behaviour of f k (x) for large values
of k. For the exponential growth scenario of Eq. 5.1, the population will clearly increase with-
out bound, which is obviously not a realistic model. As you will show in the next exercise, an
improved model might be given by
Contrary to the exponential growth model, this so–called logistic growth model has a nonlinear
update rule.
A cobweb plot is a graphical representation of the orbit of an initial value under a 1D map. It is
obtained by drawing the graph of the update rule, together with the diagonal y = x (see Fig. 5.2).
Sketching the orbit of an initial point is done as follows. E.g. starting with the input value x0 = 0.1
and f (x) = 2x(1 − x) in Fig. 5.2, we can easily plot f (0.1) by drawing a vertical line starting from
x0 = 0.1 to the curve f (x), which gives us a value of 0.18. Next, we need to turn this output value
106
Figure 5.2: Cobweb plot for f (x) = 2x(1 − x).
of 0.18 to an input value to compute f (0.18). This is done by drawing a horizontal line from the
input–output pair (0.1, 0.18) until it reaches the diagonal y = x. Now, the process can be repeated
by drawing a sequence of vertical and horizontal line segments between the curve of the update
rule and the diagonal.
From the cobweb plot, it is clear that the orbit will converge to the point x = 0.5, which lies on the
intersection of the graph of the update function and the diagonal.
Such points where f (x) = x are called fixed points. They are of extreme importance when studying
the dynamics of a system, as they contain information on the asymptotic behaviour of the system.
Exercise 5.2. Let f be the map given by f (x) = (3x − x3 )/2. Draw cobweb plots
for orbits with initial values x = 1.6 and x = 1.8. Calculate the fixed points of this
map. What happens with orbits starting out in the vicinity of each of these fixed
points?
Not all fixed points are alike. A stable fixed point has the property that points near it will move
even closer to it as the dynamical system evolves. For an unstable fixed point, the opposite is true:
points starting out in its neighbourhood will rapidly move away as time progresses.
A good analogy is that a ball at the bottom of a valley is stable, while a ball balanced at the tip of
mountain will be unstable.
Stability is an important concept as real–world systems are constantly subjected to small perturba-
tions. Therefore, a steady state observed in a real system must correspond to a stable fixed point.
If the fixed point were unstable, small perturbations would move the orbit away from the fixed
point, which would then be not observed.
107
A stable, attracting fixed point is often called a sink, while an unstable, repelling fixed point is
called a source.
To formulate this more precisely, let’s first introduce the concept of an epsilon neighbourhood N (p),
which is the set of all real numbers within a distance of p:
A fixed point is called a sink if there exists an > 0 such that for all points x in the epsilon
neighbourhood N (p), limk→∞ f k (x) = p.
Similarly, a fixed point is called a source if there is an epsilon neighbourhood N (p) such that each
x in N (p) except p eventually maps outside of N (p).
How can we determine from the shape of f whether a fixed point is a source or a sink? The answer
lies in the following theorem:
|f (x) − f (p)| 0
lim = f (p) (5.4)
x→p |x − p|
Since |f 0 (p)| < 1 there is number a between |f 0 (p)| and 1. This means that there must be a neigh-
bourhood N (p) where
|f (x) − f (p)|
< a, ∀x ∈ N (p) (5.5)
|x − p|
So, f (x) is closer to p than x was to p by at least a factor of a, meaning that f (x) is also in N (p).
Let’s rewrite Eq. 5.6 slightly:
1
f (x) − p < a f 0 (x) − p ,
∀x ∈ N (p) (5.7)
Since f 1 (x) is also in N (p), we can repeat the above procedure to get
108
Figure 5.3: Cobweb plot for f (x) = 3.3x(1 − x).
k
f (x) − p < a f k−1 (x) − p < a2 f k−2 (x) − p < ... < ak |x − p| (5.8)
Exercise 5.3. Discuss the stability of the fixed points of the map f (x) = (3x−x3 )/2.
Exercise 5.4. Discuss the stability of the fixed points of the map f (x) = x3 + x.
If we change the proportionality constant a of the logistic map fa (x) = ax(1 − x) to 3.3, we get a
different behaviour than what we have seen so far. The fixed points are x = 0 and x = 23/33, both
of which are unstable. So, without any fixed points to attract them, where do the orbits go? Using
a calculator or a computer, it is easy to show that the orbit settles into a pair of alternating values
p1 = 0.4794 and p2 = 0.8236. Fig. 5.3 plots the cobweb diagram for this case.
109
Note that f (p1 ) = p2 and f (p2 ) = p1 . Another way of looking at this it saying that f 2 (p1 ) = p1 , so
p1 is a fixed point of h = f 2 (the same is obviously true for p2 ).
More formally, we call p a periodic point of period k (or period–k point) of a map f if f k (p) = p and
k is the smallest positive integer for which this is true.
Note that this definition does not imply that any fixed point of f k is a period–k point of f , as there
might be a smaller integer n for which f n (p) = p.
Just like fixed points, period–k orbits can be attracting (sinks) or repelling (sources). To discuss
the stability, we apply Theorem 5.2.1 on the map f k , because a period–k point is a fixed point of
f k.
For a period–2 point, we need to evaluate |f 2 (p1 )|, which can be done easily with the chain rule:
0
|f 2 (p1 )| = |f 0 (f (p1 ))f 0 (p1 )| = |f 0 (p2 )f 0 (p1 )| (5.10)
Exercise 5.5. Show that the following holds: a periodic orbit {p1 , p2 , ..., pk } is a
sink if
0 0
Note that stability is a collective property of the periodic orbit, in the sense that f k (pi ) = f k (pj )
for all values of i and j.
We have already seen that the scaling parameter a in the family of logistic maps fa (x) = ax(1 − x)
can have a large influence on the behaviour of the system: for some ranges of a there is only one
attracting solution, for other ranges there is a stable period–2 orbit.
To visualise the behaviour of the logistic map for various values of a we can make a diagram like
the one in Fig. 5.4. On the horizontal axis is plotted a, while the vertical axis show x–values. The
figure is constructed as follows: for each a–value, pick a random starting value x, and calculate its
orbit under the map fa (x). Discard the first 100 points of this orbit, and plot the remaining points
of the orbit. Then increment a and start the procedure again. The points that are plotted will
(within the resolution of the picture) approximate either fixed or periodic sinks or other attracting
orbits of the dynamic system.
One might ask how Fig. 5.4 will change if different starting values of x were selected. Surprisingly,
nothing will change, which indicates that for each a value, there is at most one attracting orbit.
Fig. 5.4 (and also Fig. 5.1) is called a bifurcation diagram and shows the birth, evolution and death
of attracting orbits. The term ’bifurcation’ is used to describe significant changes in the set of
110
Figure 5.4: Bifurcation diagram for the family of logistic maps for f (x) = ax(1 − x).
attractors in a dynamical system. E.g. at a = 3, there is a transition between a single sink and the
emergence of a period–2 point. This type of bifurcation is called a period–doubling bifurcation or
also (because of its characteristic shape) a pitchfork bifurcation. For a slightly larger than 3.45, there
appears to be a period–4 sink. In fact, there is an entire sequence of periodic sinks, one for each
period 2,4,8,16,32,... . Such a sequence is called a period–doubling cascade.
Exercise 5.6. Write a computer program that finds the a–values at which the bi-
furcations occur in the period–doubling cascade 2,4,8,16, ... . Call the sequence of
these values an , and calculate
an−1 − an−2
an − an−1
What do you observe as n tends towards infinity? Repeat the experiment for the
period–3 cascade: 3,6,12,... . Also experiment with a different nonlinear map, like
fa (x) = a − x2 .
The number 4.669201609 is called Feigenbaum’s constant.
For other values of the parameter a, the orbit appears to randomly fill out the interval [0, 1], or a
subinterval. A typical cobweb plot for such a situation is shown in Fig. 5.5. These non–periodic
attracting sets are called chaotic attractors and are obviously much harder to describe than periodic
sinks.
So are these non–periodic attractors what we mean by ’chaos’? Partly. Another important concept
in that regard is the sensitive dependence on initial conditions, which we will explore next.
Table 5.1 shows the orbits of two points 0.3000 and 0.3010 under the map f (x) = 4x(1 − x). Al-
though these points start out close together, they quickly diverge. In fact, no matter how close to-
gether we choose these two points, they will eventually move apart. This sensitive dependence on
initial conditions has some far–reaching implications. If a dynamic system exhibits these chaotic
properties, it means we can never hope to accurately predict the long–term behaviour of the sys-
111
Figure 5.5: Cobweb plot for the logistic map in the case of a chaotic attractor.
n fn (0.3) fn (0.301)
0 0.3000 0.3010
1 0.8400 0.8416
2 0.5376 0.5332
3 0.9943 0.9956
4 0.0225 0.0176
5 0.0879 0.0692
6 0.3208 0.2576
7 0.8716 0.7650
8 0.4476 0.7190
9 0.9890 0.8081
10 0.0434 0.6202
Table 5.1: Comparison of orbits of nearly identical points under the map f (x) = 4x(1 − x).
112
tem, even though it is governed by deterministic and well–known rules. Sensitive dependence on
initial conditions means that small variations in the initial state of the system (e.g. due to mea-
surement errors or finite numerical precision), will always get amplified to such a degree that it is
impossible to make long–term predictions on the evolution of the system. This is the hallmark of
chaos.
To quantify how sensitive an orbit is to variations in initial conditions, we will now introduce the
concept of Lyapunov numbers and Lyapunov exponents.
We already learnt that for fixed points, stability is heavily influenced by the derivative of the map.
E.g., if p is a fixed point and f 0 (p) = a > 1, then the orbit of a point near p will move away at a
multiplicative rate of approximately a per iteration. This also means that the difference between
two points will by amplified at the same rate. Starting from these observations, we generalise this
to arbitrary orbits by defining the Lyapunov number of an orbit {x0 , x1 , x2 , ...} as
1
L(x0 ) = lim f 0 (x0 ) ... f 0 (xn−1 ) n
(5.11)
n→∞
1
ln f 0 (x0 ) + ... + ln f 0 (xn−1) )
h(x0 ) = ln L(x0 ) = lim (5.12)
n→∞ n
Exercise 5.7. Write a computer program to calculate the Lyapunov exponents for
the family of logistic maps fa (x) = ax(1 − x). Plot the Lyapunov exponent as a
function of a. In which regions do you see chaos?
Exercise 5.8. Making use of cos 2x = 1 − 2 sin2 x, prove the following explicit
formula for any orbit of the logistic map f (x) = 4x(1 − x):
1 1
xn = − cos(2n arccos(1 − 2x0 ))
2 2
Is this a useful formula from a computational point of view?
The Hénon map is to two–dimensional systems what the logistic map is to one–dimensional sys-
tems. It is governed by simple–looking equations, yet already exhibits most of the rich dynamic
behaviour seen in more complicated systems. The version of the Hénon map that we will study is
113
Figure 5.6: Influence of initial conditions for the Hénon map with b = −0.3. Initial values whose trajectories
diverge to infinity are coloured black. The squares show the location of a period–2 sink, which attracts white
initial conditions. On the left, a = 1.28 and the basin boundary is smooth. On the right, a = 1.4, and the
boundary is fractal.
Figure 5.7: Self–similarity of the Hénon basin boundary for a = 1.4 and b = −0.3. The left picture cor-
responds to [−1.88, −1.6] × [−0.52, −0.24], which was indicated by the box in Fig. 5.6. The right picture
corresponds to the box in the left picture.
For now, set a = 1.28 and b = −0.3. If we start with the initial condition (x, y) = (0, 0), we find
that the orbit moves towards a period–2 sink. The left part of Fig. 5.6 shows an analysis of the
results of iteration with general initial values. Points in black represent initial conditions whose
orbits diverge towards infinity, white points are initial conditions which converge to the period–2
sink. For a = 1.28 the boundary of the basin, which moves in and out of the rectangle of initial
values shown here, is smooth.
For a = 1.4 and b = −0.3, we get a completely different picture: the boundary of the basin is no
longer smooth, but is in a sense infinitely complicated: if we were to zoom in on it, we would
still see the same level of complexity, no matter how deep we would zoom. This is illustrated
in Fig. 5.7, which shows successive zooms of a small region of the basin boundary, indicated by
a rectangle in the picture. The self–similarity at different zoom levels is apparent. We call such a
self–similar structure with infinite complexity a fractal.
114
Figure 5.8: Plot of the Hénon attractor in the (x, y)–plane for a = 1.4 and b = 0.3.
Let us finally flip the sign of b and look at the case a = 1.4 and b = 0.3. Now the period–2
sink disappears, and the orbit will eventually evolve to look like the structure in Fig. 5.8. This
is an example of a dynamical system with a fractal attractor, as it turns out that this attractor is
once again self–similar at different zoom levels. The orbit of nearly any point in this region will
converge to this attractor. Note that this does not mean that two slightly different initial conditions
will converge to the same limiting behaviour. In fact, the opposite is true: because of the infinitely
complicated nature of the attractor, the two orbits will eventually diverge and become completely
uncorrelated.
Exercise 5.9. Consider the following family of 2D maps, this time expressed in
terms of a complex coordinate z:
Pc (z) = z 2 + c
For each value of c, check numerically if the orbit of the initial point z = 0 diverges
towards infinity. Plot the points c for which this orbit stays bounded. The set of
these points is called the Mandelbrot set and is plotted in Fig. 5.9.
We used the term ’sink’ in our discussion of 1D maps to refer to a fixed point or a periodic orbit
that attracts an epsilon neighbourhood of initial values. A ’source’ is a fixed point that repels a
neighbourhood. These definitions make sense in higher–dimensional state spaces without alter-
ation. In 2D e.g., the neighbourhoods in question are disks.
Fig. 5.10 shows schematic views of a sink and a source for a 2D map, together with a typical disk
neighbourhood and its image under the map. Along with the sink and the source, a new type of
115
Figure 5.9: The Mandelbrot set.
Figure 5.10: Images of a disk in the neighbourhood of a a) sink, b) source, c) saddle point.
fixed point is introduced, which cannot occur in 1D systems. This type of point, which we call a
saddle, has at least one attracting direction and one repelling direction.
Points near this type of fixed point act as if they were moving along the surface of a cowboy’s
saddle under the influence of gravity. This is illustrated in Fig. 5.11. A saddle exhibits sensitive
dependence on initial conditions, because of the neighbouring initial conditions that escape along
the repelling direction.
Saddle points play surprising roles in the dynamics of a system. To already give an example of
this, let’s return to Fig. 5.6, where black points are in the basin of infinity and white points are in
the basin of sink. What happens to points which are located on the boundary between these two
basins? Will they converge towards the sink or towards infinity? The answer is: neither. It turns
out that these points converge towards the saddle! So, although not a general attractor, the saddle
obviously plays an important role in determining which points go to which basin.
116
Figure 5.11: A saddle point.
Our goal in the next sections is to find ways of identifying sources, sinks and saddles from the
defining equations of the map. In 1D systems, we already saw that the key to determining stability
was looking at the derivative in that point. Since the derivative determines the tangent line, or the
best linear approximation near that point, it determines the amount of shrinking/stretching in the
vicinity of that point. The same principle operates in higher dimensional systems, where we need
to look at the best linear approximation of the system at that point. Before constructing such a
linearisation, let’s first study linear maps and see what we can learn from them.
x a11 a12 x
f (x, y) = A = (5.14)
y a21 a22 y
Every linear map has a fixed point at the origin. Its stability can be investigated in the same way
as we did for 1D maps: if all points in the neighbourhood of the fixed point approach the fixed
point when iterated by the map, we consider the fixed point to be an attractor.
In some cases, the dynamics of a 2D system resemble 1D dynamics. This is the case when the
initial point v0 is an eigenvector of A with eigenvalue λ. Then we get:
and in general vn = λn v0 . Hence the map behaves like the 1D map vn+1 = λvn .
The importance of eigenvalues is further exemplified by looking at the orbit of an arbitrary point.
Let’s restrict ourselves for the moment to the case where the eigenvalues λi of A are real and
distinct, so that we can diagonalise the matrix as follows:
117
−1 λ1 0
A=S S (5.17)
0 λ2
λn1 0
n −1
xn = A x0 = S Sx0 (5.18)
0 λn2
From this, we see immediately that if all eigenvalues have a magnitude smaller than 1, the or-
bit converges to our fixed point (0, 0). On the other hand, if all eigenvalues have a magnitude
larger than 1, the iterates will diverge, meaning that the fixed point is unstable. Finally, for one
eigenvalue smaller than 1, and one eigenvalue larger than 1, we have saddle point, as there is one
converging and one diverging direction.
Obviously, not all 2 × 2 matrices can be diagonalised. We know from linear algebra that there
are two other canonical forms for 2 × 2 matrices. E.g., when the eigenvalues are not distinct, the
canonical form is
−1 λ1 1
A=S S (5.19)
0 λ1
When the eigenvalues are a complex conjugate pair, the matrix can be diagonalised using complex
numbers. However, one can prove that a more useful real canonical form of such a matrix is:
−1 a −b
A=S S (5.20)
b a
Exercise 5.10. Show that for these other canonical forms, the same conclusions
with respect to stability of the fixed point at the origin hold as in the case of real
distinct eigenvalues.
Exercise 5.11. For the following maps, determine whether the origin is a sink,
source or saddle.
a)
4 30
1 3
b)
1 1/2
1/4 3/4
118
5.3.4 Nonlinear maps and the Jacobian matrix
To construct the linearisation around a fixed point, we now have to calculate a matrix rather than
a single derivative. This matrix is called the Jacobian and is defined as follows.
Let f = (f1 , f2 , ..., fn ) be a map on Rn and let p ∈ Rn . The Jacobian matrix of f at p, denoted as
Df (p), is defined as
∂f1 ∂f1
∂x1 (p) ··· ∂xn (p)
Df (p) =
.. .. (5.21)
. .
∂fn ∂fn
∂x1 (p) · · · ∂xn (p)
For a fixed point, f (p) = p, so for a small change h, the map moves p + h approximately Df (p) · h
away from p. That is, f magnifies a small change h in input to a change Df (p) · h in output.
As long as this deviation remains small (so that |h|2 is negligible and our approximation is valid),
the action of the map near p is essentially the same as the linear map A = Df (p), with fixed point
h = 0. So, we can use our stability criteria for linear maps we have discussed so far.
Although we won’t prove it here, in the general case of n–dimensional maps, the following stabil-
ity criteria hold:
Let f be a map on Rn and assume f (p) = p.
3. If none of the eigenvalues of Df (p) has magnitude equal to 1 (we call p hyperbolic then), if
at least one eigenvalue of Df (p) has magnitude less than 1, and at least one eigenvalue of
Df (p) has magnitude greater than 1, then p is a saddle.
Without providing any proofs, we will discuss in this section some more aspects of the special
properties of saddle points.
119
We have seen that a saddle point is unstable, in the sense that most initial conditions will move
away from it because the existence of an expanding direction. However, not all initial conditions
will move away. Consider e.g. this very simple linear map:
0.9 0
A= (5.23)
0 1.1
For initial values on the x–axis, the orbit will converge to the saddle point, because it does not feel
the influence of the expanding direction. This set of points is important enough to warrant its own
name:
The stable manifold of a saddle point p is defined as the set of points v for which |f n (v) − p| → 0 as
n → ∞.
For 2D maps, one can prove the following properties about the stable manifold. First of all, it is
one–dimensional 1 . Secondly, the stable manifold will be tangent to the eigenvector of the matrix
Df (p) with eigenvalue smaller than 1.
In the example from Eq. 5.23, the y–axis is the unstable manifold of the saddle point. It is defined as
follows:
The unstable manifold of a saddle point p is defined as the set of points v for which |f −n (v)−p| →
0 as n → ∞, so points in the unstable manifold converge to the saddle point when iterating the
map backwards. In other words, the unstable manifold is the stable manifold of the inverse map
f −1 .
Note that we have not defined the unstable manifold as the set of points for which |f n (v)−p| → ∞,
as this would be too restrictive. There are points on the unstable manifold which diverge to infinity
under the forward maps, but there are also points which get trapped in a chaotic orbit.
For the unstable manifold, similar properties hold as for the stable manifold. It is a 1–manifold,
and it is tangent to eigenvector of the Jacobian with eigenvalue larger than 1.
For linear maps, the stable and unstable manifolds are easily determined, and are just lines in the
direction of the eigenvectors of the Jacobian. For nonlinear maps, they can look significantly more
complicated, as is illustrated in Fig. 5.12 for a Hénon map.
The shape of these curves should look familiar. Indeed, the following properties hold:
1. The stable manifold forms the boundary between the basin of the sink and the basin at
infinity.
2. The attractor of the map lies along the unstable manifold (which can be either a periodic
sink, or a chaotic attractor)
A further important aspect of the unstable and stable manifold is the following: if they cross
(at another point than the saddle of course), there are automatically an infinite numbers of such
crossings. Also, the system will show chaotic behaviour in such a case.
1
A one–dimensional manifold is a set of points that locally looks like a curve everywhere along its length. E.g. the
letter ’O’ is a 1–manifold. The letter ’T’ on the other hand is not, because of the intersection and the end points.
120
Figure 5.12: Stable and unstable manifold of a Hénon map. The saddle point is the square in the lower left
corner, the stable manifold is mainly vertical, the unstable one mainly horizontal.
These insights are mainly due to Poincaré, who developed much of the theory underlying chaotic
systems when studying the dynamics of three bodies moving under the force of gravity.
In the last part of this chapter, we will discuss some introductory concepts on continuous dynami-
cal systems, i.e. those that are described by differential equations rather than difference equations.
Since we have already spent so much time discussing discrete maps, it would be nice to have
some tools to reduce a continuous system to a discrete one and be able to study some aspects of
its dynamics in that way. There are two methods that can be used for this: the time–T map and
the Poincaré map.
A time–T map is formed by taking snapshots at fixed time intervals of the solution of the contin-
uous system, i.e. a time–T map is a discrete map that advances the continuous system T time
units.
Sometimes it is possible to write down an explicit equation for a time–T map. Consider e.g. the
differential equation which describes the cooling of an object with specific heat k:
dx
= −kx (5.24)
dt
Here x stands for temperature. The solutions of this equation are of the form
121
Figure 5.13: Poincaré map of a trajectory.
Advancing from time t0 to t0 + T amounts to a multiplication with e−kT . So, the time–T map for
this system is the 1D map with the following update function:
If it’s not possible to write down an explicit rule, then the map has to be iterated numerically.
Note that the laser dynamics example from the opening of the chapter also used the time–T phi-
losophy, as the system was studied at specific snapshots in time determined by the sampling
frequency.
Another technique to reduce a continuous system to a discrete one is the so–called Poincaré map.
One of Poincaré’s most important innovations was a simplified way of looking at complex con-
tinuous trajectories. Instead of studying the entire trajectory, he found that much of the important
information was encoded in the points at which the trajectory passes through a two–dimensional
plane. The order of these intersection points defines a discrete plane map, which is called the
Poincaré map.
Fig. 5.13 shows a schematic view of a trajectory C. The plane S is defined by x3 = constant. Each
time the trajectory C pierces S in a downward direction, as at points A and B, we record the point
of piercing on the plane S. We can label these points with coordinates (x1 , x2 ). Let A represent
the k–th downward piercing of the plane, and B the (k + 1)–th downward piercing. The Poincaré
map is the 2D map G such that G(A) = B.
Given A, the differential equations can be solved with A as initial condition, and the solution
followed until the next downward piercing at B. Thus A uniquely determines B, which ensures
that the map G is well–defined. Much of the dynamical behaviour of the trajectory C is present in
the 2D map G. E.g, if C is periodic in the sense that it follows a closed loop, then the plane map G
will have a periodic orbit.
122
Figure 5.14: a) Solutions of dx/dt = −kx for different initial conditions. b) Corresponding phase portrait.
The Poincaré map is similar in principle to the time–T map we considered above, although the
details are different. While the time–T map is stroboscopic (it logs the value of a variable at equal
time intervals), the Poincaré map records plane piercings, which need not be equally spaced in
time.
If we do want to study the trajectory C without reducing it to a discrete map, it is often useful to
construct phase portraits (also called phase planes) of the solutions of a differential equation.
To illustrate what these are, let’s return to the simple differential equation
dx
= −kx (5.27)
dt
with solutions
Fig. 5.14 a) sketches several trajectories of x for different initial values. All curves converge to the
equilibrium value x = 0. An equilibrium solution is a solution of the differential equation which
satisfies dx/dt = 0. They play the same role as fixed points in discrete maps, and are therefore
important to study.
Often, a figure like Fig. 5.14 a) contains too much information, as we are often only interested in
the stability of the equilibrium. We are not so much interested in the different curves for different
initial conditions, nor in the precise rate at which the equilibrium is approached. This leads to
the phase portrait in Fig. 5.14 b), in which the t–axis isuppresseded, and it is simply shown on the
x–axis where trajectories are headed. The arrows indicate the direction of solutions (towards or
away from equilibria) without graphing specific values of t. The phase portrait allows one to see
at a glance where the equilibria are located and whether they are stable or not.
123
Figure 5.15: The Lorenz attractor.
Exercise 5.14. Construct a phase portrait for the logistic differential equation:
dx
= ax(1 − x)
dt
We finish the chapter by looking at the most famous set of differential equations which exhibit
chaotic behaviour. They were formulated in the early 1960s by Lorenz, in the context of weather
forecasting:
dx
= − σx + σy (5.29)
dt
dy
= − xz + ry − y (5.30)
dt
dz
=xy − bz (5.31)
dt
For σ = 10 and b = 8/3, Lorenz found that the system behaves ’chaotically’ whenever r exceeds
the critical value of 24.74. In that case, all solutions appear to be extremely sensitive to initial
conditions, and almost all of them are apparently neither periodic solutions nor convergent to
periodic solutions or equilibria. Instead, he found found one of the very first chaotic attractors,
which is now called the Lorenz attractor. This attractor is plotted in Fig. 5.15.
Lorenz realised that this sensitivity on initial conditions has a huge impact on weather predicting.
In fact, the title of one of his talks was: ”Predictability: Does the Flap of a Butterfly’s Wings in
Brazil set off a Tornado in Texas?”. This is why sensitivity on initial conditions is sometimes
referred to as ’the butterfly effect’.
124
Jules Henri Poincaré (1854–1912)
125
those functions as part of his doctoral thesis. Below Poincaré explains how he discovered Fuchsian
functions:
For fifteen days I strove to prove that there could not be any functions like those I have since
called Fuchsian functions. I was then very ignorant; every day I seated myself at my work ta-
ble, stayed an hour or two, tried a great number of combinations and reached no results. One
evening, contrary to my custom, I drank black coffee and could not sleep. Ideas rose in crowds;
I felt them collide until pairs interlocked, so to speak, making a stable combination. By the next
morning I had established the existence of a class of Fuchsian functions, those which come from
the hypergeometric series; I had only to write out the results, which took but a few hours.
In 1887, Oscar II, King of Sweden and Norway held a competition to celebrate his sixtieth birthday
and to promote higher learning. The King wanted a contest that would be of interest so he decided
to hold a mathematics competition. Poincaré entered the competition submitting a memoir on the
three body problem which he describes as:
The goal of celestial mechanics is to answer the great question of whether Newtonian mechanics
explains all astronomical phenomenons. The only way one can prove this is by taking the most
precise observation and comparing it to the theoretical calculations.
Poincaré did in fact win the competition. In his memoir he described new mathematical ideas
such as homoclinic points (the intersections between the stable and unstable manifolds of the
same saddle point). The memoir was about to be published in Acta Mathematica when an error
was found by the editor. This error in fact led to the discovery of chaos theory. The memoir
was published later in 1890. In addition Poincaré proved that the determinism and predictability
were disjoint problems. He also found that the solution of the three body problem would change
drastically with small change on the initial conditions. This area of research was neglected until
1963 when Edward Lorenz discovered the famous a chaotic deterministic system using a simple
model of the atmosphere.
Henri Poincaré and Albert Einstein had an interesting relationship concerning their work on rel-
ativity. Their interaction begins in 1905 when Poincaré published his first paper on relativity.
About a month later Einstein published his first paper on relativity. Both continued publishing
work about relativity, but neither of them would reference each others work. Not only did Ein-
stein not reference Poincaré’s work but claimed never to have read it. On one occasion Einstein
referenced Poincaré acknowledging his work on relativity in the text of a lecture in 1921. Although
later in Einstein’s life, he would comment on Poincaré as being one of the pioneers of relativity.
Poincaré made many contributions to different fields of applied mathematics such as: celestial
mechanics, fluid mechanics, optics, electricity, telegraphy, capillarity, elasticity, thermodynamics,
potential theory, quantum theory, theory of relativity and cosmology. In the field of differential
equations Poincaré has given many results that are critical for the qualitative theory of differential
equations, for example the Poincaré sphere and the Poincaré map.
It is that intuition that led him to discover and study so many areas of science. Poincaré is consid-
ered to be the next universalist after Gauss. After Gauss’s death in 1855 people generally believed
that there would be no one else that could master all branches of mathematics. However they
were wrong because Poincaré took all areas of mathematics as ’his province’.
(after http://planetmath.org/encyclopedia/JulesHenriPoincare.html)
126
Bibliography
• Mathematical methods for physicists, G. Arfen, H. Weber, Harcourt Academic Press, 5th edition,
2001.
• Finite elements for electrical engineers, P. Silvester, R. Ferrari, Cambridge University Press, 3rd
edition, 1996.
• Waves and fields in inhomogeneous media, W. Chew, Van Nostrand Reinhold, 1990.
• Numerical techniques for micrometer and millimeter-wave passive structures, T. Itoh (ed.), John
Wiley, 1989.
• Photonic crystals: molding the flow of light, R. Meade, J. Joannopoulos, J. Winn, Princeton Uni-
versity Press, 1995.
• Chaos: an introduction to dynamical systems, K. Alligood, T. Sauer, J. Yorke, Springer, 3rd edi-
tion, 2000.
127